CN112905736B - Quantum theory-based unsupervised text emotion analysis method - Google Patents
Quantum theory-based unsupervised text emotion analysis method Download PDFInfo
- Publication number
- CN112905736B CN112905736B CN202110113463.9A CN202110113463A CN112905736B CN 112905736 B CN112905736 B CN 112905736B CN 202110113463 A CN202110113463 A CN 202110113463A CN 112905736 B CN112905736 B CN 112905736B
- Authority
- CN
- China
- Prior art keywords
- emotion
- text
- psd
- dictionary
- nsd
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000008451 emotion Effects 0.000 title claims abstract description 214
- 238000004458 analytical method Methods 0.000 title claims abstract description 19
- 239000011159 matrix material Substances 0.000 claims abstract description 54
- 238000000034 method Methods 0.000 claims abstract description 45
- 238000007781 pre-processing Methods 0.000 claims abstract description 6
- 239000013598 vector Substances 0.000 claims description 15
- 238000012549 training Methods 0.000 claims description 9
- 230000008569 process Effects 0.000 claims description 7
- 238000007476 Maximum Likelihood Methods 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 4
- 241001632422 Radiola linoides Species 0.000 claims 1
- 230000006870 function Effects 0.000 description 14
- 230000002354 daily effect Effects 0.000 description 3
- 241001632427 Radiola Species 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 235000017166 Bambusa arundinacea Nutrition 0.000 description 1
- 235000017491 Bambusa tulda Nutrition 0.000 description 1
- 241001330002 Bambuseae Species 0.000 description 1
- 241000282414 Homo sapiens Species 0.000 description 1
- 235000015334 Phyllostachys viridis Nutrition 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 239000011425 bamboo Substances 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000002996 emotional effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 230000001747 exhibiting effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000011148 porous material Substances 0.000 description 1
- 230000004044 response Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/3332—Query translation
- G06F16/3335—Syntactic pre-processing, e.g. stopword elimination, stemming
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3346—Query execution using probabilistic model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/242—Dictionaries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Health & Medical Sciences (AREA)
- Probability & Statistics with Applications (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Mathematical Analysis (AREA)
- Computational Mathematics (AREA)
- Evolutionary Computation (AREA)
- Algebra (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Machine Translation (AREA)
Abstract
The invention relates to an unsupervised text emotion analysis method based on quantum theory, which comprises the following steps: the method comprises the following steps: creating two emotion dictionaries, namely a positive emotion dictionary PSD and a negative emotion dictionary NSD; preprocessing texts in the positive emotion dictionary PSD, the negative emotion dictionary NSD and the corpus; constructing a quantum text representation model, and respectively extracting features of the preprocessed positive emotion dictionary PSD, the preprocessed negative emotion dictionary NSD and the text to construct a positive emotion dictionary density matrix rho PSD Negative emotion dictionary density matrix ρ NSD Text density matrix ρ text The method comprises the steps of carrying out a first treatment on the surface of the And obtaining the emotion classification result of each text by using a quantum relative entropy algorithm.
Description
Technical Field
The invention relates to the technical field of text emotion classification, in particular to an unsupervised text emotion analysis method.
Background
The development of the internet has penetrated all aspects of the social politics economy so far, affecting people's daily lives. With the advent of the Internet age, social platforms develop rapidly like the spring bamboo shoots after rain, break through the social mode of closed blockage in the past, provide a wider platform for open interaction between users, and provide a lot of convenience for daily life of people. Nowadays, more and more users like to publish own attitudes and comments on social platforms (such as microblogs, weChat and the like), and every day, the social platforms can emerge tens of thousands of TB-level data contents, so that the social platforms become one of main sources for acquiring information in daily life of people. These information not only contain a report of objective facts, but also carry a large number of subjective emotional expressions. The method has the advantages that the contained emotion information is mined and identified, and the method has important scientific research significance and economic value for various fields such as public opinion analysis, marketing, investment prediction and the like. The invention mainly researches the most common text-pushing and blogging emotion in the social platform, namely a text emotion analysis technology.
One core task of text emotion analysis is text representation. Text representation is a form (method) of representing semantic information contained in text strings into real-valued vectors which can be processed by a computer, and meanwhile, the vectors are required to have excellent expression capability and distinguishing capability. Therefore, the vector-based text representation method occupies the main stream, and the performance of the vector-based text representation method is fully verified on each large data set, such as one-hot coding, word frequency-inverse document frequency, word embedding and the like. In recent years, the field of information retrieval shows a series of outstanding achievements based on quantum probability theory, which shows that the quantum probability theory can be used as an extended mathematical framework for tasks such as text characterization, document ordering and the like. Of these, the most representative is the quantum language model proposed by Sordoni et al for classical information retrieval tasks. As an extension of the classical language model, the quantum language model aims at solving the problem of term dependence, and achieves good effect.
Text in emotion analysis represents questions, typically for long text, comments at the chapter level, such as movie comments, product comments, etc. Such style text generally has the characteristics of complex semantic relationships, frequent interaction between terms, and deep dependence of context, and requires a superior representation learning model compared to information retrieval tasks. The standard quantum language model adopts one-hot coding to construct projection operators, when facing long texts, dimension disasters are easy to cause, and the problem that the quantum language model cannot be converged is exposed when training the high-dimension density matrix. But compared to vector-based representation methods, the density matrix in quantum theory can encode more semantic information, exhibiting second order correlation between word vectors. Therefore, combining quantum theory with density matrices is a valuable topic for developing novel text representation models.
Disclosure of Invention
The invention aims to solve the technical problem of overcoming the defects of the prior art and providing an unsupervised quantum text emotion analysis method. According to the method, two active and passive emotion dictionaries are constructed, each emotion dictionary and each subjective document are respectively represented, density matrix representation is constructed, then the similarity score between each subjective document and each active and passive emotion dictionary is calculated by quantum relative entropy, and an emotion classification result is obtained by comparing the similarity scores. The aim of the invention is realized by the following technical scheme:
an unsupervised text emotion analysis method based on quantum theory comprises the following steps:
(1): creating two emotion dictionaries, namely a positive emotion dictionary PSD and a negative emotion dictionary NSD, wherein the positive emotion dictionary contains words with positive emotion polarities, and the negative emotion dictionary contains words with negative emotion polarities;
(2): preprocessing texts in the positive emotion dictionary PSD, the negative emotion dictionary NSD and the corpus;
(3): constructing a quantum text representation model, and respectively extracting features of the preprocessed positive emotion dictionary PSD, the preprocessed negative emotion dictionary NSD and the text to construct a positive emotion dictionary density matrix rho PSD Negative emotion dictionary density matrix ρ NSD Text density matrix ρ text The method comprises the following steps:
the first step: respectively obtaining PSD, NSD and word vector of word in each textAnd then normalizing:
and a second step of: based on vector outer product operation, a positive emotion dictionary PSD, a negative emotion dictionary NSD and projection matrixes of each word in a text are constructed, and the projection matrixes of all words in the positive emotion dictionary PSD are combined together to form a positive emotion projection sequenceProjection matrixes of all words in the negative emotion dictionary are combined into a negative emotion projection sequence +.>And the projection matrices of all words in each text are combined into a text projection sequenceWhere r represents the number of words of the positive emotion dictionary PSD, k represents the number of words of the negative emotion dictionary NSD, and t represents the number of words contained in each text;
and a third step of: obtaining respective projection sequences pi of the positive emotion dictionary, the negative emotion dictionary and the text PSD 、Π NSD 、Π text Then, a maximum likelihood estimation MLE method is used for making likelihood functionsNumber of digitsRespectively training the density matrixes of the active emotion dictionary density matrixes ρ PSD Negative dictionary density matrix ρ NSD And text density matrix ρ text ;
(4): calculating text density matrix rho by using quantum relative entropy algorithm text Respectively and actively emotion dictionary density matrix rho PSD Negative emotion dictionary density matrix ρ NSD Is a positive similarity score S p Similarity to negative score S n ;
(5): comparing the positive similarity score with the negative similarity score if S p >S n And if the emotion type belongs to positive, otherwise, the emotion type belongs to negative, and finally, the emotion classification result of each text is obtained.
Further, in the step (1), the method for creating the positive emotion dictionary PSD and the negative emotion dictionary NSD is as follows:
the first step: selecting M groups of seed word pairs with opposite polarities to respectively form an initial positive emotion dictionary PSD and a negative emotion dictionary NSD;
and a second step of: selecting a corpus, extracting adjectives and adverbs in the corpus by a part-of-speech labeler based on a hidden Markov model, and taking the adjectives and the adverbs as candidate emotion words W hx Using part-of-speech tagger to make each word w in the sentence in the corpus i Marking the part of speech t i Let each part of speech t i Is only related to the part of speech t of the last word i-1 Concerning, i.e. P (t i |t i-1 ) And each word w i Probability of only t being part of speech i Correlation, i.e. P (w i |t i ) Then a part-of-speech tag is selected as word w that maximizes the joint probability distribution i Is part of speech:
and a third step of: using point-to-point information-information retrieval algorithmPMI-IR calculates each candidate emotion word W hx Semantic association degrees among all seed words in the positive emotion dictionary PSD and the negative emotion dictionary NSD are used as emotion scores of candidate emotion words;
fourth step: for a certain candidate emotion word W hx If emotion Score (W hx ) Greater than 0, the word belongs to a positive emotion word, if emotion Score (W) hx ) Less than 0, belonging to the passive emotion words, and according to the emotion attribute, the candidate emotion word W hx And adding the emotion dictionary into a corresponding emotion dictionary.
In the third step, the semantic association degree calculating process may be:
wherein W is hx Representing candidate emotion words, seed representing seed words in each emotion dictionary, PMI (W hx Seed) is a statistical candidate emotion word W hx Probability of co-occurrence with seed word, if probability is larger, the more closely related it is, the higher the degree of association is, score (W hx ) Is the emotion score of the candidate emotion word.
In step (2), preprocessing the text in the positive emotion dictionary PSD, the negative emotion dictionary NSD and the corpus should include: correcting spelling errors, removing illegal characters of each dictionary and text, and removing useless words including stop words and punctuation marks based on an English standard stop word list.
In step (3), the GloVe tool can be used to obtain PSD, NSD and word vectors of words in each text
In the third step of step (3), for positive emotion projection sequencesThe training method comprises the following steps:
likelihood functionThe definition is as follows:
wherein pi (n) i Is the positive emotion projection sequence pi PSD Projection matrix of ith word in (p) PSD Is the density matrix of the active emotion dictionary, tr is the trace operation of the computation matrix, tr (pi i ρ PSD ) Representing word w i Probability of occurrence, likelihood functionRepresenting the joint probability of the co-occurrence of all words in the positive emotion dictionary.
Objective function F (ρ) PSD ) The definition is as follows:
F(ρ PSD ) Representing the maximum value of the joint probability of solving all words of the positive emotion dictionary.
Using a global convergence algorithm that continuously iteratively updates ρ by defining an iteration direction Dk PSD And an objective function F (ρ) PSD ) Until the objective function F (p PSD ) Outputs the maximum value of the positive emotion dictionary density matrix ρ PSD ;
According to the same training method, a negative dictionary density matrix rho is obtained NSD And text density matrix ρ text 。
In the third step in the step (3), the quantum relative entropy calculation process may be:
S p =tr(ρ text (logρ text -logρ PSD ))
S n =tr(ρ text (logρ text -logρ NSD ))
wherein S is p ,S n Not less than 0, if and only if ρ text =ρ PSD At the time S p =0;ρ text =ρ NSD At the time S n =0。
The beneficial effects of the invention are as follows:
(1) Constructing a high-quality positive emotion dictionary and a high-quality negative emotion dictionary, and expressing two basic emotions of human beings;
(2) Based on quantum probability theory, extracting text features, constructing a density matrix, and encoding term semantics and probability distribution information;
(3) Based on quantum relative entropy, similarity between density matrixes is calculated, emotion classification can be completed unsupervised, and the method has the characteristics of quick response, strong field adaptability, high accuracy and the like.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a polarity distribution diagram of emotion words in an emotion dictionary;
FIG. 3 is a quantum text representation model flow diagram;
FIG. 4 shows the results of comparison of histogram experiments of different emotion analysis methods.
Detailed Description
The technical solution of the present invention will be described in further detail with reference to the accompanying drawings, but the scope of the present invention is not limited to the following description. FIG. 1 shows the flow of the method for unsupervised text emotion analysis based on quantum theory; FIG. 2 shows the emotion polarity profile of words in an emotion dictionary; FIG. 3 shows a flow chart of a quantum text representation model; fig. 4 shows the results of the experimental comparison of emotion classification between the final different methods. The method comprises the following specific steps:
(1): based on 7 groups of seed words and OMD (The Obama-McCain Debate) English corpus, two emotion dictionaries, namely positive and negative emotion dictionaries, named (positive sentiment dictionary, PSD) and (negative sentiment dictionary, NSD) are created manually, and The method is as follows:
the first step: 7 sets of seed word pairs with opposite polarities are manually selected, namely 'active/negative', 'good/bad', 'love/ha', 'excel/pore', 'amazing/shit', 'nice/tert' and 'awesome/crap', respectively. Thus, the initial positive emotion dictionary psd= (positive, good, love, excel, amazing, nice, awesome), and the initial negative emotion dictionary nsd= (negative, bad, ate, face, shit, terlie, crap).
In the second step, a total of 855 adjectives and adverbs in the corpus are extracted by a hidden markov (Hidden Markov Model, HMM) part-of-speech labeler, and these words are used as candidate emotion words, for example awesome, thankful, dirty, dumb, terrible. The calculation process is as follows: the HMM part-of-speech tagger is for each word w in the text i Marking the part of speech t i (e.g., adjectives, verbs, adverbs, etc.). Assume each part of speech t i Is only related to the part of speech t of the last word i-1 Related (i.e. P (t) i |t i-1 ) With each word w) i Probability of only t being part of speech i Correlation (i.e. P (w) i |t i ) A part-of-speech tag that maximizes the joint probability distribution is selected as the word w) i Is part of speech:and counting the occurrence frequency of each word according to the corpus, and calculating the part of speech corresponding to each word after three parameters of the HMM are obtained, so as to finish the part of speech labeling process.
And a third step of: each candidate emotion word W is calculated by using a point mutual information-information retrieval PMI-IR method hx Semantic association degrees among all seed words in the positive emotion dictionary PSD and the negative emotion dictionary NSD are used as emotion scores of the candidate emotion words. The semantic association degree calculating process is as follows:wherein W is hx Representing candidate emotion words, seed representing seed words in each emotion dictionary, PMI (W hx Seed) is a statistical candidate emotion word W hx The probability of co-occurrence with the seed word, the more closely the correlation, the higher the correlation if the probability is greater. Score (W) hx ) Is the emotion score of the candidate emotion word.
Fourth step: if emotion Score (W) hx ) Greater than 0, the word belongs to a positive emotion word, if emotion Score (W) hx ) And (3) being smaller than 0, belonging to the negative emotion words, and respectively adding the positive emotion words and the negative emotion words into the corresponding emotion dictionary. Finally, the positive emotion dictionary PSD contains 150 positive emotion words, e.g., best, healthy, amazing, beautiful, etc., while the negative emotion dictionary NSD contains 152 negative emotion words, e.g., fake, bloody, weird, offensively, sad, etc.
(2): the method comprises the steps of preprocessing 1928 documents in a positive emotion dictionary PSD, a negative emotion dictionary NSD and an OMD text corpus by using a Python natural language tool kit, correcting spelling errors, removing illegal characters (such as ". The total number of text books of the final OMD corpus is 1906.
(3): training a quantum text representation model, respectively extracting features from positive and negative emotion dictionaries and texts, and constructing a positive dictionary density matrix rho PSD Negative dictionary density matrix ρ NSD Text density matrix ρ text Are all L x L matrices, where L is the dimension of each word vector. Assume that each dictionary or text is represented as d= { w 1 ,w 2 ,...,w t T is the number of words in the dictionary or text, as shown in fig. 3. The method comprises the following steps:
the first step: obtaining a positive emotion dictionary PSD, a negative emotion dictionary NSD and 300-dimensional word vectors of each word in a text by using a Glove toolNormalizing to obtain: />
And a second step of: based on the vector outer product operation, the following formulas are utilized to construct an emotion dictionary and each word in the textw i Projection matrix of (c) Projection matrixIs a 300 x 300 matrix.
Then the projection matrixes of all words in the positive emotion dictionary are combined together to form a positive emotion projection sequenceCombining projection matrixes of all words in negative emotion dictionary into negative emotion projection sequenceAnd the projection matrices of all words in each text are combined into a text projection sequenceWhere r represents the number of words of the positive emotion dictionary, i.e., 150; k represents the number of words of the negative emotion dictionary, i.e., 152; and t represents the number of words each text contains.
And a third step of: obtain projection sequence pi of active dictionary, passive dictionary and text PSD 、Π NSD And pi (a Chinese character) text Then, a maximum likelihood estimation (maximum likelihood estimation, MLE) method is used for formulating likelihood functions(the meaning of likelihood function is the probability of getting the document), start training density matrix, likelihood function +.>The definition is as follows:
wherein pi (n) i Is each projection sequence { pi } PSD ,Π NSD ,Π text The i-th word projection matrix in the sequence { r, k, t } represents each projection sequence { n } PSD ,Π NSD ,Π text The number of words contained in the pattern, ρ is the density matrix, ρ∈ { ρ }, ρ is PSD ,ρ NSD ,ρ text And tr is the trace operation to calculate the matrix. tr (pi) i ρ) represents the word w i Probability of occurrence, likelihood functionAnd respectively representing the joint probabilities of the positive emotion dictionary, the negative emotion dictionary and all words in the text.
Since the log function has monotonicity, the log function is used for likelihood functionThe logarithm does not change its monotonic nature, so the objective function F (ρ) can be defined as:
wherein tr (ρ) =1, ρ.gtoreq. 0,F (ρ) ∈ { F (ρ) PSD ),F(ρ NSD ),F(ρ text ) The maximum value of joint probabilities that the positive emotion dictionary, the negative emotion dictionary and all words in the text co-occur are solved.
Fourth step: a global convergence algorithm is applied, which algorithm is implemented by defining the iteration direction D k Iteratively updating values of p and the objective function F (p) continuously until a maximum value of the objective function F (p) is obtained, and outputting respective positive dictionary density matrices p PSD Negative dictionary density matrix ρ NSD And text density matrix ρ text . Wherein, the update rule defining the kth iteration of the density matrix ρ is: ρ k+1 =ρ k +t k D k And t k Called step size, t k ∈[0,1]Representing the magnitude of the kth iteration objective function F (ρ) update; and direction of iteration D k The definition is as follows:
wherein the method comprises the steps ofAnd->Respectively representing two basic directions of vertical and horizontal, and iteration direction D k By->And->And simultaneously controlling between vertical and horizontal. q (t) k ) Representing the overall iteration direction, +.>Representing the gradient direction of the kth iteration objective function.
They are defined as:
wherein,,is the frequency of each word. To demonstrate the robustness of the global convergence algorithm, a diagonal matrix is randomly initialized at the beginning of the iteration>It satisfies all properties of the density matrix, e.g. ρ 0 And more than or equal to 0. When the back-and-forth variation of the value of the objective function is within 0.0001, the iteration is terminated, and the final density matrix ρ ε { ρ PSD ,ρ NSD ,ρ text }。
(4): calculating text density matrix rho by using quantum relative entropy algorithm text Respectively and actively dictionary density matrix ρ PSD Negative dictionary density matrix ρ NSD Is a positive similarity score S p Similarity to negative score S n . Quantum relative entropy is defined as:
S p =tr(ρ text (logρ text -logρ PSD ))
S n =tr(ρ text (logρ text -logρ NSD ))
wherein S is p ,S n Not less than 0, if and only if ρ text =ρ PSD At the time S p =0;ρ text =ρ NSD At the time S n =0。
(5) Comparing positive similarity scores S p Similarity to negative score S n If S p >S n And if the emotion type belongs to positive (emotion label is +1), otherwise, the emotion type belongs to negative (emotion label is-1), and finally, the emotion classification result of each text is obtained.
The emotion classification result of each subjective text is obtained, the emotion label is compared and tested, the classification accuracy is calculated, the word bag model, the sentence embedding model, the point mutual information-information retrieval algorithm and the quantum language model are compared, the statistical accuracy is compared with the histogram, and the effect of the text emotion analysis model can be obviously improved, as shown in fig. 4, by the method and the device.
The technical means disclosed by the scheme of the invention is not limited to the technical means disclosed by the embodiment, and also comprises the technical scheme formed by any combination of the technical features. It should be noted that modifications and adaptations to the invention may occur to one skilled in the art without departing from the principles of the present invention and are intended to be within the scope of the present invention.
Claims (7)
1. An unsupervised text emotion analysis method based on quantum theory comprises the following steps: the method comprises the following steps:
(1): creating two emotion dictionaries, namely a positive emotion dictionary PSD and a negative emotion dictionary NSD, wherein the positive emotion dictionary contains words with positive emotion polarities, and the negative emotion dictionary contains words with negative emotion polarities;
(2): preprocessing texts in the positive emotion dictionary PSD, the negative emotion dictionary NSD and the corpus;
(3): constructing a quantum text representation model, and respectively extracting features of the preprocessed positive emotion dictionary PSD, the preprocessed negative emotion dictionary NSD and the text to construct a positive emotion dictionary density matrix rho PSD Negative emotion dictionary density matrix ρ NSD Text density matrix ρ text The method comprises the following steps:
the first step: respectively obtaining PSD, NSD and word vector of word in each textAnd then normalizing:
and a second step of: based on vector outer product operation, a positive emotion dictionary PSD, a negative emotion dictionary NSD and projection matrixes of each word in a text are constructed, and the projection matrixes of all words in the positive emotion dictionary PSD are combined together to form the positive emotion dictionaryEmotion projection sequenceProjection matrixes of all words in the negative emotion dictionary are combined into a negative emotion projection sequence +.>And the projection matrices of all words in each text are combined into a text projection sequenceWhere r represents the number of words of the positive emotion dictionary PSD, k represents the number of words of the negative emotion dictionary NSD, and t represents the number of words contained in each text;
and a third step of: obtaining respective projection sequences pi of the positive emotion dictionary, the negative emotion dictionary and the text PSD 、Π NSD 、Π text Then, a likelihood function is formulated by using a maximum likelihood estimation MLE methodRespectively training the density matrixes of the active emotion dictionary density matrixes ρ PSD Negative dictionary density matrix ρ NSD And text density matrix ρ text ;
(4): calculating text density matrix rho by using quantum relative entropy algorithm text Respectively and actively emotion dictionary density matrix rho PSD Negative emotion dictionary density matrix ρ NSD Is a positive similarity score S p Similarity to negative score S n ;
(5): comparing the positive similarity score with the negative similarity score if S p >S n And if the emotion type belongs to positive, otherwise, the emotion type belongs to negative, and finally, the emotion classification result of each text is obtained.
2. The method of unsupervised text emotion analysis of claim 1, wherein in step (1), the method of creating positive emotion dictionary PSD and negative emotion dictionary NSD is as follows:
the first step: selecting M groups of seed word pairs with opposite polarities to respectively form an initial positive emotion dictionary PSD and a negative emotion dictionary NSD;
and a second step of: selecting a corpus, extracting adjectives and adverbs in the corpus by a part-of-speech labeler based on a hidden Markov model, and taking the adjectives and the adverbs as candidate emotion words W hx Using part-of-speech tagger to make each word w in the sentence in the corpus i Marking the part of speech t i Let each part of speech t i Is only related to the part of speech t of the last word i-1 Concerning, i.e. P (t i |t i-1 ) And each word w i Probability of only t being part of speech i Correlation, i.e. P (w i |t i ) Then a part-of-speech tag is selected as word w that maximizes the joint probability distribution i Is part of speech:
and a third step of: calculating each candidate emotion word W by using point mutual information-information retrieval algorithm PMI-IR hx Semantic association degrees among all seed words in the positive emotion dictionary PSD and the negative emotion dictionary NSD are used as emotion scores of candidate emotion words;
fourth step: for a certain candidate emotion word W hx If emotion Score (W hx ) Greater than 0, the word belongs to a positive emotion word, if emotion Score (W) hx ) Less than 0, belonging to the passive emotion words, and according to the emotion attribute, the candidate emotion word W hx And adding the emotion dictionary into a corresponding emotion dictionary.
3. The method for unsupervised text emotion analysis according to claim 2, wherein in the third step, the semantic association degree calculation process is as follows:wherein W is hx Representing candidate emotion words, seed representing seed words in each emotion dictionary, PMI (W hx Seed) is a systemCounting candidate emotion words W hx Probability of co-occurrence with seed word, if probability is larger, the more closely related it is, the higher the degree of association is, score (W hx ) Is the emotion score of the candidate emotion word.
4. The method of unsupervised text emotion analysis of claim 1, wherein preprocessing the text in the positive emotion dictionary PSD, the negative emotion dictionary NSD and the corpus in step (2) comprises: correcting spelling errors, removing illegal characters of each dictionary and text, and removing useless words including stop words and punctuation marks based on an English standard stop word list.
5. The method of claim 1, wherein in step (3), the GloVe tool is used to obtain the word vectors of the PSD, NSD and the words in each text, respectively
6. The method of unsupervised text emotion analysis of claim 1, wherein in the third step of step (3), the sequence is projected for positive emotionThe training method comprises the following steps:
likelihood functionThe definition is as follows:
wherein pi (n) i Is the positive emotion projection sequence pi PSD Projection matrix of ith word in (p) PSD Is the density matrix of the active emotion dictionary, tr is the trace operation of the computation matrix, tr (pi i ρ PSD ) Representing word w i Probability of occurrence, likelihood functionRepresenting joint probabilities of co-occurrence of all words in the positive emotion dictionary;
objective function F (ρ) PSD ) The definition is as follows:
F(ρ PSD ) Representing solving a maximum value of joint probabilities of all words appearing in the positive emotion dictionary;
using a global convergence algorithm by defining an iteration direction D k Continuous iterative update ρ PSD And an objective function F (ρ) PSD ) Until the objective function F (p PSD ) Outputs the maximum value of the positive emotion dictionary density matrix ρ PSD ;
According to the same training method, a negative dictionary density matrix rho is obtained NSD And text density matrix ρ text 。
7. The method of unsupervised text emotion analysis according to claim 1, wherein in the third step of step (3), the quantum relative entropy calculation process is as follows:
S p =tr(ρ text (logρ text -logρ PSD ))
S n =tr(ρ text (logρ text -logρ NSD ))
wherein S is p ,S n Not less than 0, if and only if ρ text =ρ PSD At the time S p =0;ρ text =ρ NSD At the time S n =0。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110113463.9A CN112905736B (en) | 2021-01-27 | 2021-01-27 | Quantum theory-based unsupervised text emotion analysis method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110113463.9A CN112905736B (en) | 2021-01-27 | 2021-01-27 | Quantum theory-based unsupervised text emotion analysis method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112905736A CN112905736A (en) | 2021-06-04 |
CN112905736B true CN112905736B (en) | 2023-09-19 |
Family
ID=76119050
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110113463.9A Active CN112905736B (en) | 2021-01-27 | 2021-01-27 | Quantum theory-based unsupervised text emotion analysis method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112905736B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113434646A (en) * | 2021-06-08 | 2021-09-24 | 天津大学 | Question-answering task matching model and method based on quantum measurement and self-attention mechanism |
WO2023061441A1 (en) * | 2021-10-13 | 2023-04-20 | 合肥本源量子计算科技有限责任公司 | Text quantum circuit determination method, text classification method, and related apparatus |
CN114492417A (en) * | 2022-02-07 | 2022-05-13 | 北京妙医佳健康科技集团有限公司 | Interpretable deep learning method, interpretable deep learning device, computer and medium |
CN115860989B (en) * | 2022-11-29 | 2024-05-14 | 广州明动软件股份有限公司 | Administrative law enforcement electronic document delivery method and system based on administrative law enforcement and case handling platform |
Citations (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103544246A (en) * | 2013-10-10 | 2014-01-29 | 清华大学 | Method and system for constructing multi-emotion dictionary for internet |
CN103995803A (en) * | 2014-04-25 | 2014-08-20 | 西北工业大学 | Fine granularity text sentiment analysis method |
CN104216873A (en) * | 2014-08-27 | 2014-12-17 | 华中师范大学 | Method for analyzing network left word emotion fluctuation characteristics of emotional handicap sufferer |
CN104317965A (en) * | 2014-11-14 | 2015-01-28 | 南京理工大学 | Establishment method of emotion dictionary based on linguistic data |
CN104516947A (en) * | 2014-12-03 | 2015-04-15 | 浙江工业大学 | Chinese microblog emotion analysis method fused with dominant and recessive characters |
CN104951548A (en) * | 2015-06-24 | 2015-09-30 | 烟台中科网络技术研究所 | Method and system for calculating negative public opinion index |
CN105930368A (en) * | 2016-04-13 | 2016-09-07 | 深圳大学 | Emotion classification method and system |
CN107357837A (en) * | 2017-06-22 | 2017-11-17 | 华南师范大学 | The electric business excavated based on order-preserving submatrix and Frequent episodes comments on sensibility classification method |
CN107491531A (en) * | 2017-08-18 | 2017-12-19 | 华南师范大学 | Chinese network comment sensibility classification method based on integrated study framework |
CN107832663A (en) * | 2017-09-30 | 2018-03-23 | 天津大学 | A kind of multi-modal sentiment analysis method based on quantum theory |
CN107908635A (en) * | 2017-09-26 | 2018-04-13 | 百度在线网络技术(北京)有限公司 | Establish textual classification model and the method, apparatus of text classification |
CN108596637A (en) * | 2018-04-24 | 2018-09-28 | 北京航空航天大学 | A kind of electric business service problem discovery system |
CN109101478A (en) * | 2018-06-04 | 2018-12-28 | 东南大学 | A kind of Aspect grade sentiment analysis method towards electric business comment text |
CN110046671A (en) * | 2019-04-24 | 2019-07-23 | 吉林大学 | A kind of file classification method based on capsule network |
CN110287319A (en) * | 2019-06-13 | 2019-09-27 | 南京航空航天大学 | Students' evaluation text analyzing method based on sentiment analysis technology |
CN110598207A (en) * | 2019-08-14 | 2019-12-20 | 华南师范大学 | Word vector obtaining method and device and storage medium |
CN111191463A (en) * | 2019-12-30 | 2020-05-22 | 杭州远传新业科技有限公司 | Emotion analysis method and device, electronic equipment and storage medium |
CN111897964A (en) * | 2020-08-12 | 2020-11-06 | 腾讯科技(深圳)有限公司 | Text classification model training method, device, equipment and storage medium |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9275041B2 (en) * | 2011-10-24 | 2016-03-01 | Hewlett Packard Enterprise Development Lp | Performing sentiment analysis on microblogging data, including identifying a new opinion term therein |
US9996504B2 (en) * | 2013-07-08 | 2018-06-12 | Amazon Technologies, Inc. | System and method for classifying text sentiment classes based on past examples |
-
2021
- 2021-01-27 CN CN202110113463.9A patent/CN112905736B/en active Active
Patent Citations (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103544246A (en) * | 2013-10-10 | 2014-01-29 | 清华大学 | Method and system for constructing multi-emotion dictionary for internet |
CN103995803A (en) * | 2014-04-25 | 2014-08-20 | 西北工业大学 | Fine granularity text sentiment analysis method |
CN104216873A (en) * | 2014-08-27 | 2014-12-17 | 华中师范大学 | Method for analyzing network left word emotion fluctuation characteristics of emotional handicap sufferer |
CN104317965A (en) * | 2014-11-14 | 2015-01-28 | 南京理工大学 | Establishment method of emotion dictionary based on linguistic data |
CN104516947A (en) * | 2014-12-03 | 2015-04-15 | 浙江工业大学 | Chinese microblog emotion analysis method fused with dominant and recessive characters |
CN104951548A (en) * | 2015-06-24 | 2015-09-30 | 烟台中科网络技术研究所 | Method and system for calculating negative public opinion index |
CN105930368A (en) * | 2016-04-13 | 2016-09-07 | 深圳大学 | Emotion classification method and system |
CN107357837A (en) * | 2017-06-22 | 2017-11-17 | 华南师范大学 | The electric business excavated based on order-preserving submatrix and Frequent episodes comments on sensibility classification method |
CN107491531A (en) * | 2017-08-18 | 2017-12-19 | 华南师范大学 | Chinese network comment sensibility classification method based on integrated study framework |
CN107908635A (en) * | 2017-09-26 | 2018-04-13 | 百度在线网络技术(北京)有限公司 | Establish textual classification model and the method, apparatus of text classification |
CN107832663A (en) * | 2017-09-30 | 2018-03-23 | 天津大学 | A kind of multi-modal sentiment analysis method based on quantum theory |
CN108596637A (en) * | 2018-04-24 | 2018-09-28 | 北京航空航天大学 | A kind of electric business service problem discovery system |
CN109101478A (en) * | 2018-06-04 | 2018-12-28 | 东南大学 | A kind of Aspect grade sentiment analysis method towards electric business comment text |
CN110046671A (en) * | 2019-04-24 | 2019-07-23 | 吉林大学 | A kind of file classification method based on capsule network |
CN110287319A (en) * | 2019-06-13 | 2019-09-27 | 南京航空航天大学 | Students' evaluation text analyzing method based on sentiment analysis technology |
CN110598207A (en) * | 2019-08-14 | 2019-12-20 | 华南师范大学 | Word vector obtaining method and device and storage medium |
CN111191463A (en) * | 2019-12-30 | 2020-05-22 | 杭州远传新业科技有限公司 | Emotion analysis method and device, electronic equipment and storage medium |
CN111897964A (en) * | 2020-08-12 | 2020-11-06 | 腾讯科技(深圳)有限公司 | Text classification model training method, device, equipment and storage medium |
Non-Patent Citations (5)
Title |
---|
A quantum-in-spired multimodal sentiment analysis framework;ZHANG Y等;《Theoretical Computer Science》;第21-40 * |
Unsupervised Sentiment Analysis of Twitter Posts Using Density Matrix Representation;Yazhou Zhang;《European Conference on Information Retrieval ECIR 2018:Advances ininformation Retrieval》;第316-329页 * |
基于情感分析的网络谣言识别方法;首欢容;邓淑卿;徐健;;数据分析与知识发现(07);第48-55页 * |
基于时空维度的国内外情感分析研究演化分析;赵蓉英;张扬;;情报科学(10);第173-179页 * |
基于标签传播的情感词典构建方法;张璞;王俊霞;王英豪;;计算机工程(05);第174-179页 * |
Also Published As
Publication number | Publication date |
---|---|
CN112905736A (en) | 2021-06-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112905736B (en) | Quantum theory-based unsupervised text emotion analysis method | |
CN108628823B (en) | Named entity recognition method combining attention mechanism and multi-task collaborative training | |
CN106776581B (en) | Subjective text emotion analysis method based on deep learning | |
CN108363743B (en) | Intelligent problem generation method and device and computer readable storage medium | |
CN110245229B (en) | Deep learning theme emotion classification method based on data enhancement | |
CN111209401A (en) | System and method for classifying and processing sentiment polarity of online public opinion text information | |
CN112183094B (en) | Chinese grammar debugging method and system based on multiple text features | |
CN111125333B (en) | Generation type knowledge question-answering method based on expression learning and multi-layer covering mechanism | |
CN110119443B (en) | Emotion analysis method for recommendation service | |
CN112818698B (en) | Fine-grained user comment sentiment analysis method based on dual-channel model | |
CN113268576B (en) | Deep learning-based department semantic information extraction method and device | |
Sun et al. | VCWE: visual character-enhanced word embeddings | |
CN115438154A (en) | Chinese automatic speech recognition text restoration method and system based on representation learning | |
CN115759119B (en) | Financial text emotion analysis method, system, medium and equipment | |
CN116910272B (en) | Academic knowledge graph completion method based on pre-training model T5 | |
CN115034218A (en) | Chinese grammar error diagnosis method based on multi-stage training and editing level voting | |
CN113961706A (en) | Accurate text representation method based on neural network self-attention mechanism | |
CN114154504A (en) | Chinese named entity recognition algorithm based on multi-information enhancement | |
CN113535897A (en) | Fine-grained emotion analysis method based on syntactic relation and opinion word distribution | |
CN115238693A (en) | Chinese named entity recognition method based on multi-word segmentation and multi-layer bidirectional long-short term memory | |
Nugraha et al. | Typographic-based data augmentation to improve a question retrieval in short dialogue system | |
CN111666374A (en) | Method for integrating additional knowledge information into deep language model | |
CN117973372A (en) | Chinese grammar error correction method based on pinyin constraint | |
CN116049349B (en) | Small sample intention recognition method based on multi-level attention and hierarchical category characteristics | |
CN109960782A (en) | A kind of Tibetan language segmenting method and device based on deep neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |