CN112905736B - Quantum theory-based unsupervised text emotion analysis method - Google Patents

Quantum theory-based unsupervised text emotion analysis method Download PDF

Info

Publication number
CN112905736B
CN112905736B CN202110113463.9A CN202110113463A CN112905736B CN 112905736 B CN112905736 B CN 112905736B CN 202110113463 A CN202110113463 A CN 202110113463A CN 112905736 B CN112905736 B CN 112905736B
Authority
CN
China
Prior art keywords
emotion
text
psd
dictionary
nsd
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110113463.9A
Other languages
Chinese (zh)
Other versions
CN112905736A (en
Inventor
张亚洲
马军霞
崔建涛
李璞
朱少林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou University of Light Industry
Original Assignee
Zhengzhou University of Light Industry
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou University of Light Industry filed Critical Zhengzhou University of Light Industry
Priority to CN202110113463.9A priority Critical patent/CN112905736B/en
Publication of CN112905736A publication Critical patent/CN112905736A/en
Application granted granted Critical
Publication of CN112905736B publication Critical patent/CN112905736B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3335Syntactic pre-processing, e.g. stopword elimination, stemming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3346Query execution using probabilistic model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Algebra (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to an unsupervised text emotion analysis method based on quantum theory, which comprises the following steps: the method comprises the following steps: creating two emotion dictionaries, namely a positive emotion dictionary PSD and a negative emotion dictionary NSD; preprocessing texts in the positive emotion dictionary PSD, the negative emotion dictionary NSD and the corpus; constructing a quantum text representation model, and respectively extracting features of the preprocessed positive emotion dictionary PSD, the preprocessed negative emotion dictionary NSD and the text to construct a positive emotion dictionary density matrix rho PSD Negative emotion dictionary density matrix ρ NSD Text density matrix ρ text The method comprises the steps of carrying out a first treatment on the surface of the And obtaining the emotion classification result of each text by using a quantum relative entropy algorithm.

Description

Quantum theory-based unsupervised text emotion analysis method
Technical Field
The invention relates to the technical field of text emotion classification, in particular to an unsupervised text emotion analysis method.
Background
The development of the internet has penetrated all aspects of the social politics economy so far, affecting people's daily lives. With the advent of the Internet age, social platforms develop rapidly like the spring bamboo shoots after rain, break through the social mode of closed blockage in the past, provide a wider platform for open interaction between users, and provide a lot of convenience for daily life of people. Nowadays, more and more users like to publish own attitudes and comments on social platforms (such as microblogs, weChat and the like), and every day, the social platforms can emerge tens of thousands of TB-level data contents, so that the social platforms become one of main sources for acquiring information in daily life of people. These information not only contain a report of objective facts, but also carry a large number of subjective emotional expressions. The method has the advantages that the contained emotion information is mined and identified, and the method has important scientific research significance and economic value for various fields such as public opinion analysis, marketing, investment prediction and the like. The invention mainly researches the most common text-pushing and blogging emotion in the social platform, namely a text emotion analysis technology.
One core task of text emotion analysis is text representation. Text representation is a form (method) of representing semantic information contained in text strings into real-valued vectors which can be processed by a computer, and meanwhile, the vectors are required to have excellent expression capability and distinguishing capability. Therefore, the vector-based text representation method occupies the main stream, and the performance of the vector-based text representation method is fully verified on each large data set, such as one-hot coding, word frequency-inverse document frequency, word embedding and the like. In recent years, the field of information retrieval shows a series of outstanding achievements based on quantum probability theory, which shows that the quantum probability theory can be used as an extended mathematical framework for tasks such as text characterization, document ordering and the like. Of these, the most representative is the quantum language model proposed by Sordoni et al for classical information retrieval tasks. As an extension of the classical language model, the quantum language model aims at solving the problem of term dependence, and achieves good effect.
Text in emotion analysis represents questions, typically for long text, comments at the chapter level, such as movie comments, product comments, etc. Such style text generally has the characteristics of complex semantic relationships, frequent interaction between terms, and deep dependence of context, and requires a superior representation learning model compared to information retrieval tasks. The standard quantum language model adopts one-hot coding to construct projection operators, when facing long texts, dimension disasters are easy to cause, and the problem that the quantum language model cannot be converged is exposed when training the high-dimension density matrix. But compared to vector-based representation methods, the density matrix in quantum theory can encode more semantic information, exhibiting second order correlation between word vectors. Therefore, combining quantum theory with density matrices is a valuable topic for developing novel text representation models.
Disclosure of Invention
The invention aims to solve the technical problem of overcoming the defects of the prior art and providing an unsupervised quantum text emotion analysis method. According to the method, two active and passive emotion dictionaries are constructed, each emotion dictionary and each subjective document are respectively represented, density matrix representation is constructed, then the similarity score between each subjective document and each active and passive emotion dictionary is calculated by quantum relative entropy, and an emotion classification result is obtained by comparing the similarity scores. The aim of the invention is realized by the following technical scheme:
an unsupervised text emotion analysis method based on quantum theory comprises the following steps:
(1): creating two emotion dictionaries, namely a positive emotion dictionary PSD and a negative emotion dictionary NSD, wherein the positive emotion dictionary contains words with positive emotion polarities, and the negative emotion dictionary contains words with negative emotion polarities;
(2): preprocessing texts in the positive emotion dictionary PSD, the negative emotion dictionary NSD and the corpus;
(3): constructing a quantum text representation model, and respectively extracting features of the preprocessed positive emotion dictionary PSD, the preprocessed negative emotion dictionary NSD and the text to construct a positive emotion dictionary density matrix rho PSD Negative emotion dictionary density matrix ρ NSD Text density matrix ρ text The method comprises the following steps:
the first step: respectively obtaining PSD, NSD and word vector of word in each textAnd then normalizing:
and a second step of: based on vector outer product operation, a positive emotion dictionary PSD, a negative emotion dictionary NSD and projection matrixes of each word in a text are constructed, and the projection matrixes of all words in the positive emotion dictionary PSD are combined together to form a positive emotion projection sequenceProjection matrixes of all words in the negative emotion dictionary are combined into a negative emotion projection sequence +.>And the projection matrices of all words in each text are combined into a text projection sequenceWhere r represents the number of words of the positive emotion dictionary PSD, k represents the number of words of the negative emotion dictionary NSD, and t represents the number of words contained in each text;
and a third step of: obtaining respective projection sequences pi of the positive emotion dictionary, the negative emotion dictionary and the text PSD 、Π NSD 、Π text Then, a maximum likelihood estimation MLE method is used for making likelihood functionsNumber of digitsRespectively training the density matrixes of the active emotion dictionary density matrixes ρ PSD Negative dictionary density matrix ρ NSD And text density matrix ρ text
(4): calculating text density matrix rho by using quantum relative entropy algorithm text Respectively and actively emotion dictionary density matrix rho PSD Negative emotion dictionary density matrix ρ NSD Is a positive similarity score S p Similarity to negative score S n
(5): comparing the positive similarity score with the negative similarity score if S p >S n And if the emotion type belongs to positive, otherwise, the emotion type belongs to negative, and finally, the emotion classification result of each text is obtained.
Further, in the step (1), the method for creating the positive emotion dictionary PSD and the negative emotion dictionary NSD is as follows:
the first step: selecting M groups of seed word pairs with opposite polarities to respectively form an initial positive emotion dictionary PSD and a negative emotion dictionary NSD;
and a second step of: selecting a corpus, extracting adjectives and adverbs in the corpus by a part-of-speech labeler based on a hidden Markov model, and taking the adjectives and the adverbs as candidate emotion words W hx Using part-of-speech tagger to make each word w in the sentence in the corpus i Marking the part of speech t i Let each part of speech t i Is only related to the part of speech t of the last word i-1 Concerning, i.e. P (t i |t i-1 ) And each word w i Probability of only t being part of speech i Correlation, i.e. P (w i |t i ) Then a part-of-speech tag is selected as word w that maximizes the joint probability distribution i Is part of speech:
and a third step of: using point-to-point information-information retrieval algorithmPMI-IR calculates each candidate emotion word W hx Semantic association degrees among all seed words in the positive emotion dictionary PSD and the negative emotion dictionary NSD are used as emotion scores of candidate emotion words;
fourth step: for a certain candidate emotion word W hx If emotion Score (W hx ) Greater than 0, the word belongs to a positive emotion word, if emotion Score (W) hx ) Less than 0, belonging to the passive emotion words, and according to the emotion attribute, the candidate emotion word W hx And adding the emotion dictionary into a corresponding emotion dictionary.
In the third step, the semantic association degree calculating process may be:
wherein W is hx Representing candidate emotion words, seed representing seed words in each emotion dictionary, PMI (W hx Seed) is a statistical candidate emotion word W hx Probability of co-occurrence with seed word, if probability is larger, the more closely related it is, the higher the degree of association is, score (W hx ) Is the emotion score of the candidate emotion word.
In step (2), preprocessing the text in the positive emotion dictionary PSD, the negative emotion dictionary NSD and the corpus should include: correcting spelling errors, removing illegal characters of each dictionary and text, and removing useless words including stop words and punctuation marks based on an English standard stop word list.
In step (3), the GloVe tool can be used to obtain PSD, NSD and word vectors of words in each text
In the third step of step (3), for positive emotion projection sequencesThe training method comprises the following steps:
likelihood functionThe definition is as follows:
wherein pi (n) i Is the positive emotion projection sequence pi PSD Projection matrix of ith word in (p) PSD Is the density matrix of the active emotion dictionary, tr is the trace operation of the computation matrix, tr (pi i ρ PSD ) Representing word w i Probability of occurrence, likelihood functionRepresenting the joint probability of the co-occurrence of all words in the positive emotion dictionary.
Objective function F (ρ) PSD ) The definition is as follows:
F(ρ PSD ) Representing the maximum value of the joint probability of solving all words of the positive emotion dictionary.
Using a global convergence algorithm that continuously iteratively updates ρ by defining an iteration direction Dk PSD And an objective function F (ρ) PSD ) Until the objective function F (p PSD ) Outputs the maximum value of the positive emotion dictionary density matrix ρ PSD
According to the same training method, a negative dictionary density matrix rho is obtained NSD And text density matrix ρ text
In the third step in the step (3), the quantum relative entropy calculation process may be:
S p =tr(ρ text (logρ text -logρ PSD ))
S n =tr(ρ text (logρ text -logρ NSD ))
wherein S is p ,S n Not less than 0, if and only if ρ text =ρ PSD At the time S p =0;ρ text =ρ NSD At the time S n =0。
The beneficial effects of the invention are as follows:
(1) Constructing a high-quality positive emotion dictionary and a high-quality negative emotion dictionary, and expressing two basic emotions of human beings;
(2) Based on quantum probability theory, extracting text features, constructing a density matrix, and encoding term semantics and probability distribution information;
(3) Based on quantum relative entropy, similarity between density matrixes is calculated, emotion classification can be completed unsupervised, and the method has the characteristics of quick response, strong field adaptability, high accuracy and the like.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a polarity distribution diagram of emotion words in an emotion dictionary;
FIG. 3 is a quantum text representation model flow diagram;
FIG. 4 shows the results of comparison of histogram experiments of different emotion analysis methods.
Detailed Description
The technical solution of the present invention will be described in further detail with reference to the accompanying drawings, but the scope of the present invention is not limited to the following description. FIG. 1 shows the flow of the method for unsupervised text emotion analysis based on quantum theory; FIG. 2 shows the emotion polarity profile of words in an emotion dictionary; FIG. 3 shows a flow chart of a quantum text representation model; fig. 4 shows the results of the experimental comparison of emotion classification between the final different methods. The method comprises the following specific steps:
(1): based on 7 groups of seed words and OMD (The Obama-McCain Debate) English corpus, two emotion dictionaries, namely positive and negative emotion dictionaries, named (positive sentiment dictionary, PSD) and (negative sentiment dictionary, NSD) are created manually, and The method is as follows:
the first step: 7 sets of seed word pairs with opposite polarities are manually selected, namely 'active/negative', 'good/bad', 'love/ha', 'excel/pore', 'amazing/shit', 'nice/tert' and 'awesome/crap', respectively. Thus, the initial positive emotion dictionary psd= (positive, good, love, excel, amazing, nice, awesome), and the initial negative emotion dictionary nsd= (negative, bad, ate, face, shit, terlie, crap).
In the second step, a total of 855 adjectives and adverbs in the corpus are extracted by a hidden markov (Hidden Markov Model, HMM) part-of-speech labeler, and these words are used as candidate emotion words, for example awesome, thankful, dirty, dumb, terrible. The calculation process is as follows: the HMM part-of-speech tagger is for each word w in the text i Marking the part of speech t i (e.g., adjectives, verbs, adverbs, etc.). Assume each part of speech t i Is only related to the part of speech t of the last word i-1 Related (i.e. P (t) i |t i-1 ) With each word w) i Probability of only t being part of speech i Correlation (i.e. P (w) i |t i ) A part-of-speech tag that maximizes the joint probability distribution is selected as the word w) i Is part of speech:and counting the occurrence frequency of each word according to the corpus, and calculating the part of speech corresponding to each word after three parameters of the HMM are obtained, so as to finish the part of speech labeling process.
And a third step of: each candidate emotion word W is calculated by using a point mutual information-information retrieval PMI-IR method hx Semantic association degrees among all seed words in the positive emotion dictionary PSD and the negative emotion dictionary NSD are used as emotion scores of the candidate emotion words. The semantic association degree calculating process is as follows:wherein W is hx Representing candidate emotion words, seed representing seed words in each emotion dictionary, PMI (W hx Seed) is a statistical candidate emotion word W hx The probability of co-occurrence with the seed word, the more closely the correlation, the higher the correlation if the probability is greater. Score (W) hx ) Is the emotion score of the candidate emotion word.
Fourth step: if emotion Score (W) hx ) Greater than 0, the word belongs to a positive emotion word, if emotion Score (W) hx ) And (3) being smaller than 0, belonging to the negative emotion words, and respectively adding the positive emotion words and the negative emotion words into the corresponding emotion dictionary. Finally, the positive emotion dictionary PSD contains 150 positive emotion words, e.g., best, healthy, amazing, beautiful, etc., while the negative emotion dictionary NSD contains 152 negative emotion words, e.g., fake, bloody, weird, offensively, sad, etc.
(2): the method comprises the steps of preprocessing 1928 documents in a positive emotion dictionary PSD, a negative emotion dictionary NSD and an OMD text corpus by using a Python natural language tool kit, correcting spelling errors, removing illegal characters (such as ". The total number of text books of the final OMD corpus is 1906.
(3): training a quantum text representation model, respectively extracting features from positive and negative emotion dictionaries and texts, and constructing a positive dictionary density matrix rho PSD Negative dictionary density matrix ρ NSD Text density matrix ρ text Are all L x L matrices, where L is the dimension of each word vector. Assume that each dictionary or text is represented as d= { w 1 ,w 2 ,...,w t T is the number of words in the dictionary or text, as shown in fig. 3. The method comprises the following steps:
the first step: obtaining a positive emotion dictionary PSD, a negative emotion dictionary NSD and 300-dimensional word vectors of each word in a text by using a Glove toolNormalizing to obtain: />
And a second step of: based on the vector outer product operation, the following formulas are utilized to construct an emotion dictionary and each word in the textw i Projection matrix of (c) Projection matrixIs a 300 x 300 matrix.
Then the projection matrixes of all words in the positive emotion dictionary are combined together to form a positive emotion projection sequenceCombining projection matrixes of all words in negative emotion dictionary into negative emotion projection sequenceAnd the projection matrices of all words in each text are combined into a text projection sequenceWhere r represents the number of words of the positive emotion dictionary, i.e., 150; k represents the number of words of the negative emotion dictionary, i.e., 152; and t represents the number of words each text contains.
And a third step of: obtain projection sequence pi of active dictionary, passive dictionary and text PSD 、Π NSD And pi (a Chinese character) text Then, a maximum likelihood estimation (maximum likelihood estimation, MLE) method is used for formulating likelihood functions(the meaning of likelihood function is the probability of getting the document), start training density matrix, likelihood function +.>The definition is as follows:
wherein pi (n) i Is each projection sequence { pi } PSDNSDtext The i-th word projection matrix in the sequence { r, k, t } represents each projection sequence { n } PSDNSDtext The number of words contained in the pattern, ρ is the density matrix, ρ∈ { ρ }, ρ is PSD ,ρ NSD ,ρ text And tr is the trace operation to calculate the matrix. tr (pi) i ρ) represents the word w i Probability of occurrence, likelihood functionAnd respectively representing the joint probabilities of the positive emotion dictionary, the negative emotion dictionary and all words in the text.
Since the log function has monotonicity, the log function is used for likelihood functionThe logarithm does not change its monotonic nature, so the objective function F (ρ) can be defined as:
wherein tr (ρ) =1, ρ.gtoreq. 0,F (ρ) ∈ { F (ρ) PSD ),F(ρ NSD ),F(ρ text ) The maximum value of joint probabilities that the positive emotion dictionary, the negative emotion dictionary and all words in the text co-occur are solved.
Fourth step: a global convergence algorithm is applied, which algorithm is implemented by defining the iteration direction D k Iteratively updating values of p and the objective function F (p) continuously until a maximum value of the objective function F (p) is obtained, and outputting respective positive dictionary density matrices p PSD Negative dictionary density matrix ρ NSD And text density matrix ρ text . Wherein, the update rule defining the kth iteration of the density matrix ρ is: ρ k+1 =ρ k +t k D k And t k Called step size, t k ∈[0,1]Representing the magnitude of the kth iteration objective function F (ρ) update; and direction of iteration D k The definition is as follows:
wherein the method comprises the steps ofAnd->Respectively representing two basic directions of vertical and horizontal, and iteration direction D k By->And->And simultaneously controlling between vertical and horizontal. q (t) k ) Representing the overall iteration direction, +.>Representing the gradient direction of the kth iteration objective function.
They are defined as:
wherein,,is the frequency of each word. To demonstrate the robustness of the global convergence algorithm, a diagonal matrix is randomly initialized at the beginning of the iteration>It satisfies all properties of the density matrix, e.g. ρ 0 And more than or equal to 0. When the back-and-forth variation of the value of the objective function is within 0.0001, the iteration is terminated, and the final density matrix ρ ε { ρ PSD ,ρ NSD ,ρ text }。
(4): calculating text density matrix rho by using quantum relative entropy algorithm text Respectively and actively dictionary density matrix ρ PSD Negative dictionary density matrix ρ NSD Is a positive similarity score S p Similarity to negative score S n . Quantum relative entropy is defined as:
S p =tr(ρ text (logρ text -logρ PSD ))
S n =tr(ρ text (logρ text -logρ NSD ))
wherein S is p ,S n Not less than 0, if and only if ρ text =ρ PSD At the time S p =0;ρ text =ρ NSD At the time S n =0。
(5) Comparing positive similarity scores S p Similarity to negative score S n If S p >S n And if the emotion type belongs to positive (emotion label is +1), otherwise, the emotion type belongs to negative (emotion label is-1), and finally, the emotion classification result of each text is obtained.
The emotion classification result of each subjective text is obtained, the emotion label is compared and tested, the classification accuracy is calculated, the word bag model, the sentence embedding model, the point mutual information-information retrieval algorithm and the quantum language model are compared, the statistical accuracy is compared with the histogram, and the effect of the text emotion analysis model can be obviously improved, as shown in fig. 4, by the method and the device.
The technical means disclosed by the scheme of the invention is not limited to the technical means disclosed by the embodiment, and also comprises the technical scheme formed by any combination of the technical features. It should be noted that modifications and adaptations to the invention may occur to one skilled in the art without departing from the principles of the present invention and are intended to be within the scope of the present invention.

Claims (7)

1. An unsupervised text emotion analysis method based on quantum theory comprises the following steps: the method comprises the following steps:
(1): creating two emotion dictionaries, namely a positive emotion dictionary PSD and a negative emotion dictionary NSD, wherein the positive emotion dictionary contains words with positive emotion polarities, and the negative emotion dictionary contains words with negative emotion polarities;
(2): preprocessing texts in the positive emotion dictionary PSD, the negative emotion dictionary NSD and the corpus;
(3): constructing a quantum text representation model, and respectively extracting features of the preprocessed positive emotion dictionary PSD, the preprocessed negative emotion dictionary NSD and the text to construct a positive emotion dictionary density matrix rho PSD Negative emotion dictionary density matrix ρ NSD Text density matrix ρ text The method comprises the following steps:
the first step: respectively obtaining PSD, NSD and word vector of word in each textAnd then normalizing:
and a second step of: based on vector outer product operation, a positive emotion dictionary PSD, a negative emotion dictionary NSD and projection matrixes of each word in a text are constructed, and the projection matrixes of all words in the positive emotion dictionary PSD are combined together to form the positive emotion dictionaryEmotion projection sequenceProjection matrixes of all words in the negative emotion dictionary are combined into a negative emotion projection sequence +.>And the projection matrices of all words in each text are combined into a text projection sequenceWhere r represents the number of words of the positive emotion dictionary PSD, k represents the number of words of the negative emotion dictionary NSD, and t represents the number of words contained in each text;
and a third step of: obtaining respective projection sequences pi of the positive emotion dictionary, the negative emotion dictionary and the text PSD 、Π NSD 、Π text Then, a likelihood function is formulated by using a maximum likelihood estimation MLE methodRespectively training the density matrixes of the active emotion dictionary density matrixes ρ PSD Negative dictionary density matrix ρ NSD And text density matrix ρ text
(4): calculating text density matrix rho by using quantum relative entropy algorithm text Respectively and actively emotion dictionary density matrix rho PSD Negative emotion dictionary density matrix ρ NSD Is a positive similarity score S p Similarity to negative score S n
(5): comparing the positive similarity score with the negative similarity score if S p >S n And if the emotion type belongs to positive, otherwise, the emotion type belongs to negative, and finally, the emotion classification result of each text is obtained.
2. The method of unsupervised text emotion analysis of claim 1, wherein in step (1), the method of creating positive emotion dictionary PSD and negative emotion dictionary NSD is as follows:
the first step: selecting M groups of seed word pairs with opposite polarities to respectively form an initial positive emotion dictionary PSD and a negative emotion dictionary NSD;
and a second step of: selecting a corpus, extracting adjectives and adverbs in the corpus by a part-of-speech labeler based on a hidden Markov model, and taking the adjectives and the adverbs as candidate emotion words W hx Using part-of-speech tagger to make each word w in the sentence in the corpus i Marking the part of speech t i Let each part of speech t i Is only related to the part of speech t of the last word i-1 Concerning, i.e. P (t i |t i-1 ) And each word w i Probability of only t being part of speech i Correlation, i.e. P (w i |t i ) Then a part-of-speech tag is selected as word w that maximizes the joint probability distribution i Is part of speech:
and a third step of: calculating each candidate emotion word W by using point mutual information-information retrieval algorithm PMI-IR hx Semantic association degrees among all seed words in the positive emotion dictionary PSD and the negative emotion dictionary NSD are used as emotion scores of candidate emotion words;
fourth step: for a certain candidate emotion word W hx If emotion Score (W hx ) Greater than 0, the word belongs to a positive emotion word, if emotion Score (W) hx ) Less than 0, belonging to the passive emotion words, and according to the emotion attribute, the candidate emotion word W hx And adding the emotion dictionary into a corresponding emotion dictionary.
3. The method for unsupervised text emotion analysis according to claim 2, wherein in the third step, the semantic association degree calculation process is as follows:wherein W is hx Representing candidate emotion words, seed representing seed words in each emotion dictionary, PMI (W hx Seed) is a systemCounting candidate emotion words W hx Probability of co-occurrence with seed word, if probability is larger, the more closely related it is, the higher the degree of association is, score (W hx ) Is the emotion score of the candidate emotion word.
4. The method of unsupervised text emotion analysis of claim 1, wherein preprocessing the text in the positive emotion dictionary PSD, the negative emotion dictionary NSD and the corpus in step (2) comprises: correcting spelling errors, removing illegal characters of each dictionary and text, and removing useless words including stop words and punctuation marks based on an English standard stop word list.
5. The method of claim 1, wherein in step (3), the GloVe tool is used to obtain the word vectors of the PSD, NSD and the words in each text, respectively
6. The method of unsupervised text emotion analysis of claim 1, wherein in the third step of step (3), the sequence is projected for positive emotionThe training method comprises the following steps:
likelihood functionThe definition is as follows:
wherein pi (n) i Is the positive emotion projection sequence pi PSD Projection matrix of ith word in (p) PSD Is the density matrix of the active emotion dictionary, tr is the trace operation of the computation matrix, tr (pi i ρ PSD ) Representing word w i Probability of occurrence, likelihood functionRepresenting joint probabilities of co-occurrence of all words in the positive emotion dictionary;
objective function F (ρ) PSD ) The definition is as follows:
F(ρ PSD ) Representing solving a maximum value of joint probabilities of all words appearing in the positive emotion dictionary;
using a global convergence algorithm by defining an iteration direction D k Continuous iterative update ρ PSD And an objective function F (ρ) PSD ) Until the objective function F (p PSD ) Outputs the maximum value of the positive emotion dictionary density matrix ρ PSD
According to the same training method, a negative dictionary density matrix rho is obtained NSD And text density matrix ρ text
7. The method of unsupervised text emotion analysis according to claim 1, wherein in the third step of step (3), the quantum relative entropy calculation process is as follows:
S p =tr(ρ text (logρ text -logρ PSD ))
S n =tr(ρ text (logρ text -logρ NSD ))
wherein S is p ,S n Not less than 0, if and only if ρ text =ρ PSD At the time S p =0;ρ text =ρ NSD At the time S n =0。
CN202110113463.9A 2021-01-27 2021-01-27 Quantum theory-based unsupervised text emotion analysis method Active CN112905736B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110113463.9A CN112905736B (en) 2021-01-27 2021-01-27 Quantum theory-based unsupervised text emotion analysis method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110113463.9A CN112905736B (en) 2021-01-27 2021-01-27 Quantum theory-based unsupervised text emotion analysis method

Publications (2)

Publication Number Publication Date
CN112905736A CN112905736A (en) 2021-06-04
CN112905736B true CN112905736B (en) 2023-09-19

Family

ID=76119050

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110113463.9A Active CN112905736B (en) 2021-01-27 2021-01-27 Quantum theory-based unsupervised text emotion analysis method

Country Status (1)

Country Link
CN (1) CN112905736B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113434646A (en) * 2021-06-08 2021-09-24 天津大学 Question-answering task matching model and method based on quantum measurement and self-attention mechanism
WO2023061441A1 (en) * 2021-10-13 2023-04-20 合肥本源量子计算科技有限责任公司 Text quantum circuit determination method, text classification method, and related apparatus
CN114492417A (en) * 2022-02-07 2022-05-13 北京妙医佳健康科技集团有限公司 Interpretable deep learning method, interpretable deep learning device, computer and medium
CN115860989B (en) * 2022-11-29 2024-05-14 广州明动软件股份有限公司 Administrative law enforcement electronic document delivery method and system based on administrative law enforcement and case handling platform

Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103544246A (en) * 2013-10-10 2014-01-29 清华大学 Method and system for constructing multi-emotion dictionary for internet
CN103995803A (en) * 2014-04-25 2014-08-20 西北工业大学 Fine granularity text sentiment analysis method
CN104216873A (en) * 2014-08-27 2014-12-17 华中师范大学 Method for analyzing network left word emotion fluctuation characteristics of emotional handicap sufferer
CN104317965A (en) * 2014-11-14 2015-01-28 南京理工大学 Establishment method of emotion dictionary based on linguistic data
CN104516947A (en) * 2014-12-03 2015-04-15 浙江工业大学 Chinese microblog emotion analysis method fused with dominant and recessive characters
CN104951548A (en) * 2015-06-24 2015-09-30 烟台中科网络技术研究所 Method and system for calculating negative public opinion index
CN105930368A (en) * 2016-04-13 2016-09-07 深圳大学 Emotion classification method and system
CN107357837A (en) * 2017-06-22 2017-11-17 华南师范大学 The electric business excavated based on order-preserving submatrix and Frequent episodes comments on sensibility classification method
CN107491531A (en) * 2017-08-18 2017-12-19 华南师范大学 Chinese network comment sensibility classification method based on integrated study framework
CN107832663A (en) * 2017-09-30 2018-03-23 天津大学 A kind of multi-modal sentiment analysis method based on quantum theory
CN107908635A (en) * 2017-09-26 2018-04-13 百度在线网络技术(北京)有限公司 Establish textual classification model and the method, apparatus of text classification
CN108596637A (en) * 2018-04-24 2018-09-28 北京航空航天大学 A kind of electric business service problem discovery system
CN109101478A (en) * 2018-06-04 2018-12-28 东南大学 A kind of Aspect grade sentiment analysis method towards electric business comment text
CN110046671A (en) * 2019-04-24 2019-07-23 吉林大学 A kind of file classification method based on capsule network
CN110287319A (en) * 2019-06-13 2019-09-27 南京航空航天大学 Students' evaluation text analyzing method based on sentiment analysis technology
CN110598207A (en) * 2019-08-14 2019-12-20 华南师范大学 Word vector obtaining method and device and storage medium
CN111191463A (en) * 2019-12-30 2020-05-22 杭州远传新业科技有限公司 Emotion analysis method and device, electronic equipment and storage medium
CN111897964A (en) * 2020-08-12 2020-11-06 腾讯科技(深圳)有限公司 Text classification model training method, device, equipment and storage medium

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9275041B2 (en) * 2011-10-24 2016-03-01 Hewlett Packard Enterprise Development Lp Performing sentiment analysis on microblogging data, including identifying a new opinion term therein
US9996504B2 (en) * 2013-07-08 2018-06-12 Amazon Technologies, Inc. System and method for classifying text sentiment classes based on past examples

Patent Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103544246A (en) * 2013-10-10 2014-01-29 清华大学 Method and system for constructing multi-emotion dictionary for internet
CN103995803A (en) * 2014-04-25 2014-08-20 西北工业大学 Fine granularity text sentiment analysis method
CN104216873A (en) * 2014-08-27 2014-12-17 华中师范大学 Method for analyzing network left word emotion fluctuation characteristics of emotional handicap sufferer
CN104317965A (en) * 2014-11-14 2015-01-28 南京理工大学 Establishment method of emotion dictionary based on linguistic data
CN104516947A (en) * 2014-12-03 2015-04-15 浙江工业大学 Chinese microblog emotion analysis method fused with dominant and recessive characters
CN104951548A (en) * 2015-06-24 2015-09-30 烟台中科网络技术研究所 Method and system for calculating negative public opinion index
CN105930368A (en) * 2016-04-13 2016-09-07 深圳大学 Emotion classification method and system
CN107357837A (en) * 2017-06-22 2017-11-17 华南师范大学 The electric business excavated based on order-preserving submatrix and Frequent episodes comments on sensibility classification method
CN107491531A (en) * 2017-08-18 2017-12-19 华南师范大学 Chinese network comment sensibility classification method based on integrated study framework
CN107908635A (en) * 2017-09-26 2018-04-13 百度在线网络技术(北京)有限公司 Establish textual classification model and the method, apparatus of text classification
CN107832663A (en) * 2017-09-30 2018-03-23 天津大学 A kind of multi-modal sentiment analysis method based on quantum theory
CN108596637A (en) * 2018-04-24 2018-09-28 北京航空航天大学 A kind of electric business service problem discovery system
CN109101478A (en) * 2018-06-04 2018-12-28 东南大学 A kind of Aspect grade sentiment analysis method towards electric business comment text
CN110046671A (en) * 2019-04-24 2019-07-23 吉林大学 A kind of file classification method based on capsule network
CN110287319A (en) * 2019-06-13 2019-09-27 南京航空航天大学 Students' evaluation text analyzing method based on sentiment analysis technology
CN110598207A (en) * 2019-08-14 2019-12-20 华南师范大学 Word vector obtaining method and device and storage medium
CN111191463A (en) * 2019-12-30 2020-05-22 杭州远传新业科技有限公司 Emotion analysis method and device, electronic equipment and storage medium
CN111897964A (en) * 2020-08-12 2020-11-06 腾讯科技(深圳)有限公司 Text classification model training method, device, equipment and storage medium

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
A quantum-in-spired multimodal sentiment analysis framework;ZHANG Y等;《Theoretical Computer Science》;第21-40 *
Unsupervised Sentiment Analysis of Twitter Posts Using Density Matrix Representation;Yazhou Zhang;《European Conference on Information Retrieval ECIR 2018:Advances ininformation Retrieval》;第316-329页 *
基于情感分析的网络谣言识别方法;首欢容;邓淑卿;徐健;;数据分析与知识发现(07);第48-55页 *
基于时空维度的国内外情感分析研究演化分析;赵蓉英;张扬;;情报科学(10);第173-179页 *
基于标签传播的情感词典构建方法;张璞;王俊霞;王英豪;;计算机工程(05);第174-179页 *

Also Published As

Publication number Publication date
CN112905736A (en) 2021-06-04

Similar Documents

Publication Publication Date Title
CN112905736B (en) Quantum theory-based unsupervised text emotion analysis method
CN108628823B (en) Named entity recognition method combining attention mechanism and multi-task collaborative training
CN106776581B (en) Subjective text emotion analysis method based on deep learning
CN108363743B (en) Intelligent problem generation method and device and computer readable storage medium
CN110245229B (en) Deep learning theme emotion classification method based on data enhancement
CN111209401A (en) System and method for classifying and processing sentiment polarity of online public opinion text information
CN112183094B (en) Chinese grammar debugging method and system based on multiple text features
CN111125333B (en) Generation type knowledge question-answering method based on expression learning and multi-layer covering mechanism
CN110119443B (en) Emotion analysis method for recommendation service
CN112818698B (en) Fine-grained user comment sentiment analysis method based on dual-channel model
CN113268576B (en) Deep learning-based department semantic information extraction method and device
Sun et al. VCWE: visual character-enhanced word embeddings
CN115438154A (en) Chinese automatic speech recognition text restoration method and system based on representation learning
CN115759119B (en) Financial text emotion analysis method, system, medium and equipment
CN116910272B (en) Academic knowledge graph completion method based on pre-training model T5
CN115034218A (en) Chinese grammar error diagnosis method based on multi-stage training and editing level voting
CN113961706A (en) Accurate text representation method based on neural network self-attention mechanism
CN114154504A (en) Chinese named entity recognition algorithm based on multi-information enhancement
CN113535897A (en) Fine-grained emotion analysis method based on syntactic relation and opinion word distribution
CN115238693A (en) Chinese named entity recognition method based on multi-word segmentation and multi-layer bidirectional long-short term memory
Nugraha et al. Typographic-based data augmentation to improve a question retrieval in short dialogue system
CN111666374A (en) Method for integrating additional knowledge information into deep language model
CN117973372A (en) Chinese grammar error correction method based on pinyin constraint
CN116049349B (en) Small sample intention recognition method based on multi-level attention and hierarchical category characteristics
CN109960782A (en) A kind of Tibetan language segmenting method and device based on deep neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant