CN116108840A - Text fine granularity emotion analysis method, system, medium and computing device - Google Patents

Text fine granularity emotion analysis method, system, medium and computing device Download PDF

Info

Publication number
CN116108840A
CN116108840A CN202310124542.9A CN202310124542A CN116108840A CN 116108840 A CN116108840 A CN 116108840A CN 202310124542 A CN202310124542 A CN 202310124542A CN 116108840 A CN116108840 A CN 116108840A
Authority
CN
China
Prior art keywords
text
emotion analysis
emotion
model
data set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310124542.9A
Other languages
Chinese (zh)
Inventor
张丽
李志惠
郭婷
程同琳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN202310124542.9A priority Critical patent/CN116108840A/en
Publication of CN116108840A publication Critical patent/CN116108840A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a text fine granularity emotion analysis method, a system, a medium and a computing device, wherein the method comprises the following steps: performing word vectorization on comment texts in the preprocessed data set by adopting a BERT model; respectively extracting global features and local features of comment text word vectors by using BiLSTM+attention and BiGRU+attention double-channel models, and training to obtain a neural network model for emotion analysis; performing topic extraction on the preprocessed data set by adopting an LDA topic model to obtain topic-attribute words; screening short sentences containing attribute words in the data set, and labeling corresponding topics; inputting the short sentence set marked with the topics into a trained neural network model to obtain the emotion tendencies of the topics. The fine granularity emotion analysis method based on Bert+BiLSTM+BiGRU+LDA can effectively improve the accuracy of text fine granularity emotion analysis.

Description

Text fine granularity emotion analysis method, system, medium and computing device
Technical Field
The invention belongs to the technical field of natural language processing, relates to a text fine granularity emotion analysis method, and in particular relates to a text fine granularity emotion analysis method, a system, a medium and computing equipment based on Bert+BiLSTM+BiGRU+LDA.
Background
In the present age, with the rapid development of technology and the continuous innovation of information technology, the Internet gradually plays an increasingly important role in daily work, study and life of people, and the life style of people is greatly changed. With the development of mobile network equipment, the production, life, work and the like of people tend to be intelligent, the quantity of various types of App software is increased gradually to cover the living aspects of people, various experiences can be generated in the process of using the software by people, and the use experience of the people is represented by making corresponding online comments on the used software. The text comments generally contain emotion tendencies in multiple aspects, and by taking App software as an example, evaluation on multiple aspects such as functions, page styles and the like can be contained, a reviewer can be positive for the emotion tendencies of the functions, negative for the page styles, and the emotion tendencies of the text comments on all aspects are analyzed, namely, fine-granularity emotion analysis of the text is a difficulty in the field of natural language processing.
Emotion analysis, mainly aims at various texts with subjective emotion tendencies, and predicts emotion tendencies carried by the texts through preprocessing the texts, analyzing and summarizing semantic information. According to the granularity of analysis, the analysis can be divided into chapter-level emotion analysis, sentence-level emotion analysis and attribute-level emotion analysis, wherein the attribute-level emotion analysis is also called fine granularity emotion analysis. Fine granularity emotion analysis can be divided mainly into two steps: extracting and identifying topics (aspects) described in the text, and carrying out emotion analysis on emotion tendencies of the topics; the topic extraction generally adopts a machine learning method, such as PageRank algorithm, LDA topic model, howNet-based text clustering and other methods; the emotion analysis method for emotion tendencies of each topic is mainly based on three types of emotion dictionaries, machine learning and deep learning. Text emotion analysis based on an emotion dictionary mainly depends on the construction of the emotion dictionary, and the emotion dictionary also needs to be updated and maintained continuously to ensure the quality of the emotion dictionary; the emotion analysis is mainly carried out through classifier models such as naive Bayes, support vector machines, K nearest neighbors and the like by the traditional machine learning method.
Recently, deep learning technology is gradually applied to the natural language field for research, convolutional neural networks, long-term and short-term memory networks, attention mechanisms and the like are widely applied to the natural language field, and are more widely applied to emotion analysis, and the technology can learn text deep information, so that emotion classification accuracy is improved to a certain extent. However, in fine granularity emotion analysis, the deep learning method is less in application, and the accuracy is still further improved.
Disclosure of Invention
Aiming at the defects existing in the prior art, the invention provides a text fine granularity emotion analysis method, a system, a medium and computing equipment.
The invention discloses a text fine granularity emotion analysis method, which comprises the following steps:
acquiring a comment text data set;
preprocessing the comment text data set, wherein the preprocessing comprises data cleaning and data labeling;
performing word vectorization on comment texts in the preprocessed data set by adopting a Bert model to obtain comment text word vectors;
inputting comment text word vectors into a BiLSTM+attention model for coarse-granularity emotion analysis, extracting global feature information of the text, and optimizing global features through a Attention mechanism;
inputting comment text word vectors into a BiGRU+attention model for coarse-granularity emotion analysis, extracting local feature information of a text, and optimizing local features through a Attention mechanism;
fusing the optimized global features and the optimized local features to obtain final text emotion feature representation; outputting the fused characteristic representation through a full-connection layer and a softmax activation function, and predicting the emotion tendency of the whole sentence; finally training to obtain a neural network model for emotion analysis;
performing topic extraction on the preprocessed data set by adopting an LDA topic model to obtain topic-attribute words; screening short sentences containing attribute words in the data set, and labeling corresponding topics;
inputting the short sentence set marked with the theme into a trained neural network model for fine granularity emotion analysis to obtain emotion tendencies of all the themes.
As a further improvement of the invention, a web crawler technology is adopted to acquire the comment text data set.
As a further improvement of the present invention, the data cleansing includes removing irrelevant text and duplicate comment text, the irrelevant text including but not limited to abbreviations, emoticons, repeated punctuation marks and ambiguous sentences;
the method for labeling the data comprises, but is not limited to, labeling the comments with scoring information of the comments, labeling 0-2 as negative, labeling 3 as neutral and labeling 4-5 as positive.
As a further improvement of the invention, the comment text word vector is input into a BiLSTM+attention model for coarse-grained emotion analysis, global feature information of the text is extracted, and global features are optimized through a Attention mechanism; comprising the following steps:
calculating the states of a forgetting gate, a memory gate and a temporary cell at the current moment based on the hidden state at the previous moment and the comment text word vector at the current moment;
calculating the cell state at the current moment based on the forgetting gate, the memory gate and the temporary cell state at the current moment and the cell state at the last moment;
calculating a hidden state of the previous moment based on the hidden state of the previous moment, the input word of the current moment and the cell state of the current moment;
splicing hidden states obtained by calculating the BiLSTM forward sequence and the backward sequence to obtain global characteristic information;
and inputting the global feature information into an attention layer, and optimizing the global feature through an attention mechanism to obtain the optimized global feature.
As a further improvement of the invention, the comment text word vector is input into a BiGRU+attention model for coarse-grained emotion analysis, local feature information of the text is extracted, and the local feature is optimized through an Attention mechanism; comprising the following steps:
inputting comment text word vectors into BiGRU, wherein the BiGRU consists of a forward GRU sequence and a backward GRU sequence;
splicing the hidden states obtained by calculating the BiGRU forward sequence and the BiGRU backward sequence to obtain local characteristic information;
and inputting the local feature information into an attention layer, and optimizing the local feature through an attention mechanism to obtain the optimized local feature.
As a further improvement of the invention, the optimized global features and the optimized local features are fused to obtain the final text emotion feature representation; outputting the fused characteristic representation through a full-connection layer and a softmax activation function, and predicting the emotion tendency of the whole sentence; finally training to obtain a neural network model for emotion analysis; comprising the following steps:
fusing the optimized global features and the optimized local features in a row vector splicing mode to obtain final text emotion feature representation;
inputting the fused characteristic representation into a full connection layer, and mapping the characteristic representation into a vector with the size of 3;
inputting the output of the full connection layer into a Softmax activation function, performing emotion classification calculation through the Softmax layer, and outputting an emotion classification result;
and training based on the emotion classification result to obtain a neural network model for emotion analysis.
As a further improvement of the invention, the subject extraction is carried out on the preprocessed data set by adopting an LDA subject model to obtain a subject-attribute word; screening short sentences containing attribute words in the data set, and labeling corresponding topics; comprising the following steps:
word segmentation is carried out on the preprocessed data set;
inputting the segmented text into an LDA topic model, extracting topics according to the number of preset topics, and finally obtaining N topics and M attribute words related to the N topics;
screening short sentences with the attribute words according to the obtained attribute words, and marking corresponding topics on the short sentences according to the attribute words; when a plurality of attribute words are screened out from one sentence of comments, the comments need to be segmented to obtain a plurality of short sentences with the attribute words.
The invention also discloses a text fine granularity emotion analysis system, which comprises:
the acquisition module is used for acquiring the comment text data set;
the preprocessing module is used for preprocessing the comment text data set, and the preprocessing comprises data cleaning and data labeling;
the word vectorization module is used for carrying out word vectorization on comment texts in the preprocessed data set by adopting the BERT model to obtain comment text word vectors;
the global feature extraction module is used for inputting comment text word vectors into a BiLSTM+attention model for coarse-granularity emotion analysis, extracting global feature information of the text and optimizing global features through a Attention mechanism;
the local feature extraction module is used for inputting comment text word vectors into the BiGRU+attention model for coarse-granularity emotion analysis, extracting local feature information of the text and optimizing the local features through a Attention mechanism;
the training module is used for fusing the optimized global features and the optimized local features to obtain final text emotion feature representation; outputting the fused characteristic representation through a full-connection layer and a softmax activation function, and predicting the emotion tendency of the whole sentence; finally training to obtain a neural network model for emotion analysis;
the topic extraction module is used for extracting topics from the preprocessed data set by adopting the LDA topic model to obtain topic-attribute words; screening short sentences containing attribute words in the data set, and labeling corresponding topics;
the analysis module is used for inputting the short sentence set marked with the theme into the trained neural network model for carrying out fine granularity emotion analysis to obtain emotion tendencies of all the themes.
The invention also discloses a computer readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to perform the text fine granularity emotion analysis method described above.
The invention also discloses a computing device comprising: one or more memories storing executable instructions; and the one or more processors are used for executing the executable instructions to realize the text fine granularity emotion analysis method.
Compared with the prior art, the invention has the beneficial effects that:
according to the invention, a correlation technique in deep learning is introduced to carry out fine granularity emotion analysis, such as text word vectorization by adopting a BERT model, and the generated word vector contains deep semantic information and also fuses context information; attention mechanisms are added after the BiLSTM model and the BiGRU model respectively, and when information of each word is fused, more important words have higher weight and more information are fused; the fine granularity emotion analysis method based on Bert+BiLSTM+BiGRU+LDA can effectively improve the accuracy of text fine granularity emotion analysis.
Drawings
FIG. 1 is a flow chart of a method of text fine granularity emotion analysis disclosed in an embodiment of the present invention;
fig. 2 is a block diagram of a Bert model according to an embodiment of the present invention.
Fig. 3 is a block diagram of an LSTM individual neuron as disclosed in one embodiment of the present invention.
FIG. 4 is a block diagram of BiLSTM as disclosed in one embodiment of the present invention;
FIG. 5 is a schematic diagram of a GRU neuron structure according to one embodiment of the present invention;
fig. 6 is a schematic diagram of a biglu network structure according to an embodiment of the invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The invention is described in further detail below with reference to the attached drawing figures:
as shown in FIG. 1, the invention provides a text fine granularity emotion analysis method based on Bert+BiLSTM+BiGRU+LDA, which comprises the following steps:
step 1, acquiring a comment text data set;
specific:
and crawling comment contents of an App in the software application store by adopting a crawler technology, and manually constructing a comment text data set, wherein each comment text is used as a sentence or short sentence.
Step 2, preprocessing the comment text data set, wherein the preprocessing comprises data cleaning and data labeling;
specific:
the data cleaning comprises the steps of removing irrelevant texts and repeated comment texts; the original data crawled by the crawler can have a lot of irrelevant texts, such as a lot of abbreviations, expression symbols, repeated punctuations, statements with unknown meaning, comment data only containing star scores and no comment text, and the like, wherein the texts do not contain emotion tendencies and useful information and only generate serious interference on text expression, so that emotion analysis is difficult to accurately remove the interfering texts; in addition, some comments in the data set are repeated, and repeated comments are removed;
the data annotation includes, but is not limited to, annotating the comment with scoring information for the comment, such as 0-2 for negative, 3 for neutral, and 4-5 for positive.
Step 3, performing word vectorization on comment texts in the preprocessed data set by adopting a Bert model to obtain comment text word vectors; wherein, the liquid crystal display device comprises a liquid crystal display device,
bert is a word vector pre-training model encoded based on a bi-directional transform encoder structure, and the input embedding of Bert model consists of three embeddings: position embedding, segment embedding and word embedding, since Bert can take sentence pairs as input, segment embedding represents whether a word is in a first sentence or a second sentence, and position embedding represents the position of a word in a sentence. The pretraining process of the Bert mainly comprises that two tasks comprise a shielding language model and next sentence prediction, so that dynamic and deep information of texts which can be extracted by the Bert can be ensured, and the Bert can extract contextual information in a real sense by combining the two tasks to realize vectorization of the texts and extraction of semantic information. The input sequence of the Bert model consists of position embedding, word embedding and segment embedding, is input as each piece of comment data in a text data set, and is output as word vectors calculated after the word and word of the comment text are fused with global semantic information. As shown in fig. 2, X1, X2, …, xn is the output word vector sequence of the BERT model.
Step 4, taking the word vector in the step 3 as input of a BiLSTM model, extracting global feature information of a text through the BiLSTM model, connecting an attention layer behind the BiLSTM model, and distributing a larger weight score for the emotion word;
the method specifically comprises the following steps:
in step 41, since the LSTM model can only read text in a single direction and cannot effectively acquire context information, the present invention uses the BiLSTM model, i.e., the bidirectional long-short-term memory network model, which is composed of a forward LSTM and a backward LSTM, and is used for training the forward sequence and the backward sequence, respectively, so that the problem of the sequences can be effectively solved. The LSTM cell structure diagram and BiLSTM network structure diagram are shown in figures 3 and 4.
The specific algorithm formula and analysis are as follows:
calculation of forgetting door: f (f) t =σ(W f ×[h t-1 ,x t ]+b f ) The input contains the hidden state h of the previous moment t-1 And the input word x at the current time t The value f of the forgetting gate can be obtained through the calculation of the forgetting gate t . Wherein W is f And b f Respectively a weight matrix and an offset vector.
Calculation of memory gate: i.e t =σ(W i ×[h t-1 ,x t ]+b i ),
Figure BDA0004081493740000071
The input contains the hidden state h of the previous moment t-1 And the input word x at the current time t The value i of the memory gate can be obtained through the calculation of the memory gate t And temporary cellular status->
Figure BDA0004081493740000072
Wherein W is i And b i Respectively a weight matrix and an offset vector.
Calculating the state of the cell at the current moment:
Figure BDA0004081493740000073
inputting a value i including a memory gate t Forgetting the value f of the door t Temporary cellular State->
Figure BDA0004081493740000074
And cell state C at the last moment t-1 Output is the cell state C at the current time t
Calculation of hidden states of the output gate and the current moment: o (o) t =σ(W o [h t-1 ,x t ]+b o ),h t =o t *tanh(C t ) The input contains the hidden state h of the previous moment t-1 Input word x at the current time t And cell state C at the present moment t Output is the value o of the output gate t And hidden state h t Wherein W is o And b o Respectively a weight matrix and an offset vector.
For a bidirectional long-short-term memory network model, h can be obtained through training of a forward sequence and a backward sequence L And h R Respectively is
Figure BDA0004081493740000075
Will h L And h R Splicing to obtain output data h t (t=1,2,…,n)。
Step 42, taking the output in step 41 as the input of the attention layer, under the action of the global attention mechanism, important sentences in the whole comment text can be distributed to a larger weight score, the importance degree of the important sentences in the text sequence is highlighted, and the classification accuracy is further increased.
The specific algorithm formula and analysis are as follows:
u t =tanh(w s h t +b s )
Figure BDA0004081493740000081
/>
Figure BDA0004081493740000082
wherein W is s And b s Weight matrix and offset vector, h, of global attention mechanism of the attention layer, respectively t Is the output of the BiLSTM network. u (u) t The result is h t Each element in the sequence is associated with h t Relatedness of sequences, alpha t Representing global featuresConcentration score of u s Is an initial training parameter; v is the feature vector obtained by the attention layer under the action of the attention mechanism, i.e. the output of the attention layer.
Step 5, taking the word vector in the step 3 as the input of a BiGRU model, extracting local feature information of a text through the BiGRU model, connecting an attention layer behind the BiGRU model, capturing words with larger emotion semantic contribution in a single sentence, and distributing larger weight scores for the emotion words;
the method specifically comprises the following steps:
step 51 and GRU are deep learning network models for improving LSTM, which can solve the problem of long-term dependence existing in the cyclic neural network, and is more convenient for calculation and implementation than LSTM, and the internal structure is simpler than LSTM. As shown in fig. 5, the algorithm and resolution of the GRU network update is as follows:
r t =σ(w r ·[h t-1 ,x t ])
z t =σ(w z ·[h t-1 ,x t ])
Figure BDA0004081493740000083
h t =(1-z t )*h t-1 +z t *h t
wherein r is t And z t Reset gate and update gate at time t, h respectively t-1 Is the hidden layer state at the time t-1,
Figure BDA0004081493740000084
is the candidate activation state at the moment t, h t Is the activation state at time t, w r 、w z W are the corresponding weight matrices, respectively, and σ is a representation of the sigmoid activation function. The update door is determined by historical information which needs to be forgotten at the current moment and the received new information; the reset gate is determined by information derived from the history information from the candidate state.
As shown in fig. 6, biglu consists of one forward and backward GRU sequence, and can make full use of context information. For the output state of the BiGRU network model at the time t, the output state consists of output splicing of a forward GRU network and a reverse GRU network, and the calculation method comprises the following steps:
Figure BDA0004081493740000091
Figure BDA0004081493740000092
Figure BDA0004081493740000093
wherein x is t The output at the current time is represented, T represents the time sequence length, and the output of the BiGRU can be obtained by splicing the output of the forward GRU and the output of the backward GRU.
Step 52, taking the output in step 51 as the input of the attention layer, capturing words with larger emotion semantics in a single sentence and assigning the words to larger weight scores under the action of a local attention mechanism, and highlighting the importance degree of the words in a text sequence, so that the classification accuracy is increased.
The specific algorithm formula and analysis are as follows:
u t =tanh(w w h t +b w )
Figure BDA0004081493740000094
Figure BDA0004081493740000095
wherein W is w And b w Weight matrix and offset vector, h, for the local attention mechanism of the attention layer t Is the output of the biglu network. u (u) t The result is h t Each of the sequencesElement and h t Relatedness of sequences, alpha t Attention score, u, representing local features w Is an initial training parameter; v is the feature vector obtained by the attention layer under the action of the attention mechanism, i.e. the output of the attention layer.
Step 6, fusing the optimized global features and the optimized local features to obtain final text emotion feature representation; outputting the fused characteristic representation through a full-connection layer and a softmax activation function, and predicting the emotion tendency of the whole sentence; finally training to obtain a neural network model for emotion analysis;
the method specifically comprises the following steps:
in step 61, in order to facilitate calculation of the model, features of the two-channel model are processed in a form of row vector stitching, namely feature fusion. Construct one (r) s +r e ) X c matrix V, where V is the final emotion feature vector s Is the output of BiLSTM channel with attention layer, V e Is the output of the BiGRU channel with the attention layer, r s And r e V is respectively s Matrix sum V e The number of rows, c, of the matrix is V s Matrix or V e The number of columns of the matrix.
Step 62, inputting the output of step 61 into a fully connected layer, mapping the output into a vector with the size of 3, and then obtaining a final output through a softmax activation function, wherein the output is a predicted probability distribution, namely the probability that the emotion tendency of a predicted input sentence is positive, neutral and negative;
and 63, training based on the emotion classification result to obtain a neural network model for emotion analysis.
Step 7, extracting the topics from the data set preprocessed in the step 2 by adopting an LDA topic model to obtain topic-attribute words, wherein the clustered attribute words are words describing the same topic, the phrases with the attribute words are topics which are comments, the phrases with the attribute words are screened out, and the corresponding topics are marked;
the method specifically comprises the following steps:
step 71, the LDA topic model is mainly used to infer the topic distribution of the document, and can be used to identify potential topic information in the text. The LDA model considers that topics can be represented by a vocabulary distribution, and articles can be represented by a topic distribution. According to the LDA model, a certain article is required to be generated, the distribution of topics and words is required to be determined, the distribution of documents and topics is determined, then a topic is randomly generated according to the distribution of the documents and the topics, a word is randomly generated according to the topic through the distribution of the topics and the words, and the process of the word generated before is repeated until a complete document is generated. The distribution of the documents and the topics is that of the documents is sampled from the Dirichlet distribution alpha, and the distribution of the topics and the vocabulary is that of the vocabulary corresponding to the topics is sampled from the Dirichlet distribution beta.
And (3) performing word segmentation on the data set preprocessed in the step (2) by adopting a jieba word segmentation tool, inputting the processed text into an LDA topic model, setting the number of topics generated after clustering to be 5, finally obtaining 5 topics, obtaining attribute words under each topic, selecting the 10 attribute words with highest frequency under each topic, deducing a proper subject word according to the 10 attribute words, and finally obtaining 5 subject words, wherein 10 attribute words related to the subject words are obtained under each subject word.
Step 72, screening out phrases with the attribute words according to the attribute words obtained in step 71, wherein a plurality of attribute words may appear in one sentence of comments, so that the comments need to be segmented to obtain a plurality of phrases with the attribute words. And labeling corresponding topics on the short sentences according to the attribute words.
And 8, inputting the short sentence set marked with the theme obtained in the step 7 into the model trained in the step 6, and carrying out emotion analysis to obtain emotion tendencies of all the themes, thereby realizing fine-granularity emotion analysis. For each comment, the emotion tendencies of the comments on a plurality of topics described by the comments can be obtained through the short sentences obtained after segmentation. And the emotion tendencies of all the topics in the whole short sentence set are counted, and the evaluation condition of all the topics and aspects of the App can be obtained, so that the comprehensive and systematic evaluation of the books is obtained.
Step 9, experimental analysis
In order to verify the performance of the model, an emotion classification experiment is carried out on a typical Tan Songbo hotel comment data set in an emotion analysis task and is compared with other baseline models, the emotion classification effect of the model is evaluated by adopting a universal evaluation index, namely loss, accuracy Acc, accuracy Pre and comprehensive evaluation index F1, wherein the F1 value is the combination of the accuracy and recall, and the effectiveness of the method is verified by comparing the indexes obtained by the comparison test, so that data support is provided for the effect of carrying out fine granularity emotion analysis on the model.
The experimental results are shown in table 1:
TABLE 1
Figure BDA0004081493740000111
The invention also provides a text fine granularity emotion analysis system, which comprises:
the acquisition module is used for realizing the step 1;
the preprocessing module is used for realizing the step 2;
the word vectorization module is used for realizing the step 3;
the global feature extraction module is used for realizing the step 4;
the local feature extraction module is used for realizing the step 5;
the training module is used for realizing the step 6;
the theme extraction module is used for realizing the step 7;
and the analysis module is used for realizing the step 8.
The present invention also provides a computer readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to perform the text fine granularity emotion analysis method described above.
The present invention also provides a computing device comprising: one or more memories storing executable instructions; one or more processors executing executable instructions for implementing the text fine granularity emotion analysis method described above.
The invention has the advantages that:
according to the method, the word vector containing deep semantic information is obtained through BERT word vectorization, then the word vectorized text is input into a dual-channel model formed by BiLSTM and BiGRU, global feature information of comments is extracted through a BiLSTM channel and an attention layer, local feature information of comments is extracted through a BiGRU channel and an attention layer, the feature information extracted through the two channels is fused, feature information of comment texts can be fully extracted, the prediction effect of emotion tendency is effectively improved, topic clustering is conducted through an LDA topic model, topics described by texts can be obtained, emotion analysis can be conducted on granularity of topic levels, and finally accuracy of fine-granularity emotion analysis of the texts is improved to a certain extent through the deep learning model with the front edge of Bert+BiLSTM+BiGRU.
The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. A method for text fine granularity emotion analysis, comprising:
acquiring a comment text data set;
preprocessing the comment text data set, wherein the preprocessing comprises data cleaning and data labeling;
performing word vectorization on comment texts in the preprocessed data set by adopting a Bert model to obtain comment text word vectors;
inputting comment text word vectors into a BiLSTM+attention model for coarse-granularity emotion analysis, extracting global feature information of the text, and optimizing global features through a Attention mechanism;
inputting comment text word vectors into a BiGRU+attention model for coarse-granularity emotion analysis, extracting local feature information of a text, and optimizing local features through a Attention mechanism;
fusing the optimized global features and the optimized local features to obtain final text emotion feature representation; outputting the fused characteristic representation through a full-connection layer and a softmax activation function, and predicting the emotion tendency of the whole sentence; finally training to obtain a neural network model for emotion analysis;
performing topic extraction on the preprocessed data set by adopting an LDA topic model to obtain topic-attribute words; screening short sentences containing attribute words in the data set, and labeling corresponding topics;
inputting the short sentence set marked with the theme into a trained neural network model for fine granularity emotion analysis to obtain emotion tendencies of all the themes.
2. The text fine granularity emotion analysis method of claim 1, wherein web crawler technology is employed to obtain a comment text dataset.
3. The text fine granularity emotion analysis method of claim 1, wherein said data cleansing comprises removing irrelevant text and duplicate comment text, said irrelevant text including but not limited to abbreviations, emoticons, repeated punctuation and ambiguous sentences;
the method for labeling the data comprises, but is not limited to, labeling the comments with scoring information of the comments, labeling 0-2 as negative, labeling 3 as neutral and labeling 4-5 as positive.
4. The text fine granularity emotion analysis method of claim 1, wherein comment text word vectors are input into a BiLSTM+attention model for coarse granularity emotion analysis, global feature information of a text is extracted, and global features are optimized through a Attention mechanism; comprising the following steps:
calculating the states of a forgetting gate, a memory gate and a temporary cell at the current moment based on the hidden state at the previous moment and the comment text word vector at the current moment;
calculating the cell state at the current moment based on the forgetting gate, the memory gate and the temporary cell state at the current moment and the cell state at the last moment;
calculating a hidden state of the previous moment based on the hidden state of the previous moment, the input word of the current moment and the cell state of the current moment;
splicing hidden states obtained by calculating the BiLSTM forward sequence and the backward sequence to obtain global characteristic information;
and inputting the global feature information into an attention layer, and optimizing the global feature through an attention mechanism to obtain the optimized global feature.
5. The text fine granularity emotion analysis method of claim 1, wherein comment text word vectors are input into a BiGRU+attention model for coarse granularity emotion analysis, local feature information of a text is extracted, and local features are optimized through a Attention mechanism; comprising the following steps:
inputting comment text word vectors into BiGRU, wherein the BiGRU consists of a forward GRU sequence and a backward GRU sequence;
splicing the hidden states obtained by calculating the BiGRU forward sequence and the BiGRU backward sequence to obtain local characteristic information;
and inputting the local feature information into an attention layer, and optimizing the local feature through an attention mechanism to obtain the optimized local feature.
6. The text fine granularity emotion analysis method of claim 1, wherein the optimized global features and local features are fused to obtain a final text emotion feature representation; outputting the fused characteristic representation through a full-connection layer and a softmax activation function, and predicting the emotion tendency of the whole sentence; finally training to obtain a neural network model for emotion analysis; comprising the following steps:
fusing the optimized global features and the optimized local features in a row vector splicing mode to obtain final text emotion feature representation;
inputting the fused characteristic representation into a full connection layer, and mapping the characteristic representation into a vector with the size of 3;
inputting the output of the full connection layer into a Softmax activation function, performing emotion classification calculation through the Softmax layer, and outputting an emotion classification result;
and training based on the emotion classification result to obtain a neural network model for emotion analysis.
7. The text fine granularity emotion analysis method of claim 1, wherein the subject extraction is performed on the preprocessed data set by using an LDA subject model to obtain a subject-attribute word; screening short sentences containing attribute words in the data set, and labeling corresponding topics; comprising the following steps:
word segmentation is carried out on the preprocessed data set;
inputting the segmented text into an LDA topic model, extracting topics according to the number of preset topics, and finally obtaining N topics and M attribute words related to the N topics;
screening short sentences with the attribute words according to the obtained attribute words, and marking corresponding topics on the short sentences according to the attribute words; when a plurality of attribute words are screened out from one sentence of comments, the comments need to be segmented to obtain a plurality of short sentences with the attribute words.
8. A text fine granularity emotion analysis system that implements the text fine granularity emotion analysis method of any one of claims 1 to 7, comprising:
the acquisition module is used for acquiring the comment text data set;
the preprocessing module is used for preprocessing the comment text data set, and the preprocessing comprises data cleaning and data labeling;
the word vectorization module is used for carrying out word vectorization on comment texts in the preprocessed data set by adopting the BERT model to obtain comment text word vectors;
the global feature extraction module is used for inputting comment text word vectors into a BiLSTM+attention model for coarse-granularity emotion analysis, extracting global feature information of the text and optimizing global features through a Attention mechanism;
the local feature extraction module is used for inputting comment text word vectors into the BiGRU+attention model for coarse-granularity emotion analysis, extracting local feature information of the text and optimizing the local features through a Attention mechanism;
the training module is used for fusing the optimized global features and the optimized local features to obtain final text emotion feature representation; outputting the fused characteristic representation through a full-connection layer and a softmax activation function, and predicting the emotion tendency of the whole sentence; finally training to obtain a neural network model for emotion analysis;
the topic extraction module is used for extracting topics from the preprocessed data set by adopting the LDA topic model to obtain topic-attribute words; screening short sentences containing attribute words in the data set, and labeling corresponding topics;
the analysis module is used for inputting the short sentence set marked with the theme into the trained neural network model for carrying out fine granularity emotion analysis to obtain emotion tendencies of all the themes.
9. A computer readable storage medium having stored thereon executable instructions which when executed by a processor cause the processor to perform the text fine granularity emotion analysis method of any of claims 1-7.
10. A computing device, comprising: one or more memories storing executable instructions; one or more processors executing the executable instructions to implement the text fine granularity emotion analysis method of any of claims 1-7.
CN202310124542.9A 2023-02-16 2023-02-16 Text fine granularity emotion analysis method, system, medium and computing device Pending CN116108840A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310124542.9A CN116108840A (en) 2023-02-16 2023-02-16 Text fine granularity emotion analysis method, system, medium and computing device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310124542.9A CN116108840A (en) 2023-02-16 2023-02-16 Text fine granularity emotion analysis method, system, medium and computing device

Publications (1)

Publication Number Publication Date
CN116108840A true CN116108840A (en) 2023-05-12

Family

ID=86265330

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310124542.9A Pending CN116108840A (en) 2023-02-16 2023-02-16 Text fine granularity emotion analysis method, system, medium and computing device

Country Status (1)

Country Link
CN (1) CN116108840A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116306677A (en) * 2023-05-19 2023-06-23 大汉软件股份有限公司 Emotion analysis method, system and equipment based on enhancement of neural topic model

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116306677A (en) * 2023-05-19 2023-06-23 大汉软件股份有限公司 Emotion analysis method, system and equipment based on enhancement of neural topic model
CN116306677B (en) * 2023-05-19 2024-01-26 大汉软件股份有限公司 Emotion analysis method, system and equipment based on enhancement of neural topic model

Similar Documents

Publication Publication Date Title
Arora et al. Character level embedding with deep convolutional neural network for text normalization of unstructured data for Twitter sentiment analysis
CN111966917B (en) Event detection and summarization method based on pre-training language model
Gong et al. Natural language inference over interaction space
Li et al. Context-aware emotion cause analysis with multi-attention-based neural network
CN107133213B (en) Method and system for automatically extracting text abstract based on algorithm
Badjatiya et al. Attention-based neural text segmentation
CN111444726A (en) Method and device for extracting Chinese semantic information of long-time and short-time memory network based on bidirectional lattice structure
CN111291188B (en) Intelligent information extraction method and system
CN111858944A (en) Entity aspect level emotion analysis method based on attention mechanism
Xing et al. A convolutional neural network for aspect-level sentiment classification
Liu et al. R-trans: RNN transformer network for Chinese machine reading comprehension
Jafariakinabad et al. Style-aware neural model with application in authorship attribution
Banik et al. Gru based named entity recognition system for bangla online newspapers
CN115578137A (en) Agricultural product future price prediction method and system based on text mining and deep learning model
CN111339772B (en) Russian text emotion analysis method, electronic device and storage medium
CN111145914B (en) Method and device for determining text entity of lung cancer clinical disease seed bank
CN114722176A (en) Intelligent question answering method, device, medium and electronic equipment
Yao Attention-based BiLSTM neural networks for sentiment classification of short texts
CN116108840A (en) Text fine granularity emotion analysis method, system, medium and computing device
CN114265936A (en) Method for realizing text mining of science and technology project
CN110674293B (en) Text classification method based on semantic migration
CN112507717A (en) Medical field entity classification method fusing entity keyword features
Basri et al. A deep learning based sentiment analysis on bang-lish disclosure
Preetham et al. Comparative Analysis of Research Papers Categorization using LDA and NMF Approaches
CN115906824A (en) Text fine-grained emotion analysis method, system, medium and computing equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination