CN112434164A - Network public opinion analysis method and system considering topic discovery and emotion analysis - Google Patents

Network public opinion analysis method and system considering topic discovery and emotion analysis Download PDF

Info

Publication number
CN112434164A
CN112434164A CN202011397734.XA CN202011397734A CN112434164A CN 112434164 A CN112434164 A CN 112434164A CN 202011397734 A CN202011397734 A CN 202011397734A CN 112434164 A CN112434164 A CN 112434164A
Authority
CN
China
Prior art keywords
emotion
topic
word
model
turning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011397734.XA
Other languages
Chinese (zh)
Other versions
CN112434164B (en
Inventor
曲宇航
赵昕禹
惠维
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Jiaotong University
Original Assignee
Xian Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Jiaotong University filed Critical Xian Jiaotong University
Priority to CN202011397734.XA priority Critical patent/CN112434164B/en
Publication of CN112434164A publication Critical patent/CN112434164A/en
Application granted granted Critical
Publication of CN112434164B publication Critical patent/CN112434164B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Probability & Statistics with Applications (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a network public opinion analysis method and a system considering topic discovery and emotion analysis, wherein the method comprises the following steps: step 1, obtaining topics of network public opinion texts to be analyzed based on a trained BiMPM topic discovery model or LLDA topic model based on ELMo word vectors; and 2, carrying out sentiment analysis on the topics acquired in the step 1 based on the vector space model. Aiming at the characteristics of different public opinion carriers, the invention adopts LLDA and a BiMPM topic discovery model based on ELMo word vectors to discover topics of long and short texts; and large-scale manual marking corpora and training modeling are introduced, a training model with excellent performance indexes such as accuracy and recall rate is generated, and efficient topic discovery is achieved.

Description

Network public opinion analysis method and system considering topic discovery and emotion analysis
Technical Field
The invention belongs to the technical field of data mining, relates to the field of topic models and clustering methods, and particularly relates to a network public opinion analysis method and system considering topic discovery and emotion analysis.
Background
The network public opinion is the sum of the attitude and the viewpoint of public on a network platform for publishing social hot events, and has strong emotional tendency, so that the speaking free space is continuously expanded, the speaking content is informationized, social sensitive topics are triggered, and once the severe social emotion is easily caused by excitation, the resolution is difficult in a short time. The public opinion analysis system is a software system which comprehensively uses a search engine technology, a text processing technology, a knowledge management method, natural language processing and a mobile phone short message platform, and meets the requirements of users on network public opinion monitoring, hot event topic tracking and the like by automatically acquiring, extracting, classifying, clustering, topic monitoring and topic focusing on mass internet information.
The existing public opinion analysis mode mainly comprises: topic discovery and emotional tendency analysis; the emotional tendency analysis method comprises semantic analysis based on an emotional dictionary and a characteristic classification based method. The emotion dictionary is a dictionary for artificially dividing emotion words, an emotion dictionary method is adopted, evaluation words and phrases in network public opinion data are identified, emotion weight values are obtained through polarity processing, and the similarity between the emotion words and candidate words is calculated, so that perspective emotion tendency of the candidate words is predicted. The method is mostly used for extracting the network public opinion evaluation content and judging the polarity. And the emotion analysis based on feature classification utilizes a machine learning mode to construct a corpus and screen out large-scale valuable features to realize emotion statistical classification. In addition, the two methods are combined for use, namely based on an emotion vector space model, an emotion dictionary is built, emotion words and polarity are recognized, a machine learning method is added for emotion classification statistics, and the accuracy of overall emotion weight calculation is improved.
In summary, the existing public opinion analysis method has the following defects: the exploration of hot topics is not clear; the emotion analysis accuracy is not high.
Disclosure of Invention
The invention aims to provide a network public opinion analysis method and system considering topic discovery and emotion analysis so as to solve one or more technical problems. The invention can realize efficient topic discovery and can provide more accurate emotion analysis for users.
In order to achieve the purpose, the invention adopts the following technical scheme:
the invention discloses an online public opinion analysis method considering topic discovery and emotion analysis, which comprises the following steps of:
step 1, obtaining topics of network public opinion texts to be analyzed based on a trained BiMPM topic discovery model or LLDA topic model based on ELMo word vectors;
step 2, carrying out sentiment analysis on the topics acquired in the step 1 based on a vector space model; wherein the vector space model is constructed by the steps of:
step 2.1, performing fine granularity division on emotion vocabularies under different topics on the basis of the existing emotion dictionary of the known network;
step 2.2, combining turning sentence patterns and exclamation word processing with a netknowledge emotion dictionary under a topic label for processing, and correcting the weight of emotion words;
and 2.3, introducing a topic label set in the topic discovery process, regarding the sentences in the public opinion comment data as the combination of emotional words according to the principle of a topic model 'bag of words', counting the emotional words in the combination, and identifying the emotional polarity of the sentences.
The further improvement of the invention is that in step 1, the step of obtaining the trained BiMPM topic discovery model and LLDA topic model based on the ELMo word vector comprises:
step 1.1, collecting and obtaining a plurality of public opinion texts to construct and obtain a training set; dividing the training set into a long text training set and a short text training set according to a preset text length threshold; wherein, the training sets are all manually labeled with topic classification and classified according to the labeled topic classification;
step 1.2, training a BiMPM topic discovery model based on ELMo word vectors based on the obtained long text training set to obtain a trained BiMPM topic discovery model based on the ELMo word vectors; and training the LLDA topic model based on the obtained short text training set to obtain the trained LLDA topic model.
The invention has the further improvement that in the step 1.1, the step of acquiring a plurality of public opinion texts and constructing a training set comprises the following steps:
for HTTP communication, Fiddler, HttpClient and Jsoup are combined to collect data; acquiring data by using an open source API (application programming interface) to obtain an OAuth authorization mechanism for the social media platform;
the original data of the public sentiment text is subjected to Chinese word segmentation and text marking processing through an IKAnalyzer word segmentation device, and redundant text information without identification value in the public sentiment text is removed through word retention removing operation.
The further improvement of the invention is that in step 1.2, the step of training the BiMPM topic discovery model based on the ELMo word vector comprises:
(1) extracting title participles and content keywords of each text in the long text training set, and splicing the two fields;
(2) pre-training a language model of an ELMo word vector in a training set, and finely adjusting the language model on the training set of texts with different topics to enable the language model to adapt to a preset corpus environment;
(3) constructing a BiMPM topic discovery model, and adding ELMo word vectors into a sentence representation layer of the model;
(4) segmenting the processed topic text into: a train data set, a dev data set and a test data set;
(5) and continuously carrying out iterative training on the model to enable the model to be optimal on the dev data set, and finishing the training.
In a further improvement of the present invention, in step 1.2, the step of training the LLDA topic model includes:
a. performing text preprocessing on the short text data in the step 1 according to the prior probability;
b. sampling from Dirichlet distribution to generate subject distribution of the document, sampling from polynomial distribution of the subject to generate the subject of the jth word of the document, and determining a subject classification label;
c. sampling from Dirichlet distribution to generate word distribution corresponding to a theme;
d. sampling from a polynomial distribution of the words to finally generate the words;
e. repeating the steps a to d by using a plurality of groups of documents with labels, and training an LLDA topic model;
f. and debugging, evaluating and retraining the training result to obtain the trained LLDA theme model.
The invention is further improved in that the step 2.1 comprises the following specific steps: 8 types of labels in topic classification are introduced into the emotion dictionary of the known network, namely politics, military, economy, society, cultural and sports entertainment, science and technology, religion and other types; each class of tags is labeled as a surprise, pleasure, like, look ahead, anxiety, anger, sadness, hate 8-dimensional tag vector
Figure BDA0002815154320000041
For example, the emotion vocabulary "attack" in the military label includes negative emotion attitudes of war caused by attack, the emotion degree is heavy, the spatial dimension is (0, 0, 0, 0, 0.8, 0.9), and sadness and hate in the emotion value are 0.8 and 0.9 respectively. However, in the genre and style entertainment tag, the negative emotional attitude of skills in sports activities is included, the emotional degree is light, the emotion degree is marked as (0, 0, 0, 0, 0, 0.5, 0, 0), and the emotion value is 0.5.
The invention is further improved in that the step 2.2 comprises the following specific steps:
1) and degree adverb processing, comprising: subdividing the known network degree level word dictionary into extreme degree word sets; wherein the step of subdividing comprises: endowing different emotion weight values according to the emotion colors expressed by different vocabularies;
2) negative word processing, comprising: setting a distance threshold value d for each emotion vocabulary in advance by using an algorithm of related negative words; in the text area of each emotional vocabulary, setting token of each participle as a unit 1, and backtracking forwards by taking the vocabulary in the web-aware emotion dictionary under the topic label as a reference to find a negative word; adding a negative label not _tothe current emotion vocabulary when a negative word is found until the backtracking distance is greater than or equal to a preset threshold value d or is overlapped with the previous emotion word or reaches the beginning of the sentence;
introducing a sliding window, wherein the size of the window takes punctuation marks in the clauses as a judgment basis; if odd negative words appear in the window where the emotional vocabulary is located, negation operation is carried out on the original emotional words;
3) turning sentence pattern processing, comprising: constructing a tail recognition turning sentence by adopting the turning conjunctions; in the processing process, whether a turning prefix exists is searched, and if not, the turning prefix does not exist according to a common statement; if the turning suffix exists, directly deleting the clause containing the turning prefix when the turning suffix does not exist; if the turning prefix and the suffix exist and the sentiment words in the emotion dictionary of the unknown network under the topic labels exist in the sentence of the turning suffix, deleting the clauses from the turning prefix to the turning suffix; if the emotion words in the network learning emotion dictionary under the topic labels do not exist in the clauses of the turning postfix, performing negation operation on the clauses containing the turning prefix;
4) exclamation word processing comprising: and (4) carrying out multi-scale subdivision on the existing exclamation word by combining the network emotion dictionary under the topic label, wherein the weight division range is-1.0.
The invention is further improved in that the step 2.3 comprises the following specific steps:
the vector space model is expressed as
Figure BDA0002815154320000051
In the formula, wjIs the jth emotional vocabulary and phrase, n is the number, wherein w is givenjEach emotion vocabulary in the set is assigned an emotion vector.
The invention has the further improvement that in the step 2, the specific steps of public sentiment emotion analysis comprise:
a) dividing the public opinion data comment content into sentences, and turning according to turning sentence pattern rule and turning word list to obtain sentence sequence Ls=<s1,s2,s3,K,sk,K,sn>;
b) To LsEach sentence S inkPerforming word segmentation to obtain a word and sentence sequence Lt=<t1,t2,t3,K,tm>;
c) Constructing emotion vector space model of each emotion vocabulary in sentence
Figure BDA0002815154320000052
Traversing and introducing the fine granularity division of the Hopkinson emotion dictionary of the topic label, and sequentially inquiring SkCarrying out emotion classification on the initial values of the emotion weight values of all emotion vocabularies, and recording the emotion weight values;
d) to SkPerforming negative word processing on each emotional word, adding a negative label to obtain SkThe negative label added emotional word list F;
e) aiming at each emotion vocabulary in the emotion word list F, the corresponding emotion weight is p, if the number of the not _ tags in front of the emotion is an odd number, a negative dictionary is inquired, and the emotion value is corrected to be p x n;
f) aiming at each emotional vocabulary in the list F, degree adverb and exclamation word processing is carried out, and the weight of the emotional vocabulary is continuously corrected;
g) according to the finally corrected weight of the emotional vocabulary, the sentence S is calculatedkIs denoted as Pk
h) Calculating the sentiment value of public opinion data comment content
Figure BDA0002815154320000053
And finally obtaining 8-dimensional emotion weight of public opinion data comment content.
The invention discloses an online public opinion emotion analysis system based on an emotion vector space model, which comprises:
the topic acquisition module is used for acquiring topics of the network public opinion text to be analyzed based on a trained BiMPM topic discovery model or LLDA topic model based on ELMo word vectors;
the emotion analysis module is used for carrying out emotion analysis on the topics acquired by the topic acquisition module based on the vector space model; wherein the vector space model is constructed by the steps of:
step 2.1, performing fine granularity division on emotion vocabularies under different topics on the basis of the existing emotion dictionary of the known network;
step 2.2, combining turning sentence patterns and exclamation word processing with a netknowledge emotion dictionary under a topic label for processing, and correcting the weight of emotion words;
and 2.3, introducing a topic label set in the topic discovery process, regarding the sentences in the public opinion comment data as the combination of emotional words according to the principle of a topic model 'bag of words', counting the emotional words in the combination, and identifying the emotional polarity of the sentences.
Compared with the prior art, the invention has the following beneficial effects:
the invention provides a method for clustering emotion vectors in public opinion information based on a topic model to find topics and emotional tendencies. The invention provides an online public opinion analysis method based on an emotion vector space model on the basis of detailed research on a theme model algorithm. Aiming at the characteristics of different public sentiment carriers, the invention adopts an LLDA (laboratory-LDA, latent Dirichlet distribution with additional category labels) and a BIMPM topic discovery model based on ELMo word vectors to discover topics of long and short texts; and large-scale manual marking corpora and training modeling are introduced, a training model with excellent performance indexes such as accuracy and recall rate is generated, and efficient topic discovery is achieved. The emotion analysis based on the emotion vector space model is adopted, and the emotion words in the Hopkinson network emotion dictionary under the subject label are divided in a fine granularity mode manually to construct an emotion vector space model; and according to the emotion distribution, a BiMPM model is adopted to carry out effective emotion vocabulary classification statistics, fine-grained emotion tendency analysis is realized, and more accurate emotion analysis is provided for users. In conclusion, the topic model can be established to better analyze the hot topics in the text, and the BiMPM model is used for the long text, the LLDA model is used for the short text, so that the limitation of the model is effectively avoided, and the accuracy of topic discovery is improved. The invention divides the emotional tendency more finely on the basis of the existing emotional dictionary, avoids dividing the emotion with simple positive and negative attitudes, and improves the capability of correctly analyzing the emotional tendency in different contexts.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art are briefly introduced below; it is obvious that the drawings in the following description are some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.
Fig. 1 is a schematic flowchart of an internet public opinion analysis method based on an emotion vector space model according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a BiMPM model according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of the LLDA model in an embodiment of the present invention.
Detailed Description
In order to make the purpose, technical effect and technical solution of the embodiments of the present invention clearer, the following clearly and completely describes the technical solution of the embodiments of the present invention with reference to the drawings in the embodiments of the present invention; it is to be understood that the described embodiments are only some of the embodiments of the present invention. Other embodiments, which can be derived by one of ordinary skill in the art from the disclosed embodiments without inventive faculty, are intended to be within the scope of the invention.
Referring to fig. 1, a method for internet public opinion analysis based on emotion vector space model according to an embodiment of the present invention includes the following steps:
step 1, text preprocessing:
1.1) firstly, data acquisition is carried out by using a Fiddler + HttpClient + Jsoup comprehensive technology for HTTP communication, and data acquisition is carried out by using an open source API (application program interface) acquisition OAuth authorization mechanism for a social media platform.
1.2) carrying out Chinese word segmentation, text marking processing, word retention removal and other operations on the original data through an IKAnalyzer word segmentation device to remove redundant text information without identification value in the original text so as to improve the accuracy of topic discovery of the topic model.
Referring to fig. 2, in step 2, the long text topic discovery method of the BiMPM topic discovery model based on the ELMo word vector includes:
(1) extracting title participles and content keywords of each text in the long text training set, and splicing the two fields;
(2) pre-training a language model of an ELMo word vector in a training set, and finely adjusting the language model on the training set of texts with different topics to enable the language model to adapt to a preset corpus environment;
(3) constructing a BiMPM topic discovery model, and adding ELMo word vectors into a sentence representation layer of the model;
(4) segmenting the processed topic text into: a train data set, a dev data set and a test data set;
(5) and continuously carrying out iterative training on the model to enable the model to be optimal on the dev data set, and finishing the training.
Referring to fig. 3, in step 3, the method for discovering short text topics based on the LLDA topic model includes:
aiming at the problem of short text data sparseness, the invention uses a topic discovery algorithm based on an LLDA topic model. The algorithm comprises the following steps:
a. performing text preprocessing on the short text data according to the prior probability;
b. sampling from the Dirichlet distribution generates a subject distribution of the document, which is generated from a hyper-parametric Dirichlet distribution. Sampling the subject of the jth word of the generated document from the polynomial distribution of the subject, determining a subject classification label, and emphasizing the setting of a certain specific label;
c. a word distribution corresponding to a theme is generated by sampling from a Dirichlet distribution (namely, Dirichlet distribution), in other words, the word distribution is generated by the Dirichlet distribution as a parameter;
d. sampling from a polynomial distribution of the words to finally generate the words;
e. repeating the steps a to d by using a plurality of groups of documents with labels, and training an LLDA model;
f. debugging, evaluating and retraining the training result to finally generate a better LLDA model;
g. and repeating the steps b to f.
Step 4, emotion analysis based on the vector space model:
4.1) emotion vocabulary recognition: 8 types of labels set in topic classification are introduced into the construction of an actual netwary emotion dictionary, and are marked as 8-dimensional label vectors such as surprise (surrised), delight (delight), love (love), expectation (expected), anxiety (anxiety), anger (angry), sadness (sad), hate (hatred) and the like.
4.2) judging the emotion polarity: aiming at the phenomenon that the adverb with emotional degree has different emotional tendencies in different contexts, the invention combines the actual topic label to carry out the following processing on the emotional dictionary:
a. and (5) processing degree adverbs. The degree level words in the known net dictionary are manually subdivided into a degree extreme word set. Different emotion weight values are given according to the emotion colors expressed by different vocabularies.
b. And (5) negative word processing. And (3) setting a distance threshold value d for each emotion vocabulary in advance by using an algorithm of related negative words. In a text area where each emotional vocabulary is located, setting token of each participle as a unit 1, tracing back forward by taking vocabularies in a knowledge network emotion dictionary under a topic label as a reference, continuously searching for a negative word, and adding a negative label not _tothe current emotional vocabulary when finding a negative word until the backtracking distance is larger than a preset threshold value d or is overlapped with the previous emotional word or reaches the beginning of a text sentence.
c. And (5) turning sentence pattern processing. And constructing a tail recognition turning sentence by adopting the turning conjunctions.
d. And (5) exclamation word processing. And carrying out multi-scale subdivision on the exclamation words by combining the artificial constructed netpage emotion dictionary under the topic labels, and superposing different weight values in the calculation of the weight values so as to increase the emotional tendency of the comment content.
4.3) calculating emotion weight: the emotion weight is calculated under different topic labels, and an emotion vector space is constructed through emotion vocabulary discrimination, and an initial value of the emotion vocabulary weight is given. Secondly, according to the emotion distribution characteristics, emotion classification statistics is carried out, and emotion weight is calculated. Finally, the emotion weight is continuously corrected through emotion polarity processing.
The embodiment of the invention specifically comprises the following steps:
and (4) emotion vocabulary recognition. The conventional web emotion dictionary only carries out positive and negative division on vocabulary emotion and is not enough to finish emotion analysis, so that the invention manually carries out fine granularity division on emotion vocabularies under different topics on the basis of the conventional web emotion dictionary. 8 types of labels in topic classification are introduced into the emotion dictionary of the known network, namely politics, military, economy, society, cultural and sports entertainment, science and technology, religion and other types; each class of tags is labeled as a surprise, pleasure, like, look ahead, anxiety, anger, sadness, hate 8-dimensional tag vector
Figure BDA0002815154320000091
Figure BDA0002815154320000092
The emotion weight initial value marked under 8 topic labels ranges from-1.0 to 1.0. For example, the emotion vocabulary "attack" in the military label includes negative emotion attitudes of war caused by attack, the emotion degree is heavy, the spatial dimension is (0, 0, 0, 0, 0.8, 0.9), and sadness and hate in the emotion value are 0.8 and 0.9 respectively. However, in the genre and style entertainment tag, the negative emotional attitude of skills in sports activities is included, the emotional degree is light, the emotion degree is marked as (0, 0, 0, 0, 0, 0.5, 0, 0), and the emotion value is 0.5.
The invention combines the actual topic labels, processes turning sentence patterns, exclamation words and the like more accurately by combining the knowledge network emotion dictionaries under the topic labels, and corrects the weight of emotion words. The processing steps include:
(1) and degree adverb processing, comprising: subdividing the known net degree level word dictionary into a 'extreme degree word set'; the step of subdivision comprises the steps of endowing different emotion weight values according to the emotion colors expressed by different vocabularies;
(2) negative word processing, comprising: setting a distance threshold value d for each emotion vocabulary in advance by using an algorithm of related negative words; in a text area where each emotional vocabulary is located, setting token of each participle as a unit 1, tracing back forward by taking vocabularies in a knowledge network emotion dictionary under a topic label as a reference, continuously searching for a negative word, and adding a negative label 'not _' to the current emotional vocabulary when finding a negative word until the backtracking distance is more than or equal to a preset threshold value d or is overlapped with the previous emotional word or reaches the beginning of a text sentence; meanwhile, a 'sliding window' concept is introduced, the size of a window takes punctuation marks in the clauses as a judgment basis, if odd negative words appear in a window where the emotion words are located, negation operation is carried out on the original emotion words, and specifically, the weight of the original emotion words is multiplied by the weight of the negative words.
(3) Turning sentence pattern processing, comprising: and constructing a tail recognition turning sentence by adopting the turning conjunctions. In the processing process, whether the turning prefix exists is searched, and if the turning prefix does not exist, the turning prefix is processed according to a common statement. If so, when no inflected suffix exists, the clause containing the inflected prefix is directly deleted. And if the turning prefix and the suffix exist and the emotional words in the informed network emotion dictionary under the topic labels exist in the turning suffix sentence, the clause from the turning prefix to the turning suffix is deleted. And if the clause of the turning suffix does not have the emotion words in the network-aware emotion dictionary under the topic labels, performing negation operation on the clause containing the turning prefix. A specific sentence rule is "turn prefix.
(4) Exclamation word processing comprising: and (3) carrying out multi-scale subdivision on the existing 85 exclamation words by combining a netpage emotion dictionary under the topic label, wherein the weight division range is-1.0. And performing superposition processing when calculating the weight value so as to increase the emotional tendency of the text information.
And calculating the emotion weight, which comprises the following steps: and introducing a topic tag set in a topic discovery process, regarding sentences in the public opinion comment data as a combination of emotional words according to a topic model 'bag of words' principle, counting the emotional words in the combination, and identifying the emotional polarity of the sentences. The specific vector space model is expressed as
Figure BDA0002815154320000111
In the formula wjIs the ith emotional vocabulary and phrase, n is the number, wherein w is givenjEndowing each emotion vocabulary with emotion vectors
Figure BDA0002815154320000112
Introducing an LDA theme model into the public sentiment text, constructing an emotion vector space model, and acquiring probability distribution of text-theme-vocabulary from the hidden theme, namely the theme t with large emotion weightm
The public sentiment emotion analysis method in the embodiment of the invention comprises the following steps:
a) dividing the public opinion data comment content into sentences, and turning according to turning sentence pattern rule and turning word list to obtain sentence sequence Ls=<s1,s2,s3,K,sn>;
b) To LsEach sentence S inkPerforming word segmentation to obtain a word and sentence sequence Lt=<t1,t2,t3,K,tm>;
c) Constructing emotion vector space model of each emotion vocabulary in sentence
Figure BDA0002815154320000113
Then traversing the fine-grained division of the emotion dictionary of the cognitive network introducing the topic labels, and sequentially inquiring SkThe initial values of the emotion weight values of all emotion vocabularies are classified by adopting an LDA theme model, and the emotion weight values are recorded;
d) to SkPerforming negative word processing on each emotional word, adding a negative label to obtain SkThe negative label added emotional word list F;
e) aiming at each emotion vocabulary in the list F, the corresponding emotion weight is p, if the number of the not _ tags in front of the emotion is an odd number, a negative dictionary is inquired, and the emotion value is corrected to be p x n;
f) aiming at each emotional vocabulary in the list F, degree adverb and exclamation word processing is carried out, and the weight of the emotional vocabulary is continuously corrected;
g) according to the finally corrected weight of the emotional vocabulary, the sentence S is calculatedkIs denoted as Pk
h) Calculating the sentiment value of public opinion data comment content
Figure BDA0002815154320000114
And finally giving out the 8-dimensional emotion weight of the public opinion data comment content.
In summary, aiming at the defects or shortcomings in the prior art, the invention provides an online public opinion analysis method based on an emotion vector space model by carrying out detailed research on a topic model algorithm. Aiming at the characteristics of different public opinion carriers, topic discovery of long and short texts is carried out by adopting an LLDA (Label-LDA, latent Dirichlet distribution with attached category labels) and a BIMPM topic discovery model based on ELMo word vectors. And large-scale manual marking corpora and training modeling are introduced, a training model with excellent performance indexes such as accuracy and recall rate is generated, and efficient topic discovery is achieved. Meanwhile, emotion words in a Hopkinson network emotion dictionary under the theme label are divided in a fine-grained manner manually by adopting emotion analysis based on the emotion vector space model, and the emotion vector space model is constructed. And according to the emotion distribution, effective emotion vocabulary classification statistics is carried out by adopting an LDA theme model, fine-grained emotion tendency analysis is realized, and more accurate emotion analysis is provided for users. Compared with the prior art, the public opinion analysis method based on the emotion vector space model provided by the embodiment of the invention has the following advantages: 1. the hot topics in the text can be better analyzed by establishing the topic model, the BiMPM model is used for the long text, the LLDA model is used for the short text, the limitation of the model is effectively avoided, and the topic finding accuracy is improved. 2. The emotional tendency is divided more finely on the basis of the existing emotional dictionary, the emotion is prevented from being divided by simple positive and negative attitudes, and the capability of correctly analyzing the emotional tendency in different contexts is improved.
The invention provides an online public opinion emotion analysis system based on an emotion vector space model, which comprises:
the topic acquisition module is used for acquiring topics of the network public opinion text to be analyzed based on the trained BiMPM model or LLDA topic model;
the emotion analysis module is used for carrying out emotion analysis on the topics acquired by the topic acquisition module based on the vector space model; wherein the vector space model is constructed by the steps of:
step 2.1, performing fine granularity division on emotion vocabularies under different topics on the basis of the existing emotion dictionary of the known network;
step 2.2, combining turning sentence patterns and exclamation word processing with a netknowledge emotion dictionary under a topic label for processing, and correcting the weight of emotion words;
and 2.3, introducing a topic label set in the topic discovery process, regarding the sentences in the public opinion comment data as the combination of emotional words according to the principle of a topic model 'bag of words', counting the emotional words in the combination, and identifying the emotional polarity of the sentences.
In the system of the embodiment of the present invention, the text preprocessing includes:
1.1) firstly, data acquisition is carried out by using a Fiddler + HttpClient + Jsoup comprehensive technology for HTTP communication, and data acquisition is carried out by using an open source API (application program interface) acquisition OAuth authorization mechanism for a social media platform.
1.2) carrying out Chinese word segmentation, text marking processing, word retention removal and other operations on the original data through an IKAnalyzer word segmentation device to remove redundant text information without identification value in the original text so as to improve the accuracy of topic discovery of the topic model. The principle of the IKAnalyzer is a forward maximum matching algorithm, the word segmentation process is that a plurality of continuous characters in the text to be segmented are matched with a word list from left to right, the scanning is continued after the matching is successful until the next scanning is finished when the next scanning is not the word or the prefix of the word in the word list, and the word is segmented.
In the system of the embodiment of the present invention, 2) a long text topic discovery method of a BiMPM topic discovery model based on an ELMo word vector:
(1) extracting title participles and content keywords of each text in the long text training set, and splicing the two fields;
(2) pre-training a language model of an ELMo word vector in a training set, and finely adjusting the language model on the training set of texts with different topics to enable the language model to adapt to a preset corpus environment;
(3) constructing a BiMPM topic discovery model, and adding ELMo word vectors into a sentence representation layer of the model;
(4) segmenting the processed topic text into: a train data set, a dev data set and a test data set;
(5) and continuously carrying out iterative training on the model to enable the model to be optimal on the dev data set, and finishing the training.
In the system of the embodiment of the invention, 3) the method for discovering the short text topic based on the LLDA topic model comprises the following steps:
aiming at the problem of short text data sparseness, the invention uses a topic discovery algorithm based on an LLDA topic model. The algorithm comprises the following steps:
a. performing text preprocessing on the short text data in the step 1) according to the prior probability;
b. sampling from the Dirichlet distribution generates a subject distribution of the document, which is generated from a hyper-parametric Dirichlet distribution. Sampling the subject of the jth word of the generated document from the polynomial distribution of the subject, determining a subject classification label, and emphasizing the setting of a certain specific label;
c. a word distribution corresponding to a theme is generated by sampling from a Dirichlet distribution (namely, Dirichlet distribution), in other words, the word distribution is generated by the Dirichlet distribution as a parameter;
d. sampling from a polynomial distribution of the words to finally generate the words;
e. repeating the steps a to d by using a plurality of groups of documents with labels, and training an LLDA model;
f. and debugging, evaluating and retraining the training result. Finally, a better LLDA model is generated.
g. And repeating the steps b to f.
In the system of the embodiment of the invention, 4) the emotion analysis method based on the vector space model comprises the following steps:
4.1) emotion vocabulary recognition: politics arranged in topic classification is introduced into the construction of an actual network knowledge emotion dictionary,Military, economic, social, etc. 8 types of tags, and labeled as surprised (surrised), joy (r) ((r))delight) Love (love), expected (expected), anxiety (anxiety), anger (angry), sadness (sad), hate (hate)hatred) And the 8-dimensional label vector is waited.
4.2) judging the emotion polarity: aiming at the phenomenon that the adverb with emotional degree has different emotional tendencies in different contexts, the invention combines the actual topic label to carry out the following processing on the emotional dictionary:
a. and (5) processing degree adverbs. The degree level words in the known net dictionary are manually subdivided into a degree extreme word set. Different emotion weight values are given according to the emotion colors expressed by different vocabularies.
b. And (5) negative word processing. And (3) setting a distance threshold value d for each emotion vocabulary in advance by using an algorithm of related negative words. In a text area where each emotional vocabulary is located, setting token of each participle as a unit 1, tracing back forward by taking vocabularies in a knowledge network emotion dictionary under a topic label as a reference, continuously searching for a negative word, and adding a negative label not _tothe current emotional vocabulary when finding a negative word until the backtracking distance is larger than a preset threshold value d or is overlapped with the previous emotional word or reaches the beginning of a text sentence.
c. And (5) turning sentence pattern processing. And constructing a tail recognition turning sentence by adopting the turning conjunctions.
d. And (5) exclamation word processing. And carrying out multi-scale subdivision on the exclamation words by combining the artificial constructed netpage emotion dictionary under the topic labels, and superposing different weight values in the calculation of the weight values so as to increase the emotional tendency of the comment content.
4.3) calculating emotion weight: the emotion weight is calculated under different topic labels, and an emotion vector space is constructed through emotion vocabulary discrimination, and an initial value of the emotion vocabulary weight is given. Secondly, according to the emotion distribution characteristics, emotion classification statistics is carried out, and emotion weight is calculated. Finally, the emotion weight is continuously corrected through emotion polarity processing. The specific vector space model is expressed as
Figure BDA0002815154320000151
In the formula wjIs the ith emotional vocabulary and phrase, n is the number, wherein w is givenjEndowing each emotion vocabulary with emotion vectors
Figure BDA0002815154320000152
Introducing an LDA topic model into a text in public opinion comment content, constructing an emotion vector space model, and acquiring probability distribution of text-topic-vocabulary from a hidden topic T, namely the topic T with a large emotion weightmThen the concrete formula is converted into
Figure BDA0002815154320000153
The specific calculation formula of the emotion weight is as follows:
Figure BDA0002815154320000154
in the formula (I), the compound is shown in the specification,
Figure BDA0002815154320000155
and carrying out weighted summation on all emotion vocabulary weights in the sentence.
In summary, the invention discloses a network public opinion analysis method and system based on an emotion vector space model, and the main algorithm comprises two parts, namely a BiMPM topic discovery model based on ELMo word vectors, topic discovery of latent Dirichlet distribution (Labeled-LDA, LLDA) with additional class labels and emotion tendency analysis based on vector space. The adopted technical scheme is as follows: after preprocessing operations such as Chinese word segmentation, text marking and staying, word removing and staying and the like are carried out on an original text, topic discovery is realized by utilizing a BiMPM topic discovery model and an LLDA topic model. Then 8 topic classifications and 8 types of emotional tendency labels are introduced into a netword emotion dictionary, 8-dimensional emotional weight values are calculated after each emotional vocabulary component emotion vector space model in the text to be analyzed, and the emotional tendency of the public opinion data is obtained according to the calculated emotional weight values.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Although the present invention has been described in detail with reference to the above embodiments, those skilled in the art can make modifications and equivalents to the embodiments of the present invention without departing from the spirit and scope of the present invention, which is set forth in the claims of the present application.

Claims (10)

1. A network public opinion analysis method considering topic discovery and emotion analysis is characterized by comprising the following steps:
step 1, obtaining topics of network public opinion texts to be analyzed based on a trained BiMPM topic discovery model or LLDA topic model based on ELMo word vectors;
step 2, carrying out sentiment analysis on the topics acquired in the step 1 based on a vector space model; wherein the vector space model is constructed by the steps of:
step 2.1, performing fine granularity division on emotion vocabularies under different topics on the basis of the existing emotion dictionary of the known network;
step 2.2, combining turning sentence patterns and exclamation word processing with a netknowledge emotion dictionary under a topic label for processing, and correcting the weight of emotion words;
and 2.3, introducing a topic label set in the topic discovery process, regarding the sentences in the public opinion comment data as the combination of emotional words according to the principle of a topic model 'bag of words', counting the emotional words in the combination, and identifying the emotional polarity of the sentences.
2. The method for analyzing network public sentiment considering both topic discovery and sentiment analysis as claimed in claim 1, wherein in step 1, the step of obtaining the trained BiMPM topic discovery model and LLDA topic model based on the ELMo word vector comprises:
step 1.1, collecting and obtaining a plurality of public opinion texts to construct and obtain a training set; dividing the training set into a long text training set and a short text training set according to a preset text length threshold; wherein, the training sets are all manually labeled with topic classification and classified according to the labeled topic classification;
step 1.2, training a BiMPM topic discovery model based on ELMo word vectors based on the obtained long text training set to obtain a trained BiMPM topic discovery model based on the ELMo word vectors; and training the LLDA topic model based on the obtained short text training set to obtain the trained LLDA topic model.
3. The method for internet public opinion analysis considering topic discovery and emotion analysis as claimed in claim 2, wherein in step 1.1, the step of acquiring a plurality of public opinion texts and constructing a training set comprises:
for HTTP communication, Fiddler, HttpClient and Jsoup are combined to collect data; acquiring data by using an open source API (application programming interface) to obtain an OAuth authorization mechanism for the social media platform;
the original data of the public sentiment text is subjected to Chinese word segmentation and text marking processing through an IKAnalyzer word segmentation device, and redundant text information without identification value in the public sentiment text is removed through word retention removing operation.
4. The method for internet public opinion analysis considering topic discovery and sentiment analysis according to claim 2, wherein in step 1.2, the step of training the BiMPM topic discovery model based on the ELMo word vector includes:
(1) extracting title participles and content keywords of each text in the long text training set, and splicing the two fields;
(2) pre-training a language model of an ELMo word vector in a training set, and finely adjusting the language model on the training set of texts with different topics to enable the language model to adapt to a preset corpus environment;
(3) constructing a BiMPM topic discovery model, and adding ELMo word vectors into a sentence representation layer of the model;
(4) segmenting the processed topic text into: a train data set, a dev data set and a test data set;
(5) and continuously carrying out iterative training on the model to enable the model to be optimal on the dev data set, and finishing the training.
5. The method for internet public opinion analysis considering topic discovery and emotion analysis as claimed in claim 2, wherein in step 1.2, the step of training the LLDA topic model comprises:
a. performing text preprocessing on the short text data in the step 1 according to the prior probability;
b. sampling from Dirichlet distribution to generate subject distribution of the document, sampling from polynomial distribution of the subject to generate the subject of the jth word of the document, and determining a subject classification label;
c. sampling from Dirichlet distribution to generate word distribution corresponding to a theme;
d. sampling from a polynomial distribution of the words to finally generate the words;
e. repeating the steps a to d by using a plurality of groups of documents with labels, and training an LLDA topic model;
f. and debugging, evaluating and retraining the training result to obtain the trained LLDA theme model.
6. The internet public opinion analysis method considering topic discovery and emotion analysis as claimed in claim 1, wherein the step 2.1 includes the following steps: 8 types of labels in topic classification are introduced into the emotion dictionary of the known network, namely politics, military, economy, society, cultural and sports entertainment, science and technology, religion and other types; each class of tags is labeled as a surprise, pleasure, like, look ahead, anxiety, anger, sadness, hate 8-dimensional tag vector
Figure FDA0002815154310000031
Wherein vectors of the same vocabulary under different labels are different.
7. The internet public opinion analysis method considering topic discovery and emotion analysis as claimed in claim 6, wherein the step 2.2 includes the following specific steps:
1) and degree adverb processing, comprising: subdividing the known network degree level word dictionary into extreme degree word sets; wherein the step of subdividing comprises: endowing different emotion weight values according to the emotion colors expressed by different vocabularies;
2) negative word processing, comprising: setting a distance threshold value d for each emotion vocabulary in advance by using an algorithm of related negative words; in the text area of each emotional vocabulary, setting token of each participle as a unit 1, and backtracking forwards by taking the vocabulary in the web-aware emotion dictionary under the topic label as a reference to find a negative word; adding a negative label not _tothe current emotion vocabulary when a negative word is found until the backtracking distance is greater than or equal to a preset threshold value d or is overlapped with the previous emotion word or reaches the beginning of the sentence;
introducing a sliding window, wherein the size of the window takes punctuation marks in the clauses as a judgment basis; if odd negative words appear in the window where the emotional vocabulary is located, negation operation is carried out on the original emotional words;
3) turning sentence pattern processing, comprising: constructing a tail recognition turning sentence by adopting the turning conjunctions; in the processing process, whether a turning prefix exists is searched, and if not, the turning prefix does not exist according to a common statement; if the turning suffix exists, directly deleting the clause containing the turning prefix when the turning suffix does not exist; if the turning prefix and the suffix exist and the sentiment words in the emotion dictionary of the unknown network under the topic labels exist in the sentence of the turning suffix, deleting the clauses from the turning prefix to the turning suffix; if the emotion words in the network learning emotion dictionary under the topic labels do not exist in the clauses of the turning postfix, performing negation operation on the clauses containing the turning prefix;
4) exclamation word processing comprising: and (4) carrying out multi-scale subdivision on the existing exclamation word by combining the network emotion dictionary under the topic label, wherein the weight division range is-1.0.
8. The internet public opinion analysis method considering topic discovery and emotion analysis as claimed in claim 7, wherein the step 2.3 includes the following steps:
the vector space model is expressed as
Figure FDA0002815154310000041
In the formula, wjFor the jth emotional vocabulary and wordsGroup, n is a number, wherein w is givenjEach emotion vocabulary in the set is assigned an emotion vector.
9. The method as claimed in claim 8, wherein the step 2 of analyzing the sentiment of the public opinion comprises the following specific steps:
a) dividing the public opinion data comment content into sentences, and turning according to turning sentence pattern rule and turning word list to obtain sentence sequence Ls=<s1,s2,s3,K,sk,K,sn>;
b) To LsEach sentence S inkPerforming word segmentation to obtain a word and sentence sequence Lt=<t1,t2,t3,K,tm>;
c) Constructing emotion vector space model of each emotion vocabulary in sentence
Figure FDA0002815154310000042
Traversing and introducing the fine granularity division of the Hopkinson emotion dictionary of the topic label, and sequentially inquiring SkCarrying out emotion classification on the initial values of the emotion weight values of all emotion vocabularies, and recording the emotion weight values;
d) to SkPerforming negative word processing on each emotional word, adding a negative label to obtain SkThe negative label added emotional word list F;
e) aiming at each emotion vocabulary in the emotion word list F, the corresponding emotion weight is p, if the number of the not _ tags in front of the emotion is an odd number, a negative dictionary is inquired, and the emotion value is corrected to be p x n;
f) aiming at each emotional vocabulary in the list F, degree adverb and exclamation word processing is carried out, and the weight of the emotional vocabulary is continuously corrected;
g) according to the finally corrected weight of the emotional vocabulary, the sentence S is calculatedkIs denoted as Pk
h) Calculating the sentiment value of public opinion data comment content
Figure FDA0002815154310000043
And finally obtaining 8-dimensional emotion weight of public opinion data comment content.
10. A network public opinion analysis system considering topic discovery and emotion analysis is characterized by comprising:
the topic acquisition module is used for acquiring topics of the network public opinion text to be analyzed based on a trained BiMPM topic discovery model or LLDA topic model based on ELMo word vectors;
the emotion analysis module is used for carrying out emotion analysis on the topics acquired by the topic acquisition module based on the vector space model; wherein the vector space model is constructed by the steps of:
step 2.1, performing fine granularity division on emotion vocabularies under different topics on the basis of the existing emotion dictionary of the known network;
step 2.2, combining turning sentence patterns and exclamation word processing with a netknowledge emotion dictionary under a topic label for processing, and correcting the weight of emotion words;
and 2.3, introducing a topic label set in the topic discovery process, regarding the sentences in the public opinion comment data as the combination of emotional words according to the principle of a topic model 'bag of words', counting the emotional words in the combination, and identifying the emotional polarity of the sentences.
CN202011397734.XA 2020-12-03 2020-12-03 Network public opinion analysis method and system taking topic discovery and emotion analysis into consideration Active CN112434164B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011397734.XA CN112434164B (en) 2020-12-03 2020-12-03 Network public opinion analysis method and system taking topic discovery and emotion analysis into consideration

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011397734.XA CN112434164B (en) 2020-12-03 2020-12-03 Network public opinion analysis method and system taking topic discovery and emotion analysis into consideration

Publications (2)

Publication Number Publication Date
CN112434164A true CN112434164A (en) 2021-03-02
CN112434164B CN112434164B (en) 2023-04-28

Family

ID=74690831

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011397734.XA Active CN112434164B (en) 2020-12-03 2020-12-03 Network public opinion analysis method and system taking topic discovery and emotion analysis into consideration

Country Status (1)

Country Link
CN (1) CN112434164B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113569008A (en) * 2021-07-20 2021-10-29 南京市栖霞区民政事务服务中心 Big data analysis method and system based on community management data
CN114091469A (en) * 2021-11-23 2022-02-25 杭州萝卜智能技术有限公司 Sample expansion based network public opinion analysis method
CN114237460A (en) * 2021-10-14 2022-03-25 北京淘友天下科技发展有限公司 Label display method, device, terminal, storage medium and computer program product
CN117877738A (en) * 2024-03-13 2024-04-12 简阳市人民医院 COPD patient venous thrombosis prevention system based on knowledge-based health education mode

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108563638A (en) * 2018-04-13 2018-09-21 武汉大学 A kind of microblog emotional analysis method based on topic identification and integrated study
CN109684646A (en) * 2019-01-15 2019-04-26 江苏大学 A kind of microblog topic sentiment analysis method based on topic influence
CN110516067A (en) * 2019-08-23 2019-11-29 北京工商大学 Public sentiment monitoring method, system and storage medium based on topic detection
US20200019611A1 (en) * 2018-07-12 2020-01-16 Samsung Electronics Co., Ltd. Topic models with sentiment priors based on distributed representations
CN111143549A (en) * 2019-06-20 2020-05-12 东华大学 Method for public sentiment emotion evolution based on theme

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108563638A (en) * 2018-04-13 2018-09-21 武汉大学 A kind of microblog emotional analysis method based on topic identification and integrated study
US20200019611A1 (en) * 2018-07-12 2020-01-16 Samsung Electronics Co., Ltd. Topic models with sentiment priors based on distributed representations
CN109684646A (en) * 2019-01-15 2019-04-26 江苏大学 A kind of microblog topic sentiment analysis method based on topic influence
CN111143549A (en) * 2019-06-20 2020-05-12 东华大学 Method for public sentiment emotion evolution based on theme
CN110516067A (en) * 2019-08-23 2019-11-29 北京工商大学 Public sentiment monitoring method, system and storage medium based on topic detection

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ZHIGUO WANG: "Bilateral Multi-Perspective Matching for Natural Language Sentences", 《ARXIV》 *
杨云: "网络舆情话题识别及情感倾向分析的应用研究", 《中国优秀硕士学位论文全文数据库》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113569008A (en) * 2021-07-20 2021-10-29 南京市栖霞区民政事务服务中心 Big data analysis method and system based on community management data
CN114237460A (en) * 2021-10-14 2022-03-25 北京淘友天下科技发展有限公司 Label display method, device, terminal, storage medium and computer program product
CN114237460B (en) * 2021-10-14 2024-01-30 北京淘友天下科技发展有限公司 Label display method, device, terminal, storage medium and computer program product
CN114091469A (en) * 2021-11-23 2022-02-25 杭州萝卜智能技术有限公司 Sample expansion based network public opinion analysis method
CN117877738A (en) * 2024-03-13 2024-04-12 简阳市人民医院 COPD patient venous thrombosis prevention system based on knowledge-based health education mode
CN117877738B (en) * 2024-03-13 2024-05-07 简阳市人民医院 COPD patient venous thrombosis prevention system based on knowledge-based health education mode

Also Published As

Publication number Publication date
CN112434164B (en) 2023-04-28

Similar Documents

Publication Publication Date Title
Devika et al. Sentiment analysis: a comparative study on different approaches
CN107451126B (en) Method and system for screening similar meaning words
CN108536870B (en) Text emotion classification method fusing emotional features and semantic features
CN112434164B (en) Network public opinion analysis method and system taking topic discovery and emotion analysis into consideration
Tang et al. Multi-label patent categorization with non-local attention-based graph convolutional network
Wahid et al. Cricket sentiment analysis from Bangla text using recurrent neural network with long short term memory model
CN110807084A (en) Attention mechanism-based patent term relationship extraction method for Bi-LSTM and keyword strategy
CN109885675B (en) Text subtopic discovery method based on improved LDA
CN112613582B (en) Deep learning hybrid model-based dispute focus detection method and device
CN113505200B (en) Sentence-level Chinese event detection method combined with document key information
CN109446423B (en) System and method for judging sentiment of news and texts
CN115952292B (en) Multi-label classification method, apparatus and computer readable medium
Patel et al. Dynamic lexicon generation for natural scene images
CN111159342A (en) Park text comment emotion scoring method based on machine learning
CN114491062B (en) Short text classification method integrating knowledge graph and topic model
CN114756675A (en) Text classification method, related equipment and readable storage medium
CN114722176A (en) Intelligent question answering method, device, medium and electronic equipment
Dhar et al. Bengali news headline categorization using optimized machine learning pipeline
CN114493783A (en) Commodity matching method based on double retrieval mechanism
CN113486143A (en) User portrait generation method based on multi-level text representation and model fusion
CN112528653A (en) Short text entity identification method and system
CN115906824A (en) Text fine-grained emotion analysis method, system, medium and computing equipment
Liu et al. Suggestion mining from online reviews usingrandom multimodel deep learning
CN107729509A (en) The chapter similarity decision method represented based on recessive higher-dimension distributed nature
CN113516202A (en) Webpage accurate classification method for CBL feature extraction and denoising

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant