CN112434164A

CN112434164A - Network public opinion analysis method and system considering topic discovery and emotion analysis

Info

Publication number: CN112434164A
Application number: CN202011397734.XA
Authority: CN
Inventors: 曲宇航; 赵昕禹; 惠维
Original assignee: Xian Jiaotong University
Current assignee: Xian Jiaotong University
Priority date: 2020-12-03
Filing date: 2020-12-03
Publication date: 2021-03-02
Anticipated expiration: 2040-12-03
Also published as: CN112434164B

Abstract

The invention discloses a network public opinion analysis method and a system considering topic discovery and emotion analysis, wherein the method comprises the following steps: step 1, obtaining topics of network public opinion texts to be analyzed based on a trained BiMPM topic discovery model or LLDA topic model based on ELMo word vectors; and 2, carrying out sentiment analysis on the topics acquired in the step 1 based on the vector space model. Aiming at the characteristics of different public opinion carriers, the invention adopts LLDA and a BiMPM topic discovery model based on ELMo word vectors to discover topics of long and short texts; and large-scale manual marking corpora and training modeling are introduced, a training model with excellent performance indexes such as accuracy and recall rate is generated, and efficient topic discovery is achieved.

Description

Network public opinion analysis method and system considering topic discovery and emotion analysis

Technical Field

The invention belongs to the technical field of data mining, relates to the field of topic models and clustering methods, and particularly relates to a network public opinion analysis method and system considering topic discovery and emotion analysis.

Background

The network public opinion is the sum of the attitude and the viewpoint of public on a network platform for publishing social hot events, and has strong emotional tendency, so that the speaking free space is continuously expanded, the speaking content is informationized, social sensitive topics are triggered, and once the severe social emotion is easily caused by excitation, the resolution is difficult in a short time. The public opinion analysis system is a software system which comprehensively uses a search engine technology, a text processing technology, a knowledge management method, natural language processing and a mobile phone short message platform, and meets the requirements of users on network public opinion monitoring, hot event topic tracking and the like by automatically acquiring, extracting, classifying, clustering, topic monitoring and topic focusing on mass internet information.

The existing public opinion analysis mode mainly comprises: topic discovery and emotional tendency analysis; the emotional tendency analysis method comprises semantic analysis based on an emotional dictionary and a characteristic classification based method. The emotion dictionary is a dictionary for artificially dividing emotion words, an emotion dictionary method is adopted, evaluation words and phrases in network public opinion data are identified, emotion weight values are obtained through polarity processing, and the similarity between the emotion words and candidate words is calculated, so that perspective emotion tendency of the candidate words is predicted. The method is mostly used for extracting the network public opinion evaluation content and judging the polarity. And the emotion analysis based on feature classification utilizes a machine learning mode to construct a corpus and screen out large-scale valuable features to realize emotion statistical classification. In addition, the two methods are combined for use, namely based on an emotion vector space model, an emotion dictionary is built, emotion words and polarity are recognized, a machine learning method is added for emotion classification statistics, and the accuracy of overall emotion weight calculation is improved.

In summary, the existing public opinion analysis method has the following defects: the exploration of hot topics is not clear; the emotion analysis accuracy is not high.

Disclosure of Invention

The invention aims to provide a network public opinion analysis method and system considering topic discovery and emotion analysis so as to solve one or more technical problems. The invention can realize efficient topic discovery and can provide more accurate emotion analysis for users.

In order to achieve the purpose, the invention adopts the following technical scheme:

the invention discloses an online public opinion analysis method considering topic discovery and emotion analysis, which comprises the following steps of:

step 1, obtaining topics of network public opinion texts to be analyzed based on a trained BiMPM topic discovery model or LLDA topic model based on ELMo word vectors;

step 2, carrying out sentiment analysis on the topics acquired in the step 1 based on a vector space model; wherein the vector space model is constructed by the steps of:

step 2.1, performing fine granularity division on emotion vocabularies under different topics on the basis of the existing emotion dictionary of the known network;

step 2.2, combining turning sentence patterns and exclamation word processing with a netknowledge emotion dictionary under a topic label for processing, and correcting the weight of emotion words;

and 2.3, introducing a topic label set in the topic discovery process, regarding the sentences in the public opinion comment data as the combination of emotional words according to the principle of a topic model 'bag of words', counting the emotional words in the combination, and identifying the emotional polarity of the sentences.

The further improvement of the invention is that in step 1, the step of obtaining the trained BiMPM topic discovery model and LLDA topic model based on the ELMo word vector comprises:

step 1.1, collecting and obtaining a plurality of public opinion texts to construct and obtain a training set; dividing the training set into a long text training set and a short text training set according to a preset text length threshold; wherein, the training sets are all manually labeled with topic classification and classified according to the labeled topic classification;

step 1.2, training a BiMPM topic discovery model based on ELMo word vectors based on the obtained long text training set to obtain a trained BiMPM topic discovery model based on the ELMo word vectors; and training the LLDA topic model based on the obtained short text training set to obtain the trained LLDA topic model.

The invention has the further improvement that in the step 1.1, the step of acquiring a plurality of public opinion texts and constructing a training set comprises the following steps:

for HTTP communication, Fiddler, HttpClient and Jsoup are combined to collect data; acquiring data by using an open source API (application programming interface) to obtain an OAuth authorization mechanism for the social media platform;

the original data of the public sentiment text is subjected to Chinese word segmentation and text marking processing through an IKAnalyzer word segmentation device, and redundant text information without identification value in the public sentiment text is removed through word retention removing operation.

The further improvement of the invention is that in step 1.2, the step of training the BiMPM topic discovery model based on the ELMo word vector comprises:

(1) extracting title participles and content keywords of each text in the long text training set, and splicing the two fields;

(2) pre-training a language model of an ELMo word vector in a training set, and finely adjusting the language model on the training set of texts with different topics to enable the language model to adapt to a preset corpus environment;

(3) constructing a BiMPM topic discovery model, and adding ELMo word vectors into a sentence representation layer of the model;

(4) segmenting the processed topic text into: a train data set, a dev data set and a test data set;

(5) and continuously carrying out iterative training on the model to enable the model to be optimal on the dev data set, and finishing the training.

In a further improvement of the present invention, in step 1.2, the step of training the LLDA topic model includes:

a. performing text preprocessing on the short text data in the step 1 according to the prior probability;

b. sampling from Dirichlet distribution to generate subject distribution of the document, sampling from polynomial distribution of the subject to generate the subject of the jth word of the document, and determining a subject classification label;

c. sampling from Dirichlet distribution to generate word distribution corresponding to a theme;

d. sampling from a polynomial distribution of the words to finally generate the words;

e. repeating the steps a to d by using a plurality of groups of documents with labels, and training an LLDA topic model;

f. and debugging, evaluating and retraining the training result to obtain the trained LLDA theme model.

The invention is further improved in that the step 2.1 comprises the following specific steps: 8 types of labels in topic classification are introduced into the emotion dictionary of the known network, namely politics, military, economy, society, cultural and sports entertainment, science and technology, religion and other types; each class of tags is labeled as a surprise, pleasure, like, look ahead, anxiety, anger, sadness, hate 8-dimensional tag vector

For example, the emotion vocabulary "attack" in the military label includes negative emotion attitudes of war caused by attack, the emotion degree is heavy, the spatial dimension is (0, 0, 0, 0, 0.8, 0.9), and sadness and hate in the emotion value are 0.8 and 0.9 respectively. However, in the genre and style entertainment tag, the negative emotional attitude of skills in sports activities is included, the emotional degree is light, the emotion degree is marked as (0, 0, 0, 0, 0, 0.5, 0, 0), and the emotion value is 0.5.

The invention is further improved in that the step 2.2 comprises the following specific steps:

1) and degree adverb processing, comprising: subdividing the known network degree level word dictionary into extreme degree word sets; wherein the step of subdividing comprises: endowing different emotion weight values according to the emotion colors expressed by different vocabularies;

2) negative word processing, comprising: setting a distance threshold value d for each emotion vocabulary in advance by using an algorithm of related negative words; in the text area of each emotional vocabulary, setting token of each participle as a unit 1, and backtracking forwards by taking the vocabulary in the web-aware emotion dictionary under the topic label as a reference to find a negative word; adding a negative label not _tothe current emotion vocabulary when a negative word is found until the backtracking distance is greater than or equal to a preset threshold value d or is overlapped with the previous emotion word or reaches the beginning of the sentence;

introducing a sliding window, wherein the size of the window takes punctuation marks in the clauses as a judgment basis; if odd negative words appear in the window where the emotional vocabulary is located, negation operation is carried out on the original emotional words;

3) turning sentence pattern processing, comprising: constructing a tail recognition turning sentence by adopting the turning conjunctions; in the processing process, whether a turning prefix exists is searched, and if not, the turning prefix does not exist according to a common statement; if the turning suffix exists, directly deleting the clause containing the turning prefix when the turning suffix does not exist; if the turning prefix and the suffix exist and the sentiment words in the emotion dictionary of the unknown network under the topic labels exist in the sentence of the turning suffix, deleting the clauses from the turning prefix to the turning suffix; if the emotion words in the network learning emotion dictionary under the topic labels do not exist in the clauses of the turning postfix, performing negation operation on the clauses containing the turning prefix;

4) exclamation word processing comprising: and (4) carrying out multi-scale subdivision on the existing exclamation word by combining the network emotion dictionary under the topic label, wherein the weight division range is-1.0.

The invention is further improved in that the step 2.3 comprises the following specific steps:

the vector space model is expressed as

In the formula, w_jIs the jth emotional vocabulary and phrase, n is the number, wherein w is given_jEach emotion vocabulary in the set is assigned an emotion vector.

The invention has the further improvement that in the step 2, the specific steps of public sentiment emotion analysis comprise:

a) dividing the public opinion data comment content into sentences, and turning according to turning sentence pattern rule and turning word list to obtain sentence sequence L_s＝<s₁,s₂,s₃,K,s_k,K,s_n>；

b) To L_sEach sentence S in_kPerforming word segmentation to obtain a word and sentence sequence L_t＝<t₁,t₂,t₃,K,t_m>；

c) Constructing emotion vector space model of each emotion vocabulary in sentence

Traversing and introducing the fine granularity division of the Hopkinson emotion dictionary of the topic label, and sequentially inquiring S_kCarrying out emotion classification on the initial values of the emotion weight values of all emotion vocabularies, and recording the emotion weight values;

d) to S_kPerforming negative word processing on each emotional word, adding a negative label to obtain S_kThe negative label added emotional word list F;

e) aiming at each emotion vocabulary in the emotion word list F, the corresponding emotion weight is p, if the number of the not _ tags in front of the emotion is an odd number, a negative dictionary is inquired, and the emotion value is corrected to be p x n;

f) aiming at each emotional vocabulary in the list F, degree adverb and exclamation word processing is carried out, and the weight of the emotional vocabulary is continuously corrected;

g) according to the finally corrected weight of the emotional vocabulary, the sentence S is calculated_kIs denoted as P_k；

h) Calculating the sentiment value of public opinion data comment content

And finally obtaining 8-dimensional emotion weight of public opinion data comment content.

The invention discloses an online public opinion emotion analysis system based on an emotion vector space model, which comprises:

the topic acquisition module is used for acquiring topics of the network public opinion text to be analyzed based on a trained BiMPM topic discovery model or LLDA topic model based on ELMo word vectors;

the emotion analysis module is used for carrying out emotion analysis on the topics acquired by the topic acquisition module based on the vector space model; wherein the vector space model is constructed by the steps of:

Compared with the prior art, the invention has the following beneficial effects:

the invention provides a method for clustering emotion vectors in public opinion information based on a topic model to find topics and emotional tendencies. The invention provides an online public opinion analysis method based on an emotion vector space model on the basis of detailed research on a theme model algorithm. Aiming at the characteristics of different public sentiment carriers, the invention adopts an LLDA (laboratory-LDA, latent Dirichlet distribution with additional category labels) and a BIMPM topic discovery model based on ELMo word vectors to discover topics of long and short texts; and large-scale manual marking corpora and training modeling are introduced, a training model with excellent performance indexes such as accuracy and recall rate is generated, and efficient topic discovery is achieved. The emotion analysis based on the emotion vector space model is adopted, and the emotion words in the Hopkinson network emotion dictionary under the subject label are divided in a fine granularity mode manually to construct an emotion vector space model; and according to the emotion distribution, a BiMPM model is adopted to carry out effective emotion vocabulary classification statistics, fine-grained emotion tendency analysis is realized, and more accurate emotion analysis is provided for users. In conclusion, the topic model can be established to better analyze the hot topics in the text, and the BiMPM model is used for the long text, the LLDA model is used for the short text, so that the limitation of the model is effectively avoided, and the accuracy of topic discovery is improved. The invention divides the emotional tendency more finely on the basis of the existing emotional dictionary, avoids dividing the emotion with simple positive and negative attitudes, and improves the capability of correctly analyzing the emotional tendency in different contexts.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art are briefly introduced below; it is obvious that the drawings in the following description are some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.

Fig. 1 is a schematic flowchart of an internet public opinion analysis method based on an emotion vector space model according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a BiMPM model according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of the LLDA model in an embodiment of the present invention.

Detailed Description

In order to make the purpose, technical effect and technical solution of the embodiments of the present invention clearer, the following clearly and completely describes the technical solution of the embodiments of the present invention with reference to the drawings in the embodiments of the present invention; it is to be understood that the described embodiments are only some of the embodiments of the present invention. Other embodiments, which can be derived by one of ordinary skill in the art from the disclosed embodiments without inventive faculty, are intended to be within the scope of the invention.

Referring to fig. 1, a method for internet public opinion analysis based on emotion vector space model according to an embodiment of the present invention includes the following steps:

step 1, text preprocessing:

1.1) firstly, data acquisition is carried out by using a Fiddler + HttpClient + Jsoup comprehensive technology for HTTP communication, and data acquisition is carried out by using an open source API (application program interface) acquisition OAuth authorization mechanism for a social media platform.

1.2) carrying out Chinese word segmentation, text marking processing, word retention removal and other operations on the original data through an IKAnalyzer word segmentation device to remove redundant text information without identification value in the original text so as to improve the accuracy of topic discovery of the topic model.

Referring to fig. 2, in step 2, the long text topic discovery method of the BiMPM topic discovery model based on the ELMo word vector includes:

Referring to fig. 3, in step 3, the method for discovering short text topics based on the LLDA topic model includes:

aiming at the problem of short text data sparseness, the invention uses a topic discovery algorithm based on an LLDA topic model. The algorithm comprises the following steps:

a. performing text preprocessing on the short text data according to the prior probability;

b. sampling from the Dirichlet distribution generates a subject distribution of the document, which is generated from a hyper-parametric Dirichlet distribution. Sampling the subject of the jth word of the generated document from the polynomial distribution of the subject, determining a subject classification label, and emphasizing the setting of a certain specific label;

c. a word distribution corresponding to a theme is generated by sampling from a Dirichlet distribution (namely, Dirichlet distribution), in other words, the word distribution is generated by the Dirichlet distribution as a parameter;

e. repeating the steps a to d by using a plurality of groups of documents with labels, and training an LLDA model;

f. debugging, evaluating and retraining the training result to finally generate a better LLDA model;

g. and repeating the steps b to f.

Step 4, emotion analysis based on the vector space model:

4.1) emotion vocabulary recognition: 8 types of labels set in topic classification are introduced into the construction of an actual netwary emotion dictionary, and are marked as 8-dimensional label vectors such as surprise (surrised), delight (delight), love (love), expectation (expected), anxiety (anxiety), anger (angry), sadness (sad), hate (hatred) and the like.

4.2) judging the emotion polarity: aiming at the phenomenon that the adverb with emotional degree has different emotional tendencies in different contexts, the invention combines the actual topic label to carry out the following processing on the emotional dictionary:

a. and (5) processing degree adverbs. The degree level words in the known net dictionary are manually subdivided into a degree extreme word set. Different emotion weight values are given according to the emotion colors expressed by different vocabularies.

b. And (5) negative word processing. And (3) setting a distance threshold value d for each emotion vocabulary in advance by using an algorithm of related negative words. In a text area where each emotional vocabulary is located, setting token of each participle as a unit 1, tracing back forward by taking vocabularies in a knowledge network emotion dictionary under a topic label as a reference, continuously searching for a negative word, and adding a negative label not _tothe current emotional vocabulary when finding a negative word until the backtracking distance is larger than a preset threshold value d or is overlapped with the previous emotional word or reaches the beginning of a text sentence.

c. And (5) turning sentence pattern processing. And constructing a tail recognition turning sentence by adopting the turning conjunctions.

d. And (5) exclamation word processing. And carrying out multi-scale subdivision on the exclamation words by combining the artificial constructed netpage emotion dictionary under the topic labels, and superposing different weight values in the calculation of the weight values so as to increase the emotional tendency of the comment content.

4.3) calculating emotion weight: the emotion weight is calculated under different topic labels, and an emotion vector space is constructed through emotion vocabulary discrimination, and an initial value of the emotion vocabulary weight is given. Secondly, according to the emotion distribution characteristics, emotion classification statistics is carried out, and emotion weight is calculated. Finally, the emotion weight is continuously corrected through emotion polarity processing.

The embodiment of the invention specifically comprises the following steps:

and (4) emotion vocabulary recognition. The conventional web emotion dictionary only carries out positive and negative division on vocabulary emotion and is not enough to finish emotion analysis, so that the invention manually carries out fine granularity division on emotion vocabularies under different topics on the basis of the conventional web emotion dictionary. 8 types of labels in topic classification are introduced into the emotion dictionary of the known network, namely politics, military, economy, society, cultural and sports entertainment, science and technology, religion and other types; each class of tags is labeled as a surprise, pleasure, like, look ahead, anxiety, anger, sadness, hate 8-dimensional tag vector

The emotion weight initial value marked under 8 topic labels ranges from-1.0 to 1.0. For example, the emotion vocabulary "attack" in the military label includes negative emotion attitudes of war caused by attack, the emotion degree is heavy, the spatial dimension is (0, 0, 0, 0, 0.8, 0.9), and sadness and hate in the emotion value are 0.8 and 0.9 respectively. However, in the genre and style entertainment tag, the negative emotional attitude of skills in sports activities is included, the emotional degree is light, the emotion degree is marked as (0, 0, 0, 0, 0, 0.5, 0, 0), and the emotion value is 0.5.

The invention combines the actual topic labels, processes turning sentence patterns, exclamation words and the like more accurately by combining the knowledge network emotion dictionaries under the topic labels, and corrects the weight of emotion words. The processing steps include:

(1) and degree adverb processing, comprising: subdividing the known net degree level word dictionary into a 'extreme degree word set'; the step of subdivision comprises the steps of endowing different emotion weight values according to the emotion colors expressed by different vocabularies;

(2) negative word processing, comprising: setting a distance threshold value d for each emotion vocabulary in advance by using an algorithm of related negative words; in a text area where each emotional vocabulary is located, setting token of each participle as a unit 1, tracing back forward by taking vocabularies in a knowledge network emotion dictionary under a topic label as a reference, continuously searching for a negative word, and adding a negative label 'not _' to the current emotional vocabulary when finding a negative word until the backtracking distance is more than or equal to a preset threshold value d or is overlapped with the previous emotional word or reaches the beginning of a text sentence; meanwhile, a 'sliding window' concept is introduced, the size of a window takes punctuation marks in the clauses as a judgment basis, if odd negative words appear in a window where the emotion words are located, negation operation is carried out on the original emotion words, and specifically, the weight of the original emotion words is multiplied by the weight of the negative words.

(3) Turning sentence pattern processing, comprising: and constructing a tail recognition turning sentence by adopting the turning conjunctions. In the processing process, whether the turning prefix exists is searched, and if the turning prefix does not exist, the turning prefix is processed according to a common statement. If so, when no inflected suffix exists, the clause containing the inflected prefix is directly deleted. And if the turning prefix and the suffix exist and the emotional words in the informed network emotion dictionary under the topic labels exist in the turning suffix sentence, the clause from the turning prefix to the turning suffix is deleted. And if the clause of the turning suffix does not have the emotion words in the network-aware emotion dictionary under the topic labels, performing negation operation on the clause containing the turning prefix. A specific sentence rule is "turn prefix.

(4) Exclamation word processing comprising: and (3) carrying out multi-scale subdivision on the existing 85 exclamation words by combining a netpage emotion dictionary under the topic label, wherein the weight division range is-1.0. And performing superposition processing when calculating the weight value so as to increase the emotional tendency of the text information.

And calculating the emotion weight, which comprises the following steps: and introducing a topic tag set in a topic discovery process, regarding sentences in the public opinion comment data as a combination of emotional words according to a topic model 'bag of words' principle, counting the emotional words in the combination, and identifying the emotional polarity of the sentences. The specific vector space model is expressed as

In the formula w_jIs the ith emotional vocabulary and phrase, n is the number, wherein w is given_jEndowing each emotion vocabulary with emotion vectors

Introducing an LDA theme model into the public sentiment text, constructing an emotion vector space model, and acquiring probability distribution of text-theme-vocabulary from the hidden theme, namely the theme t with large emotion weight_m。

The public sentiment emotion analysis method in the embodiment of the invention comprises the following steps:

a) dividing the public opinion data comment content into sentences, and turning according to turning sentence pattern rule and turning word list to obtain sentence sequence L_s＝<s₁,s₂,s₃,K,s_n>；

Then traversing the fine-grained division of the emotion dictionary of the cognitive network introducing the topic labels, and sequentially inquiring S_kThe initial values of the emotion weight values of all emotion vocabularies are classified by adopting an LDA theme model, and the emotion weight values are recorded;

e) aiming at each emotion vocabulary in the list F, the corresponding emotion weight is p, if the number of the not _ tags in front of the emotion is an odd number, a negative dictionary is inquired, and the emotion value is corrected to be p x n;

h) Calculating the sentiment value of public opinion data comment content

And finally giving out the 8-dimensional emotion weight of the public opinion data comment content.

In summary, aiming at the defects or shortcomings in the prior art, the invention provides an online public opinion analysis method based on an emotion vector space model by carrying out detailed research on a topic model algorithm. Aiming at the characteristics of different public opinion carriers, topic discovery of long and short texts is carried out by adopting an LLDA (Label-LDA, latent Dirichlet distribution with attached category labels) and a BIMPM topic discovery model based on ELMo word vectors. And large-scale manual marking corpora and training modeling are introduced, a training model with excellent performance indexes such as accuracy and recall rate is generated, and efficient topic discovery is achieved. Meanwhile, emotion words in a Hopkinson network emotion dictionary under the theme label are divided in a fine-grained manner manually by adopting emotion analysis based on the emotion vector space model, and the emotion vector space model is constructed. And according to the emotion distribution, effective emotion vocabulary classification statistics is carried out by adopting an LDA theme model, fine-grained emotion tendency analysis is realized, and more accurate emotion analysis is provided for users. Compared with the prior art, the public opinion analysis method based on the emotion vector space model provided by the embodiment of the invention has the following advantages: 1. the hot topics in the text can be better analyzed by establishing the topic model, the BiMPM model is used for the long text, the LLDA model is used for the short text, the limitation of the model is effectively avoided, and the topic finding accuracy is improved. 2. The emotional tendency is divided more finely on the basis of the existing emotional dictionary, the emotion is prevented from being divided by simple positive and negative attitudes, and the capability of correctly analyzing the emotional tendency in different contexts is improved.

The invention provides an online public opinion emotion analysis system based on an emotion vector space model, which comprises:

the topic acquisition module is used for acquiring topics of the network public opinion text to be analyzed based on the trained BiMPM model or LLDA topic model;

In the system of the embodiment of the present invention, the text preprocessing includes:

1.2) carrying out Chinese word segmentation, text marking processing, word retention removal and other operations on the original data through an IKAnalyzer word segmentation device to remove redundant text information without identification value in the original text so as to improve the accuracy of topic discovery of the topic model. The principle of the IKAnalyzer is a forward maximum matching algorithm, the word segmentation process is that a plurality of continuous characters in the text to be segmented are matched with a word list from left to right, the scanning is continued after the matching is successful until the next scanning is finished when the next scanning is not the word or the prefix of the word in the word list, and the word is segmented.

In the system of the embodiment of the present invention, 2) a long text topic discovery method of a BiMPM topic discovery model based on an ELMo word vector:

In the system of the embodiment of the invention, 3) the method for discovering the short text topic based on the LLDA topic model comprises the following steps:

a. performing text preprocessing on the short text data in the step 1) according to the prior probability;

f. and debugging, evaluating and retraining the training result. Finally, a better LLDA model is generated.

g. And repeating the steps b to f.

In the system of the embodiment of the invention, 4) the emotion analysis method based on the vector space model comprises the following steps:

4.1) emotion vocabulary recognition: politics arranged in topic classification is introduced into the construction of an actual network knowledge emotion dictionary,Military, economic, social, etc. 8 types of tags, and labeled as surprised (surrised), joy (r) ((r))delight) Love (love), expected (expected), anxiety (anxiety), anger (angry), sadness (sad), hate (hate)hatred) And the 8-dimensional label vector is waited.

4.3) calculating emotion weight: the emotion weight is calculated under different topic labels, and an emotion vector space is constructed through emotion vocabulary discrimination, and an initial value of the emotion vocabulary weight is given. Secondly, according to the emotion distribution characteristics, emotion classification statistics is carried out, and emotion weight is calculated. Finally, the emotion weight is continuously corrected through emotion polarity processing. The specific vector space model is expressed as

Introducing an LDA topic model into a text in public opinion comment content, constructing an emotion vector space model, and acquiring probability distribution of text-topic-vocabulary from a hidden topic T, namely the topic T with a large emotion weight_mThen the concrete formula is converted into

The specific calculation formula of the emotion weight is as follows:

in the formula (I), the compound is shown in the specification,

and carrying out weighted summation on all emotion vocabulary weights in the sentence.

In summary, the invention discloses a network public opinion analysis method and system based on an emotion vector space model, and the main algorithm comprises two parts, namely a BiMPM topic discovery model based on ELMo word vectors, topic discovery of latent Dirichlet distribution (Labeled-LDA, LLDA) with additional class labels and emotion tendency analysis based on vector space. The adopted technical scheme is as follows: after preprocessing operations such as Chinese word segmentation, text marking and staying, word removing and staying and the like are carried out on an original text, topic discovery is realized by utilizing a BiMPM topic discovery model and an LLDA topic model. Then 8 topic classifications and 8 types of emotional tendency labels are introduced into a netword emotion dictionary, 8-dimensional emotional weight values are calculated after each emotional vocabulary component emotion vector space model in the text to be analyzed, and the emotional tendency of the public opinion data is obtained according to the calculated emotional weight values.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Although the present invention has been described in detail with reference to the above embodiments, those skilled in the art can make modifications and equivalents to the embodiments of the present invention without departing from the spirit and scope of the present invention, which is set forth in the claims of the present application.

Claims

1. A network public opinion analysis method considering topic discovery and emotion analysis is characterized by comprising the following steps:

2. The method for analyzing network public sentiment considering both topic discovery and sentiment analysis as claimed in claim 1, wherein in step 1, the step of obtaining the trained BiMPM topic discovery model and LLDA topic model based on the ELMo word vector comprises:

3. The method for internet public opinion analysis considering topic discovery and emotion analysis as claimed in claim 2, wherein in step 1.1, the step of acquiring a plurality of public opinion texts and constructing a training set comprises:

4. The method for internet public opinion analysis considering topic discovery and sentiment analysis according to claim 2, wherein in step 1.2, the step of training the BiMPM topic discovery model based on the ELMo word vector includes:

5. The method for internet public opinion analysis considering topic discovery and emotion analysis as claimed in claim 2, wherein in step 1.2, the step of training the LLDA topic model comprises:

6. The internet public opinion analysis method considering topic discovery and emotion analysis as claimed in claim 1, wherein the step 2.1 includes the following steps: 8 types of labels in topic classification are introduced into the emotion dictionary of the known network, namely politics, military, economy, society, cultural and sports entertainment, science and technology, religion and other types; each class of tags is labeled as a surprise, pleasure, like, look ahead, anxiety, anger, sadness, hate 8-dimensional tag vector

Wherein vectors of the same vocabulary under different labels are different.

7. The internet public opinion analysis method considering topic discovery and emotion analysis as claimed in claim 6, wherein the step 2.2 includes the following specific steps:

8. The internet public opinion analysis method considering topic discovery and emotion analysis as claimed in claim 7, wherein the step 2.3 includes the following steps:

the vector space model is expressed as

In the formula, w_jFor the jth emotional vocabulary and wordsGroup, n is a number, wherein w is given_jEach emotion vocabulary in the set is assigned an emotion vector.

9. The method as claimed in claim 8, wherein the step 2 of analyzing the sentiment of the public opinion comprises the following specific steps:

h) Calculating the sentiment value of public opinion data comment content

10. A network public opinion analysis system considering topic discovery and emotion analysis is characterized by comprising: