CN109446423B

CN109446423B - System and method for judging sentiment of news and texts

Info

Publication number: CN109446423B
Application number: CN201811257151.XA
Authority: CN
Inventors: 李敏; 吴家鸣; 赵巍巍
Original assignee: Beijing Jiebao Data Technology Co ltd
Current assignee: Beijing Jiebao Data Technology Co ltd
Priority date: 2018-10-26
Filing date: 2018-10-26
Publication date: 2022-07-01
Anticipated expiration: 2038-10-26
Also published as: CN109446423A

Abstract

The invention discloses a method for judging the emotion of news and texts, which comprises the following steps: preprocessing a news text crawled by a network, removing a crawler webpage label and removing stop words in the news text; preliminarily judging the emotion of the news text by adopting a deep learning method; judging the secondary emotion of the news text by using an SVM (support vector machine) method; collecting and summarizing positive or negative emotion words in the news text, matching the positive or negative emotion words with the positive or negative emotion database, calculating the proportion of the positive or negative emotion words in the news text, and performing three-time emotion judgment; performing weight calculation on the primary emotion judgment result, the secondary emotion judgment result and the tertiary emotion judgment result, and comprehensively judging the emotion of the news text; according to the scheme, the emotion judgment methods of the three methods are subjected to weight calculation to comprehensively judge the emotion of the news and the text, so that the accuracy of the emotion judgment of the news and the text is improved.

Description

System and method for judging sentiment of news and text

Technical Field

The invention relates to the technical field of artificial intelligence and natural language processing, in particular to a system and a method for judging the emotion of news and texts.

Background

With the rapid development of network technology and network media, mass information such as news, user views, user evaluations, social public opinions, and the like in the network has increased rapidly. Many of these messages carry subjective emotional information, including positive emotions and negative emotions.

In the past, human beings have been used to determine the sentiment of news and text. This requires a lot of manpower to judge the emotion of the web news and text. If the emotion is judged manually, as today of network information explosion, it is extremely lagged to manually judge the news tuningness and the emotional tendency of the text.

Therefore, how to accurately judge the subjective feelings of a large amount of information at a high speed without manual work is a urgent and important technical problem for governments, business units, and the like.

In the invention with the patent application number of 201710463295.X, a system for acquiring network news and predicting text emotion is disclosed. The system takes a news text crawled by a network as a training set, marks and trains the emotion of the network news by using an SVM method, and then judges the emotion of the network news.

In the above invention, there are problems that only one emotion judgment means is used, which is limited in accuracy of emotion judgment and it is difficult to have better performance in accuracy.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides a system and a method for judging the emotion of news and texts, which can effectively solve the problems in the background art.

The technical scheme adopted by the invention for solving the technical problems is as follows:

a method for judging the emotion of news and texts is characterized by comprising the following steps:

step 100, preprocessing a news text crawled by a network, merging the title and the content of the news text, and removing a crawler webpage label and stop words in the text;

200, preliminarily judging the emotion of the news text by adopting a deep learning method;

step 300, performing secondary emotion judgment on the news text by using an SVM (support vector machine) method;

step 400, collecting positive or negative emotion words in the summarized news text, matching the emotion words with an emotion database, calculating the proportion of the positive or negative emotion words in the news text, and performing three times of emotion judgment;

and 500, performing weight calculation on the primary emotion judgment result, the secondary emotion judgment result and the tertiary emotion judgment result, and comprehensively judging the emotion of the news text.

Further, in step 200, the specific steps of performing preliminary emotion judgment by using a deep learning method are as follows:

step 201, performing word segmentation processing on the news text without stop words to obtain word sequences of unary words, binary words, ternary words and multiple words;

step 202, comparing the word segmentation elements of the word sequence in the step 201 with an emotion database respectively, rearranging the word sequence according to negative words and degree words, and generating an extended word sequence;

step 203, counting the number of all phrases which represent emotional tendency in the word elements in the extended word sequence, namely counting the total number of words which represent positive tendency, the total number of words which represent negative tendency and the total number of words which represent neutral tendency in news or texts;

and step 204, inputting the emotional tendency words in the extended word sequence into an emotion judgment model for training to obtain a judgment result.

Further, in the step 202, the step of comparing the word segmentation elements with the emotion database to generate the extended word sequence specifically includes:

firstly, news or text titles are classified and correspond to corresponding theme fields;

then, selecting an emotion database in the corresponding theme field;

and finally, matching word segmentation elements in the word sequence with an emotion database respectively, merging the degree words and the negative words adjacent to the positive or negative emotion tendency words together, replacing word elements in the word sequence with standard words close to the word segmentation elements in the emotion database, and re-integrating the word elements into an extended word sequence.

Further, in step 300, the specific method for performing secondary emotion judgment by using the SVM method is as follows:

301, extracting emotional tendency feature words in the news text according to the sequence, and classifying the emotional tendency feature words in the news text into a positive type, a neutral type and a negative type;

step 302, integrating a plurality of feature words in a feature dictionary by adopting an IG (interactive group) algorithm of the emotional tendency feature words;

and 303, performing tf/idf calculation on the feature words in the feature dictionary, and adding tf/idf values of the feature words into the SVM model for training to obtain positive, neutral and negative emotional tendency values.

Further, the calculation formula of the IG algorithm is specifically as follows:

IG＝∑P(i)ln(P(i)/Q(i))；

wherein IG is information gain, p (i) is probability distribution of ith feature word, and q (i) is probability distribution of emotion classification.

Further, the emotion database comprises a positive emotion dictionary, a negative emotion dictionary, a degree adverb dictionary and a negative dictionary.

Further, in the step 500, the formula for performing weight calculation on the emotion judgment result specifically is:

E(X)＝∑(p(x)*e(x))；

wherein E (X) is the statistical mathematical expectation of emotional tendency of the three algorithms, p (x) is the weight of the algorithm, and e (x) is the emotional tendency value of the algorithm.

Further, the weight of the three emotion judgment algorithms is specifically calculated as follows:

firstly, acquiring experimental texts in various different fields and different subjects;

secondly, manually and accurately identifying the emotional tendency result of the text for the experimental text, namely the topic is a positive tendency, a negative tendency or a neutral tendency;

thirdly, judging the emotion of the experimental text according to three emotion judgment methods, namely a deep learning method, an SVM method and an emotion database method, and respectively recording the emotion judgment results of the three emotion judgment methods on the experimental text;

and fourthly, comparing the judgment results of all the experimental texts with the manually judged emotion results by the three emotion judgment methods in sequence, and respectively determining the accuracy of the three emotion judgment methods, wherein the accuracy is the weight of the three emotion judgment methods.

Compared with the prior art, the invention has the beneficial effects that: the invention combines a deep learning method, an SVM method and an emotion database method, and carries out weight calculation on the emotion judgment methods of the three methods to carry out comprehensive emotion judgment on news and texts, so that the method has a very good effect on the emotion judgment of the news and the texts in practical application, and has very high accuracy.

Drawings

FIG. 1 is a schematic view of a judging process according to the present invention;

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

As shown in fig. 1, the present invention provides a method for judging the emotion of news and text, which comprises the following steps:

step 100, preprocessing the news text crawled by the network, merging the title and the content of the news text, removing the webpage tags of the crawler and stop words in the text, and removing the messy tags of the crawled content, wherein the messy tags comprise symbols such as "diamond-solid", ". tangle-solidup", "↓", and the like. And html tags of the web pages are removed, and stop words are specifically conjunctions and adverbs, such as "and", "get", "between", and the like.

And 200, preliminarily judging the emotion of the news text by adopting a deep learning method.

In this step, the specific steps of performing preliminary emotion judgment by using a deep learning method are as follows:

step 201, performing word segmentation processing on the news text without stop words to obtain word sequences of unary words, binary words, ternary words and multiple words, wherein the word segmentation processing is to divide each sentence of the news text into a plurality of word elements according to a common word as a unit, so as to form word sequences of a plurality of groups of binary words, ternary words and multiple words.

Step 202, comparing the word segmentation elements of the word sequence in the above steps with an emotion database respectively, rearranging the word sequence according to negative words and degree words to generate an expanded word sequence, wherein due to the rapid development of the current network society, many new words appear to express a certain emotion, in order to avoid missing emotional tendency words, the word elements of the word sequence are compared with the emotion database, the word elements in the emotion database are not changed, for convertible word elements, similar words summarized by the emotion database are used to replace the word elements in a word vector sequence, for example, "chicken ribs" can be converted into "useless", and due to the addition of a negative word to a positive tendency word of Chinese, the word elements are possibly converted into tendency words, and the addition of degree adverbs can improve the emotional tendency value, so in this step, the word elements need to be re-integrated into a new element, and generating an extended word sequence.

In step 202, the step of comparing the word segmentation elements with the emotion database to generate an extended word sequence specifically includes:

then, selecting an emotion database in the corresponding theme field;

In different fields, the emotional tendencies expressed by some vocabularies are very different, so that the theme of a news text is clear, the front and back classification of word segmentation elements of the emotional tendencies is facilitated, the word segmentation elements are subjected to positive or negative emotional classification, and finally, the word vector sequence is divided into a positive emotional vector sequence and a negative emotional vector sequence again according to the positive or negative emotions.

Step 203, counting the number of all word groups which represent emotional tendency in the word elements in the expanded word sequence, namely counting the total number of words which represent positive tendency, the total number of words which represent negative tendency and the total number of words which represent neutral tendency in news or texts, before counting the emotional tendency words, firstly, carrying out vector conversion on the word elements in the expanded word vector sequence by using a word embedding technology to generate word vector sequence words, wherein the word embedding technology is to assign a vector to each element in the expanded word vector sequence, and the vector represents a point in space and words with close meanings, and the vectors of the words are also close, so that the operation on the words can be converted into the operation on the vectors, and is also called as tensor in deep learning.

It is further explained that, for the positive emotional tendency words and the negative emotional tendency words, the vectors after being assigned with the values are very different, and the advantage of representing the words by the vectors is that: firstly, the problem of uneven length of characters can be solved, because if each word has a corresponding word vector, for a text with the length of N, only the vectors represented by the corresponding N words are selected and arranged together according to the sequence of the words in the text, wherein the dimensionality of each word vector is the same; secondly, the words cannot form features, but the tensor is abstract quantification and is calculated through layer-by-layer abstraction of a multilayer neural network; thirdly, the text is composed of words, the characteristics of the text can be combined by the tensors of the words, the tensors of the text contain the combined meanings among a plurality of words, and the tensors can be regarded as characteristic engineering of the text, so that a foundation is provided for machine learning text classification.

And 204, inputting emotion tendency words in the extension word sequence into an emotion judgment model for training to obtain a judgment result, wherein the emotion judgment model adopts a deep convolution neural network principle to judge emotion, and the deep convolution neural network utilizes a ReLU linear rectification function to activate emotion words.

The emotion database in the present embodiment includes a positive emotion dictionary, a negative emotion dictionary, a degree adverb dictionary, and a negative dictionary.

Step 300, judging the secondary emotion of the news text by utilizing an SVM method;

the specific method for performing secondary emotion judgment by adopting the SVM method comprises the following steps:

(1) extracting the emotional tendency characteristic words in the news text according to the sequence, dividing the emotional tendency characteristic words in the news text into three types of positive, neutral and negative, determining a news theme before extracting the news text, performing word segmentation according to a univariate word, a bivariate word, a trigram or a multituple word, and then performing emotional classification on the emotional tendency characteristic words.

(2) Integrating a plurality of characteristic words in a characteristic dictionary by adopting an IG algorithm of the emotional tendency characteristic words, wherein the calculation formula of the IG algorithm is specifically as follows:

IG＝∑P(i)ln(P(i)/Q(i))；

It should be added that the feature dictionary is a varactor model and can store any type of object, and in this embodiment, the feature dictionary may include multiple elements, where each element in the feature dictionary includes a feature word variable and an information gain of a corresponding feature word.

(3) And (2) tf/idf calculation is carried out on the feature words in the feature dictionary, tf/idf values of the feature words are added into an SVM model for training to obtain three types of positive, neutral and negative emotional tendency values, tf/idf is used for calculating weight calculation of the feature words, tf represents word frequency of the feature words and is used for calculating the capacity of the feature words in describing document content, idf is inverse document frequency and is used for calculating the capacity of the feature words in distinguishing documents, and tf/idf values are added into information gains of the feature words through nonlinear transformation, so that the three types of positive, neutral and negative emotional tendency values are obtained.

And step 400, collecting and summarizing positive or negative emotion words in the news text, matching the positive or negative emotion words with the positive or negative emotion database, calculating the proportion of the positive or negative emotion words in the news text, and performing three times of emotion judgment.

In the step, word segmentation sequences are obtained by segmenting words of news or texts, the word segmentation sequences are integrated according to adjacent degree words and negative words to generate extended word sequences, emotion tendency words in the extended word sequences are labeled and classified, negative judgment is made on the corresponding news or texts when corresponding negative emotion words exist in the news and texts, otherwise, non-negative news and texts are obtained, finally, statistical superposition of negative judgment and non-negative judgment is carried out on a plurality of word segmentation sequences, and the emotion tendency of the news text is judged by comparing the proportion of the negative judgment and the non-negative judgment.

In this step, the formula for performing weight calculation on the emotion judgment result is specifically:

E(X)＝∑(p(x)*e(x))；

The weighted average is carried out on the three emotion judgment methods by using the weight method, so that the proportion occupied by the judgment algorithm with low accuracy can be reduced, and the accuracy of emotion judgment is improved.

It needs to be added to explain that the weight of three emotion judgment algorithms is specifically calculated:

That is, the weights of the three emotion judgment methods in the present embodiment are obtained through judgment experiments of a large number of texts, and this method also improves the weight accuracy of each emotion judgment method as much as possible.

It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

Claims

1. A method for judging the emotion of news and texts is characterized by comprising the following steps:

step 200, preliminarily judging the emotion of the news text by adopting a deep learning method;

in step 200, the specific steps of performing preliminary emotion judgment by using a deep learning method are as follows:

then, selecting an emotion database in the corresponding theme field;

finally, matching word segmentation elements in the word sequence with an emotion database respectively, combining degree words and negative words adjacent to the positive or negative emotion tendency words together, replacing word elements in the word sequence with standard words close to the word segmentation elements in the emotion database, and re-integrating the word elements into an extended word sequence;

firstly, comparing word segmentation elements of a word sequence with an emotion database, not changing the word segmentation elements in the emotion database, and replacing the word segmentation elements in a word vector sequence with similar words summarized by the emotion database for convertible word segmentation elements;

step 204, inputting the emotional tendency words in the extended word sequence into an emotion judgment model for training to obtain a judgment result;

2. The method for judging the emotion of news and texts as claimed in claim 1, wherein in step 300, the specific method for performing the secondary emotion judgment by using the SVM method is as follows:

3. The method for judging emotion of news and text according to claim 2, wherein a calculation formula of the IG algorithm is specifically as follows:

IG＝∑P(i)ln(P(i)/Q(i))；

4. The method as claimed in claim 1, wherein the emotion database comprises a positive emotion dictionary, a negative emotion dictionary, a degree adverb dictionary and a negative emotion dictionary.

5. The method as claimed in claim 1, wherein in the step 500, the formula for performing weight calculation on the emotion judgment result is specifically:

E(X)＝∑(p(x)*e(x))；

wherein E (X) is the statistical mathematical expectation of the comprehensive emotional tendency, p (x) is the weight of a single emotional judgment method, and e (x) is the emotional tendency value of the single emotional judgment method.

6. The method for judging the emotion of news and texts as claimed in claim 5, wherein the weights of the three emotion judgment algorithms are specifically calculated by: