CN110990572A

CN110990572A - Emotion analysis method based on theme

Info

Publication number: CN110990572A
Application number: CN201911222894.8A
Authority: CN
Inventors: 林希; 陈增和; 温志刚
Original assignee: Shenzhen Housley Technology Co ltd
Current assignee: Shenzhen Housley Technology Co ltd
Priority date: 2019-12-03
Filing date: 2019-12-03
Publication date: 2020-04-10

Abstract

The invention discloses an emotion analysis method based on themes, which belongs to the field of artificial intelligence and is used for generating semantic expansion data under corresponding themes; performing word decomposition on the semantic expansion data and the synonym table to obtain a word segmentation library, and labeling the meaning expression of each word in the word segmentation library; and combining to generate semantic two-class participles according to the meaning expression, collecting a two-class participle library, and fusing the two-class participle library and the semantic expansion data to obtain semantic augmentation data. Different corpora are collected according to different themes, then augmented data processing is carried out according to the corpora in the corresponding theme field, richer corpora are obtained, the method is more accurate in emotion judgment in the later period, the corpus emotion factors before and after each corpus is judged, better combination is adaptive to the emotion contextual model of a person, better scene combination which accords with emotion analysis is achieved, the emotion of analysis is more accurate, the technical problems that the existing emotion analysis method cannot be used across the field, and the emotion value judged at the same time is not accurate are solved.

Description

Emotion analysis method based on theme

Technical Field

The invention relates to the field of artificial intelligence, in particular to an emotion analysis method based on a theme.

Background

With the popularization of the internet, the lives of people also change greatly. The network gradually becomes a carrier of various information in the society, particularly with the continuous development of the Chinese economy, financial products such as stocks, bonds and the like gradually become hot topics discussed by people, and more people acquire financial, financial and other economic news and related information through the network. Web text has also become an important source for our information acquisition, opinion publication, and emotion communication. More and more people like to communicate their opinions on the web, and thus there is a lot of text information containing tendencies on the web.

The algorithms for emotion mining of web texts can be roughly classified into three types: supervised emotion mining, unsupervised emotion mining, and semi-supervised emotion mining. The (semi-) supervised emotion mining methods generally have higher classification accuracy, but all of them require manually labeled training corpora to train and generate the text emotion classifier, and it is time-consuming and labor-consuming to obtain the manually labeled training corpora. The traditional unsupervised emotion mining method is used for classifying the emotion of a text by utilizing an emotion dictionary, the method has strong dependence on the emotion dictionary, and a good emotion dictionary is difficult to obtain. Therefore, unsupervised emotion classification methods represented by JST, S-LDA, DPLDA and the like are favored in recent years, which not only can effectively avoid the emotion dictionary dependency defect of the conventional unsupervised emotion classification method and achieve a better emotion classification effect, but also can well perform theme mining on texts.

The existing emotion analysis methods are all used for determining an emotion dictionary through transmission and judging emotion values belonging to texts according to the emotion dictionary, the judging method is single and generally cannot be used in multiple fields, judged linguistic data are not complete enough, judged numerical values are not accurate enough, and the like.

Disclosure of Invention

The invention aims to provide an emotion analysis method based on a theme, and the emotion analysis method is used for solving the technical problems that the existing emotion analysis method cannot be used in a cross-domain mode and meanwhile judged emotion values are not accurate.

A topic-based sentiment analysis method, comprising the steps of:

step 1: determining an analyzed theme, and generating a synonym table based on a corresponding theme according to the existing internet theme corpus;

step 2: generating semantic expansion data under corresponding topics;

and step 3: performing word decomposition on the semantic expansion data and the synonym table to obtain a word segmentation library, and labeling the meaning expression of each word in the word segmentation library;

and 4, step 4: generating semantic second-class participles according to meaning expression combinations, collecting to obtain a second-class participle library, and fusing the second-class participle library with semantic expansion data to obtain semantic augmentation data;

and 5: inputting semantic augmentation data into a neural network model for training to obtain a word vector model;

step 6: endowing the words in the semantic augmentation data with emotion numerical values, decomposing each word by taking the word as a unit, endowing the words with multidirectional emotion numerical values of each word, and collecting the emotion numerical values of the words and the emotion values of the single words of the words to obtain an emotion judging library;

and 7: inputting a sentence text to be analyzed into a word vector model to obtain a word vector;

and 8: inputting the word vector into an emotion judging library to obtain a first emotion value based on the word vector;

and step 9: decomposing the word vector by words to obtain a single word set, and inputting the single word set into an emotion judgment library to obtain a second emotion value based on the words;

step 10: and calculating the first emotion value and the second emotion value to be emotion values of the input text.

Further, the specific process in step 1 is as follows:

manually inputting and determining a theme field which corresponds to more texts to be analyzed, and acquiring a basic text corpus D in the theme field through the Internet;

performing word segmentation on the corpus D by using a word segmentation tool, wherein the full-use size is 5, and the step length is 2 windows to obtain binary linguistic training data;

carrying out Word2Vec model training on binary linguistic training data to obtain Word vector representation;

calculating the residual value of an included angle between every two word vectors vi and vj to serve as the similarity of two words, and obtaining a similarity measurement matrix; the specific calculation formula is as follows:

and obtaining 3 words which are most adjacent to the word vi through measurement, namely 3 synonyms of the word vi to obtain a synonym table of the warning situation field.

Further, the specific process of step 2 is as follows:

randomly selecting a corpus to be input, and judging whether the number n of the corpus is more than or equal to 1000;

if n is less than 1000, directly sampling and outputting the corpus, and if n is more than or equal to 1000, executing the next step;

performing word segmentation on an input corpus to obtain a word segmentation table of the corpus words;

generating a random variable N in [ A, B, C, D, E ] according to equal probability, and if N is equal to A, generating a new corpus by adopting a synonym substitution method for 3 words in a participle table of the corpus situation; if N is equal to B, finding a random synonym of the random word in the sentence, and inserting the synonym into a random position in the sentence to generate a new corpus; if N is equal to C, two word exchange positions in the participle table are randomly selected to generate a new corpus; if N ═ D, E words in the word segmentation table are deleted randomly to generate a new corpus; and if N is 4, directly outputting the corpus.

Further, the specific process of step 3 is as follows:

and the semantic expansion data and the synonym table perform semantic word segmentation according to the Chinese dictionary, collect all the segmented words, find out the specific meanings of the semantic segmented words from the Chinese dictionary, and label the specific meanings behind each semantic segmented word to form a one-to-one mapping relation.

Further, the specific process of step 4 is as follows:

searching vocabularies similar to or identical to the meanings from the meanings of the original semantic segmentation words, then collecting the vocabularies to obtain a two-class segmentation word bank, and when the two-class segmentation word bank is fused with the semantic expansion data, the vocabularies similar to or identical to the meanings are combined together and sequenced according to the emotion degrees to obtain the semantic expansion data.

Further, the specific process in step 6 is as follows:

the words in the semantic augmentation data are endowed with emotion numerical values by using an emotion dictionary text matching algorithm, then each word is decomposed, the same words in all the words are gathered into one word, the same word has multidirectional emotion values, the multidirectional emotion values are the same as those of the word where the word is located, and then the words are averaged to obtain the emotion values of the single words.

Further, the specific process of step 8 is as follows:

judging the part of speech of a word, if the part of speech of the word is positive, detecting front and back words, judging that the emotion numerical value of the word is multiplied by the weight of a degree adverb when the front word is the degree adverb, subtracting one from the emotion numerical value of the word when the front word is negative or negative, subtracting one from the emotion numerical value of the word when the back word is negative, and adding the weight of the front word to the emotion numerical value of the word when the front and back words are other parts of speech;

if the part of speech is a negative word, detecting a previous word, if the previous word is a degree adverb, multiplying the emotion value of the word by the weight of the degree adverb, if the previous word is a negative word, subtracting the weight of the previous word from the emotion value of the word, if the previous word is other parts of speech, and outputting the first emotion value of the word.

Further, the specific process of step 9 is as follows: multiplying the emotion value of the single character of each character in the word by the word weight to obtain the emotion value of the word of each character, and adding the emotion values of the words of each character to obtain a second emotion value, wherein the proportion of the word weight is 7:3 when the action characters are combined with the ranks, and the proportion of the degree characters is 2:8 when the action characters are combined with the action characters.

Further, the specific process calculated in step 10 is as follows: the following formula is used for calculation,

K＝A*tanh((Q-B)*(V_a-V_c))-A·tanh((H-B)*(V_a-V_d))

wherein K represents the weight of the first emotion value, a ═ 0.8, B ═ 15, Q ═ a ═ 23, H ═ a ═ 40, V_a＝1.3，V_c＝0.95，V_d＝1.05；

V_final＝(1-K)*V_d+K*V_a；

V_finalIs an emotional value.

By adopting the technical scheme, the invention has the following technical effects:

according to the emotion analysis method, different corpora are collected according to different themes, and then augmented data processing is carried out according to the corpora in the corresponding theme field, so that richer corpora are obtained, emotion judgment in the later period is more accurate, the emotion factors of the corpora before and after are considered when each corpus is judged, better combination is adaptive to the emotion contextual model of a human, better context combination according with emotion analysis is achieved, the analyzed emotion is more accurate, and the technical problems that the existing emotion analysis method cannot be used across fields and the judged emotion value is inaccurate are solved.

Drawings

FIG. 1 is a flow chart of the method of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, preferred embodiments are given and the present invention is described in further detail. It should be noted, however, that the numerous details set forth in the description are merely for the purpose of providing the reader with a thorough understanding of one or more aspects of the present invention, which may be practiced without these specific details.

As shown in FIG. 1, the emotion analysis method based on subject of the present invention includes the following steps:

step 1: and determining the analyzed theme, and generating a synonym table based on the corresponding theme according to the existing Internet theme corpus.

In practical application, the corpus of the topics of each field is generally generated into a synonym table under the corresponding topic in advance to be used as storage for standby, and meanwhile, updating is carried out at each fixed time, and the updating time is generally 3 days.

Manually inputting and determining a theme field corresponding to more texts to be analyzed, and acquiring a basic text corpus D in the theme field through the Internet.

And (3) performing word segmentation on the corpus D by using a word segmentation tool, wherein the full-use size is 5, and the step length is 2 windows to obtain binary linguistic training data.

Step 2: and generating semantic expansion data under the corresponding theme. Randomly selecting a corpus to input, and judging whether the corpus number n is more than or equal to 1000. The number of the linguistic data can be increased or decreased, and generally, the analyzed answers are more accurate when the number of the linguistic data reaches 1000.

If n is less than 1000, directly sampling and outputting the corpus, and if n is more than or equal to 1000, executing the next step.

And performing word segmentation on the input corpus to obtain a word segmentation table of the corpus words.

And step 3: and performing word decomposition on the semantic expansion data and the synonym table to obtain a word segmentation library, and labeling the meaning expression of each word in the word segmentation library. And the semantic expansion data and the synonym table perform semantic word segmentation according to the Chinese dictionary, collect all the segmented words, find out the specific meanings of the semantic segmented words from the Chinese dictionary, and label the specific meanings behind each semantic segmented word to form a one-to-one mapping relation.

And 4, step 4: and combining to generate semantic second-class participles according to the meaning expression, collecting to obtain a second-class participle library, and fusing the second-class participle library and the semantic expansion data to obtain semantic augmentation data. Searching vocabularies similar to or identical to the meanings from the meanings of the original semantic segmentation words, then collecting the vocabularies to obtain a two-class segmentation word bank, and when the two-class segmentation word bank is fused with the semantic expansion data, the vocabularies similar to or identical to the meanings are combined together and sequenced according to the emotion degrees to obtain the semantic expansion data.

And 5: and inputting the semantic augmentation data into the neural network model for training to obtain a word vector model.

The basic building blocks of neural networks are neurons, and the mathematical neuron models correspond to biological nerve cells. In other words, the artificial neural network theory describes biological cells in the objective world with an abstract mathematical model of neurons.

It is obvious that the nerve cells of the living beings are the material basis and the source of the birth and formation of the neural network theory. Thus, the mathematical description of neurons must be based on the objective behavioral characteristics of biological nerve cells. Therefore, it is very important and necessary to know the behavior of biological nerve cells.

The topology of neural networks is also based on the manner in which neural cells interconnect in biological anatomy. It is also important to disclose the interaction condition of nerve cells.

Neurons are the basic elements of neural networks. Only understanding the neurons can recognize the nature of the neural network. In this section, the biological anatomy of neurons, the way information is processed and transmitted, the functioning and mathematical models thereof are described.

Step 6: and assigning emotion numerical values to the words in the semantic augmentation data, decomposing each word by taking the word as a unit, assigning the multidirectional emotion numerical values to each word, and collecting the emotion numerical values of the words and the emotion values of the single words of the words to obtain an emotion judgment library.

And 7: and inputting the sentence text to be analyzed into the word vector model to obtain a word vector. The sentence text is obtained by directly grabbing data from a corresponding comment platform by using a computer grabbing technology and then inputting the grabbed data into a word vector model, wherein when what subject is grabbed, the corresponding platform belongs to what main pushing field is generally defined or labeled manually.

And 8: the word vector is input into an emotion judging library to obtain a first emotion value based on the word vector.

Judging the part of speech of the word, if the part of speech of the word is positive, detecting the front word and the rear word, judging that the emotion numerical value of the word is multiplied by the weight of the degree adverb when the front word is the negative word or the negative word, subtracting one from the emotion numerical value of the word when the front word is the negative word, subtracting one from the emotion numerical value of the word when the rear word is the negative word, and adding the weight of the front word to the emotion numerical value of the word when the front word and the rear word are other parts of speech.

And step 9: and decomposing the word vector by words to obtain a single word set, and inputting the single word set into an emotion judgment library to obtain a second emotion value based on the words. Multiplying the emotion value of the single character of each character in the word by the word weight to obtain the emotion value of the word of each character, and adding the emotion values of the words of each character to obtain a second emotion value, wherein the proportion of the word weight is 7:3 when the action characters are combined with the ranks, and the proportion of the degree characters is 2:8 when the action characters are combined with the action characters. For example, after the word "happy" is decomposed into words, the emotion value of the word "happy" accounts for seventy percent and the emotion value of the word "happy" accounts for thirty percent. For example, after the word "very" is decomposed into words, the emotion value of the word "very" accounts for twenty percent and the emotion value of the word "very" accounts for eighty percent.

The following formula is used for calculation,

K＝A*tanh((Q-B)*(V_a-V_c))-A·tanh((H-B)*(V_a-V_d))

V_final＝(1-K)*V_d+K*V_a。

V_finalAs an emotional value, V_finalThe emotion value is the emotion value to be analyzed, and is neutral when the emotion value is 1, namely, the attitude is not very reflected, and the emotion value is a real-time medium comment. For example, to illustrate the location of a restaurant, i.e., a neutral illustration, when no comments are made, a score greater than 1 is good and proportional, and a score less than 1 is negatively and inversely proportional.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that those skilled in the art can make various improvements and modifications without departing from the principle of the present invention, and these improvements and modifications should also be construed as the protection scope of the present invention.

Claims

1. A sentiment analysis method based on a theme is characterized by comprising the following steps:

step 2: generating semantic expansion data under corresponding topics;

2. The emotion analysis method based on subject matter as claimed in claim 1, wherein: the specific process in the step 1 is as follows:

3. The emotion analysis method based on subject matter as claimed in claim 1, wherein: the specific process of the step 2 is as follows:

4. The emotion analysis method based on subject matter as claimed in claim 1, wherein: the specific process of the step 3 is as follows:

5. The emotion analysis method based on subject matter as claimed in claim 1, wherein: the specific process of the step 4 is as follows:

6. The emotion analysis method based on subject matter as claimed in claim 1, wherein: the specific process in the step 6 is as follows:

7. The emotion analysis method based on subject matter as claimed in claim 1, wherein: the specific process of the step 8 is as follows:

8. The emotion analysis method based on subject matter as claimed in claim 1, wherein: the specific process of the step 9 is as follows: multiplying the emotion value of the single character of each character in the word by the word weight to obtain the emotion value of the word of each character, and adding the emotion values of the words of each character to obtain a second emotion value, wherein the proportion of the word weight is 7:3 when the action characters are combined with the ranks, and the proportion of the degree characters is 2:8 when the action characters are combined with the action characters.

9. The emotion analysis method based on subject matter as claimed in claim 1, wherein: the specific process calculated in the step 10 is as follows: the following formula is used for calculation,

K＝A*tanh((Q-B)*(V_a-V_c))-A·tanh((H-B)*(V_a-V_d))

V_final＝(1-K)*V_d+K*V_a；

V_finalIs an emotional value.