CN112507071A - Network platform short text mixed emotion classification method based on novel emotion dictionary - Google Patents

Network platform short text mixed emotion classification method based on novel emotion dictionary Download PDF

Info

Publication number
CN112507071A
CN112507071A CN202011408818.9A CN202011408818A CN112507071A CN 112507071 A CN112507071 A CN 112507071A CN 202011408818 A CN202011408818 A CN 202011408818A CN 112507071 A CN112507071 A CN 112507071A
Authority
CN
China
Prior art keywords
emotion
word
dictionary
network platform
weight
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011408818.9A
Other languages
Chinese (zh)
Other versions
CN112507071B (en
Inventor
徐小龙
黄寄
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202011408818.9A priority Critical patent/CN112507071B/en
Publication of CN112507071A publication Critical patent/CN112507071A/en
Application granted granted Critical
Publication of CN112507071B publication Critical patent/CN112507071B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Abstract

The invention provides a network platform short text mixed emotion classification method based on a novel emotion dictionary, which is used for performing word segmentation processing on texts in samples marked with emotions; counting the word frequency of each word in all samples of a certain emotion; calculating the emotion weight corresponding to each word by using the word frequency of each word; recording the emotion weight of each word to obtain a novel emotion dictionary; carrying out emotion weight calculation on a sample to be classified by using the novel emotion dictionary to obtain an emotion vector containing the emotion weight of each emotion; inputting the emotion vector into a deep learning model as the characteristic representation of an input layer of the deep learning model; and finally obtaining a mixed emotion classification result. According to the method, the emotion in the short text of the network platform is concentrated into a short emotion vector, so that the training speed of realizing mixed emotion classification by deep learning is improved, and the memory occupation of the model during training is reduced.

Description

Network platform short text mixed emotion classification method based on novel emotion dictionary
Technical Field
The invention relates to a network platform short text mixed emotion classification method based on a novel emotion dictionary, and belongs to the field of artificial intelligence.
Background
The short texts of the network platform mainly include: comments under news software, forum comments, blog comments, chat room content, and the like. These web platforms generate a large amount of text, and web texts in various forms have become a channel for human to receive information and a means for emotional communication.
Emotion is an important component of human beings, and affects human behavior, thinking, decision making, and social interaction, and emotion calculation is the task for computers to have the ability to recognize, understand, and express human emotions. The emotion calculation is divided into three parts of recognition, expression and decision, wherein the recognition is how to enable a computer to accurately recognize human emotions and eliminate uncertainty and ambiguity of natural language, and a recognized carrier can be generally divided into characters, voice, expressions, postures and the like; the expression means how to express the abstract emotion by an information carrier which can be intuitively understood by human beings; decision making refers to how to make better decisions with the emotion mechanism. The emotion calculation can calculate the emotional tendency of the object with emotion, and after the emotional tendency of the object with emotion is obtained, the behavior or the state of the object with emotion can be further predicted according to the emotional tendency, for example, the object with emotion is personalized and recommended by the aid of the emotional tendency or the attention of the object with emotion is predicted.
The emotion classification is recognition in emotion calculation, and the main task of the emotion classification is to recognize and classify emotion expressed in words, sentences or documents to obtain the emotion polarity of the words, the sentences or the documents, so that the emotion in the documents or the sentences is recognized. Currently, the main emotion classification methods are mainly classified into the following three categories: the first category is emotion classification methods based on an emotion dictionary. The method based on the emotion dictionary mainly comprises the steps of segmenting a text, finding out words with different parts of speech and calculating corresponding scores of the words. The method is very dependent on an emotion dictionary, and the manual construction of the emotion dictionary is very complicated, such as an improved WordNet dictionary; the second category is machine learning methods based on artificially extracted features. The machine learning method based on the artificial extraction features needs a large amount of data marked in advance, needs people to find out corresponding features, and then uses machine learning methods such as a support vector machine, naive Bayes and the like to classify the emotion, such as an improved KNN model; the third category is emotion classification methods based on deep learning. The emotion analysis method based on deep learning also needs a large amount of data marked in advance, but does not need people to find out corresponding features, the deep learning model can automatically extract the features in the data, and the more common models comprise a cyclic neural network, a convolutional neural network and the like. However, the above methods all generally have the following problems:
1. most research is only classified according to binary (positive and negative) or ternary emotions, but documents often contain more than one emotion, and multiple emotions can be contained in the same document at the same time.
2. The method based on machine learning needs manual feature extraction, and the effect of the final model often depends on the quality of the manually extracted features, so that automatic operation cannot be realized, and the method is difficult to obtain.
3. The method based on deep learning needs to perform feature representation on a document, but because the text length is too long, the dimension of feature representation is huge, so that the problems of slow training time and large memory occupation are caused.
In view of the above, it is necessary to provide a short text mixed emotion classification method for a network platform based on a novel emotion dictionary to solve the above problems.
Disclosure of Invention
The invention aims to provide a network platform short text mixed emotion classification method based on a novel emotion dictionary.
In order to achieve the purpose, the invention provides a network platform short text mixed emotion classification method based on a novel emotion dictionary, which comprises the following steps:
step 1, carrying out artificial emotion marking on the collected short texts of the historical network platform, and using the short texts as a training set;
step 2, performing word segmentation processing on each sample in the training set, then calculating the word frequency of each word under each emotion, calculating the emotion weight of each word by using the word frequency of each word under each emotion, and storing each word and the emotion weight thereof in a dictionary in a key value pair mode to form an emotion dictionary;
step 3, accumulating the emotion weight under a certain emotion of all participles of each sample to obtain the sum of the emotion weight of the certain emotion of the sample, and combining the sum of the emotion weight of each emotion of the sample to form an emotion vector of a training set;
step 4, taking the emotion vectors of the training set as the feature representation of the input layer, and using the emotion vectors for the training of the DNN mixed emotion classification model to obtain a trained DNN model for mixed emotion classification;
step 5, performing word segmentation on the new network platform short text, searching each word after word segmentation in an emotion dictionary to obtain emotion weight corresponding to each word, respectively summing the emotion weights of all words under each emotion to obtain the sum of the emotion weights of the new network platform short text under each emotion, and combining the sum to form an emotion vector;
and 6, inputting the emotion vectors formed by the new network platform short texts into the trained DNN model to obtain vectors containing probability values of each emotion, judging the two maximum probability values and a set threshold value, if the two maximum probability values are larger than or equal to the threshold value, indicating that the new network platform short texts contain the corresponding emotions, otherwise, not containing the corresponding emotions.
As a further improvement of the present invention, the step 2 specifically includes:
step 21, classifying samples of different emotion labels in the training set to obtain sample sets under n classes of emotion labels, wherein each set is Si(i is more than or equal to 1 and less than or equal to n); setting i to 1, setting an all _ words set to record all the words that appear, setting the all _ words set to null, and trainingThe number of the refining total samples is N;
step 22, setting wordsiThe set is null, and the kth sample of the ith type emotion label training set is set as
Figure RE-GDA0002930866120000041
Set k equal to 1, set countiSetting count for counter of ith type emotion label training seti=1;
Step 23, mixing
Figure RE-GDA0002930866120000042
The text part is divided into words, and the results after word division are stored in wordsiThe set is stored in an all _ words set;
step 24, let k equal to k +1, counti=counti+1, repeating the step 23 until k equals to the total number of samples in the ith type emotion label training set;
step 25, making i equal to i +1, and repeating the steps 22 to 24 until i equal to n;
step 26, extracting the word w from the all _ words set (not putting back), and counting the word w in the wordsi(1. ltoreq. i. ltoreq.n)
Figure RE-GDA0002930866120000043
Called word frequency, nwIs not 0
Figure RE-GDA0002930866120000044
The number of (2);
step 27, calculating the emotion weight of the word w under the ith type emotion label
Figure RE-GDA0002930866120000045
And the emotional weight of the word w under the nth type of emotion;
step 28, if the all _ words set is not empty, then go to step 26;
step 29, the emotion weight of the word w under the ith emotion is expressed as w:
Figure RE-GDA0002930866120000046
the form of the key-value pair is stored in the dictionary weightiFinally, n dictionaries are obtained, the n dictionaries are classified into an emotion dictionary, and each weightiAn emotion page called emotion dictionary.
As a further improvement of the invention, the weight of the word w in the step 27 under the i-th type emotion label
Figure RE-GDA0002930866120000051
The calculation formula of (2) is as follows:
Figure RE-GDA0002930866120000052
as a further improvement of the present invention, the step 3 specifically includes:
step 31, let the kth sample of the ith type emotion label training set be
Figure RE-GDA0002930866120000053
Setting k to 1;
step 32, mixing
Figure RE-GDA0002930866120000054
The text part of the search result is segmented, and each segmented word w is used as each emotion page weight for inquiring the emotion dictionary obtained in the step 2iObtaining the value corresponding to the key w
Figure RE-GDA0002930866120000055
Then calculate out
Figure RE-GDA0002930866120000056
Sentiment score of the ith sentiment of (1)
Figure RE-GDA0002930866120000057
And combining the emotion scores of all emotion types to form an emotion vector of a training set
Figure RE-GDA0002930866120000058
As a further improvement of the invention, the sentiment score in step 32 is
Figure RE-GDA0002930866120000059
The calculation formula of (2) is as follows:
Figure RE-GDA00029308661200000510
as a further improvement of the present invention, the step 5 specifically includes:
step 51, defining the new short text of the network platform as a test set sample, wherein the r-th sample in the test set sample is testrTest will berPerforming word segmentation processing on the text to obtain a sample testrWord sets after word segmentation;
step 52, taking each word in the word set as a key to inquire the emotion weight corresponding to each word in the emotion dictionary obtained in step 2, and then using the formula:
Figure RE-GDA00029308661200000511
obtaining a test sample testrSentiment score of the ith category
Figure RE-GDA00029308661200000512
Then combining the emotion scores of each emotion to form an emotion vector
Figure RE-GDA0002930866120000061
As a further improvement of the present invention, the step 6 specifically includes:
step 61, converting V obtained in step 5 into VrInputting the DNN model which is trained, and obtaining a vector V containing probability values of each emotionp=(p1,......,pn) Wherein p isi(1. ltoreq. i. ltoreq.n) as a test sample testrProbability values comprising the ith emotion;
step 62, adding Vp=(p1,......,pn) The highest and the second probability valueTwo large emotional probability values are extracted and are set as pjAnd pl
Step 63, setting emotion threshold value P and judging PjAnd plAnd P, the judgment rule is as follows:
if p isjIs not less than P and PlIf not less than P, the test sample test is indicatedrThe j-th emotion and the l-th emotion are contained;
if p isjIs not less than P and PlIf < P, the test sample test is indicatedrThe j type emotion is contained;
if p isj< P and PlIf not less than P, the test sample test is indicatedrThe I type emotion is contained;
if p isj< P and PlIf < P, the test sample test is indicatedrWithout emotion.
The invention has the beneficial effects that: firstly, when emotion classification is realized by using deep learning in the past, a certain text is generally classified into one class of emotion, but the emotion contained in the text is often more than one, the emotion contained in the text can have emotion combination, and the task of classifying various emotions contained in the text is called a mixed emotion classification task; secondly, when the emotion classification task is solved by using deep learning, the text to be classified needs to be subjected to feature representation, the existing feature representation method usually represents the text into a vector with huge dimensionality, so that a large amount of time and memory are consumed during text feature representation and deep learning training.
Drawings
FIG. 1 is a flow chart of a mixed emotion classification method of a network platform short text based on a novel emotion dictionary.
FIG. 2 is a coordinate axis diagram of the mixed emotion classification of the present invention.
FIG. 3 is a schematic diagram of the structure of FIG. 1 when the text is converted into an emotion vector.
Fig. 4 is a schematic structural diagram of the DNN model in fig. 1.
Fig. 5 is a schematic diagram showing the structure of the emotion classification determination in fig. 1.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings and specific embodiments.
The invention realizes the fast and high-efficiency mixed emotion classification task in the short text environment of the network platform, improves the efficiency of the mixed emotion classification task, and reduces the time cost and the memory cost. The invention provides a novel emotion dictionary, and a mixed emotion classification task is completed by combining the novel emotion dictionary with a deep learning method.
The network platform generates massive text data every day, and the text data contains a large amount of emotions. In the past, when deep learning is used for emotion classification, a certain text is generally classified into a class of emotions, but due to the characteristics of human emotions, the number of emotions contained in the text is often more than one, the emotions contained in the text may have emotion combinations, and the emotion combination mode is shown as emotion mixture of each quadrant in fig. 2. The task of classifying a plurality of emotions contained in the text is called a mixed emotion classification task, and the mixed emotion classification task can be solved by using a multi-label classification deep learning model. And manually marking the emotion contained in each comment text by the collected comment data of the network platform, and taking the comment data as a training set.
And performing word segmentation processing on each comment text in the training set to obtain a set of all words in the training set and a set of all words of each type of emotion. Calculating the word frequency of all words in a word set in a training set under each emotion, then calculating the emotion weight of the word by using the word frequency, storing the word and the emotion weight of the word in an emotion dictionary in a key value pair mode, combining the sum of the emotion weights of each emotion of each sample to form an emotion vector of the training set, and inputting the emotion vector into a DNN model for training by using the emotion vector as a feature representation to obtain a trained mixed emotion classifier.
The method comprises the steps of performing word segmentation on an obtained new network platform text, then placing all obtained words as keys in a constructed emotion dictionary to query the values of the words, obtaining emotion weights of all words of the text under different emotions, then summing the emotion weights under each emotion to obtain the emotion weight sum under the emotion, then combining all emotion weight sums to form an emotion vector, inputting the emotion vector into a trained mixed emotion classifier to obtain a predicted vector of probability values of each emotion, then comparing two maximum probability values in the emotion vector with a set threshold value, if the probability values are larger than or equal to the threshold value, indicating that the new network platform text contains the corresponding emotion, and otherwise, not containing the corresponding emotion.
To facilitate understanding of the technical solution of the present invention, some concepts are defined below:
definition 1 mixed emotion classification: the emotion calculation task is used for extracting two or more emotion types in a text according to the emotion contained in the text or a sentence. For example, a text may contain two emotions of anger and fear, and the emotions classified by the classifier should be "anger" and "fear".
Definition 2 feature representation: network platform short text is intended to be recognized by a computer and the text must be characterized in a format that can be recognized by the computer. The feature representation oriented by the invention is an emotion vector of a training set constructed by utilizing a novel emotion dictionary, and the emotion vector is used as feature representation.
Defining 3 emotion dictionaries: the emotion dictionary is a hash set of words or phrases labeled with emotional tendency, and generally consists of word-emotional tendency or phrase-emotional tendency key value pairs. The emotional tendency value can be a group of discrete values, such as positive emotion, neutral emotion and negative emotion, or a group of continuous values, such as values in the range of-1 to 1, wherein the value larger than 0 is regarded as the positive emotion, and the value smaller than 0 is regarded as the negative emotion.
The construction method of the emotion dictionary can be divided into an emotion dictionary construction method based on a dictionary and an emotion dictionary construction method based on a corpus. The construction method of the emotion dictionary based on the dictionary is generally based on the improvement or the expansion of the original emotion dictionary; the method for constructing the emotion dictionary based on the corpus is to perform some processing on words in a corpus to obtain the emotional tendency of the words. General processing methods include calculation of emotional tendency using syntactic relations of words, labeling of emotional tendency using machine learning or deep learning, and the like.
The emotion dictionary used by the invention is characterized in that the emotion weight of each word under each emotion is calculated by utilizing the word frequency of each word under each emotion, a key value pair consisting of the word and the emotion weight of a word under a certain emotion is stored in the dictionary, the emotion weight of each emotion of the word of a training sample is calculated and stored in the dictionary to form the emotion dictionary.
Definition of 4-participles: the Chinese word segmentation is to use a continuous text character string, which can be a continuous long text such as a paragraph and an article, and can also be a continuous short text such as a sentence and a phrase. And recombining and decomposing according to a certain standard to enable the substrings to become substrings with characters and words as basic units, thereby facilitating subsequent processing and analysis work. Since the short text of the network platform is an 'unstructured data', it needs to be converted into 'structured data' by the Chinese word segmentation method.
The word segmentation method based on the prefix dictionary is adopted in the word segmentation. Constructing a prefix dictionary by using words and word frequencies in a statistical dictionary of a known network, traversing each word in the statistical dictionary, traversing a prefix of each word, adding the prefix into the prefix dictionary to obtain the word frequency of the prefix of each word, then generating a directed acyclic graph formed by all possible word conditions in a sentence to be participled, finding a path with the maximum word frequency to obtain a maximum segmentation combination based on the word frequency, wherein the combination is a word segmentation result.
Define 5 emotion weights: emotional weight of a word w under the i-th emotion
Figure RE-GDA0002930866120000091
The word represents the situationEmotional intensity of feeling. The calculation formula is as follows:
Figure RE-GDA0002930866120000101
wherein
Figure RE-GDA0002930866120000102
The word frequency of the word w under the ith type of emotion is shown; countiThe total number of samples of the ith emotion in the training set; n is the total number of samples in the training set; n iswIs the sample of how many emotions the word appears in.
Define 6 emotion vectors: the emotion vector is an n-dimensional vector, and n is the number of emotion types to be classified. Each dimension of the vector represents the sum of the emotion weights at the emotion for all words in the sample. The vector may be used as a feature representation for deep learning.
The method of the invention is used for carrying out mixed emotion classification on the short text on the network platform, and obtaining the emotion vector by using the novel emotion dictionary. The emotion vectors are used as feature representation input of the deep learning model, so that time consumption and memory occupation in the deep learning training process are greatly reduced.
As shown in fig. 1, for example, a comment short text of a microblog network platform is used to solve the problem of mixed sentiment classification of comments on a microblog, and the specific operation steps are as follows:
step 1: firstly, carrying out artificial emotion marking on collected microblog comment samples, namely historical network platform short texts, and then using the data samples as a training set.
Step 2: as shown in fig. 3, each sample in the training set is subjected to word segmentation, then the word frequency of each word under each emotion is calculated, the emotion weight of the word is calculated by using the word frequency of the word under each emotion, and each word and the emotion weight thereof are stored in a dictionary in a key-value pair manner to form an emotion dictionary, and the specific implementation method is as follows:
step 21: classifying samples of different emotion labels of the training set to obtain nSample sets under emotion-like labels, each set being called Si(i is more than or equal to 1 and less than or equal to N), i is the ith emotion, i is set to 1, an all _ words set is set for recording all appeared words, the all _ words set is set to be empty, and the number of training lumped samples is set to be N;
step 22: let all _ words set equal to null for recording all words appearing in training set, let kth sample of i-th class emotion label training set be
Figure RE-GDA0002930866120000111
Set k equal to 1, set countiSetting count for counter of ith type emotion label training seti=1;
Step 23: order wordsiThe collection is empty, all texts in the sample of the ith emotion are subjected to word segmentation, and word segmentation results are stored in wordsiThe set is also stored in an all _ words set;
step 24: let k equal k +1, counti=counti+1, repeating the step 23 until k equals to the total number of samples in the ith type emotion label training set;
step 25: repeating the steps 22 to 24 until i is equal to n;
step 26: retrieving words w from the all _ words set without putting back, counting words w in wordsi(1. ltoreq. i. ltoreq.n)
Figure RE-GDA0002930866120000112
Called word frequency, nwIs not 0
Figure RE-GDA0002930866120000113
The number of (2);
step 27: calculating the emotion weight of the word w under the ith type of emotion label
Figure RE-GDA0002930866120000114
And the emotional weight of the word under the n-th type emotion; weight of word w under type i emotion label
Figure RE-GDA0002930866120000115
The calculation formula of (2) is as follows:
Figure RE-GDA0002930866120000116
step 28: if the all _ words set is not empty, then go to step 26;
step 29: and (3) adding the emotion weight of the word w under the ith type of emotion to w:
Figure RE-GDA0002930866120000117
the form of the key-value pair is stored in the dictionary weightiFinally, n dictionaries are obtained. Grouping n dictionaries into an emotion dictionary, each weightiAn emotion page called emotion dictionary.
And step 3: as shown in FIG. 3, the emotion weights of all participles of each sample under a certain emotion are accumulated to obtain the sum of the emotion weights of the certain emotion of the sample, and then the sum of the emotion weights of each emotion of the sample is combined to form an emotion vector of the training set.
The specific implementation method of the step 3 is as follows:
step 31: let the kth sample of the ith type emotion label training set be
Figure RE-GDA0002930866120000121
Setting k to 1; step 32: will be provided with
Figure RE-GDA0002930866120000122
Then, each word w after word segmentation is used as each emotion page weight for inquiring the emotion dictionary constructed in the step 2iObtaining the value corresponding to the key w
Figure RE-GDA0002930866120000123
Then, the current sample is calculated by using the following formula
Figure RE-GDA0002930866120000124
Sentiment score of the ith sentiment of (1)
Figure RE-GDA0002930866120000125
Figure RE-GDA0002930866120000126
And combining the emotion scores of each class to form an emotion vector of a training set
Figure RE-GDA0002930866120000127
And 4, step 4: as shown in fig. 4, the emotion vectors of the training set obtained in step 3 are used as feature representations of the input layer for training the DNN mixed emotion classification model, so as to obtain a trained DNN model for mixed emotion classification.
And 5: and performing word segmentation on the new network platform short text, searching each word after word segmentation in an emotion dictionary to obtain corresponding emotion weight, summing the emotion weights of all words under each emotion respectively to obtain the sum of the emotion weights of the short text under each emotion, and combining the sum to form an emotion vector.
The concrete implementation method of the step 5 is as follows:
step 51: setting the short text of the new network platform as a test set sample, and setting the r-th sample in the test set as testrTest will berPerforming word segmentation processing on the text to obtain a sample testrWord sets after word segmentation;
step 52: traversing the word set, and using each traversed word w as a key to inquire the weight of each dictionary page in the emotion dictionaryi(1. ltoreq. i. ltoreq. n)
Figure RE-GDA0002930866120000128
And calculating the emotion score in each type of emotion by using the following formula:
Figure RE-GDA0002930866120000129
obtaining a test sample testrSentiment score of the ith category
Figure RE-GDA0002930866120000131
Then combining the obtained emotion scores under each emotion into an emotion vector
Figure RE-GDA0002930866120000132
Step 6: as shown in fig. 5, the emotion vector is input into the trained DNN model to obtain a vector including a probability value of each emotion, the maximum two probability values are determined with a set threshold, if the maximum two probability values are greater than or equal to the threshold, it is indicated that the sample includes the emotion, otherwise, the sample does not include the emotion.
The step 6 is realized by the following specific method:
step 61: subjecting the product obtained in step 5
Figure RE-GDA0002930866120000133
Inputting the trained DNN model to obtain a vector V containing the probability value of each emotionp=(p1,......,pn) Wherein p isi(1. ltoreq. i. ltoreq.n) as a test sample testrProbability values comprising the ith emotion;
step 62: will Vp=(p1,......,pn) The two emotion probability values with the maximum middle probability value and the second highest middle probability value are taken out and are set as pjAnd pl
And step 63: setting an emotion threshold value P, and judging PjAnd plAnd P, the judgment rule is as follows:
if p isjIs not less than P and PlIf not less than P, the test sample test is indicatedrThe j-th emotion and the l-th emotion are contained;
if p isjIs not less than P and PlIf < P, the test sample test is indicatedrThe j type emotion is contained;
if p isj< P and PlIf not less than P, the test sample test is indicatedrContains the I class of emotionFeeling;
if p isj< P and PlIf < P, the test sample test is indicatedrWithout emotion.
In summary, the invention provides a network platform short text mixed emotion classification method based on a novel emotion dictionary, which includes the steps of manually marking collected comment data of a network platform, using the comment data as a training set, processing the comment data in the training set and storing the processed comment data in an emotion dictionary, forming emotion vectors of the training set from the processed data, and inputting the emotion vectors of the training set as feature representation into a DNN model for training to obtain a trained mixed emotion classifier; the method comprises the steps of performing word segmentation on a network platform text to be detected, then putting the network platform text into an established emotion dictionary for query processing to form an emotion vector of the text to be detected, and finally inputting the emotion vector of the text to be detected into a trained mixed emotion classifier for processing to obtain how the emotion condition is contained in the text to be detected.
Although the present invention has been described in detail with reference to the preferred embodiments, it will be understood by those skilled in the art that various changes may be made and equivalents may be substituted for elements thereof without departing from the spirit and scope of the present invention.

Claims (7)

1. A network platform short text mixed emotion classification method based on a novel emotion dictionary is characterized by comprising the following steps:
step 1, carrying out artificial emotion marking on the collected short texts of the historical network platform, and using the short texts as a training set;
step 2, performing word segmentation processing on each sample in the training set, then calculating the word frequency of each word under each emotion, calculating the emotion weight of each word by using the word frequency of each word under each emotion, and storing each word and the emotion weight thereof in a dictionary in a key value pair mode to form an emotion dictionary;
step 3, accumulating the emotion weight under a certain emotion of all participles of each sample to obtain the sum of the emotion weight of the certain emotion of the sample, and combining the sum of the emotion weight of each emotion of the sample to form an emotion vector of a training set;
step 4, taking the emotion vectors of the training set as the feature representation of the input layer, and using the emotion vectors for the training of the DNN mixed emotion classification model to obtain a trained DNN model for mixed emotion classification;
step 5, performing word segmentation on the new network platform short text, searching each word after word segmentation in an emotion dictionary to obtain emotion weight corresponding to each word, respectively summing the emotion weights of all words under each emotion to obtain the sum of the emotion weights of the new network platform short text under each emotion, and combining the sum to form an emotion vector;
and 6, inputting the emotion vectors formed by the new network platform short texts into the trained DNN model to obtain vectors containing probability values of each emotion, judging the two maximum probability values and a set threshold value, if the two maximum probability values are larger than or equal to the threshold value, indicating that the new network platform short texts contain the corresponding emotions, otherwise, not containing the corresponding emotions.
2. The method for classifying short texts in a network platform based on a novel emotion dictionary according to claim 1, wherein the step 2 specifically comprises:
step 21, classifying samples of different emotion labels in the training set to obtain sample sets under n classes of emotion labels, wherein each set is Si(i is more than or equal to 1 and less than or equal to n); setting i to be 1, setting an all _ words set to be used for recording all appeared words, setting the all _ words set to be null, and setting the number of training lumped samples to be N;
step 22, setting wordsiThe set is null, and the kth sample of the ith type emotion label training set is set as
Figure RE-FDA0002930866110000021
Set k equal to 1, set countiSetting count for counter of ith type emotion label training seti=1;
Step 23, mixing
Figure RE-FDA0002930866110000022
The text part is divided into words, and the results after word division are stored in wordsiThe set is stored in an all _ words set;
step 24, let k equal to k +1, counti=counti+1, repeating the step 23 until k equals to the total number of samples in the ith type emotion label training set;
step 25, making i equal to i +1, and repeating the steps 22 to 24 until i equal to n;
step 26, extracting the word w from the all _ words set (not putting back), and counting the word w in the wordsi(1. ltoreq. i. ltoreq.n) frequency of occurrence fi w,fi wCalled word frequency, nwF is not 0i w(i is more than or equal to 1 and less than or equal to n);
step 27, calculating the emotion weight of the word w under the ith type emotion label
Figure RE-FDA0002930866110000023
And the emotional weight of the word w under the nth type of emotion;
step 28, if the all _ words set is not empty, then go to step 26;
step 29, the emotion weight of the word w under the ith emotion is expressed as w:
Figure RE-FDA0002930866110000024
the form of the key-value pair is stored in the dictionary weightiFinally, n dictionaries are obtained, the n dictionaries are classified into an emotion dictionary, and each weightiAn emotion page called emotion dictionary.
3. The novel emotion dictionary-based network platform short text mixed emotion classification method as claimed in claim 2, wherein: the weight of the word w in the step 27 under the ith type emotion label
Figure RE-FDA0002930866110000031
The calculation formula of (2) is as follows:
Figure RE-FDA0002930866110000032
4. the novel emotion dictionary-based network platform short text mixed emotion classification method as claimed in claim 2, wherein said step 3 specifically comprises:
step 31, let the kth sample of the ith type emotion label training set be
Figure RE-FDA0002930866110000033
Setting k to 1;
step 32, mixing
Figure RE-FDA0002930866110000034
The text part of the search result is segmented, and each segmented word w is used as each emotion page weight for inquiring the emotion dictionary obtained in the step 2iObtaining the value corresponding to the key w
Figure RE-FDA0002930866110000035
Then calculate out
Figure RE-FDA0002930866110000036
Sentiment score of the ith sentiment of (1)
Figure RE-FDA0002930866110000037
And combining the emotion scores of all emotion types to form an emotion vector of a training set
Figure RE-FDA0002930866110000038
5. The novel emotion dictionary-based network platform short text mixed emotion classification method of claim 4The method is characterized in that: the sentiment score in the step 32
Figure RE-FDA0002930866110000039
The calculation formula of (2) is as follows:
Figure RE-FDA00029308661100000310
6. the novel emotion dictionary-based network platform short text mixed emotion classification method as claimed in claim 4, wherein said step 5 specifically comprises:
step 51, defining the new short text of the network platform as a test set sample, wherein the r-th sample in the test set sample is testrTest will berPerforming word segmentation processing on the text to obtain a sample testrWord sets after word segmentation;
step 52, taking each word in the word set as a key to inquire the emotion weight corresponding to each word in the emotion dictionary obtained in step 2, and then using the formula:
Figure RE-FDA0002930866110000041
obtaining a test sample testrSentiment score of the ith category
Figure RE-FDA0002930866110000042
Then combining the emotion scores of each emotion to form an emotion vector
Figure RE-FDA0002930866110000043
7. The novel emotion dictionary-based network platform short text mixed emotion classification method as claimed in claim 6, wherein said step 6 specifically comprises:
step 61, converting V obtained in step 5 into VrInputting the DNN model which is trained to obtain the model containing eachVector V of emotional probability valuesp=(p1,......,pn) Wherein p isi(1. ltoreq. i. ltoreq.n) as a test sample testrProbability values comprising the ith emotion;
step 62, adding Vp=(p1,......,pn) The two emotion probability values with the maximum middle probability value and the second highest middle probability value are taken out and are set as pjAnd pl
Step 63, setting emotion threshold value P and judging PjAnd plAnd P, the judgment rule is as follows:
if p isjIs not less than P and PlIf not less than P, the test sample test is indicatedrThe j-th emotion and the l-th emotion are contained;
if p isjIs not less than P and PlIf < P, the test sample test is indicatedrThe j type emotion is contained;
if p isj< P and PlIf not less than P, the test sample test is indicatedrThe I type emotion is contained;
if p isj< P and PlIf < P, the test sample test is indicatedrWithout emotion.
CN202011408818.9A 2020-12-03 2020-12-03 Network platform short text mixed emotion classification method based on novel emotion dictionary Active CN112507071B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011408818.9A CN112507071B (en) 2020-12-03 2020-12-03 Network platform short text mixed emotion classification method based on novel emotion dictionary

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011408818.9A CN112507071B (en) 2020-12-03 2020-12-03 Network platform short text mixed emotion classification method based on novel emotion dictionary

Publications (2)

Publication Number Publication Date
CN112507071A true CN112507071A (en) 2021-03-16
CN112507071B CN112507071B (en) 2022-10-14

Family

ID=74971730

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011408818.9A Active CN112507071B (en) 2020-12-03 2020-12-03 Network platform short text mixed emotion classification method based on novel emotion dictionary

Country Status (1)

Country Link
CN (1) CN112507071B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040058301A1 (en) * 2002-06-28 2004-03-25 Glenberg Arthur M. Method for enhancing reading comprehension
CN102663046A (en) * 2012-03-29 2012-09-12 中国科学院自动化研究所 Sentiment analysis method oriented to micro-blog short text

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040058301A1 (en) * 2002-06-28 2004-03-25 Glenberg Arthur M. Method for enhancing reading comprehension
CN102663046A (en) * 2012-03-29 2012-09-12 中国科学院自动化研究所 Sentiment analysis method oriented to micro-blog short text

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
YASUO MATSUYAMA AND YUUKI HOSHINO: "Image-to-Image Retrieval Reflecting", 《2008 7TH ASIA-PACIFIC SYMPOSIUM ON INFORMATION AND TELECOMMUNICATION TECHNOLOGIES》 *
陈珂: "基于多通道卷积神经网络的中文微博情感分析", 《计算机研究与发展》 *

Also Published As

Publication number Publication date
CN112507071B (en) 2022-10-14

Similar Documents

Publication Publication Date Title
CN107133213B (en) Method and system for automatically extracting text abstract based on algorithm
Alwehaibi et al. Comparison of pre-trained word vectors for arabic text classification using deep learning approach
CN111209401A (en) System and method for classifying and processing sentiment polarity of online public opinion text information
CN109960799B (en) Short text-oriented optimization classification method
CN111143549A (en) Method for public sentiment emotion evolution based on theme
CN109002473B (en) Emotion analysis method based on word vectors and parts of speech
CN112906397B (en) Short text entity disambiguation method
CN113505200B (en) Sentence-level Chinese event detection method combined with document key information
CN108733647B (en) Word vector generation method based on Gaussian distribution
CN112434164B (en) Network public opinion analysis method and system taking topic discovery and emotion analysis into consideration
CN107133212B (en) Text implication recognition method based on integrated learning and word and sentence comprehensive information
CN110705247A (en) Based on x2-C text similarity calculation method
CN114491062B (en) Short text classification method integrating knowledge graph and topic model
Chang et al. A METHOD OF FINE-GRAINED SHORT TEXT SENTIMENT ANALYSIS BASED ON MACHINE LEARNING.
CN113065341A (en) Automatic labeling and classifying method for environmental complaint report text
Chen et al. Chinese Weibo sentiment analysis based on character embedding with dual-channel convolutional neural network
CN113032541A (en) Answer extraction method based on bert and fusion sentence cluster retrieval
CN114064901B (en) Book comment text classification method based on knowledge graph word meaning disambiguation
CN112632272B (en) Microblog emotion classification method and system based on syntactic analysis
CN113065350A (en) Biomedical text word sense disambiguation method based on attention neural network
CN111859955A (en) Public opinion data analysis model based on deep learning
CN110705277A (en) Chinese word sense disambiguation method based on cyclic neural network
CN112507071B (en) Network platform short text mixed emotion classification method based on novel emotion dictionary
CN113988054A (en) Entity identification method for coal mine safety field
CN113312903A (en) Method and system for constructing word stock of 5G mobile service product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant