WO2019080863A1 - Text sentiment classification method, storage medium and computer - Google Patents

Text sentiment classification method, storage medium and computer

Info

Publication number
WO2019080863A1
WO2019080863A1 PCT/CN2018/111607 CN2018111607W WO2019080863A1 WO 2019080863 A1 WO2019080863 A1 WO 2019080863A1 CN 2018111607 W CN2018111607 W CN 2018111607W WO 2019080863 A1 WO2019080863 A1 WO 2019080863A1
Authority
WO
WIPO (PCT)
Prior art keywords
underlying
feature vector
vector
classification
text
Prior art date
Application number
PCT/CN2018/111607
Other languages
French (fr)
Chinese (zh)
Inventor
曾伟波
郑耀松
倪时龙
苏江文
许成功
吕君玉
何天尝
林祥仙
Original Assignee
福建亿榕信息技术有限公司
国家电网有限公司
国网信息通信产业集团有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 福建亿榕信息技术有限公司, 国家电网有限公司, 国网信息通信产业集团有限公司 filed Critical 福建亿榕信息技术有限公司
Publication of WO2019080863A1 publication Critical patent/WO2019080863A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools

Definitions

  • the present invention relates to the field of machine learning, and in particular to a method and a storage medium for text sentiment classification.
  • a sentiment classification which is mainly used to analyze or predict the emotional category to which a text with emotional orientation belongs. Generally divided into positive, negative or positive, negative and neutral. According to the difference in size and granularity of the research object, the sentiment analysis technique can be roughly divided into the following three levels: word level, sentence level and chapter level emotion analysis.
  • the word-level sentiment classification can be divided into a dictionary-based sentiment classification model and a corpus-based sentiment classification model.
  • the dictionary-based sentiment classification model relies on the synonymous and antisense relations in the existing dictionary to judge the emotional tendency of words in the text. Some scholars use words such as "good” and “bad” as the benchmark words, and then calculate the difference between the mutual information between the registered words and the reference words. Some researchers use HowNet to detect the fuzzy emotion categories of adjectives in the text, and calculate the net coverage scores to distinguish the adjectives with uncertain emotion categories and the core adjectives determined by emotional categories.
  • the corpus-based sentiment classification model mainly identifies the sentiment orientation of words by statistical analysis of existing corpora. Some researchers have proposed a method based on the theory of emotional consistency.
  • Sentence-based sentiment classification can be divided into two sub-directions: semantic-based sentiment classification and statistical-based sentiment classification.
  • Semantic-based sentiment classification needs to match the sentiment dictionary to find the emotional words in the sentence, and then calculate the emotion of the whole sentence through the emotional intensity or polarity of the emotional words.
  • Some scholars try to use the rhetorical structure theory to solve the problem of sentiment orientation of sentences. Firstly, according to the theory, the sentences are divided into different blocks of text elements, and each element block is assigned different weights according to the importance of the overall emotion of the document. Emotional prediction is obtained by weighting the sentiment score of the sentence as a whole.
  • the statistical-based sentiment analysis method is based on the machine learning method.
  • a model is trained by the machine learning algorithm, and then the model is used to predict the emotional tendency of the unknown text data.
  • Some researchers try to construct feature vectors by using the number of positive and negative emotion words, negative words, special keywords, part-of-speech tags, and emojis, etc., and use machine learning to classify the sentiment data with emotional tendency.
  • the heat of learning some researchers use the recurrent neural network to combine the phrase vector and the word vector and send it into the classifier as a feature to analyze the sentiment orientation. The experiment proves the effectiveness of the method.
  • the inventors provide a text sentiment classification method, comprising the following steps: performing an emotional dictionary construction on an input text, the emotional dictionary construction step including a part-of-speech selection expression, an underlying feature vector extraction, a middle layer feature extraction, and a combination
  • the sentiment dictionary is collected, and the word vector of the training sample is collected, and the word vector of the training sample is pooled to obtain a middle layer feature vector; the underlying feature vector and the middle layer feature vector are weighted and merged to obtain a fusion feature vector, which is respectively based on the underlying feature vector
  • the classification model, the middle eigenvector classification model, and the fusion eigenvector classification model are used to calculate the classification results.
  • the underlying vector extraction is specifically performed by using a vector space model for the underlying features, wherein each dimension is characterized by a normalized TF-TDF weight.
  • the underlying feature vector and the middle layer feature vector are weighted and expressed as
  • the step of pooling the word vector comprises: dividing the number of dimensions of the underlying feature vector into several parts, summing the word vectors in each dimension, and then summing the summation results in order The order is combined to merge the results.
  • a text sentiment classification storage medium storing a computer program, when executed by a processor, implements the following steps: performing an emotional dictionary construction on the input text, the emotional dictionary construction step including a part-of-speech selection expression and an underlying feature vector extraction
  • the middle layer feature extraction combined with the sentiment dictionary, collects the word vector of the training sample, and pools the word vector of the training sample to obtain the middle layer feature vector; and performs weighted fusion on the bottom layer feature vector and the middle layer feature vector to obtain the fusion feature
  • the vector is calculated based on the underlying eigenvector classification model, the middle eigenvector classification model, and the fused feature vector classification model.
  • the underlying vector extraction is specifically performed by using a vector space model for the underlying features, wherein each dimension is characterized by a normalized TF-TDF weight.
  • the underlying feature vector and the middle layer feature vector are weighted and expressed as
  • the step of pooling the word vector further includes dividing the number of dimensions of the underlying feature vector into several parts, summing the word vectors in each dimension, and then summing the summation results in sequence The order is combined to merge the results.
  • a computer comprising the above described storage medium.
  • the present invention can establish an efficient and stable emotional dictionary with low dimension through learning, continue to use the emotional dictionary, and combine the feature fusion and the classifier fusion method to effectively improve the classification accuracy, through the bottom layer, The middle layer, the fusion feature vector, and the three classifiers to generate the classification result can make the final classification result more stable and more robust.
  • the calculation amount of the method of the present invention is also reduced by the detailed pooling process.
  • the present invention solves the problem that the prior art text emotion classification is not efficient and the classification accuracy is insufficient.
  • FIG. 1 is a flowchart of a text sentiment classification method according to an embodiment of the present invention.
  • FIG. 2 is a schematic diagram of a whole process of a text sentiment classification method according to an embodiment of the present invention
  • FIG. 3 is a diagram showing a pooling process according to an embodiment of the present invention.
  • FIG. 4 is a feature fusion diagram according to an embodiment of the present invention.
  • FIG. 1 is a text sentiment classification method.
  • the method is based on the sentiment classification model of the extreme learning machine.
  • the extreme learning machine is a single-hidden layer feedforward neural network (SLFNs).
  • the network consists of an input layer, a hidden layer and an output layer.
  • the input layer is hidden to the hidden layer and the hidden layer. There is a full connection between the output layers.
  • the method of the invention can begin in steps,
  • the sentiment dictionary construction step includes a part of speech selection expression and an underlying feature vector extraction.
  • the sentiment dictionary construction step includes two processes of part of speech selection and underlying feature selection.
  • Part of speech selection In the present invention, nouns, verbs, adjectives, and adverbs are collectively used as a reference word, and the sentiment dictionary can be a set of four word-of-speech reference words that appear in all the selected materials. Combine the words with different parts of speech to form the latent semantic information of a document, which can ensure the coverage of the sentiment dictionary to the greatest extent, while retaining the semantic information of the document.
  • Stratigraphic feature vector extraction uses the underlying feature selection principle based on chi-square statistics to further select the feature words that best represent the emotional polarity of the text.
  • the underlying feature selection vector space model is expressed, wherein the feature of each dimension in the vector is the normalized TF-IDF weight.
  • step S102 the layer feature extraction is combined with the sentiment dictionary to collect the word vector of the training sample, and the word vector of the training sample is pooled to obtain the middle layer feature vector; specifically, we can train the Skip-gram model in an unsupervised manner, and use The trained model inputs the training samples and generates a training sample word vector.
  • the specific pooling steps are shown in Figure 3:
  • each word vector group For each word vector group, the following operations are performed: all word vectors in the group are accumulated, and finally each word vector group forms a feature vector v(z), and the dimension of the feature vector is also k;
  • the present invention further performs step S104 to perform weighted fusion on the bottom layer feature vector and the middle layer feature vector to obtain a fusion feature vector
  • the S106 is respectively based on the underlying feature vector classification model, the middle layer feature vector classification model, and the fusion.
  • the eigenvector classification model calculates the classification result.
  • the specific process of classifying the sentiment to which the input sample belongs is: respectively feeding the underlying feature, the middle layer feature, and the fusion feature of the sample to be determined into the corresponding trained extreme learning machine.
  • the output result vectors of the three classification models are added together to obtain the final discriminant vector, and the median maximum corresponding label of the vector is the final emotion category.
  • the present invention can establish an efficient and stable emotional dictionary with low dimension through learning, continue to use the emotional dictionary, and combine the feature fusion and the classifier fusion method to effectively improve the classification accuracy, through the bottom layer, the middle layer, By merging the feature vectors and then generating the classification results through three classifiers, the final classification results can be made more stable and robust.
  • the calculation amount of the method of the present invention is also reduced by the detailed pooling process.
  • the present invention solves the problem that the prior art text emotion classification is not efficient and the classification accuracy is insufficient.
  • the underlying feature vector, the middle layer feature vector weighted fusion is expressed as,
  • steps may be performed before step S100 to preprocess the text to remove information that is irrelevant to the task, such as specification encoding format, removal of illegal characters, word segmentation, and part-of-speech tagging processing and stop word processing.
  • the canonical coding format is used for unified text encoding operations, such as unifying text content into UTF-8 encoding format; removing illegal characters can use regular expression matching to filter illegal characters; word segmentation tagging processing using ICTCLAS Chinese lexical analysis
  • the system performs word segmentation and part-of-speech tagging; stop word processing uses the stop word table to filter words that often appear in the text but have little meaning for sentiment analysis.
  • a text sentiment classification storage medium storing a computer program, when executed by a processor, implements the following steps: performing an emotional dictionary construction on the input text, the emotional dictionary construction step including a part-of-speech selection expression and an underlying feature vector extraction
  • the middle layer feature extraction combined with the sentiment dictionary, collects the word vector of the training sample, and pools the word vector of the training sample to obtain the middle layer feature vector; and performs weighted fusion on the bottom layer feature vector and the middle layer feature vector to obtain the fusion feature
  • the vector is calculated based on the underlying eigenvector classification model, the middle eigenvector classification model, and the fused feature vector classification model.
  • the underlying vector extraction is specifically performed by using a vector space model for the underlying features, wherein each dimension is characterized by a normalized TF-TDF weight.
  • the underlying feature vector and the middle layer feature vector are weighted and expressed as
  • the step of pooling the word vector further comprises: dividing the number of dimensions of the underlying feature vector into several parts, summing the word vectors in each dimension, and then summing the summation results in order Combine the summation results.
  • a computer comprising the above described storage medium.
  • the present invention solves the problem that the prior art text emotion classification is not efficient and the classification accuracy is insufficient.

Abstract

A text sentiment classification method, a storage medium and a computer. Said method comprises the following steps: constructing a sentiment dictionary for input text, the step of constructing a sentiment dictionary comprising selecting and expressing parts of speech, extracting base-level feature vectors; extracting mid-level features, and in combination with the sentiment dictionary, acquiring word vectors of training samples and pooling the word vectors of the training samples, so as to obtain mid-level feature vectors; performing weighted fusion on the base-level feature vectors and the mid-level feature vectors, so as to obtain fused feature vectors; calculating a classification result on the basis of a base-level feature vector classification model, a mid-level feature vector classification model and a fused feature vector classification model. The present invention solves the problem in the prior art that the sentiment classification is not efficient and stable enough.

Description

文本情感分类方法、存储介质及计算机Text sentiment classification method, storage medium and computer
相关申请的交叉引用Cross-reference to related applications
本申请基于申请号为201711012851.8、申请日为2017年10月26日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。The present application is based on a Chinese patent application filed on Jan. 26, 2017, filed on Jan. 26, s.
技术领域Technical field
本发明涉及机器学习领域,尤其涉及一种文本情感分类的方法及存储介质。The present invention relates to the field of machine learning, and in particular to a method and a storage medium for text sentiment classification.
背景技术Background technique
情感分类,其主要用于分析或预测带有情感倾向的文本所属的情感类别。一般分为正向、负向或者正向、负向以及中性。根据研究对象的大小粒度的差别,可以粗略地将情感分析技术分为如下三个层次:词语级、句子级和篇章级的情感分析。A sentiment classification, which is mainly used to analyze or predict the emotional category to which a text with emotional orientation belongs. Generally divided into positive, negative or positive, negative and neutral. According to the difference in size and granularity of the research object, the sentiment analysis technique can be roughly divided into the following three levels: word level, sentence level and chapter level emotion analysis.
基于词语级的情感分类又可以分为基于词典的情感分类模型和基于语料库的情感分类模型。基于词典的情感分类模型依靠已有的词典中的同义、反义关系来判断文本中词的情感倾向。有学者将“好”和“坏”这类明显倾向的词作为基准词,然后再计算登录词与基准词之间的互信息的差。有学者利用HowNet标注检测文本中的形容词的模糊情感类别,通过计算净覆盖得分来区分情感类别不确定的形容词和情感类别确定的核心形容词。基于语料库的情感分类模型主要是通过对已有的语料库进行统计分析,来识别词语的情感倾向性。有学者提出一种基于情感一致性理论的方法,他们认为不同的连接词蕴含着潜在的语义关系,所以利用语料库中的连接词可 以挖掘出未登录词的语义情感。有学者提出了一种解决情感词领域依赖的方法,首先用已有的语料库来抽取文本中的情感词和情感对象,然后将它们形成一个情感搭配对,利用启发式算法来计算出每个情感搭配对的情感,将最后结果构造成一个情感搭配词典,这种做法在一定程度上解决了情感词的上下文依赖。The word-level sentiment classification can be divided into a dictionary-based sentiment classification model and a corpus-based sentiment classification model. The dictionary-based sentiment classification model relies on the synonymous and antisense relations in the existing dictionary to judge the emotional tendency of words in the text. Some scholars use words such as "good" and "bad" as the benchmark words, and then calculate the difference between the mutual information between the registered words and the reference words. Some scholars use HowNet to detect the fuzzy emotion categories of adjectives in the text, and calculate the net coverage scores to distinguish the adjectives with uncertain emotion categories and the core adjectives determined by emotional categories. The corpus-based sentiment classification model mainly identifies the sentiment orientation of words by statistical analysis of existing corpora. Some scholars have proposed a method based on the theory of emotional consistency. They think that different connected words contain potential semantic relations, so the use of connected words in the corpus can dig out the semantic emotions of unregistered words. Some scholars have proposed a method to solve the domain dependence of emotional words. Firstly, the existing corpus is used to extract the emotional and emotional objects in the text, and then they are formed into an emotional matching pair. The heuristic algorithm is used to calculate each emotion. With the pair of emotions, the final result is constructed into an emotional collocation dictionary, which solves the context dependence of emotional words to a certain extent.
基于句子级的情感分类又可以分为两个子方向:基于语义的情感分类和基于统计的情感分类。基于语义的情感分类需要匹配情感词典来找出句子中的情感词,再通过情感词的情感强度或者极性来计算句子整体的情感。有学者尝试利用修辞结构理论解决句子的情感倾向性问题,首先根据该理论将句子划分为不同的文本元素块,并根据对文档整体情感的重要程度对每个元素块分配不同的权重,最后通过加权求得句子整体的情感得分进行情感预测。基于统计的情感分析方法就是基于机器学习的方法,利用已经标注的数据通过机器学习算法训练出一个模型,然后用该模型对未知的文本数据进行情感倾向的预测。有学者尝试利用正负向情感词的个数、否定词、特殊的关键词、词性标签以及表情符号以及等来构建特征向量,利用机器学习的方法对推特数据进行情感倾向分类,随着深度学习的大热,也有学者利用递归神经网络对短语向量和词向量进行组合并其作为特征送入分类器中进行情感倾向分析,实验证明了该类方法的有效性。Sentence-based sentiment classification can be divided into two sub-directions: semantic-based sentiment classification and statistical-based sentiment classification. Semantic-based sentiment classification needs to match the sentiment dictionary to find the emotional words in the sentence, and then calculate the emotion of the whole sentence through the emotional intensity or polarity of the emotional words. Some scholars try to use the rhetorical structure theory to solve the problem of sentiment orientation of sentences. Firstly, according to the theory, the sentences are divided into different blocks of text elements, and each element block is assigned different weights according to the importance of the overall emotion of the document. Emotional prediction is obtained by weighting the sentiment score of the sentence as a whole. The statistical-based sentiment analysis method is based on the machine learning method. Using the already labeled data, a model is trained by the machine learning algorithm, and then the model is used to predict the emotional tendency of the unknown text data. Some scholars try to construct feature vectors by using the number of positive and negative emotion words, negative words, special keywords, part-of-speech tags, and emojis, etc., and use machine learning to classify the sentiment data with emotional tendency. The heat of learning, some scholars use the recurrent neural network to combine the phrase vector and the word vector and send it into the classifier as a feature to analyze the sentiment orientation. The experiment proves the effectiveness of the method.
基于篇章级情感分类主要研究像新闻、博客等这样篇章级文本的整体情感。研究的重点是放在文本的语义信息上。有学者提出的方法分析了篇章级文本中出现的评价短语词组,通过分析这些评价短语词组的情感倾向性,半自动地构建一个情感词典,然后利用情感词典来分析篇章的整体情感。而基于机器学习的方法来对篇章级文本进行情感分析则更为普遍。该类方法利用情感词、短语等各种资源,通过支持向量机这一经典机器学习算法来构建篇章级文本的情感分类模型。此外,还有一类方法是先将篇章 级文本划分为多个句子,并利用最大熵算法对每一个句子进行情感分析;然后将句子的情感倾向与其位置、句式等特征结合,形成篇章的特征送入支持向量机,训练出篇章级文本的情感分类器,也取得了不错的结果。Based on chapter-level sentiment classification, I mainly study the overall emotions of text-level texts such as news and blogs. The focus of the research is on the semantic information of the text. Some scholars have proposed methods to analyze the phrase phrases appearing in text-level texts. By analyzing the sentiment orientation of these evaluation phrases, semi-automatically construct an emotional dictionary, and then use the emotional dictionary to analyze the overall emotion of the text. Emotional analysis of text-based texts based on machine learning is more common. This method uses emotion resources, phrases and other resources to construct a sentiment classification model of text-level text through the support machine vector machine. In addition, there is another method to divide the chapter-level text into multiple sentences, and use the maximum entropy algorithm to analyze each sentence emotionally; then combine the emotional tendency of the sentence with its position, sentence and other characteristics to form the characteristics of the text. Sending a support vector machine and training the emotional classifier of chapter-level text also achieved good results.
发明内容Summary of the invention
为此,需要提供一种文本情感分类方法,解决现有技术情感分类不够高效、稳定的问题。To this end, it is necessary to provide a text sentiment classification method to solve the problem that the prior art emotion classification is not efficient and stable.
为实现上述目的,发明人提供了一种文本情感分类方法,包括如下步骤,对输入文本进行情感词典构建,所述情感词典构建步骤包括词性选择表达、底层特征向量提取;中层特征提取,结合所述情感词典,采集训练样本的词向量,对训练样本的词向量进行池化后得到中层特征向量;对所述底层特征向量、中层特征向量进行加权融合,得到融合特征向量,分别基于底层特征向量分类模型、中层特征向量分类模型、融合特征向量分类模型计算分类结果。To achieve the above object, the inventors provide a text sentiment classification method, comprising the following steps: performing an emotional dictionary construction on an input text, the emotional dictionary construction step including a part-of-speech selection expression, an underlying feature vector extraction, a middle layer feature extraction, and a combination The sentiment dictionary is collected, and the word vector of the training sample is collected, and the word vector of the training sample is pooled to obtain a middle layer feature vector; the underlying feature vector and the middle layer feature vector are weighted and merged to obtain a fusion feature vector, which is respectively based on the underlying feature vector The classification model, the middle eigenvector classification model, and the fusion eigenvector classification model are used to calculate the classification results.
上述方案中,所述底层向量提取具体为,对底层特征使用向量空间模型进行表达,其中每一维的特征为归一化后的TF-TDF权重。In the above solution, the underlying vector extraction is specifically performed by using a vector space model for the underlying features, wherein each dimension is characterized by a normalized TF-TDF weight.
上述方案中,底层特征向量、中层特征向量加权融合表达为,In the above solution, the underlying feature vector and the middle layer feature vector are weighted and expressed as
Figure PCTCN2018111607-appb-000001
Figure PCTCN2018111607-appb-000001
其中,L表示底层特征向量,M表示为中层特征向量,
Figure PCTCN2018111607-appb-000002
为底层特征的权重,||表示的是串联的符号。
Where L is the underlying eigenvector and M is the middle eigenvector.
Figure PCTCN2018111607-appb-000002
For the weight of the underlying feature, || represents the symbol of the concatenation.
上述方案中,所述对词向量进行池化具体步骤后包括,将底层特征向量的维度数等分为若干份,将每一份维度中的词向量进行求和,再将求和结果按先后顺序对求和结果进行合并。In the above solution, the step of pooling the word vector comprises: dividing the number of dimensions of the underlying feature vector into several parts, summing the word vectors in each dimension, and then summing the summation results in order The order is combined to merge the results.
一种文本情感分类存储介质,存储有计算机程序,所述计算机程序在被处理器执行时实现如下步骤,对输入文本进行情感词典构建,所述情感词典构建步骤包括词性选择表达、底层特征向量提取;中层特征提取,结 合所述情感词典,采集训练样本的词向量,对训练样本的词向量进行池化后得到中层特征向量;对所述底层特征向量、中层特征向量进行加权融合,得到融合特征向量,分别基于底层特征向量分类模型、中层特征向量分类模型、融合特征向量分类模型计算分类结果。A text sentiment classification storage medium storing a computer program, when executed by a processor, implements the following steps: performing an emotional dictionary construction on the input text, the emotional dictionary construction step including a part-of-speech selection expression and an underlying feature vector extraction The middle layer feature extraction, combined with the sentiment dictionary, collects the word vector of the training sample, and pools the word vector of the training sample to obtain the middle layer feature vector; and performs weighted fusion on the bottom layer feature vector and the middle layer feature vector to obtain the fusion feature The vector is calculated based on the underlying eigenvector classification model, the middle eigenvector classification model, and the fused feature vector classification model.
上述方案中,所述底层向量提取具体为,对底层特征使用向量空间模型进行表达,其中每一维的特征为归一化后的TF-TDF权重。In the above solution, the underlying vector extraction is specifically performed by using a vector space model for the underlying features, wherein each dimension is characterized by a normalized TF-TDF weight.
具体地,底层特征向量、中层特征向量加权融合表达为,Specifically, the underlying feature vector and the middle layer feature vector are weighted and expressed as
Figure PCTCN2018111607-appb-000003
Figure PCTCN2018111607-appb-000003
其中,L表示底层特征向量,M表示为中层特征向量,
Figure PCTCN2018111607-appb-000004
为底层特征的权重,||表示的是串联的符号。
Where L is the underlying eigenvector and M is the middle eigenvector.
Figure PCTCN2018111607-appb-000004
For the weight of the underlying feature, || represents the symbol of the concatenation.
上述方案中,所述对词向量进行池化具体步骤还包括,将底层特征向量的维度数等分为若干份,将每一份维度中的词向量进行求和,再将求和结果按先后顺序对求和结果进行合并。In the above solution, the step of pooling the word vector further includes dividing the number of dimensions of the underlying feature vector into several parts, summing the word vectors in each dimension, and then summing the summation results in sequence The order is combined to merge the results.
一种计算机,所述计算机包括上述的存储介质。A computer comprising the above described storage medium.
区别于现有技术,本发明可以通过学习来建立一个高效、稳定的维度较低的情感词典,继续运用情感词典,同时结合特征融合和分类器融合的方式,来有效提高分类精度,通过底层、中层、融合特征向量,再通过三个分类器去产生分类结果,能够使得最终的分类结果更稳定,更具有鲁棒性。还通过细致的池化过程减少了本发明方法的计算量,综上所述,本发明解决了现有技术中文本情感分类效率不高,分类精度不足的问题。Different from the prior art, the present invention can establish an efficient and stable emotional dictionary with low dimension through learning, continue to use the emotional dictionary, and combine the feature fusion and the classifier fusion method to effectively improve the classification accuracy, through the bottom layer, The middle layer, the fusion feature vector, and the three classifiers to generate the classification result can make the final classification result more stable and more robust. The calculation amount of the method of the present invention is also reduced by the detailed pooling process. In summary, the present invention solves the problem that the prior art text emotion classification is not efficient and the classification accuracy is insufficient.
附图说明DRAWINGS
图1为本发明一实施方式涉及的文本情感分类方法的流程图;1 is a flowchart of a text sentiment classification method according to an embodiment of the present invention;
图2为本发明一实施方式涉及的文本情感分类方法全流程示意图;2 is a schematic diagram of a whole process of a text sentiment classification method according to an embodiment of the present invention;
图3为本发明一实施方式涉及的池化过程图;3 is a diagram showing a pooling process according to an embodiment of the present invention;
图4为本发明一实施方式涉及的特征融合图。4 is a feature fusion diagram according to an embodiment of the present invention.
具体实施方式Detailed ways
为详细说明技术方案的技术内容、构造特征、所实现目的及效果,以下结合具体实施例并配合附图详予说明。The detailed description of the technical content, structural features, and the objects and effects of the technical solutions will be described in detail below with reference to the specific embodiments and the accompanying drawings.
请参阅图1,为一种文本情感分类方法,本方法是基于极限学习机的情感分类模型。极限学习机是一种单隐层的前馈神经网络(Single-hidden Layer Feedforward Neural Networks,SLFNs),该网络由输入层、隐藏层、输出层三部分组成,同时输入层到隐藏层、隐藏层到输出层之间都是全连接。本发明方法可以开始于步骤,Please refer to FIG. 1 , which is a text sentiment classification method. The method is based on the sentiment classification model of the extreme learning machine. The extreme learning machine is a single-hidden layer feedforward neural network (SLFNs). The network consists of an input layer, a hidden layer and an output layer. The input layer is hidden to the hidden layer and the hidden layer. There is a full connection between the output layers. The method of the invention can begin in steps,
S100对输入文本进行情感词典构建,所述情感词典构建步骤包括词性选择表达、底层特征向量提取。在某些实施例中,如图2所示,情感词典构建步骤包括词性选择和底层特征选择两个过程。词性选择表达在本发明中选择名词、动词、形容词和副词共同作为基准词,情感词典可以是所有选择材料中出现过的四种词性的基准词集合。并将不同词性的词相组合构成一篇文档的潜在语义信息,这样能最大程度保证了情感词典的覆盖面,同时保留了文档的语义信息。地层特征向量提取采用基于卡方统计的底层特征选择原理进一步选择最能代表文本情感极性的特征词。其中底层特征选择向量空间模型进行表达,其中向量中每维的特征是归一化后的TF-IDF权重。S100 performs an emotional dictionary construction on the input text, and the sentiment dictionary construction step includes a part of speech selection expression and an underlying feature vector extraction. In some embodiments, as shown in FIG. 2, the sentiment dictionary construction step includes two processes of part of speech selection and underlying feature selection. Part of speech selection In the present invention, nouns, verbs, adjectives, and adverbs are collectively used as a reference word, and the sentiment dictionary can be a set of four word-of-speech reference words that appear in all the selected materials. Combine the words with different parts of speech to form the latent semantic information of a document, which can ensure the coverage of the sentiment dictionary to the greatest extent, while retaining the semantic information of the document. Stratigraphic feature vector extraction uses the underlying feature selection principle based on chi-square statistics to further select the feature words that best represent the emotional polarity of the text. The underlying feature selection vector space model is expressed, wherein the feature of each dimension in the vector is the normalized TF-IDF weight.
步骤S102中层特征提取,结合所述情感词典,采集训练样本的词向量,对训练样本的词向量进行池化后得到中层特征向量;具体地,我们可以采用无监督方式训练Skip-gram模型,并用训练好的模型输入训练样本,产生训练样本词向量。具体的池化步骤如图3所示:In step S102, the layer feature extraction is combined with the sentiment dictionary to collect the word vector of the training sample, and the word vector of the training sample is pooled to obtain the middle layer feature vector; specifically, we can train the Skip-gram model in an unsupervised manner, and use The trained model inputs the training samples and generates a training sample word vector. The specific pooling steps are shown in Figure 3:
(1)将词向量的维度数等分为若干份,将每一份维度中的词向量进行求和,再将求和结果按先后顺序对求和结果进行合并。假设文本包含x个词语,经过底层特征提取后剩下t个词语,这条文本表示为T=(w 1,w 2,...w t), 其中每个单词的词向量为,每个词向量有k维特征; (1) Divide the number of dimensions of the word vector into several parts, sum the word vectors in each dimension, and then combine the summation results in sequential order. Suppose the text contains x words. After the underlying feature extraction, there are t words. This text is represented as T=(w 1 , w 2 ,...w t ), where the word vector of each word is, each Word vectors have k-dimensional features;
(2)将文本T中的词向量等分成N份,形成N个词向量组,每个组里面对应有t/N个词向量;(2) Dividing the word vector in the text T into N parts to form N word vector groups, each group corresponding to t/N word vectors;
(3)对于每个词向量组进行以下操作:将组内所有词向量进行累加,最终每个词向量组都会形成一个特征向量v(z),该特征向量的维度也是k;(3) For each word vector group, the following operations are performed: all word vectors in the group are accumulated, and finally each word vector group forms a feature vector v(z), and the dimension of the feature vector is also k;
(4)将N个词向量组的特征向量串联起来就得到整个文档的特征一个全新的向量,如公式所示:v(z 1)||v(z 2)||...||v(z N)。其中||表示串联的符号。 (4) The feature vectors of the N word vector groups are concatenated to obtain a brand new vector of the whole document, as shown by the formula: v(z 1 )||v(z 2 )||...||v (z N ). Where || represents the symbol of the concatenation.
如图1和图4所示,本发明还进行步骤S104对所述底层特征向量、中层特征向量进行加权融合,得到融合特征向量,S106分别基于底层特征向量分类模型、中层特征向量分类模型、融合特征向量分类模型计算分类结果。在某些实施例中,可以参看图2,对输入样本所属的情感分类的具体过程为:分别将待判定样本的底层特征、中层特征和融合特征送入对应的训练好的基于极限学习机的情感分类模型中,再将三个分类模型的输出结果向量进行相加,得到最终的判别向量,该向量中值最大对应标签就是最终的情感类别。As shown in FIG. 1 and FIG. 4, the present invention further performs step S104 to perform weighted fusion on the bottom layer feature vector and the middle layer feature vector to obtain a fusion feature vector, and the S106 is respectively based on the underlying feature vector classification model, the middle layer feature vector classification model, and the fusion. The eigenvector classification model calculates the classification result. In some embodiments, referring to FIG. 2, the specific process of classifying the sentiment to which the input sample belongs is: respectively feeding the underlying feature, the middle layer feature, and the fusion feature of the sample to be determined into the corresponding trained extreme learning machine. In the sentiment classification model, the output result vectors of the three classification models are added together to obtain the final discriminant vector, and the median maximum corresponding label of the vector is the final emotion category.
通过上述步骤,本发明可以通过学习来建立一个高效、稳定的维度较低的情感词典,继续运用情感词典,同时结合特征融合和分类器融合的方式,来有效提高分类精度,通过底层、中层、融合特征向量,再通过三个分类器去产生分类结果,能够使得最终的分类结果更稳定,更具有鲁棒性。还通过细致的池化过程减少了本发明方法的计算量,综上所述,本发明解决了现有技术中文本情感分类效率不高,分类精度不足的问题。Through the above steps, the present invention can establish an efficient and stable emotional dictionary with low dimension through learning, continue to use the emotional dictionary, and combine the feature fusion and the classifier fusion method to effectively improve the classification accuracy, through the bottom layer, the middle layer, By merging the feature vectors and then generating the classification results through three classifiers, the final classification results can be made more stable and robust. The calculation amount of the method of the present invention is also reduced by the detailed pooling process. In summary, the present invention solves the problem that the prior art text emotion classification is not efficient and the classification accuracy is insufficient.
在其他一些进一步的实施例中,底层特征向量、中层特征向量加权融合表达为,In some further embodiments, the underlying feature vector, the middle layer feature vector weighted fusion is expressed as,
Figure PCTCN2018111607-appb-000005
Figure PCTCN2018111607-appb-000005
其中,L表示底层特征向量,M表示为中层特征向量,
Figure PCTCN2018111607-appb-000006
为底层特征的 权重,||表示的是串联的符号。通过上述方式,能够使得底层特征向量与中层特征向量的结合比例能够根据用户需要准确地进行调节。通过对结合方式进行更好地拟合调整,更好地达到提高模型分类精度的效果。
Where L is the underlying eigenvector and M is the middle eigenvector.
Figure PCTCN2018111607-appb-000006
For the weight of the underlying feature, || represents the symbol of the concatenation. In the above manner, the combination ratio of the underlying feature vector and the middle layer feature vector can be accurately adjusted according to user needs. By better fitting and adjusting the combination mode, the effect of improving the classification accuracy of the model can be better achieved.
在一些实施例中,还可以在步骤S100之前先进行步骤,对文本进行预处理,去除与本任务不相干的信息,例如规范编码格式、去除非法字符、分词以及词性标注处理和停用词处理。规范编码格式用于统一文本编码操作,例如将文本内容统一为UTF-8的编码格式;去除非法字符可以采用正则表达式匹配的方式对非法字符进行过滤处理;分词词性标注处理采用ICTCLAS汉语词法分析系统进行分词和词性标注;停用词处理采用停用词表对文本中经常出现但其本身对情感分析意义不大的词进行过滤。通过预处理过程,能够提高文本分词对于分类器的针对性及适应性,极大加快本发明方法对文本的识别效率。In some embodiments, steps may be performed before step S100 to preprocess the text to remove information that is irrelevant to the task, such as specification encoding format, removal of illegal characters, word segmentation, and part-of-speech tagging processing and stop word processing. . The canonical coding format is used for unified text encoding operations, such as unifying text content into UTF-8 encoding format; removing illegal characters can use regular expression matching to filter illegal characters; word segmentation tagging processing using ICTCLAS Chinese lexical analysis The system performs word segmentation and part-of-speech tagging; stop word processing uses the stop word table to filter words that often appear in the text but have little meaning for sentiment analysis. Through the pre-processing process, the pertinence and adaptability of the text segmentation to the classifier can be improved, and the recognition efficiency of the text by the method of the invention is greatly accelerated.
一种文本情感分类存储介质,存储有计算机程序,所述计算机程序在被处理器执行时实现如下步骤,对输入文本进行情感词典构建,所述情感词典构建步骤包括词性选择表达、底层特征向量提取;中层特征提取,结合所述情感词典,采集训练样本的词向量,对训练样本的词向量进行池化后得到中层特征向量;对所述底层特征向量、中层特征向量进行加权融合,得到融合特征向量,分别基于底层特征向量分类模型、中层特征向量分类模型、融合特征向量分类模型计算分类结果。A text sentiment classification storage medium storing a computer program, when executed by a processor, implements the following steps: performing an emotional dictionary construction on the input text, the emotional dictionary construction step including a part-of-speech selection expression and an underlying feature vector extraction The middle layer feature extraction, combined with the sentiment dictionary, collects the word vector of the training sample, and pools the word vector of the training sample to obtain the middle layer feature vector; and performs weighted fusion on the bottom layer feature vector and the middle layer feature vector to obtain the fusion feature The vector is calculated based on the underlying eigenvector classification model, the middle eigenvector classification model, and the fused feature vector classification model.
进一步地,所述底层向量提取具体为,对底层特征使用向量空间模型进行表达,其中每一维的特征为归一化后的TF-TDF权重。Further, the underlying vector extraction is specifically performed by using a vector space model for the underlying features, wherein each dimension is characterized by a normalized TF-TDF weight.
具体地,底层特征向量、中层特征向量加权融合表达为,Specifically, the underlying feature vector and the middle layer feature vector are weighted and expressed as
Figure PCTCN2018111607-appb-000007
Figure PCTCN2018111607-appb-000007
其中,L表示底层特征向量,M表示为中层特征向量,
Figure PCTCN2018111607-appb-000008
为底层特征的权重,||表示的是串联的符号。
Where L is the underlying eigenvector and M is the middle eigenvector.
Figure PCTCN2018111607-appb-000008
For the weight of the underlying feature, || represents the symbol of the concatenation.
优选地,所述对词向量进行池化具体步骤还包括,将底层特征向量的 维度数等分为若干份,将每一份维度中的词向量进行求和,再将求和结果按先后顺序对求和结果进行合并。Preferably, the step of pooling the word vector further comprises: dividing the number of dimensions of the underlying feature vector into several parts, summing the word vectors in each dimension, and then summing the summation results in order Combine the summation results.
一种计算机,所述计算机包括上述的存储介质。通过设计上述存储介质和计算机,本发明解决了现有技术中文本情感分类效率不高,分类精度不足的问题。A computer comprising the above described storage medium. By designing the above storage medium and computer, the present invention solves the problem that the prior art text emotion classification is not efficient and the classification accuracy is insufficient.
需要说明的是,尽管在本文中已经对上述各实施例进行了描述,但并非因此限制本发明的专利保护范围。因此,基于本发明的创新理念,对本文所述实施例进行的变更和修改,或利用本发明说明书及附图内容所作的等效结构或等效流程变换,直接或间接地将以上技术方案运用在其他相关的技术领域,均包括在本发明的专利保护范围之内。It should be noted that although the above embodiments have been described herein, the scope of the invention is not limited thereby. Therefore, based on the innovative concept of the present invention, the above technical solutions are directly or indirectly applied to the changes and modifications made to the embodiments described herein, or the equivalent structures or equivalent processes transformed by the contents of the specification and drawings of the present invention. All other related technical fields are included in the scope of patent protection of the present invention.

Claims (9)

  1. 一种文本情感分类方法,包括如下步骤:A text sentiment classification method includes the following steps:
    对输入文本进行情感词典构建,所述情感词典构建步骤包括词性选择表达、底层特征向量提取;Performing an emotional dictionary construction on the input text, the emotional dictionary construction step including a part-of-speech selection expression and an underlying feature vector extraction;
    中层特征提取,结合所述情感词典,采集训练样本的词向量,对训练样本的词向量进行池化后得到中层特征向量;The middle layer feature extraction is combined with the sentiment dictionary to collect the word vector of the training sample, and the word vector of the training sample is pooled to obtain the middle layer feature vector;
    对所述底层特征向量、中层特征向量进行加权融合,得到融合特征向量,分别基于底层特征向量分类模型、中层特征向量分类模型、融合特征向量分类模型计算分类结果。The underlying feature vector and the middle layer feature vector are weighted and fused to obtain the fused feature vector, and the classification result is calculated based on the underlying eigenvector classification model, the middle eigenvector classification model and the fused feature vector classification model.
  2. 根据权利要求1所述的文本情感分类方法,其中,所述底层向量提取为,对底层特征使用向量空间模型进行表达,其中每一维的特征为归一化后的TF-TDF权重。The text sentiment classification method according to claim 1, wherein the underlying vector is extracted by using a vector space model for the underlying features, wherein each dimension is characterized by a normalized TF-TDF weight.
  3. 根据权利要求1所述的文本情感分类方法,其中,所述底层特征向量、所述中层特征向量加权融合表达为,The text sentiment classification method according to claim 1, wherein the underlying feature vector and the middle layer feature vector are weighted and expressed as
    Figure PCTCN2018111607-appb-100001
    Figure PCTCN2018111607-appb-100001
    其中,L表示底层特征向量,M表示为中层特征向量,
    Figure PCTCN2018111607-appb-100002
    为底层特征的权重,||表示的是串联的符号。
    Where L is the underlying eigenvector and M is the middle eigenvector.
    Figure PCTCN2018111607-appb-100002
    For the weight of the underlying feature, || represents the symbol of the concatenation.
  4. 根据权利要求1所述的文本情感分类方法,其中,所述对词向量进行池化具体步骤后包括:The text sentiment classification method according to claim 1, wherein the step of pooling the word vector comprises:
    将底层特征向量的维度数等分为若干份,将每一份维度中的词向量进行求和,再将求和结果按先后顺序对求和结果进行合并。The number of dimensions of the underlying feature vector is equally divided into several parts, the word vectors in each dimension are summed, and the summation results are combined in the order of the summation results.
  5. 一种文本情感分类存储介质,存储有计算机程序,所述计算机程序在被处理器执行时实现如下步骤,对输入文本进行情感词典构建,所述情感词典构建步骤包括词性选择表达、底层特征向量提取;A text sentiment classification storage medium storing a computer program, when executed by a processor, implements the following steps: performing an emotional dictionary construction on the input text, the emotional dictionary construction step including a part-of-speech selection expression and an underlying feature vector extraction ;
    中层特征提取,结合所述情感词典,采集训练样本的词向量,对训 练样本的词向量进行池化后得到中层特征向量;The middle layer feature extraction is combined with the sentiment dictionary to collect the word vector of the training sample, and the word vector of the training sample is pooled to obtain the middle layer feature vector;
    对所述底层特征向量、中层特征向量进行加权融合,得到融合特征向量,分别基于底层特征向量分类模型、中层特征向量分类模型、融合特征向量分类模型计算分类结果。The underlying feature vector and the middle layer feature vector are weighted and fused to obtain the fused feature vector, and the classification result is calculated based on the underlying eigenvector classification model, the middle eigenvector classification model and the fused feature vector classification model.
  6. 根据权利要求5所述的文本情感分类存储介质,其中,所述底层向量提取具体为,对底层特征使用向量空间模型进行表达,其中每一维的特征为归一化后的TF-TDF权重。The text sentiment classification storage medium according to claim 5, wherein the underlying vector extraction is specifically performed by using a vector space model for the underlying features, wherein each dimension is characterized by a normalized TF-TDF weight.
  7. 根据权利要求5所述的文本情感分类存储介质,其中,所述底层特征向量、所述中层特征向量加权融合表达为,The text sentiment classification storage medium according to claim 5, wherein the underlying feature vector and the middle layer feature vector are weighted and expressed as
    Figure PCTCN2018111607-appb-100003
    Figure PCTCN2018111607-appb-100003
    其中,L表示底层特征向量,M表示为中层特征向量,
    Figure PCTCN2018111607-appb-100004
    为底层特征的权重,||表示的是串联的符号。
    Where L is the underlying eigenvector and M is the middle eigenvector.
    Figure PCTCN2018111607-appb-100004
    For the weight of the underlying feature, || represents the symbol of the concatenation.
  8. 根据权利要求5所述的文本情感分类存储介质,其中,所述对词向量进行池化具体步骤还包括,将底层特征向量的维度数等分为若干份,将每一份维度中的词向量进行求和,再将求和结果按先后顺序对求和结果进行合并。The text sentiment classification storage medium according to claim 5, wherein the step of pooling the word vector further comprises dividing the number of dimensions of the underlying feature vector into a plurality of parts, and the word vector in each dimension The summation is performed, and the summation results are combined in the order of the summation results.
  9. 一种计算机,所述计算机包括权利要求5-8任一项所述的存储介质。A computer comprising the storage medium of any of claims 5-8.
PCT/CN2018/111607 2017-10-26 2018-10-24 Text sentiment classification method, storage medium and computer WO2019080863A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711012851.8 2017-10-26
CN201711012851.8A CN107590134A (en) 2017-10-26 2017-10-26 Text sentiment classification method, storage medium and computer

Publications (1)

Publication Number Publication Date
WO2019080863A1 true WO2019080863A1 (en) 2019-05-02

Family

ID=61043372

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/111607 WO2019080863A1 (en) 2017-10-26 2018-10-24 Text sentiment classification method, storage medium and computer

Country Status (2)

Country Link
CN (1) CN107590134A (en)
WO (1) WO2019080863A1 (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110968697A (en) * 2019-12-20 2020-04-07 合肥讯飞数码科技有限公司 Text classification method, device and equipment and readable storage medium
CN111339305A (en) * 2020-03-20 2020-06-26 北京中科模识科技有限公司 Text classification method and device, electronic equipment and storage medium
CN111708886A (en) * 2020-06-11 2020-09-25 国网天津市电力公司 Public opinion analysis terminal and public opinion text analysis method based on data driving
CN112836051A (en) * 2021-02-19 2021-05-25 太极计算机股份有限公司 Online self-learning court electronic file text classification method
CN113011192A (en) * 2021-03-16 2021-06-22 广东工业大学 Text emotional feature extraction method based on attention causal explanation
CN113239685A (en) * 2021-01-13 2021-08-10 中国科学院计算技术研究所 Public sentiment detection method and system based on dual sentiments
CN113312481A (en) * 2021-05-27 2021-08-27 中国平安人寿保险股份有限公司 Text classification method, device and equipment based on block chain and storage medium
CN117521639A (en) * 2024-01-05 2024-02-06 湖南工商大学 Text detection method combined with academic text structure

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107590134A (en) * 2017-10-26 2018-01-16 福建亿榕信息技术有限公司 Text sentiment classification method, storage medium and computer
CN108363753B (en) * 2018-01-30 2020-05-19 南京邮电大学 Comment text emotion classification model training and emotion classification method, device and equipment
CN108804417B (en) * 2018-05-21 2022-03-15 山东科技大学 Document-level emotion analysis method based on specific field emotion words
CN109033089B (en) * 2018-09-06 2021-01-26 北京京东尚科信息技术有限公司 Emotion analysis method and device
CN109582963A (en) * 2018-11-29 2019-04-05 福建南威软件有限公司 A kind of archives automatic classification method based on extreme learning machine
CN109783800B (en) * 2018-12-13 2024-04-12 北京百度网讯科技有限公司 Emotion keyword acquisition method, device, equipment and storage medium
CN110362734A (en) * 2019-06-24 2019-10-22 北京百度网讯科技有限公司 Text recognition method, device, equipment and computer readable storage medium
CN110795564B (en) * 2019-11-01 2022-02-22 南京稷图数据科技有限公司 Text classification method lacking negative cases
CN110909167B (en) * 2019-11-29 2022-07-01 重庆邮电大学 Microblog text classification system
CN111159410A (en) * 2019-12-31 2020-05-15 广州广电运通信息科技有限公司 Text emotion classification method, system and device and storage medium
CN111324734B (en) * 2020-02-17 2021-03-02 昆明理工大学 Case microblog comment emotion classification method integrating emotion knowledge
CN111930938A (en) * 2020-07-06 2020-11-13 武汉卓尔数字传媒科技有限公司 Text classification method and device, electronic equipment and storage medium
CN113222772B (en) * 2021-04-08 2023-10-31 合肥工业大学 Native personality dictionary construction method, native personality dictionary construction system, storage medium and electronic equipment
CN113344121B (en) * 2021-06-29 2023-10-27 北京百度网讯科技有限公司 Method for training a sign classification model and sign classification
CN115545573B (en) * 2022-11-30 2023-04-25 山西清众科技股份有限公司 Risk early warning method, device, equipment and storage medium based on social event

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101377769A (en) * 2007-08-29 2009-03-04 中国科学院自动化研究所 Method for representing multiple graininess of text message
CN102789498A (en) * 2012-07-16 2012-11-21 钱钢 Method and system for carrying out sentiment classification on Chinese comment text on basis of ensemble learning
CN106326212A (en) * 2016-08-26 2017-01-11 北京理工大学 Method for analyzing implicit type discourse relation based on hierarchical depth semantics
CN107590134A (en) * 2017-10-26 2018-01-16 福建亿榕信息技术有限公司 Text sentiment classification method, storage medium and computer

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103020712B (en) * 2012-12-28 2015-10-28 东北大学 A kind of distributed sorter of massive micro-blog data and method
CN103729431B (en) * 2013-12-26 2017-01-18 东北大学 Massive microblog data distributed classification device and method with increment and decrement function
CN104820997B (en) * 2015-05-14 2016-12-21 北京理工大学 A kind of method for tracking target based on piecemeal sparse expression Yu HSV Feature Fusion
CN105824922B (en) * 2016-03-16 2019-03-08 重庆邮电大学 A kind of sensibility classification method merging further feature and shallow-layer feature
CN106503049A (en) * 2016-09-22 2017-03-15 南京理工大学 A kind of microblog emotional sorting technique for merging multiple affection resources based on SVM
CN107092596B (en) * 2017-04-24 2020-08-04 重庆邮电大学 Text emotion analysis method based on attention CNNs and CCR
CN107247702A (en) * 2017-05-05 2017-10-13 桂林电子科技大学 A kind of text emotion analysis and processing method and system
CN107291696A (en) * 2017-06-28 2017-10-24 达而观信息科技(上海)有限公司 A kind of comment word sentiment analysis method and system based on deep learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101377769A (en) * 2007-08-29 2009-03-04 中国科学院自动化研究所 Method for representing multiple graininess of text message
CN102789498A (en) * 2012-07-16 2012-11-21 钱钢 Method and system for carrying out sentiment classification on Chinese comment text on basis of ensemble learning
CN106326212A (en) * 2016-08-26 2017-01-11 北京理工大学 Method for analyzing implicit type discourse relation based on hierarchical depth semantics
CN107590134A (en) * 2017-10-26 2018-01-16 福建亿榕信息技术有限公司 Text sentiment classification method, storage medium and computer

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110968697A (en) * 2019-12-20 2020-04-07 合肥讯飞数码科技有限公司 Text classification method, device and equipment and readable storage medium
CN110968697B (en) * 2019-12-20 2023-06-16 合肥讯飞数码科技有限公司 Text classification method, apparatus, device and readable storage medium
CN111339305A (en) * 2020-03-20 2020-06-26 北京中科模识科技有限公司 Text classification method and device, electronic equipment and storage medium
CN111339305B (en) * 2020-03-20 2023-04-14 北京中科模识科技有限公司 Text classification method and device, electronic equipment and storage medium
CN111708886A (en) * 2020-06-11 2020-09-25 国网天津市电力公司 Public opinion analysis terminal and public opinion text analysis method based on data driving
CN113239685A (en) * 2021-01-13 2021-08-10 中国科学院计算技术研究所 Public sentiment detection method and system based on dual sentiments
CN113239685B (en) * 2021-01-13 2023-10-31 中国科学院计算技术研究所 Public opinion detection method and system based on double emotions
CN112836051A (en) * 2021-02-19 2021-05-25 太极计算机股份有限公司 Online self-learning court electronic file text classification method
CN112836051B (en) * 2021-02-19 2024-03-26 太极计算机股份有限公司 Online self-learning court electronic file text classification method
CN113011192B (en) * 2021-03-16 2023-09-15 广东工业大学 Text emotion feature extraction method based on attention causal interpretation
CN113011192A (en) * 2021-03-16 2021-06-22 广东工业大学 Text emotional feature extraction method based on attention causal explanation
CN113312481A (en) * 2021-05-27 2021-08-27 中国平安人寿保险股份有限公司 Text classification method, device and equipment based on block chain and storage medium
CN117521639A (en) * 2024-01-05 2024-02-06 湖南工商大学 Text detection method combined with academic text structure
CN117521639B (en) * 2024-01-05 2024-04-02 湖南工商大学 Text detection method combined with academic text structure

Also Published As

Publication number Publication date
CN107590134A (en) 2018-01-16

Similar Documents

Publication Publication Date Title
WO2019080863A1 (en) Text sentiment classification method, storage medium and computer
CN105824922B (en) A kind of sensibility classification method merging further feature and shallow-layer feature
CN108052593B (en) Topic keyword extraction method based on topic word vector and network structure
Devika et al. Sentiment analysis: a comparative study on different approaches
WO2020062770A1 (en) Method and apparatus for constructing domain dictionary, and device and storage medium
CN111143576A (en) Event-oriented dynamic knowledge graph construction method and device
CN103116637A (en) Text sentiment classification method facing Chinese Web comments
Suleiman et al. Comparative study of word embeddings models and their usage in Arabic language applications
Wahid et al. Topic2Labels: A framework to annotate and classify the social media data through LDA topics and deep learning models for crisis response
WO2017198031A1 (en) Semantic parsing method and apparatus
Mertiya et al. Combining naive bayes and adjective analysis for sentiment detection on Twitter
CN110750648A (en) Text emotion classification method based on deep learning and feature fusion
CN110134934A (en) Text emotion analysis method and device
Tiwari et al. Ensemble approach for twitter sentiment analysis
Chang et al. A METHOD OF FINE-GRAINED SHORT TEXT SENTIMENT ANALYSIS BASED ON MACHINE LEARNING.
CN112069312A (en) Text classification method based on entity recognition and electronic device
Sadr et al. Unified topic-based semantic models: A study in computing the semantic relatedness of geographic terms
CN114997288A (en) Design resource association method
CN115017903A (en) Method and system for extracting key phrases by combining document hierarchical structure with global local information
Mozafari et al. Emotion detection by using similarity techniques
Thakur et al. A review on text based emotion recognition system
Nikhila et al. Text imbalance handling and classification for cross-platform cyber-crime detection using deep learning
CN114547303A (en) Text multi-feature classification method and device based on Bert-LSTM
CN113688624A (en) Personality prediction method and device based on language style
Hindocha et al. Short-text Semantic Similarity using GloVe word embedding

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18869666

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18869666

Country of ref document: EP

Kind code of ref document: A1