CN111259141A - Social media corpus emotion analysis method based on multi-model fusion - Google Patents
Social media corpus emotion analysis method based on multi-model fusion Download PDFInfo
- Publication number
- CN111259141A CN111259141A CN202010030785.2A CN202010030785A CN111259141A CN 111259141 A CN111259141 A CN 111259141A CN 202010030785 A CN202010030785 A CN 202010030785A CN 111259141 A CN111259141 A CN 111259141A
- Authority
- CN
- China
- Prior art keywords
- text
- image
- data
- corpus
- emotion
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 35
- 230000004927 fusion Effects 0.000 title claims abstract description 22
- 230000008451 emotion Effects 0.000 title claims description 14
- 238000000034 method Methods 0.000 claims abstract description 25
- 230000002996 emotional effect Effects 0.000 claims abstract description 7
- 230000007935 neutral effect Effects 0.000 claims abstract description 7
- 230000000007 visual effect Effects 0.000 claims abstract description 7
- 238000012549 training Methods 0.000 claims description 12
- 238000012545 processing Methods 0.000 claims description 9
- 230000011218 segmentation Effects 0.000 claims description 8
- 238000010586 diagram Methods 0.000 claims description 6
- 238000013507 mapping Methods 0.000 claims description 6
- 238000007781 pre-processing Methods 0.000 claims description 5
- 230000002068 genetic effect Effects 0.000 claims description 2
- 230000001537 neural effect Effects 0.000 claims description 2
- 238000010606 normalization Methods 0.000 claims description 2
- 238000012360 testing method Methods 0.000 claims description 2
- 238000012795 verification Methods 0.000 claims description 2
- 238000001914 filtration Methods 0.000 claims 1
- 230000008901 benefit Effects 0.000 description 3
- 239000013598 vector Substances 0.000 description 2
- 206010010144 Completed suicide Diseases 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/55—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Machine Translation (AREA)
Abstract
本发明公开了一种基于多模型融合的社交媒体语料情感分析方法,使用pyspide爬虫框架从社交媒体获取,并对爬虫获取的数据集进行处理,将数据集拆分为三类:只包含文本信息、只包含图像信息以及文本图像信息均包含,本发明利用跨媒体的方法进行语料处理,对于语料中的文本信息,使用SO‑PMI算法构建情感词典,分析逐点互信息积极性、中性和消极性。使用相似距离在单词之间替换PMI并构建新的公式;对于图像或者视频的语料,利用视觉文本联合建模方法去得到、解析图像的含义,从得出对于图像或者视频的含义。利用纯文本的分析结果和视觉得出的分析结果,进行加权融合得到最后的情感分析的结果。
The invention discloses a social media corpus sentiment analysis method based on multi-model fusion. The pyspide crawler framework is used to obtain from social media, and the data set obtained by the crawler is processed, and the data set is divided into three categories: only contains text information , only contains image information and text and image information, the present invention uses a cross-media method to process corpus, for text information in corpus, uses SO-PMI algorithm to construct an emotional dictionary, and analyzes the positive, neutral and negative of point-by-point mutual information sex. Use similarity distance to replace PMI between words and build a new formula; for image or video corpus, use visual text joint modeling method to get and analyze the meaning of the image, and then get the meaning of the image or video. Using the analysis results of the plain text and the visual analysis results, weighted fusion is performed to obtain the final sentiment analysis result.
Description
技术领域technical field
本发明属于情感分析领域,涉及一种基于多模型融合的社交媒体语料情感分析方法。The invention belongs to the field of sentiment analysis, and relates to a social media corpus sentiment analysis method based on multi-model fusion.
背景技术Background technique
近年来,大量的社交平台和软件涌现出来,如微博、微信、QQ等,这些社交平台极大地丰富了人们的生活。越来越多的人积极地与他人分享信息,在社交平台上表达他们的观点和感受,所以每个社交平台慢慢地就会出现大量的语料信息如:图像、文本、视频等。人们分析隐藏在这些信息中的情感可以有益于在线营销、危机公关、监控公众意见、违法行为和发现潜在抑郁症等轻生迹象等。情感分析是平台社交信息的一个趋势,即根据对用户的语料信息进行分类,可分为积极、消极和中性,三种情感倾向。在此之前,有各种方法对于图像或者文本的单一识别分析已经取得了很多成果。但是,单一特征的情感分析有很多局限性,例如用户量比较大的微博,Facebook,Twitter等社交平台,都支持图文同时发布的方法,而现今大部分方法不能全面分析用户在社交平台上发布多种语料而造成判断失误。对于社交平台的多种语料信息,提高情感分析的准确性和全面性,有待于提高。In recent years, a large number of social platforms and software have emerged, such as Weibo, WeChat, QQ, etc. These social platforms have greatly enriched people's lives. More and more people are actively sharing information with others and expressing their opinions and feelings on social platforms, so each social platform will gradually appear a large amount of corpus information such as: images, texts, videos, etc. People analyzing the emotions hidden in this information can benefit online marketing, crisis PR, monitoring public opinion, illegal behavior, and spotting signs of suicide such as potential depression. Sentiment analysis is a trend of platform social information, that is, according to the classification of users' corpus information, it can be divided into positive, negative and neutral, three emotional tendencies. Prior to this, various methods have achieved many results for single-recognition analysis of images or texts. However, sentiment analysis of a single feature has many limitations. For example, social platforms such as Weibo, Facebook, and Twitter, which have a large number of users, all support the method of publishing images and texts at the same time. However, most of the current methods cannot comprehensively analyze users’ social media platforms. Misjudgment caused by publishing multiple corpora. For the various corpus information of social platforms, improving the accuracy and comprehensiveness of sentiment analysis needs to be improved.
本发明基于多模型融合的社交媒体语料情感分析方法,避免单一的特征对于情感分析的不足,针对图像和文本进行结合分析情感,从而更加准确、适用范围更广。通过双重语料对于社区媒体的信息进行语义分析,提高了情感分析的准确性和全面性。The present invention is based on the multi-model fusion social media corpus sentiment analysis method, avoids the insufficiency of a single feature for sentiment analysis, and analyzes sentiment by combining images and texts, thereby being more accurate and having wider application range. Semantic analysis of social media information through dual corpus improves the accuracy and comprehensiveness of sentiment analysis.
发明内容SUMMARY OF THE INVENTION
本发明的目的在于提出一种基于多模型融合的社交媒体语料情感分析方法。实验相关数据使用pyspide爬虫框架从社交媒体获取,并对爬虫获取的数据集进行处理,将数据集拆分为三类:只包含文本信息、只包含图像信息以及文本图像信息均包含,本发明着重处理文本图像信息均包含情况,其他两种情况的语料可作为本发明鲁棒性的验证。首先,对于语料中的信息进行识别,识别出的语料信息可分为三类:只包含文本信息、只包含图像信息以及文本图像信息均包含,不管语料信息是上述三类中的那一种,都以包含图文信息的语料进行处理,这样做的好处是不管用户语料是哪种情况都能够合理的进行情感分析,保证模型的鲁棒性。首先,对语料中的文本信息,使用SO-PMI算法(情感倾向点互信息算法)构建情感词典,分析语料的积极性、中性和消极性,但是SO-PMI算法不能够灵活使用中文单词和短语,所以使用相似距离在单词之间替换并构建新的情感词典。其次,对于图像(包含图片和视频的集合),使用视觉文本联合建模算法对图像进行含义的解析,从而得出图像的情感倾向。最后,使用文本语料分析结果和图像语料分析得出的结果,进行加权融合的到最后的情感分析结果。The purpose of the present invention is to propose a social media corpus sentiment analysis method based on multi-model fusion. The experimental related data is obtained from social media using the pyspide crawler framework, and the data set obtained by the crawler is processed, and the data set is divided into three categories: only containing text information, only containing image information, and containing both text and image information. The present invention focuses on The processing of text and image information includes situations, and the corpus of the other two situations can be used as a verification of the robustness of the present invention. First of all, to identify the information in the corpus, the identified corpus information can be divided into three categories: only text information, only image information, and both text and image information, regardless of whether the corpus information is one of the above three categories, All of them are processed with corpus containing graphic and text information. The advantage of this is that no matter what the user corpus is, sentiment analysis can be carried out reasonably and the robustness of the model can be guaranteed. First, for the text information in the corpus, use the SO-PMI algorithm (emotional tendency point mutual information algorithm) to build a sentiment dictionary to analyze the positivity, neutrality and negativity of the corpus, but the SO-PMI algorithm cannot flexibly use Chinese words and phrases , so use similarity distance to replace between words and build a new sentiment dictionary. Second, for images (including a collection of pictures and videos), the visual text joint modeling algorithm is used to parse the meaning of the images, so as to obtain the emotional tendencies of the images. Finally, use the results of text corpus analysis and image corpus analysis to perform weighted fusion to the final sentiment analysis results.
为了实现上述目的,本发明采用的技术方案为一种基于多模型融合的社交媒体语料情感分析方法,该方法共包含以下步骤:In order to achieve the above purpose, the technical solution adopted in the present invention is a social media corpus sentiment analysis method based on multi-model fusion, and the method comprises the following steps:
步骤1数据预处理:Step 1 Data preprocessing:
使用的数据是从新浪微博等社交平台通过爬虫获取,并过滤广告等无关数据,只保留用带有用户主观性的博文数据,对过滤后的文本数据使用jieba分词器进行分词,分词后的数据存在很多无意义的数据,为提高后期模型训练的难度,所以使用停用词表,将其过滤,采用哈工大的停用词表,得到经过数据预处理后的文本;为方便对图片数据的处理,将图片数据采用归一化的方式处理为256像素*256像素的图片。The data used is obtained from social platforms such as Sina Weibo through crawlers, and irrelevant data such as advertisements are filtered, only the blog post data with user subjectivity is retained, and the filtered text data is segmented using the jieba tokenizer. There is a lot of meaningless data in the data. In order to improve the difficulty of later model training, we use a stop word list to filter it, and use the stop word list of Harbin Institute of Technology to get the text after data preprocessing; Processing, the image data is processed into a 256-pixel*256-pixel image in a normalized manner.
步骤2对文本语料进行SO-PMI模型训练:Step 2: Perform SO-PMI model training on the text corpus:
对步骤(1)中得到的文本进行词语的情感标记,同样分为积极、消极、中性三类。用于模型训练的文本数据占总数据的70%,测试验证数据占30%。首先,对已经分词且过滤停用词的数据,使用70%的处理过的情感词汇用于Word2vec工具,得到一个扩展的情感词典。基于语义定位的点互信息算法(SO-PMI),利用词与词之间的距离以及情感词典来判断它们属于哪一类。之后考虑否定词,程度副词,感叹词,修辞句和情感图表的影响,权衡所有因素,计算出文本内容的情感倾向得到分类结果。The text obtained in step (1) is marked with sentiment of words, which are also divided into three categories: positive, negative and neutral. Text data for model training accounts for 70% of the total data, and test validation data accounts for 30%. First, using 70% of the processed sentiment words for the Word2vec tool for data that has been tokenized and filtered for stop words, an extended sentiment dictionary is obtained. The semantic localization-based point mutual information algorithm (SO-PMI) uses the distance between words and the sentiment dictionary to determine which category they belong to. Then consider the influence of negative words, degree adverbs, interjections, rhetorical sentences and sentiment graphs, weigh all factors, and calculate the sentiment tendency of the text content to get the classification result.
步骤3对图片数据进行CNN+LSTM模型训练:Step 3: Perform CNN+LSTM model training on the image data:
在图片数据集的基础上,增加对图片的情感描述文本,利用这两个模态的数据提供更高精度的细粒度分类卷积做图像分类,CNN+LSTM做文本分类,两个分类结果合起来得到组后图像的情感含义解释。图像文本方面分类使用的是CNN模型,CNN模型由卷积层和全连接层构成;对于文本方面,采用深度结构化的联合嵌入方法,联合嵌入图像和细粒度的视觉描述。该方法学习了图像与文本的兼容函数,看作是多模态结构拼接嵌入的扩展。不使用双线性相容函数,而是使用深层神经编码器生成的有限元内积,最大限度地提高描述与匹配图像之间的相容性,同时最小化与其他类图像的相容性。给定数据D=(vn,tn,yn),n=1,…,N,其中v∈V表示视觉信息,t∈T表示文本类型,y∈Y表示类标签,然后通过最小化经验风险来学习图像和文本分类器函数fv:V→Y和ft:T→Y其中为0-1损失,然后定义函数F的兼容性使用特性可学的编码器的功能θ(V)图像和文本Φ(t)函数,其中,N表示数据维度,V表示图像集合,T表示文本集合,Y表示映射空间。下述三个公式是从数学角度对多模型融合的社交媒体语料情感分析方法的解释,其中公式(1.3)为图文融合函数,且F(v,t)为图文融合结果,θ(v)T为图像函数,Φ(t)为文本函数;公式(1.1)为图像最大期望平均函数,其中F(v,t)为公式(1.3),y为图像语料,t为文本语料,Y为y的映射空间;公式(1.2)为文本最大期望平均函数,其中F(v,t)为公式(1.3),y为图像语料,v为图像语料,Y为映射空间。On the basis of the picture data set, the emotional description text for the picture is added, and the data of these two modalities is used to provide higher-precision fine-grained classification convolution for image classification, CNN+LSTM for text classification, and the two classification results are combined. up to get an explanation of the emotional meaning of the images after the group. Image and text classification uses a CNN model, which consists of convolutional layers and fully connected layers; for text, a deep structured joint embedding method is used to jointly embed images and fine-grained visual descriptions. The method learns compatibility functions for images and texts, viewed as an extension of multimodal structure stitching embeddings. Instead of using a bilinear compatibility function, a finite element inner product generated by a deep neural encoder is used, maximizing the compatibility between the description and matching images, while minimizing the compatibility with other classes of images. Given data D = (v n , t n , yn ), n = 1, ..., N, where v ∈ V represents visual information, t ∈ T represents text type, and y ∈ Y represents class label, then minimized by Empirical risk to learn image and text classifier functions f v : V → Y and f t : T → Y where for the 0-1 loss, then define the compatibility of the function F Use feature-learnable encoder functions θ(V) image and text Φ(t) functions, where N represents the data dimension, V represents the image set, T represents the text set, and Y represents the mapping space. The following three formulas are mathematical explanations of the multi-model fusion social media corpus sentiment analysis method, in which formula (1.3) is the image-text fusion function, and F(v, t) is the image-text fusion result, θ(v ) T is the image function, Φ(t) is the text function; formula (1.1) is the maximum expected average function of the image, where F(v, t) is the formula (1.3), y is the image corpus, t is the text corpus, Y is The mapping space of y; formula (1.2) is the maximum expected average function of the text, where F(v, t) is formula (1.3), y is the image corpus, v is the image corpus, and Y is the mapping space.
fv(v)=argmaxyEt~T(y)[F(v,t)],yεY (1.1)f v (v) = argmax y E t ~ T(y) [F(v, t)], yεY (1.1)
ft(t)=argmaxyEv~T(y)[F(v,t)],yεY (1.2)f t (t)=argmax y E v~T(y) [F(v, t)], yεY (1.2)
F(v,t)=θ(v)TΦ(t) (1.3)F(v,t)=θ(v) T Φ(t) (1.3)
步骤4多模型融合:Step 4 Multi-model fusion:
通过步骤2、3步骤可以得到两种文本最后的文本情感的分类结果,然后通过加权的方式处理两部分判断最后的分类结果。最后的分类结果y=am+bn,其中m为纯文本判定的类别距离相似度,n为图像所得文本判定的类别距离相似度,然后根据MATLB工具的GeneticAlgorithm遗传学算法求解得到阈值a和b。Through steps 2 and 3, the classification results of the final text sentiment of the two texts can be obtained, and then the final classification results are judged by processing the two parts in a weighted manner. The final classification result y=am+bn, where m is the category distance similarity determined by the plain text, and n is the category distance similarity determined by the text obtained from the image, and then the thresholds a and b are obtained according to the GeneticAlgorithm genetic algorithm of the MATLB tool.
步骤5最终情感分析结果:Step 5 Final sentiment analysis results:
经过步骤4可以得到y=am+bn中a和b的值,输入文本类别相似度和图像文本相似度,输出图文分类值y,其值为1,-1以及0,且1为积极,-1为消极,0为中性分类结果。After step 4, the values of a and b in y=am+bn can be obtained, input the text category similarity and image text similarity, and output the image and text classification value y, whose values are 1, -1 and 0, and 1 is positive, -1 is negative, 0 is a neutral classification result.
与现有技术相比较,本发明的技术优势主要体现在:Compared with the prior art, the technical advantages of the present invention are mainly reflected in:
(1)本发明利用跨媒体的方法进行语料处理,首先,对于语料中的文本信息,使用SO-PMI算法构建情感词典,分析逐点互信息积极性、中性和消极性的。但是这种方法不能灵活使用了汉语单词和短语。所以使用相似距离在单词之间替换PMI并构建新的公式。(1) The present invention uses a cross-media method for corpus processing. First, for the text information in the corpus, the SO-PMI algorithm is used to construct a sentiment dictionary, and the positive, neutral and negative points of the point-by-point mutual information are analyzed. But this method cannot flexibly use Chinese words and phrases. So use similarity distance to replace PMI between words and build new formula.
(2)其次,对于图像或者视频的语料(视频可以看作是图像的集合),利用视觉文本联合建模方法去得到、解析图像的含义,从得出对于图像或者视频的含义。(2) Secondly, for the corpus of images or videos (video can be regarded as a collection of images), the visual-text joint modeling method is used to obtain and analyze the meaning of the image, and then the meaning of the image or video is obtained.
(3)最后,利用纯文本的分析结果和视觉得出的分析结果,进行加权融合得到最后的情感分析的结果。(3) Finally, use the analysis results of the plain text and the analysis results obtained by vision to perform weighted fusion to obtain the final sentiment analysis result.
附图说明Description of drawings
图1是本发明使用语料样例图。FIG. 1 is a sample diagram of the corpus used in the present invention.
图2是基于多模型融合的社交媒体语料情感分析的总结构图。Figure 2 is a general structure diagram of sentiment analysis of social media corpus based on multi-model fusion.
图3是本发明中分词完成后的结果图。Fig. 3 is the result diagram after word segmentation is completed in the present invention.
图4是停用词表图。Figure 4 is a stop word table diagram.
图5是步骤1经处理得到的样例图。Figure 5 is a sample image obtained by processing in step 1.
图6是SO-PMI模型训练过程图。Figure 6 is a diagram of the SO-PMI model training process.
图7是本发明训练CNN+LSTM模型的子图。FIG. 7 is a sub-graph of the training CNN+LSTM model of the present invention.
具体实施方式Detailed ways
以下结合附图和实施例对本发明进行详细说明。The present invention will be described in detail below with reference to the accompanying drawings and embodiments.
本发明采用的技术方案为一种基于多模型融合的社交媒体语料情感分析方法,该发明的具体分析过程如下The technical solution adopted in the present invention is a social media corpus sentiment analysis method based on multi-model fusion. The specific analysis process of the present invention is as follows
(1)中文分词(1) Chinese word segmentation
中文分词就是将连续的字序列按照一定的规范重新组合成词序列的过程,按照中文理解方法,将其划分为单个的词语,在实施过程中可以使用jieba分词工具对文本进行分词,分词完的句子如图2所示,可以看到这个句子被分割成了单个的词语。Chinese word segmentation is the process of recombining consecutive word sequences into word sequences according to certain specifications. According to the Chinese understanding method, it is divided into individual words. During the implementation process, the jieba word segmentation tool can be used to segment the text. After the word segmentation is completed The sentence is shown in Figure 2. You can see that the sentence is divided into individual words.
(2)去停用词(2) Remove stop words
在一段或一句正常的中文文本中通常会包含逗号、句号、分号等特殊符号。在分词完成后,这些标点符号就不需要继续存在。其次句子中包含了一些对句子重要度影响很小的词语,如的、不仅、而且、了等词语,在后续步骤中不需要使用,因此在预处理对其进行删除处理。A paragraph or sentence of normal Chinese text usually contains special symbols such as commas, periods, and semicolons. These punctuation marks do not need to continue to exist after the word segmentation is complete. Secondly, the sentence contains some words that have little effect on the importance of the sentence, such as words such as , not only, and, and so on, which do not need to be used in the subsequent steps, so they are deleted in the preprocessing.
(3)构建词向量(3) Constructing word vectors
经过(1)(2)两步处理过的大量数据,通过Word2Vec工具提取词向量,降低数据维度且获得扩展的数据词典。After a large amount of data processed in (1) and (2), word vectors are extracted by the Word2Vec tool, which reduces the data dimension and obtains an expanded data dictionary.
(4)训练SO-PMI模型(4) Training the SO-PMI model
经过(1)、(2)、(3)步的处理文本数据信息获得扩展的情感词典,然后通过SO-PMI算法通过词与词之间的距离来确定属于哪一类,构建SO-PMI模型。After (1), (2), (3) steps of processing the text data information to obtain an extended sentiment dictionary, and then using the SO-PMI algorithm to determine which category belongs to by the distance between words, and construct the SO-PMI model .
(5)图像归一化处理(5) Image normalization processing
经过爬虫获取的图像数据具有大小不一致的特性,这样的数据处理起来较为复杂,所以根据所选算法进行大小的归一化,把大小处理为256像素*256像素大小的图片。The image data obtained by the crawler has the characteristics of inconsistent size, which is more complicated to process. Therefore, the size is normalized according to the selected algorithm, and the size is processed into a picture with a size of 256 pixels*256 pixels.
(6)训练CNN+LSTM模型(6) Training CNN+LSTM model
经(5)处理的过的图像数据(标注过的数据),来训练CNN+LSTM模型。The image data (labeled data) processed by (5) is used to train the CNN+LSTM model.
(7)多模型融合(7) Multi-model fusion
经(4)、(6)训练获得SO-PMI模型和CNN+LSTM模型,图文数据输入会得到两个处理结果,经加权的方式处理两部分判断最后的分类结果,通过使用多模型融合的社交媒体语料情感分析方法实验,验证了该方法的有效性和准确性。相对于单模型和只进行文本情感分析时的准确率有明显提升,结果表明对微博情感分析时,本发明的提出的方法准确率更高。After (4) and (6) training, the SO-PMI model and the CNN+LSTM model are obtained. Two processing results will be obtained from the input of graphic data. The two parts are processed in a weighted way to judge the final classification result. The social media corpus sentiment analysis method experiment verifies the effectiveness and accuracy of the method. Compared with the single model and only the text sentiment analysis, the accuracy is significantly improved, and the results show that the proposed method of the present invention has higher accuracy when the microblog sentiment analysis is performed.
Claims (2)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010030785.2A CN111259141A (en) | 2020-01-13 | 2020-01-13 | Social media corpus emotion analysis method based on multi-model fusion |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010030785.2A CN111259141A (en) | 2020-01-13 | 2020-01-13 | Social media corpus emotion analysis method based on multi-model fusion |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111259141A true CN111259141A (en) | 2020-06-09 |
Family
ID=70952992
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010030785.2A Pending CN111259141A (en) | 2020-01-13 | 2020-01-13 | Social media corpus emotion analysis method based on multi-model fusion |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111259141A (en) |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111881667A (en) * | 2020-07-24 | 2020-11-03 | 南京烽火星空通信发展有限公司 | Sensitive text auditing method |
CN112070093A (en) * | 2020-09-22 | 2020-12-11 | 网易(杭州)网络有限公司 | Method for generating image classification model, image classification method, device and equipment |
CN112133406A (en) * | 2020-08-25 | 2020-12-25 | 合肥工业大学 | Multimodal emotion guidance method and system based on emotion map, storage medium |
CN112214603A (en) * | 2020-10-26 | 2021-01-12 | Oppo广东移动通信有限公司 | Image-text resource classification method, device, terminal and storage medium |
CN112231535A (en) * | 2020-10-23 | 2021-01-15 | 山东科技大学 | A method, processing device and storage medium for making a multimodal data set in the field of agricultural pests and diseases |
CN112241627A (en) * | 2020-10-09 | 2021-01-19 | 中国科学技术大学 | An information analysis system for negative reports of listed companies based on python text analysis |
CN112396091A (en) * | 2020-10-23 | 2021-02-23 | 西安电子科技大学 | Social media image popularity prediction method, system, storage medium and application |
CN112651448A (en) * | 2020-12-29 | 2021-04-13 | 中山大学 | Multi-modal emotion analysis method for social platform expression package |
CN112669936A (en) * | 2021-01-04 | 2021-04-16 | 上海海事大学 | Social network depression detection method based on texts and images |
CN113157998A (en) * | 2021-02-28 | 2021-07-23 | 江苏匠算天诚信息科技有限公司 | Method, system, device and medium for polling website and judging website type through IP |
CN113222772A (en) * | 2021-04-08 | 2021-08-06 | 合肥工业大学 | Native personality dictionary construction method, system, storage medium and electronic device |
CN114169450A (en) * | 2021-12-10 | 2022-03-11 | 同济大学 | Social media data multi-modal attitude analysis method |
CN115827880A (en) * | 2023-02-10 | 2023-03-21 | 之江实验室 | A business execution method and device based on emotion classification |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105320960A (en) * | 2015-10-14 | 2016-02-10 | 北京航空航天大学 | Voting based classification method for cross-language subjective and objective sentiments |
CN106886580A (en) * | 2017-01-23 | 2017-06-23 | 北京工业大学 | A kind of picture feeling polarities analysis method based on deep learning |
CN107818084A (en) * | 2017-10-11 | 2018-03-20 | 北京众荟信息技术股份有限公司 | A kind of sentiment analysis method for merging comment figure |
CN108388544A (en) * | 2018-02-10 | 2018-08-10 | 桂林电子科技大学 | A kind of picture and text fusion microblog emotional analysis method based on deep learning |
CN108764268A (en) * | 2018-04-02 | 2018-11-06 | 华南理工大学 | A kind of multi-modal emotion identification method of picture and text based on deep learning |
-
2020
- 2020-01-13 CN CN202010030785.2A patent/CN111259141A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105320960A (en) * | 2015-10-14 | 2016-02-10 | 北京航空航天大学 | Voting based classification method for cross-language subjective and objective sentiments |
CN106886580A (en) * | 2017-01-23 | 2017-06-23 | 北京工业大学 | A kind of picture feeling polarities analysis method based on deep learning |
CN107818084A (en) * | 2017-10-11 | 2018-03-20 | 北京众荟信息技术股份有限公司 | A kind of sentiment analysis method for merging comment figure |
CN108388544A (en) * | 2018-02-10 | 2018-08-10 | 桂林电子科技大学 | A kind of picture and text fusion microblog emotional analysis method based on deep learning |
CN108764268A (en) * | 2018-04-02 | 2018-11-06 | 华南理工大学 | A kind of multi-modal emotion identification method of picture and text based on deep learning |
Non-Patent Citations (3)
Title |
---|
DIONYSIS GOULARAS.ETL: "Evaluation of Deep Learning Techniques in Sentiment Analysis from Twitter Data", 《2019 INTERNATIONAL CONFERENCE ON DEEP LEARNING AND MACHINE LEARNING IN EMERGING APPLICATIONS (DEEP-ML)》 * |
FEN YANG.ETL: "A Multi-model Fusion Framework based on Deep Learning for Sentiment Classification", 《2018 IEEE 22ND INTERNATIONAL CONFERENCE ON COMPUTER SUPPORTED COOPERATIVE WORK IN DESIGN ((CSCWD))》 * |
NAN CHEN.ETL: "Advanced Combined LSTM-CNN Model for Twitter Sentiment Analysis", 《2018 5TH IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND INTELLIGENCE SYSTEMS (CCIS)》 * |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111881667B (en) * | 2020-07-24 | 2023-09-29 | 上海烽烁科技有限公司 | Sensitive text auditing method |
CN111881667A (en) * | 2020-07-24 | 2020-11-03 | 南京烽火星空通信发展有限公司 | Sensitive text auditing method |
CN112133406B (en) * | 2020-08-25 | 2022-11-04 | 合肥工业大学 | Multi-mode emotion guidance method and system based on emotion maps and storage medium |
CN112133406A (en) * | 2020-08-25 | 2020-12-25 | 合肥工业大学 | Multimodal emotion guidance method and system based on emotion map, storage medium |
CN112070093A (en) * | 2020-09-22 | 2020-12-11 | 网易(杭州)网络有限公司 | Method for generating image classification model, image classification method, device and equipment |
CN112241627A (en) * | 2020-10-09 | 2021-01-19 | 中国科学技术大学 | An information analysis system for negative reports of listed companies based on python text analysis |
CN112231535A (en) * | 2020-10-23 | 2021-01-15 | 山东科技大学 | A method, processing device and storage medium for making a multimodal data set in the field of agricultural pests and diseases |
CN112396091A (en) * | 2020-10-23 | 2021-02-23 | 西安电子科技大学 | Social media image popularity prediction method, system, storage medium and application |
CN112231535B (en) * | 2020-10-23 | 2022-11-15 | 山东科技大学 | Method for making multi-modal data set in field of agricultural diseases and insect pests, processing device and storage medium |
CN112396091B (en) * | 2020-10-23 | 2024-02-09 | 西安电子科技大学 | Social media image popularity prediction method, system, storage medium and application |
CN112214603A (en) * | 2020-10-26 | 2021-01-12 | Oppo广东移动通信有限公司 | Image-text resource classification method, device, terminal and storage medium |
CN112651448A (en) * | 2020-12-29 | 2021-04-13 | 中山大学 | Multi-modal emotion analysis method for social platform expression package |
CN112651448B (en) * | 2020-12-29 | 2023-09-15 | 中山大学 | Multi-mode emotion analysis method for social platform expression package |
CN112669936A (en) * | 2021-01-04 | 2021-04-16 | 上海海事大学 | Social network depression detection method based on texts and images |
CN113157998A (en) * | 2021-02-28 | 2021-07-23 | 江苏匠算天诚信息科技有限公司 | Method, system, device and medium for polling website and judging website type through IP |
CN113222772A (en) * | 2021-04-08 | 2021-08-06 | 合肥工业大学 | Native personality dictionary construction method, system, storage medium and electronic device |
CN113222772B (en) * | 2021-04-08 | 2023-10-31 | 合肥工业大学 | Native personality dictionary construction method, native personality dictionary construction system, storage medium and electronic equipment |
CN114169450A (en) * | 2021-12-10 | 2022-03-11 | 同济大学 | Social media data multi-modal attitude analysis method |
CN115827880A (en) * | 2023-02-10 | 2023-03-21 | 之江实验室 | A business execution method and device based on emotion classification |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111259141A (en) | Social media corpus emotion analysis method based on multi-model fusion | |
CN109933664B (en) | An Improved Method for Fine-Grained Sentiment Analysis Based on Sentiment Word Embedding | |
Kumar et al. | Sentiment analysis of multimodal twitter data | |
CN107729309B (en) | A method and device for Chinese semantic analysis based on deep learning | |
JP4148522B2 (en) | Expression detection system, expression detection method, and program | |
CN112860888B (en) | A Bimodal Sentiment Analysis Method Based on Attention Mechanism | |
CN111797898B (en) | Online comment automatic reply method based on deep semantic matching | |
Baroni | Grounding distributional semantics in the visual world | |
CN108009293A (en) | Video tab generation method, device, computer equipment and storage medium | |
US20200134398A1 (en) | Determining intent from multimodal content embedded in a common geometric space | |
CN107862087A (en) | Sentiment analysis method, apparatus and storage medium based on big data and deep learning | |
CN102929860B (en) | Chinese clause emotion polarity distinguishing method based on context | |
CN105740224A (en) | User psychological early warning method and device based on text analysis | |
CN113761377B (en) | False information detection method and device based on attention mechanism multi-feature fusion, electronic equipment and storage medium | |
CN107885883A (en) | A kind of macroeconomy field sentiment analysis method and system based on Social Media | |
CN106547875A (en) | A kind of online incident detection method of the microblogging based on sentiment analysis and label | |
CN112800184B (en) | A sentiment analysis method for short text reviews based on Target-Aspect-Opinion joint extraction | |
Wagle et al. | Explainable AI for multimodal credibility analysis: Case study of online beauty health (mis)-information | |
CN108733652B (en) | Test method for film evaluation emotion tendency analysis based on machine learning | |
CN113254814A (en) | Network course video labeling method and device, electronic equipment and medium | |
CN117591752B (en) | Multi-mode false information detection method, system and storage medium | |
CN113486143A (en) | User portrait generation method based on multi-level text representation and model fusion | |
CN108804416B (en) | Training method for film evaluation emotion tendency analysis based on machine learning | |
Soykan et al. | A comprehensive gold standard and benchmark for comics text detection and recognition | |
Wojatzki et al. | Agree or disagree: Predicting judgments on nuanced assertions |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20200609 |
|
WD01 | Invention patent application deemed withdrawn after publication |