CN103324662A

CN103324662A - Visual method and equipment for dynamic view evolution of social media event

Info

Publication number: CN103324662A
Application number: CN2013101344331A
Authority: CN
Inventors: 程学旗; 刘盛华; 朱亚涛; 王元卓; 刘悦; 朱文君
Original assignee: Institute of Computing Technology of CAS
Current assignee: Institute of Computing Technology of CAS
Priority date: 2013-04-18
Filing date: 2013-04-18
Publication date: 2013-09-25
Anticipated expiration: 2033-04-18
Also published as: CN103324662B

Abstract

The present invention provides a visualization method for the evolution of dynamic viewpoints of social media events. The method first determines the emotional membership and emotional classification of information in the collected social media event information collection, and then establishes emotional visualization based on the emotional classification of the information. The geometric layout of the graph and color the established geometric layout based on the emotional membership of the information. This method can intuitively display the emotional changes of events in the information flow, and can display more emotional information in terms of event changes and developments, which can help users better identify the turning points and explosive points of events, and predict the development trend of events.

Description

Method and device for visualization of dynamic opinion evolution of social media events

技术领域 technical field

本发明属于互联网信息挖掘与分析领域，尤其涉及以社会媒体事件为主线的各种信息流的观点分析及可视化方法。 The invention belongs to the field of Internet information mining and analysis, and in particular relates to viewpoint analysis and visualization methods of various information flows with social media events as the main line. the

背景技术 Background technique

随着web2.0技术的快速发展，使得越来越多的人能够通过Twitter、微博等平台对事件和话题的发展发表他们的意见和想法。带有情感的内容反映了事件的发展和人们的反应，并可以看出事件的发展趋势。情感分析在这个分析过程中有非常重要的作用。情感分析包含了情感分类，观点抽取和意见挖掘，评分预测等部分。情感分析的结果通常以图表的形式展现给用户，但是却不能帮助用户更好的理解事件的发展过程，除非用户自己去阅读关于此事件的大量非结构化的数据。 With the rapid development of web2.0 technology, more and more people can express their opinions and thoughts on the development of events and topics through Twitter, Weibo and other platforms. Emotional content reflects the development of events and people's reactions, and can see the development trend of events. Sentiment analysis plays a very important role in this analysis process. Sentiment analysis includes sentiment classification, opinion extraction and opinion mining, score prediction and other parts. The results of sentiment analysis are usually shown to users in the form of graphs, but they cannot help users better understand the development process of the event unless the user reads a large amount of unstructured data about the event. the

将情感分析的结果可视化的目的是通过提升人机交互性能来帮助用户更好的理解数据，发现事件的发展趋势和转折点。参考文献1提出了观点三角和环来可视化旅店评论的不同情感。参考文献2通过对观点的全面分析，提出了基于一个产品不同方面的“意见块”可视化方法。参考文献3使用了基于像素点的情绪块和高密度地质图来进行可视化。参考文献4提出了一种简单的有向路径的方法来描述事件情感之间的时间关系。但是这些可视化方法大部分是基于饼图或直方图等，并不能显示事件的情感随时间变化及变化趋势。 The purpose of visualizing the results of sentiment analysis is to help users better understand the data and discover the development trend and turning point of events by improving the performance of human-computer interaction. Reference 1 proposes opinion triangles and circles to visualize different sentiments of hotel reviews. Reference 2 proposes an “opinion block” visualization method based on different aspects of a product through a comprehensive analysis of opinions. Reference 3 uses pixel-based sentiment blocks and high-density geological maps for visualization. Reference 4 proposes a simple directed-path approach to describe temporal relationships among event emotions. However, most of these visualization methods are based on pie charts or histograms, etc., and cannot show the emotional changes and trends of events over time. the

参考文献列表： Reference list:

参考文献1，Y.Wu,F.Wei,S.Liu,N.Au,W.Cui,H.Zhou,and H.Qu,OpinionSeer:Interactive Visualization of Hotel Customer Feedback,IEEE Trans.16(6):1109–1118,2010. Reference 1, Y.Wu, F.Wei, S.Liu, N.Au, W.Cui, H.Zhou, and H.Qu, OpinionSeer: Interactive Visualization of Hotel Customer Feedback, IEEE Trans.16(6): 1109–1118, 2010.

参考文献2，B.Alper,H.Yang,E.Haber and Kandogan，OpinionBlocks:Visualizing Consumer Reviews,IEEE VisWeek2011Workshop on Interactive Visual Text Analytics for Decision Making. Reference 2, B.Alper, H.Yang, E.Haber and Kandogan, OpinionBlocks: Visualizing Consumer Reviews, IEEE VisWeek2011Workshop on Interactive Visual Text Analytics for Decision Making.

参考文献3，M.Hao,C.Rohrdantz,H.Janetzko,U.Dayal,D.Keim,L.Haug and M.Hsu,Visual Sentiment Analysis on Twitter Data Streams.IEEE Symposium on Visual Analytics Science and Technology.2011. Reference 3, M. Hao, C. Rohrdantz, H. Janetzko, U. Dayal, D. Keim, L. Haug and M. Hsu, Visual Sentiment Analysis on Twitter Data Streams. IEEE Symposium on Visual Analytics Science and Technology. 2011 .

参考文献4，D.Das,A.Kolya,A.Ekbal,S.Bandyopadhyay,Temporal analysis of sentiment events:a visual realization and tracking.CICLing'11Proceedings of the12th international conference on Computational linguistics and intelligent text processing. Reference 4, D. Das, A. Kolya, A. Ekbal, S. Bandyopadhyay, Temporal analysis of sentiment events: a visual realization and tracking. CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing.

发明内容 Contents of the invention

因此，本发明的目的在于克服上述现有技术的缺陷，提供一种有效的，更直观的情感可视化方法，帮助用户理解事件发展过程中动态观点的演变，识别事件的转折点和预测事件的发展趋势等。 Therefore, the object of the present invention is to overcome the defects of the above-mentioned prior art, provide an effective and more intuitive emotional visualization method, help users understand the evolution of dynamic viewpoints in the event development process, identify the turning point of the event and predict the development trend of the event wait. the

本发明的目的是通过以下技术方案实现的： The purpose of the present invention is achieved by the following technical solutions:

一方面，本发明提供了一种社会媒体事件的动态观点演变的可视化方法，所述方法包括： On the one hand, the present invention provides a kind of visualization method of the dynamic viewpoint evolution of social media event, and described method comprises:

步骤1）确定所采集的社会媒体事件信息集合中信息的情感隶属度和情感分类，所述信息的情感隶属度表示该信息以多大概率属于某一情感分类； Step 1) Determine the emotional membership and emotional classification of the information in the collected social media event information collection, the emotional membership of the information indicates how likely the information belongs to a certain emotional classification;

步骤2）基于所述信息的情感分类，建立所述信息集合的情感可视化图形的几何布局，所述几何布局中，以横轴表示信息产生的时间，以纵轴表示属于各情感分类的信息的数量； Step 2) Based on the emotional classification of the information, establish the geometric layout of the emotional visualization graph of the information set. In the geometric layout, the horizontal axis represents the time when the information was generated, and the vertical axis represents the information belonging to each emotional classification. quantity;

步骤3）基于所述信息的情感隶属度对所建立的几何布局进行着色。 Step 3) Color the established geometric layout based on the sentiment membership of said information. the

上述方法中，所述步骤1）可包括： In the above method, the step 1) may include:

步骤11）从所述社会媒体事件信息集合中挑选少部分信息进行人工情感标注； Step 11) Select a small part of information from the social media event information collection for artificial emotion labeling;

步骤12）基于这些已标注的信息训练用于进行情感分类的分类器； Step 12) Train a classifier for sentiment classification based on these labeled information;

步骤13）使用训练好的分类器对所述社会媒体事件信息集合中未进行人工情感标注其余信息进行分类，确定每条信息的情感隶属度和情感分类。 Step 13) Use the trained classifier to classify the remaining information in the social media event information set that has not been artificially labeled, and determine the emotional membership and emotional classification of each piece of information. the

上述方法中，所述步骤12）可以以信息的文本视图特征和非文本视图特征来训练所述分类器，其中，信息的文本视图特征该信息中情感词的基于信息检索的逐点互信息值组成的向量来表征，信息的非文本视图特征包括表情符号特征、时间特征和标点符号特征。 In the above method, the step 12) can use the text view features and non-text view features of the information to train the classifier, wherein the text view features of the information are the point-by-point mutual information values based on information retrieval of the emotional words in the information The non-text view features of information include emoji features, time features and punctuation features. the

上述方法中，在所述步骤2）在所述几何布局中，最底层的曲线函数为S₀，第i层情感分类的曲线函数为: In the above method, in the step 2) in the geometric layout, the curve function of the bottom layer is S ₀ , and the curve function of the i-th layer emotion classification is:

${S S}_{i i} = = {S S}_{00} + + {Σ Σ}_{j j = = 11}^{i i} {ρ ρ}_{j j}$

其中，密度函数ρ_j为在单位时间段内属于第j种情感分类的信息数量，1≤i≤n；S₀=0或者n为情感分类的数量。 Among them, the density function ρ _j is the amount of information belonging to the jth emotion category within a unit time period, 1≤i≤n; S ₀ =0 or n is the number of sentiment categories.

上述方法中，所述步骤3）可包括： In the above method, the step 3) may include:

步骤31）建立所述信息的情感隶属度与信息的颜色之间的映射； Step 31) Establish a mapping between the emotional membership of the information and the color of the information;

步骤32）在所建立的几何布局上，按照信息颜色的渐变顺序为各情感分类层上的信息着色。 Step 32) On the established geometric layout, color the information on each emotion classification layer according to the gradient order of the information color. the

上述方法中，所述情感分类为积极、中立或消极。 In the above method, the sentiment is classified as positive, neutral or negative. the

上述方法中，所述步骤31）可采用如下函数来建立所述信息的情感隶属度与信息的颜色之间的映射： In the above method, the step 31) can use the following function to establish the mapping between the emotional membership of the information and the color of the information:

$RGB RGB ((t t)) = = \{\begin{matrix} ((((11 - - n no ((t t)))) * * 255,255,0 255,255,0)),, p p ((t t)) &GreaterEqual; &Greater Equal; n no ((t t)) \\ ((255255,, ((11 - - p p ((t t)))) * * 255,0 255,0)),, p p ((t t)) < < n no ((t t)) \end{matrix}$

其中，t表示所述社会媒体事件信息集合中的任一信息，n(t)表示对该信息t对消极情感分类的隶属度，p(t)表示该消息t对积极情感分类的隶属度。 Wherein, t represents any information in the social media event information set, n(t) represents the membership degree of the information t to the negative emotion classification, and p(t) represents the membership degree of the message t to the positive emotion classification. the

上述方法中，还可包括步骤4）在所建立的情感可视化图形上标注情感标签，所述情感标签为在所述社会媒体事件信息中出现的情感词。 The above method may further include step 4) labeling emotional tags on the established emotional visualization graph, where the emotional tags are emotional words appearing in the social media event information. the

上述方法中，所述步骤4）中，所述情感标签在所述步骤3）得到的可视化图形上的字体大小以如下公式计算: In the above method, in the step 4), the font size of the emotional label on the visual graph obtained in the step 3) is calculated by the following formula:

F(w)=α*PMI-IR(w)*f(w) F(w)=α*PMI-IR(w)*f(w)

其中，w表示要在所述可视化图形上显示的任一情感词，f(w)是情感词w在每种情感分类的信息中出现的频率，PMI-IR(w)表示该情感词w的基于信息检索的逐点互信息值，α是比例因子。 Wherein, w represents any emotional word to be displayed on the visual graph, f(w) is the frequency of occurrence of emotional word w in the information of each emotional classification, and PMI-IR (w) represents the emotional word w Pointwise mutual information value based on information retrieval, α is a scaling factor. the

又一方面，本发明提供了一种社会媒体事件的动态观点演变的可视化设备，包括： In yet another aspect, the present invention provides a visualization device for the evolution of dynamic viewpoints of social media events, including:

用于确定所采集的社会媒体事件信息集合中信息的情感隶属度和情感分类的装置，所述信息的情感隶属度表示该信息以多大概率属于某一情感分类； A device for determining the emotional membership and emotional classification of information in the collected social media event information collection, the emotional membership of the information indicates how likely the information belongs to a certain emotional classification;

用于基于所述信息的情感分类，建立所述信息集合的情感可视化图形的几何布局的装置，所述几何布局中，以横轴表示信息产生的时间，以纵轴表示属于各情感分类的信息的数量； A device for establishing the geometric layout of the emotional visualization graph of the information set based on the emotional classification of the information, in the geometric layout, the horizontal axis represents the time when the information is generated, and the vertical axis represents the information belonging to each emotional classification quantity;

用于基于所述信息的情感隶属度对所建立的几何布局进行着色的装置。 Means for coloring the established geometric layout based on the sentiment membership of said information. the

与现有技术相比，本发明的优点在于： Compared with prior art, the advantage of the present invention is:

可以直观的显示信息流中事件的情感变化，包括情感的强度和演化、观点标签的抽取等，从而在事件的变化和发展方面可以显示更多的情感信息，能够帮助用户更好的识别事件的转折点和爆发点，预测事件的发展趋势。 It can intuitively display the emotional changes of events in the information flow, including the intensity and evolution of emotions, the extraction of opinion tags, etc., so that more emotional information can be displayed in terms of event changes and developments, and can help users better identify events. Turning points and breaking points, predicting the development trend of events. the

附图说明 Description of drawings

以下参照附图对本发明实施例作进一步说明，其中： Embodiments of the present invention will be further described below with reference to accompanying drawings, wherein:

图1为根据本发明实施例的可视化方法的流程示意图； Fig. 1 is a schematic flow chart of a visualization method according to an embodiment of the present invention;

图2为根据本发明实施例的可视化方法的几何布局示意图； Fig. 2 is a schematic diagram of a geometric layout of a visualization method according to an embodiment of the present invention;

图3为根据本发明实施例的可视化方法所得到的情感分析可视化视图的示意； Fig. 3 is the schematic diagram of the sentiment analysis visualization view that obtains according to the visualization method of the embodiment of the present invention;

图4为采用根据本发明实施例的可视化方法的情感分析可视化结果示意图； Fig. 4 is a schematic diagram of a visualization result of sentiment analysis using a visualization method according to an embodiment of the present invention;

图5为采用传统方法的情感分析可视化结果示意图。 Fig. 5 is a schematic diagram of visualization results of sentiment analysis using traditional methods. the

具体实施方式 Detailed ways

为了使本发明的目的，技术方案及优点更加清楚明白，以下结合附图通过具体实施例对本发明进一步详细说明。应当理解，此处所描述的具体实施例仅仅用以解释本发明，并不用于限定本发明。 In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below through specific embodiments in conjunction with the accompanying drawings. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention. the

在本发明的一个实施例中，提供了一种社会媒体事件的动态观点演变的可视化方法，可以直观的显示信息流中事件的情感变化，包括情感的强度和演化等。其中，社会媒体事件实际上就是Twitter或微博等平台上面关于不同事件话题的在连续时间流内的tweet或微博等信息。该方法主要包括下列步骤：步骤1）对所采集的社会媒体事件信息集合中的信息进行情感分类，也就是确定每条信息的情感隶属度和该信息所属的情感分类；步骤2）建立所述信息集合的情感可视化图形的几何布局；步骤3）对所建立的几何布局进行着色，以使情感强度可视化；步骤4）在所建立的情感可视化图形上标注情感标签。 In one embodiment of the present invention, a visualization method for dynamic opinion evolution of social media events is provided, which can intuitively display the emotional changes of events in information streams, including the intensity and evolution of emotions. Among them, social media events are actually information such as tweets or microblogs in continuous time streams on different event topics on platforms such as Twitter or Weibo. The method mainly includes the following steps: Step 1) Sentiment classification of information in the collected social media event information collection, that is, determining the emotional membership degree of each piece of information and the emotional classification to which the information belongs; Step 2) Establishing the The geometric layout of the emotion visualization graph of the information set; step 3) color the established geometric layout to visualize the emotional strength; step 4) label the emotion labels on the established emotion visualization graph. the

下文以来自TREC2011微博数据集合中从2008年6月到2009年9月有关“Obama”的41096条微博作为社会媒体事件信息集合的一个示例，对根据本发明一个实施例的社会媒体事件的动态观点演变的可视化方法的步骤进行更详细的讨论。 From June 2008 to September 2009 in the TREC2011 microblog data collection, 41096 microblogs related to "Obama" are used as an example of social media event information collection below, and the social media event according to one embodiment of the present invention The steps of the visualization method for dynamic opinion evolution are discussed in more detail. the

更具体地，在步骤1）对社会媒体事件信息集合中每条信息进行情感分类。对信息进行情感分类实际上就是计算该信息属于不同情感分类的隶属度（可简称为情感隶属度）并确定该信息所述的情感分类（也可称为情感类别）。每条信息的情感隶属度是指该信息以多大概率属于某一情感分类。例如，假设采用三种情感分类：积极、中立和消极，那么对于信息t可获取三种不同情感隶属度：积极p(t),中立m(t),消极n(t)。它们满足下列公式: More specifically, in step 1), sentiment classification is performed on each piece of information in the social media event information collection. Sentiment classification of information is actually to calculate the degree of membership of the information belonging to different emotion categories (may be referred to as emotion membership for short) and determine the emotion category (also referred to as emotion category) described by the information. The emotional membership degree of each piece of information refers to the probability that the information belongs to a certain emotional classification. For example, assuming that three sentiment classifications are adopted: positive, neutral and negative, then three different sentiment memberships can be obtained for information t: positive p(t), neutral m(t), and negative n(t). They satisfy the following formula:

p(t)+m(t)+n(t)＝1。 p(t)+m(t)+n(t)=1. the

而信息t的情感类别为这3个隶属度中的最大值代表的情感类别，即t∈max(p(t),m(t),n(t))。例如，如果m(t)>p(t)>n(t)，则表示信息t为中立情感类型的信息。对整个信息集合进行人工情感分类费时费力，因此通常可以采用基于机器学习的分类方法，也就是利用一定的已标注的数据集合来训练分类器，然后利用训练好的分类器来实现对整个信息集合中每条信息的情感分类。 The emotional category of information t is the emotional category represented by the maximum value of the three membership degrees, that is, t∈max(p(t), m(t), n(t)). For example, if m(t)>p(t)>n(t), it means that information t is information of neutral emotion type. It is time-consuming and labor-intensive to perform artificial sentiment classification on the entire information set. Therefore, a classification method based on machine learning can usually be used, that is, a certain labeled data set is used to train the classifier, and then the trained classifier is used to realize the classification of the entire information set. Sentiment classification of each message in . the

在一个实施例中，可以从上述的41096中均匀挑选2250条信息进行人工情感标注，例如，标注为积极、中立或消极。然后基于这些已标注的微博信息训练分类器，并使用训练好的分类器可对未进行人工情感标注的38846条微博进行情感分类，同时获得每条微博信息的情感隶属度。其中，分类器可以采用逻辑回归模型、决策树、朴素贝叶斯方法、k-最近邻方法、支持向量机(Support Vector Machine，SVM)模型、随机森林分类模型等典型的分类模型。在训练分类模型时，最关键的是以哪些特征来选择样本并训练模型，所谓训练分类模型就是利用样本数据和从样本数据中提取的特征通过训练来学习出分类模型中的参数，从而得到训练好的分类模型。在该实施例中，主要以社会媒体事件信息的文本视图特征和非文本视图特征来训练分类器。 In one embodiment, 2250 pieces of information can be evenly selected from the above 41096 for artificial sentiment labeling, for example, labeling as positive, neutral or negative. Then a classifier is trained based on these labeled microblog information, and the trained classifier can be used to classify the sentiment of 38846 microblogs that have not been artificially labeled, and at the same time obtain the emotional membership of each microblog information. Among them, the classifier can use typical classification models such as logistic regression model, decision tree, naive Bayesian method, k-nearest neighbor method, support vector machine (Support Vector Machine, SVM) model, random forest classification model, etc. When training a classification model, the most important thing is which features are used to select samples and train the model. The so-called training classification model is to use the sample data and the features extracted from the sample data to learn the parameters in the classification model through training, so as to be trained. Good classification model. In this embodiment, the classifier is mainly trained with text view features and non-text view features of social media event information. the

其中，信息的文本视图特征可以以该信息中情感词的PMI-IR值组成的向量来表征，其中情感词的PMI-IR值为该情感词的基于信息检索的逐点互信息(Point-wise Mutual Information and Information Retrieval)权重。信息的非文本视图特征包括表情符号特征、时间特征和标点符号特征。下面简单地介绍获取信息的文本视图特征和非文本视图特征的步骤： Among them, the text view feature of the information can be represented by a vector composed of the PMI-IR value of the emotional word in the information, where the PMI-IR value of the emotional word is the point-wise mutual information (Point-wise mutual information) based on information retrieval of the emotional word Mutual Information and Information Retrieval) weight. The non-text view features of information include emoji features, time features, and punctuation features. The following briefly introduces the steps to obtain the text view features and non-text view features of information:

步骤（11）获取信息的文本视图特征。信息的文本视图特征是由该信息中情感词的PMI-IR值组成的向量。每条信息中的情感词可以利用情感词典来识别，而情感词典中的情感词通常可以从公知的WordNet和HowNet中搜集而来。对于该信息的情感词集合P中任一个情感词w的PMI-IR的计算方式如下: Step (11) obtains the text view feature of the information. The text view feature of information is a vector composed of PMI-IR values of sentiment words in the information. The sentiment words in each piece of information can be identified by using a sentiment dictionary, and the sentiment words in the sentiment dictionary can usually be collected from well-known WordNet and HowNet. The calculation method of PMI-IR for any emotional word w in the emotional word set P of the information is as follows:

$PM PM - - IR IR ((w w)) = = {log log}_{22} [[\frac{htts htts ((wNEARexcellent wNEARexcellent)) htts htts ((poor poor))}{htts htts ((wNEARpoor w NEAR poor)) htts htts ((excellent excellent))}]] w w &Element; &Element; p p$

在上式中hits()是向AltaVista搜索引擎发送请求获得的查询结果的数目，其中w NEAR excellent使用了AltaVista的NEAR操作符，就是返回结果要包含w和excellent，且这两个词语的单词距离不能超过10个。例如，利用WordNet和HowNet的情感词词典对41096条微博进行情感词识别，然后对这些情感词计算其PMI-IR值，从而可以获取每条微博信息的文本视图特征。如针对2008年6月份关于Obama的微博数据中，识别出下列的情感词：“incredible beautiful thank amazing moderate good admire adorable liked loser dissed shit fucking gay dead worst kill idiot sock stupid”，计算各情感词的PMI-IR值，可得（如表1所示）： In the above formula, hits() is the number of query results obtained by sending a request to the AltaVista search engine, where w NEAR excellent uses the NEAR operator of AltaVista, that is, the returned results must contain w and excellent, and the word distance between these two words Cannot exceed 10. For example, the emotional word dictionary of WordNet and HowNet is used to identify the emotional words of 41096 microblogs, and then calculate the PMI-IR value of these emotional words, so that the text view characteristics of each microblog information can be obtained. For example, in the microblog data about Obama in June 2008, the following emotional words were identified: "incredible beautiful thank amazing moderate good good admire adorable liked loser dissed shit fucking gay dead worst kill idiot sock stupid" and calculate the PMI of each emotional word -IR value, available (as shown in Table 1):

表1 Table 1

步骤（12）获取信息的非文本视图特征。信息的非文本视图特征包括时间特征、表情符号特征和标点符号特征。可以以如下方式来获取这些特征： Step (12) obtains the non-text view features of the information. The non-text view features of information include time features, emoji features, and punctuation features. These features can be obtained as follows:

对于时间特征，时间特征反映的是发表信息的时间；由于人们在早晨和晚上，月初和月末，春季和冬季往往都有不同的行为，因此可将发表信息的时间划分为基于小时、天、周和月的时间特征。例如，可以获取信息的发布时间，然后根据预先设定的15个时间段：早中午晚（共4个时间段），周一至周日（共7个时间段），月的第几周（每个月有4周，共4个时间段），给对应上的时间段特征赋值为1，否则为0，组成15维的0-1向量，即构成时间特征。 For the time feature, the time feature reflects the time of publishing information; since people often have different behaviors in the morning and evening, at the beginning of the month and at the end of the month, in spring and winter, the time of publishing information can be divided into categories based on hours, days, and weeks. and month time features. For example, you can obtain the release time of the information, and then according to the 15 preset time periods: morning, noon and evening (4 time periods in total), Monday to Sunday (7 time periods in total), week of the month (every There are 4 weeks in a month, a total of 4 time periods), assign a value of 1 to the corresponding time period feature, otherwise it is 0, and form a 15-dimensional 0-1 vector, which constitutes a time feature. the

对于表情符号特征，是基于表情符号表得到的，每种表情符号对应一个情感值，即积极情感为+1，消极情感为-1，中性为0。例如基于从维基百科Wikipedia获取的表情特征库（具体该库的内容参见这个网址：http://en.wikipedia.org/wiki/List_of_emoticons）从该信息中提取所有的表情符号，例如：(<_>)、>:[等，将它们的情感标签记为积极(+1),中立(0),消极(-1)。则可以将每条信息中所包含的表情符号所对应的数值之和作为表情符号特征，不含有表情符号的，其特征值设置为0。例如，对于“Barack obama piss off:(”其表情符号特征值为-1；对于“I love Obama.:-)”，其表情符号特征值为1。 For the emoji feature, it is obtained based on the emoji table, and each emoji corresponds to an emotion value, that is, positive emotion is +1, negative emotion is -1, and neutral is 0. For example, based on the emoticon feature library obtained from Wikipedia (see this website for the content of the library: http://en.wikipedia.org/wiki/List_of_emoticons), extract all emoticons from the information, for example: (<_ >), >:[, etc., record their sentiment labels as positive (+1), neutral (0), negative (-1). Then, the sum of the numerical values corresponding to the emoticons included in each piece of information can be used as the feature of the emoticon, and if the emoticon is not included, the feature value is set to 0. For example, "Barack obama piss off:(" has an emoji feature value of -1; for "I love Obama.:-)" it has an emoji feature value of 1. the

对于标点符号特征，强调标识符(!)，疑问标识符(?)以及它们的混合或者重复表达出的情感比较强烈。因此，可以将每一个标点符号在信息中出现的频率作为这个特征的值。例如可以通过遍历文本数据获取所包含的!和的数目来提取标点符号特征。 For the punctuation feature, emphatic identifiers (!), interrogative identifiers (?), and their mixture or repetition express stronger emotions. Therefore, the frequency of each punctuation mark in the information can be used as the value of this feature. For example, the punctuation feature can be extracted by traversing the text data to obtain the number of contained ! and sums. the

通过上述方法就可以提取出每条信息的文本视图特征和非文本视图特征。在步骤1）可以将每条信息的各个特征组合成为一个总的特征向量，利用进行了人工情感标注的2250条微博对随机森林（Random forests）分类器进行训练。然后，使用训练后的随机森林分类器，基于每条微博信息的特征向量对未进行人工情感标注的38846条微博进行情感分类，同时获得每条微博信息的情感隶属度。例如，经训练后的分类器得到微博“Sisterbies good morning and thank you I think obama is beautiful.”的情感隶属度分别为p(t)=0.7595,m(t)=0.1288,n(t)=0.1117),则分类结果为此条微博属于“积极”情感类（p(t)>m(t)>n(t)）。 The text view features and non-text view features of each piece of information can be extracted through the above method. In step 1), the individual features of each piece of information can be combined into a total feature vector, and the random forests (Random forests) classifier can be trained by using 2250 microblogs that have been labeled with artificial emotions. Then, using the trained random forest classifier, based on the feature vector of each microblog information, sentiment classification is carried out on 38,846 microblogs that have not been artificially labeled, and the emotional membership of each microblog information is obtained at the same time. For example, after training, the classifier can obtain the emotional membership of the microblog "Sisterbies good morning and thank you I think obama is beautiful." p(t)=0.7595, m(t)=0.1288, n(t)= 0.1117), the classification result is that this microblog belongs to the "positive" emotional category (p(t)>m(t)>n(t)). the

步骤2）基于对所述信息集合中信息的情感分类，建立所述信息集合的情感可视化图形的几何布局 Step 2) Based on the emotional classification of the information in the information set, establish the geometric layout of the emotional visualization graphics of the information set

例如，分别统计上述信息集合中属于各个情感类别的信息的数量，建立所述信息集合的情感可视化图形的几何布局，在该几何布局中以横轴表示信息产生的时间，以纵轴表示属于各情感类别的社会媒体信息的数量。以上述的示例为例，情感类别为积极、中立、消极三种类别，那么在纵轴方向通常可以从上到下安排情感类别层，最下方为消极情感类别，中间为中立情感类别，最上方为积极情感类别。这三个情感类别层可以是基于水平面从下向上依次排列，也可以是相对于情感类别中间层对称。 For example, the quantity of information belonging to each emotion category in the above-mentioned information collection is counted separately, and the geometric layout of the emotional visualization graph of the information collection is established. The number of social media messages in the sentiment category. Taking the above example as an example, the emotional categories are positive, neutral, and negative. Then, the emotional category layers can usually be arranged from top to bottom in the direction of the vertical axis. The bottom is the negative emotion category, the middle is the neutral emotion category, and the top category of positive emotions. The three emotion category layers can be arranged from bottom to top based on the horizontal plane, or can be symmetrical with respect to the middle layer of emotion category. the

例如，假设可视化图最底层的曲线函数为S₀，情感类别i层的曲线函数是: For example, suppose the curve function at the bottom layer of the visualization graph is S ₀ , and the curve function at the i-level of the emotion category is:

${S S}_{i i} = = {S S}_{00} + + {Σ Σ}_{j j = = 11}^{i i} {ρ ρ}_{j j} - - - - - - ((11))$

其中，密度函数ρ_j为属于情感类别j的信息数量分布，也可以理解为在单位时间段内该情感层上的社会媒体信息的数量。假设有n种情感类别（例如在上文的示例中提到的三种情感类别，即n=3），那么可视化图的最上层曲线函数是S_n为: Among them, the density function _ρj is the distribution of the quantity of information belonging to the emotional category j, and it can also be understood as the quantity of social media information on the emotional layer within a unit time period. Assuming that there are n kinds of emotional categories (such as the three emotional categories mentioned in the above example, n=3), then the uppermost curve function of the visualization graph is S _n as:

${S S}_{n no} = = {S S}_{00} + + {Σ Σ}_{j j = = 11}^{n no} {ρ ρ}_{j j} - - - - - - ((22))$

一般来说，从大方向上看，情感通常可为三个类别体系(即n=3)，例如积极、中立、消极。但理解，在上文介绍的方法，对情感类别n的取值不做特别的限制，例如，n可以取值为2，或者为了使可视化的效果更细致，也可以对情感进行更细的划分，例如n可取值为4、5、6等等。通过上述公式（1）和（2）可以看出，每个情感类别i的曲线都是在它前面的几个情感类别的曲线基础上累加得到的，从而可以产生情感类别从下至上依次排列的可视化效果。 Generally speaking, from a general perspective, emotions can usually be classified into three categories (ie n=3), such as positive, neutral, and negative. However, it is understood that in the method introduced above, there is no special restriction on the value of the emotion category n. For example, n can take a value of 2, or in order to make the visualization effect more detailed, the emotion can also be divided into finer details. , for example n can be 4, 5, 6 and so on. From the above formulas (1) and (2), it can be seen that the curve of each emotional category i is accumulated on the basis of the curves of several previous emotional categories, so that the emotional categories can be arranged from bottom to top Visualization. the

在上述公式中，通过将函数S₀=0获取基于水平面的可视化图；通过S₀=-S_n可以获取基于空间对称的可视化图。这两个图都是通过图的高度或者宽度来说明关注度，社会媒体信息的数量越多，图的高度越高，表示越多的人参与讨论，在这一时间段内对此事件的关注度越高。尽管如此，通过观察中立层的上下变化，空间对称的可视化图可以更直接的表示出积极和消极情感的比例，基于水平面的可视化图却不能。因此，在优选的实施例中，将底层曲线函数S₀的数学表达式设置为如下的公式（3），从而得到相对于情感中间层对称的可视化图形的几何布局: In the above formula, the visualization graph based on the horizontal plane can be obtained by setting the function S ₀ =0; the visualization graph based on space symmetry can be obtained by S ₀ =-S _n . These two graphs illustrate attention through the height or width of the graph. The more social media information, the higher the height of the graph, indicating that more people participated in the discussion. During this period of time, the attention to this event The higher the degree. Nevertheless, spatially symmetric visualizations can more directly represent the ratio of positive and negative emotions by observing the up and down changes of the neutral layer, while the horizontal plane-based visualization cannot. Therefore, in a preferred embodiment, the mathematical expression of the underlying curve function _S0 is set to the following formula (3), so as to obtain the geometric layout of the visual graph symmetrical to the emotional middle layer:

${S S}_{00} = = - - \frac{11}{22} {Σ Σ}_{j j = = 11}^{n no} {ρ ρ}_{j j} - - - - - - ((33))$

图2给出了的根据本发明一个实施例的情感可视化图形的几何布局。如图2所示，横轴表示信息产生的时间，纵轴表示社会媒体信息的数量。该可视化图的几何布局采用对称的形式，中间层是持“中立”观点的人的数量，上下分别为“积极”和“消极”的人的数量，关于中间层是一个对称的关系。每一层的厚度就反映该时间段内参与讨论的人数的多少及其情感的分类情况。 Fig. 2 shows the geometric layout of emotion visualization graphics according to an embodiment of the present invention. As shown in Figure 2, the horizontal axis represents the time when the information was generated, and the vertical axis represents the amount of social media information. The geometric layout of the visualized diagram adopts a symmetrical form. The middle layer is the number of people holding "neutral" views, and the upper and lower layers are the number of "positive" and "negative" people respectively. There is a symmetrical relationship about the middle layer. The thickness of each layer reflects the number of people participating in the discussion and the classification of their emotions during the time period. the

步骤3）对所建立的情感可视化图形的几何布局进行着色，以使情感强度可视化。 Step 3) Color the geometric layout of the built emotion visualization graph to visualize emotion strength. the

为了使可视化效果图不仅能辨识出情感的积极、中立、消极，同时能够体现出情感的强弱，需要对上述可视化图形中的各个情感类别层进行着色，以使情感强度的可视化。在一个实施例中，采用了一种颜色和情感隶属度的映射函数，以此用颜色的渐变来表示情感的变化及其强度。通过该映射函数调整了RGB颜色模型，使得红绿蓝三种颜色可以任意混合产生多种颜色。也就是说每一种颜色都由这三个元素决定，红绿蓝中每一种的值都在0到255之间。颜色的属性值和信息t的非负情感隶属度权重之间的映射函数如下定义: In order to make the visualization effect map not only identify the positive, neutral, and negative emotions, but also reflect the strength of emotion, it is necessary to color the layers of each emotion category in the above visualization graph to visualize the intensity of emotion. In one embodiment, a mapping function of color and emotion membership is used, so that the color gradient is used to represent the change and intensity of emotion. The RGB color model is adjusted through the mapping function, so that the three colors of red, green and blue can be mixed arbitrarily to produce multiple colors. That is to say, each color is determined by these three elements, and the value of each of red, green and blue is between 0 and 255. The mapping function between the attribute value of color and the non-negative sentiment membership weight of information t is defined as follows:

$RGB RGB ((t t)) = = \{\begin{matrix} ((((11 - - n no ((t t)))) * * 255,255,0 255,255,0)),, p p ((t t)) &GreaterEqual; &Greater Equal; n no ((t t)) \\ ((255255,, ((11 - - p p ((t t)))) * * 255,0 255,0)),, p p ((t t)) < < n no ((t t)) \end{matrix} - - - - - - ((44))$

对于如下三种情况： For the following three situations:

RGB（t）=（0,255,0）表示纯绿色；（a） RGB(t)=(0,255,0) means pure green; (a)

RGB（t）=（255,255,0）表示纯黄色；（b） RGB(t)=(255,255,0) means pure yellow; (b)

RGB（t）=（255,0，0）表示纯红色；（c） RGB(t)=(255,0,0) means pure red; (c)

其中，公式（4）中，当第一个参数值（((1-n(t)）*255）越大，绿色越淡；表示该类是分为“积极”的，但是以并不是很大的概率分为的“积极”，即p(t)≥n(t)and m(t)<1；当第二个参数值((1-p(t))*255)越大，红色越淡；表示该类是分为“消极”的，但是，是以并不是很大的概率分为的“消极”，即p(t)<n(t)。考虑到积极和消极的情感是对立的，所以当信息t属于积极或消极情感类别时，隶属度p(t)不可能等于n(t)，即每条信息的情感要么是积极的，要么是消极的。当信息t属于中立情感类别时，若p(t)等于n(t)，则直接令m(t)=1,p(t)=n(t)=0,作为绝对中立。因此，绝对中立处于中间图层，颜色是黄色（2），RGB元组值是(255,255,0)。 Among them, in formula (4), when the first parameter value (((1-n(t))*255) is larger, the green is lighter; it means that the class is classified as "positive", but it is not very The large probability is divided into "positive", that is, p(t)≥n(t) and m(t)<1; when the second parameter value ((1-p(t))*255) is larger, the red The lighter it is; it means that this class is divided into "negative", but it is divided into "negative" with a not very high probability, that is, p(t)<n(t). Considering that positive and negative emotions are Opposite, so when the information t belongs to the positive or negative emotion category, the degree of membership p(t) cannot be equal to n(t), that is, the emotion of each piece of information is either positive or negative. When the information t belongs to the neutral For the emotional category, if p(t) is equal to n(t), then directly set m(t)=1, p(t)=n(t)=0, as absolute neutrality. Therefore, absolute neutrality is in the middle layer, The color is yellow (2) and the RGB tuple value is (255,255,0).

例如，通过上述方法，对图2所示的几何布局图形的图层从上到下着色得到图3所示的着色后的可视化图：最上面的是绿色，即积极类，但是绿色逐渐变浅（最顶层，p(t)=1.0,纯绿色），紧接着表示那些是分为积极的类，但是这种概率是越来越小的，即“积极”的程度越来越弱。中间过渡到中立的黄色，最中间的纯黄色表示是以1.0的概率分为中立的那些 tweet，黄色靠上的部分就是分为中立，但是p(t)>n(t)，黄色靠下的部分就是分为中立，但是p(t)<n(t)。最下面的是红色，即消极类，但是红色逐渐变深，也就是表示那些是分为消极的类，而且这种概率是越来越大的，即“消极”的程度越来越强，直到最下层的n(t)=1.0，纯红色。 For example, through the above method, color the layers of the geometric layout graph shown in Figure 2 from top to bottom to get the colored visualization shown in Figure 3: the top is green, that is, the positive class, but the green gradually becomes lighter (The top layer, p(t)=1.0, pure green), followed by those who are classified as positive, but this probability is getting smaller and smaller, that is, the degree of "positive" is getting weaker and weaker. The middle transitions to neutral yellow. The middle pure yellow indicates those tweets that are classified as neutral with a probability of 1.0. The part above the yellow is classified as neutral, but p(t)>n(t), and the part below the yellow is classified as neutral. Part is divided into neutral, but p(t)<n(t). The bottom one is red, that is, the negative class, but the red gradually becomes darker, which means that those are classified as negative classes, and this probability is getting bigger and bigger, that is, the degree of "negative" is getting stronger and stronger, until The bottom layer has n(t)=1.0, pure red. the

步骤4）在所建立的情感可视化图形上标注情感标签。 Step 4) Label the emotional labels on the established emotional visualization graph. the

可以将在社会媒体事件信息中出现的情感词作为情感标签显示在该情感可视化图形上，并且可以通过情感标签的字体大小来反映情感词出现的频率和情感词的情感强度，在该可视化图形中，情感标签w的字体大小F(w)计算如下: The emotional words that appear in social media event information can be displayed on the emotional visualization graph as emotional labels, and the frequency of occurrence of emotional words and the emotional intensity of emotional words can be reflected by the font size of the emotional labels. In this visual graph , the font size F(w) of the emotional label w is calculated as follows:

F(w)=α*PMI-IR(w)*f(w) F(w)=α*PMI-IR(w)*f(w)

其中f(w)是情感词w在每种情感分类的信息中出现的频率，PMI-IR(w)表示该情感词w的基于信息检索的逐点互信息值，α是比例因子，α大小跟可视化图的比例有关系，例如，如果可显示的最大字体为9号字，则可以将α设置为9。 Where f(w) is the frequency of the emotional word w appearing in the information of each emotional classification, PMI-IR(w) represents the point-by-point mutual information value of the emotional word w based on information retrieval, α is a scaling factor, and the size of α It has something to do with the scale of the visualization. For example, if the largest font that can be displayed is a font size 9, you can set α to 9. the

例如，假设在上述对2008年6月份的微博数据集处理的结果中情感标签为“incredible,beautiful,wrong,idolt”。再依据字体大小公式计算可得各情感标签文字的字体大小（表2所示），最终生成的情感可视化图形如图3所示。 For example, it is assumed that in the result of processing the above-mentioned microblog data set in June 2008, the emotional label is "incredible, beautiful, wrong, idolt". Then calculate the font size of each emotional label text according to the font size formula (shown in Table 2), and finally generate the emotional visualization graph as shown in Figure 3. the

表2 Table 2

应理解，上述的步骤4）是可选的，在所建立的情感可视化图形上标注情感标签，是为了在该可视化图像上进一步显示更多的情感信息，以帮助用户更好地预测事件的发展趋势。可以所建立的可视化图形上显示所有的出现的情感词，也可以只显示部分出现频率高且PMI-IR值大的情感词。 It should be understood that the above step 4) is optional. Marking emotional labels on the established emotional visualization graph is to further display more emotional information on the visual image to help users better predict the development of events trend. All appearing emotional words can be displayed on the established visual graph, or only some emotional words with high frequency of occurrence and large PMI-IR value can be displayed. the

图4表示的是从2008年6月到2009年5月关于Obama的tweet分析结果。图4显示了关于Obama的情感演变过程，图上显示了2008年11月流的宽度增加，这说明关于Obama的tweet数量达到了高峰，图中的变化可以归因与2008年11月5号Obama击败了John McCain，当选为美国的第44任总统，并宣布当选。除此之外，在2009年1月，绿色图层超过了红色图层，显示了大部分的tweet用户发表了积极的内容，与此同时， Obama发表了就职演说，可以认为他的就职演说振奋人心并且很成功以至于很多人都支持他。 Figure 4 shows the analysis results of tweets about Obama from June 2008 to May 2009. Figure 4 shows the evolution of sentiment about Obama. The graph shows that the width of the flow increased in November 2008, which indicates that the number of tweets about Obama reached a peak. The changes in the graph can be attributed to Obama on November 5, 2008 Defeated John McCain, was elected as the 44th President of the United States, and declared elected. Besides that, in January 2009, the green layer overtook the red layer, showing that the majority of tweeters posted positive content, at the same time that Obama delivered his inaugural address, which can be considered uplifting Popular and so successful that many people support him. the

与图5所示的传统的可视化方法如饼图和直方图进行了对比，如图5的饼图可以清晰的说明每一种情感所占的比例，但是却不能说明情感的强度，并且没有体现出情感的起伏变化。直方图集中说明了数据在不同时间序列的变化情况。直方图中的颜色梯度可以表示情感的强烈程度，但是它所面临的问题和基于水平面的可视化图是一样的，理解这样的直方图需要用户将情感变化和时间关联起来，除此之外，从直方图很难让看到情感随时间的演变，因为时间窗靠人工方式划分。因此，基于饼图和直方图的可视化方法提供的信息比较有限，只能提供对事件情感比较宏观的视角。图4所示的本发明的可视化图通过颜色梯度的变化使情感强度可视化，基于事件序列的tweet密度函数使得事件的转折点和爆发点以及事件的预测都比较直观。而且情感词标签也反映情感强度。整体上来说，图4所示的本发明提供的可视化图体现了人们对Obama的情感是随着时间而变化的，特别是当一些有影响的事件发生时，最终这些变化趋于平缓，这时候情感强烈的tweet（推特）也变的很少，由此看来，根据本发明实施例的情感可视化图帮助用户更好的理解情感在不同时间的强度变化和随时间而演变的趋势。 Compared with the traditional visualization methods such as pie chart and histogram shown in Figure 5, the pie chart in Figure 5 can clearly illustrate the proportion of each emotion, but it cannot explain the intensity of emotion and does not reflect Emotional ups and downs. A histogram centrally illustrates how data changes over different time series. The color gradient in the histogram can represent the intensity of emotion, but the problem it faces is the same as that based on the horizontal plane. Understanding such a histogram requires the user to associate emotional changes with time. In addition, from Histograms make it difficult to see the evolution of sentiment over time because the time windows are divided manually. Therefore, the information provided by the visualization methods based on pie charts and histograms is relatively limited, and can only provide a macroscopic perspective on event sentiment. The visualization diagram of the present invention shown in FIG. 4 visualizes the emotional intensity through the change of the color gradient, and the tweet density function based on the event sequence makes the turning point and burst point of the event and the prediction of the event more intuitive. And the emotional word labels also reflect the emotional intensity. On the whole, the visualization diagram provided by the present invention shown in Fig. 4 reflects that people's emotions to Obama change over time, especially when some influential events occur, and finally these changes tend to be gentle, at this time There are also fewer tweets with strong emotions. From this point of view, the emotion visualization diagram according to the embodiment of the present invention helps users better understand the intensity of emotions at different times and the trend of evolution over time. the

在本发明的又一个实施例中，还提供了一种社会媒体事件的动态观点演变的可视化设备，包括：用于确定所采集的社会媒体事件信息集合中信息的情感隶属度和情感分类的装置，所述信息的情感隶属度表示该信息以多大概率属于某一情感分类；用于基于所述信息的情感分类，建立所述信息集合的情感可视化图形的几何布局的装置，所述几何布局中，以横轴表示信息产生的时间，以纵轴表示属于各情感分类的信息的数量；用于基于所述信息的情感隶属度对所建立的几何布局进行着色的装置。该设备还可以包括用于在所建立的情感可视化图形上标注情感标签的装置。 In yet another embodiment of the present invention, there is also provided a visualization device for the dynamic viewpoint evolution of a social media event, including: a device for determining the emotional membership and emotional classification of information in the collected social media event information collection , the emotional membership degree of the information indicates how likely the information belongs to a certain emotional classification; the device for establishing the geometric layout of the emotional visualization graphics of the information set based on the emotional classification of the information, in the geometric layout , the horizontal axis represents the time of information generation, and the vertical axis represents the quantity of information belonging to each emotion classification; a means for coloring the established geometric layout based on the emotional membership of the information. The apparatus may also include means for labeling emotion labels on the created emotion visualization graph. the

虽然本发明已经通过优选实施例进行了描述，然而本发明并非局限于这里所描述的实施例，在不脱离本发明范围的情况下还包括所作出的各种改变以及变化。 Although the present invention has been described in terms of preferred embodiments, the present invention is not limited to the embodiments described herein, and various changes and changes are included without departing from the scope of the present invention. the

Claims

1. A visualization method of the dynamic viewpoint evolution of a social media event, said method comprising:

Step 1) Determine the emotional membership and emotional classification of the information in the collected social media event information collection, the emotional membership of the information indicates how likely the information belongs to a certain emotional classification;

Step 2) Based on the emotional classification of the information, establish the geometric layout of the emotional visualization graph of the information set. In the geometric layout, the horizontal axis represents the time when the information was generated, and the vertical axis represents the information belonging to each emotional classification. quantity;

Step 3) Color the established geometric layout based on the sentiment membership of said information.

2. The method according to claim 1, said step 1) comprising:

Step 11) selecting a small part of information from the social media event information collection for artificial emotion labeling;

Step 12) Train a classifier for sentiment classification based on the labeled information;

Step 13) Use the trained classifier to classify the remaining information in the social media event information set that has not been artificially labeled, and determine the emotional membership and emotional classification of each piece of information.

3. The method according to claim 2, in the step 12), the classifier is trained with the text view features and non-text view features of the information, wherein the text view features of the information are based on the information of the emotional words in the information The retrieved point-wise mutual information values are represented by vectors, and the non-text view features of information include emoji features, time features, and punctuation features.

4. method according to claim 1, in described step 2) in described geometric layout, the curve function of the bottom layer is S0, and the curve function of the i-th layer emotion classification is:

{S S}_{i i} = = {S S}_{00} + + {Σ Σ}_{j j = = 11}^{i i} {ρ ρ}_{j j}

Among them, the density function ρ _j is the amount of information belonging to the jth emotion category within a unit time period, 1≤i≤n; S ₀ =0 or n is the number of sentiment categories.

5. The method according to any one of the preceding claims, said step 3) comprising:

Step 31) Establishing a mapping between the emotional membership of the information and the color of the information;

Step 32) On the established geometric layout, color the information on each emotion classification layer according to the gradient order of the information color.

6. The method of claim 5, the sentiment classification being positive, neutral or negative.

7. The method according to claim 6, said step 31) adopts the following function to establish a mapping between the emotional membership of the information and the color of the information:

RGB RGB ((t t)) = = \{\begin{matrix} ((((11 - - n no ((t t)))) * * 255,255,0 255,255,0)),, p p ((t t)) &GreaterEqual; &Greater Equal; n no ((t t)) \\ ((255255,, ((11 - - p p ((t t)))) * * 255,0 255,0)),, p p ((t t)) < < n no ((t t)) \end{matrix}

Wherein, t represents any information in the social media event information set, n(t) represents the membership degree of the information t to the negative emotion classification, and p(t) represents the membership degree of the message t to the positive emotion classification.

8. The method according to claim 5, further comprising step 4) labeling emotional tags on the established emotional visualization graph, the emotional tags being emotional words appearing in the social media event information.

9. the method according to claim 8, described step 4) in, described emotion label in described step 3) the font size on the visualization graph that obtains is calculated with following formula:

F(w)=α*PMI-IR(w)*f(w)

Wherein, w represents any emotional word to be displayed on the visual graph, f(w) is the frequency of occurrence of emotional word w in the information of each emotional classification, and PMI-IR (w) represents the emotional word w Pointwise mutual information value based on information retrieval, α is a scaling factor.

10. A visualization device for the evolution of dynamic viewpoints of social media events, said device comprising:

A device for determining the emotional membership and emotional classification of information in the collected social media event information collection, the emotional membership of the information indicates how likely the information belongs to a certain emotional classification;

A device for establishing the geometric layout of the emotion visualization graph of the information set based on the emotion classification of the information, in the geometric layout, the time when the information is generated is represented by the horizontal axis, and the information belonging to each emotion classification is represented by the vertical axis quantity;

Means for coloring the established geometric layout based on the sentiment membership of said information.