CN104376038A - Position associated text information visualization method based on label cloud - Google Patents

Position associated text information visualization method based on label cloud Download PDF

Info

Publication number
CN104376038A
CN104376038A CN201410466976.8A CN201410466976A CN104376038A CN 104376038 A CN104376038 A CN 104376038A CN 201410466976 A CN201410466976 A CN 201410466976A CN 104376038 A CN104376038 A CN 104376038A
Authority
CN
China
Prior art keywords
information
tag cloud
according
based
different
Prior art date
Application number
CN201410466976.8A
Other languages
Chinese (zh)
Inventor
华一新
李响
赵婷
王丽娜
张晶
王培�
Original Assignee
中国人民解放军信息工程大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国人民解放军信息工程大学 filed Critical 中国人民解放军信息工程大学
Priority to CN201410466976.8A priority Critical patent/CN104376038A/en
Publication of CN104376038A publication Critical patent/CN104376038A/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9537Spatial or temporal dependent retrieval, e.g. spatiotemporal queries

Abstract

The invention relates to a position associated text information visualization method based on a label cloud and belongs to the field of electronic techniques. According to the method, data are obtained from text information associated with a common map and geographic positions, a statistic map is obtained according to a point element and plane element generating algorithm, and lexical analysis and filter are carried out on a large amount of non-structured test information so that key words and corresponding word frequency can be extracted. According to the position associated text information visualization method based on the label cloud, irrelevant detail information on the common map is filtered out, only main information is kept, the method is suitable for point elements and plane elements according to the different detail degrees of information of different scales, the outlines of administrative regions are not used by the label cloud, misunderstanding generated due to label positions is avoided, a user can browse the test information associated with the geographic positions, some unnecessary operations are omitted, and the user can be helped to grasp the general characteristic and the trend of a large amount of position associated test information.

Description

一种基于标签云的位置关联文本信息可视化方法 A position information associated text tag cloud visualization Based

技术领域 FIELD

[0001] 本发明涉及一种基于标签云的位置关联文本信息可视化方法,属于电子技术领域。 [0001] The present invention relates to a position information associated text tag cloud visualization based, it belongs to the field of electronics.

背景技术 Background technique

[0002] 以地理信息为主的可视化方法。 [0002] Visualization of geographic information based. 传统的地理信息系统(如ArcGIS、SUperMap等) 根据不同类型的文本信息进行可视化。 Traditional GIS (e.g. ArcGIS, SUperMap, etc.) according to visualize the different types of text information. 结构化的文本信息作为地理要素的属性信息存储在关系表中,点击某一个地理要素时,与之关联的文本信息会以数据表格的形式呈现出来。 Text message text structured information stored as attribute information of geographic features in relational tables, when you click a certain geographical elements associated data will be presented in the form of a table. 而对于非结构化的文本信息,则采用一种外部链接的方法,即该地理区域保存了所有与之关联的文本存储位置,当点击该区域时,由相应的文本程序(如记事本、Word等)打开该文本。 For the unstructured text information, the use of an external link method, i.e., the geographic area stores all storage locations associated with the text, when the click region, from the corresponding textual program (such as Notepad, Word etc.) to open the text. 由于地图上存在大量的地理要素,浏览这些文本信息需要频繁地进行缩放、漫游和点击对话框操作,不便于用户浏览。 Due to the large number of geographic features on the map, and browse the text information needs to be scaled roaming box and click operation frequently not easy to navigate. 并且,用户也难以从这种可视化的形式中探索和发现出有用的信息。 And difficult for the user to explore and find out useful information from this in the form of visualization.

[0003] 以文本信息为主的可视化方法,大百科全书软件(如微软的Encarta、维基百科以及百度百科等),采用的是与地理信息系统截然不同的思路,以文字为主体,文字所关联的地理空间位置则由偏安一隅的地图来表示,如图1所示,这种以文本信息为主的可视化方法侧重表达文本信息,空间信息的表达过于简略。 [0003] text-based information visualization methods, encyclopedia software (such as Microsoft Encarta, Wikipedia and Baidu Encyclopedia, etc.), using the geographic information system and distinct ideas, as the main text, the associated text geospatial position by sacrifice the long corner of the map represented as shown in FIG. 1, this visualization method focuses mainly text information expressing text information expressing spatial information too brief.

[0004] 基于标签云的可视化方法,作为非空间文本信息表达的有效方法,标签云(TagCloud或Word Cloud)最早是以"潜意识文档(subconscious file) " 一词出现在Douglas Coupland的《Microserfs》一书中,此后经Flicker网站首次应用之后便得到广泛使用,如图2所示。 [0004] visualization method based on the tag cloud, as an effective method for the expression of non-spatial text information, tag cloud (TagCloud or Word Cloud) is the earliest, "the document subconscious (subconscious file)" The word appeared in Douglas Coupland's "Microserfs" a The book, then by Flicker site after the first application will be widely used, as shown in Fig. Stanley Milgram最早将标签云应用到地理信息可视化研究中。 Stanley Milgram was first applied to the tag cloud of geographic information visualization research. Alexandar JafTe等通过标签云的方法,将具有地理标签的海量照片信息与地图关联,并进行可视化。 Massive Photos Information Map association Alexandar JafTe by methods such as tag clouds, and would have geotagged, and visualization. 此后,也有学者在Alexandar JafTe的思想基础上利用mash-up工具将标签和标签云叠加在地图上。 Since then, some scholars use mash-up tool on the ideological basis Alexandar JafTe on the label and tag clouds superimposed on the map. Michael Stryker等以新闻和科技文献为研究对象,通过标签云的方法进行地理可视化从而能及时感知公众健康情况。 Michael Stryker, etc. to news and scientific literature for the study, geographic visualization by means of a tag cloud so that it can promptly perceive public health. 但是以上几种研究都是简单地将标签云叠加在地图上,或者以单独的窗口形式将标签云与地图关联起来。 Several more research but are simply tag cloud is superimposed on the map, or in a separate window in the form of a tag cloud will associate with the map. 这种方法最突出的问题是标签云会和地图上原有的注记产生冲突,同时普通地图上包含太多用户并不关注的详细信息,容易分散用户对兴趣点和兴趣区域的注意力。 This method is the most prominent problem is the tag cloud and will map the original annotation conflict, while the user is not concerned too much detailed information contained on ordinary maps, easily distracting to the point of interest and the region of interest. Dinh-Quyen Nguyen省略了用户不关心的细节,设计了一种名为Taggram的地图,如图3所示,它仅保留了国家行政区划的面状要素, 然后将Flicker等网站上的标签按照流行程度以不同字体、大小放置在相应的国家行政区划中。 Dinh-Quyen Nguyen omitted details of users do not care about the design of the map called Taggram, shown in Figure 3, it only retains the planar elements of the administrative divisions of the state, then the label on sites like Flicker according to the popular degree in a different font, size, placed in the corresponding administrative divisions in the country. 但是Taggram明显存在两点不足:(1)它仅适用于面状要素,对于点状要素则无能为力;(2)由于Taggram保留了行政区划形状的真实性,标签的位置容易让读图者产生误解。 But obviously Taggram two disadvantages: (1) it only applies to planar elements, for point features are powerless; (2) due to the Taggram retains the authenticity of the administrative divisions of the shape, location tag-reader's easy for misunderstanding .

[0005] 因此目前使用以地理信息为主的可视化方法时操作繁琐且难以发现有效信息;以文本信息为主的可视化方法过于测量表达文本信息,空间信息的表达过于简略;以及使用基于标签云的Taggram地图仅适用于面状要素,且标签的位置容易让读者产生误解 [0005] Thus currently using visualization methods to GIS-based operation is complicated and difficult to find useful information; Visualization text information based too measuring the expression text information expressing spatial information too brief; and based tag cloud location Taggram map applies only to planar elements, and the label's easy for readers to misunderstand

发明内容 SUMMARY

[0006] 本发明的目的是提供一种基于标签云的位置关联文本信息可视化方法,以解决目前可视化方法所出现上述的问题。 [0006] The object of the present invention is to provide a position information associated text tag cloud visualization based to solve the above problems occurring current visualization method.

[0007] 本发明为解决上述技术问题而提供一种基于标签云的位置关联文本信息可视化方法,该可视化方法包括以下步骤: [0007] The present invention is to solve the above problems and to provide a location-based tag cloud visualization method associated text information, the visualization method comprising:

[0008] 1)将普通地图中的各个地理位置划分成离散的点; [0008] 1) The general location of each map is divided into discrete points;

[0009] 2)根据点要素和面要素生成算法对划分的各离散点进行调整,使其不相互压盖, 并能保持相对位置的准确性; [0009] 2) adjust each discrete point division according to the feature points and polygon generation algorithm, so as not to mutually gland, and to maintain the accuracy of relative position;

[0010] 3)对地理位置关联的文本信息进行词法分析和过滤以提取关键词和相应的词频, 按照各地理位置对应的词频为与各地理位置对应的离散点设置权重; [0010] 3) location of the text information associated lexical analysis and filtering to extract keywords and respective word frequency, word frequency in accordance with each location corresponding to each discrete points disposed right location corresponding to a weight;

[0011] 4)按照权重的不同对各离散单元按照标签云的显示规则进行显示。 [0011] 4) Each discrete units are displayed by the display tag cloud in accordance with the rules of the weight difference.

[0012] 所述的步骤2)是采用Cartogram生成算法实现,该算法是根据某种属性值将各离散单位圆按照水平和垂直方向重新布局,保持位置的相对正确。 [0012] The step 2) using Cartogram generation algorithm, this algorithm is based on an attribute value of each discrete unit circle relocation in accordance with the horizontal and vertical directions, relative to the holding position correctly.

[0013] 所述步骤2)中Cartogram算法的实现过程如下: [0013] step 2) Cartogram algorithm implementation process is as follows:

[0014] a)将得到的所有离散点都分布于规则的网格交叉点上; All discrete points [0014] a) obtained are distributed in a regular grid intersections;

[0015] b)按照设定的方向对相邻两个离散点之间的距离进行简化,保留两点X轴和Y轴方向上距离较大的,并且将较大的距离调整为标准单位1,较小的简化为0。 [0015] b) the distance between two adjacent discrete points in the direction of simplifying the set, reserved two points X and Y-axis directions from the larger, and the larger the distance is adjusted to a standard unit smaller reduced to zero.

[0016] 所述步骤4)中的标签云显示规则包括面向不同尺度的显示规则和面向不同时间的显示规则。 Tag cloud [0016] step 4) display rules include display rules for different scales and display rules for different times.

[0017] 所述面向不同尺度的显示规则是用离散的若干模型表达不同尺度上的相同对象。 [0017] The rules for different display scales different expression of the same object on the scale model with several discrete.

[0018] 所述面向不同时间的显示规则包括两种,第一种是类似于"sprakclouds的思想, 该方式是用户移动鼠标至某一个关键词时,该更关键词就会浮动出来且放大显示;第二种方式是使用"瀑布"的隐喻,随时间变化的文本以瀑布飞流之下的形式分布,用户点击图上的任何一个模型,就会显示出一个"瀑布"式的标签云。 [0018] The display rules for different time comprises two, the first one is similar to the "sprakclouds thinking of the way the user moves to a particular keyword, the keyword will float out more and magnified ; second way is to use the metaphor of "waterfall", the change over time in the form of text below the waterfalls distribution, any model a user clicks on the map, it will show a tag cloud "waterfall" type.

[0019] 本发明的有益效果是:本发明从普通地图和地理位置关联的文本信息获取数据, 根据点要素和面要素生成算法得到得到统计地图,针对大量非结构化文本信息,进行词法分析和过滤以提取关键词和相应的词频,本发明滤除了普通地图上不相关的细节信息,只保留了主要的信息,并根据不同的尺度信息的详略程度不同,不仅适用于点状要素,也适用于面状要素,标签云没有使用行政区域的轮廓,避免了由于标签位置而产生的误解,便于用户浏览与地理位置关联的文本信息,减少了一些不必要的操作,并能够帮助用户在在大量的位置关联文本信息中把握信息的总体特征和趋势。 [0019] Advantageous effects of the present invention are: the present invention is to obtain data maps and text information common from the associated location, to thereby give the map based on the dot count and area features generation algorithm for a large amount of unstructured text information, and lexical analysis filter to extract keywords and respective word frequency, the present invention is not filtered out details related to the ordinary maps, leaving only the main information, and depending on the level of detail of information of different scales, not only for point features also text information applies to the planar element, without the use of tag clouds contour administrative areas, to avoid misunderstandings arising due to the position of the label, easy to navigate and associated geographic location, reducing the number of unnecessary operations and can help the user in grasp the general characteristics and trend information in a large number of position information associated text.

附图说明 BRIEF DESCRIPTION

[0020] 图1是目前以文本信息为主的可视化示意图; [0020] FIG. 1 is a text-based information visualization schematic;

[0021] 图2是现有标签云的应用示例示意图; [0021] FIG 2 is an application example of a schematic view of the conventional tag cloud;

[0022] 图3是现有Taggram地图的应用实例示意图; [0022] FIG. 3 is a schematic view of the prior application example Taggram map;

[0023] 图4是本发明的基于标签云的位置关联文本信息可视化方法的流程图; [0023] FIG. 4 is a flowchart illustrating a position information associated text tag cloud visualization method of the present invention;

[0024] 图5是以中国及其周边19国为例的标签云地图实现流程; [0024] FIG. 5 is China and its surrounding countries, for example 19 tag cloud map implementation process;

[0025] 图6是cartogram算法中两点之间距离的简化示意图; [0025] FIG. 6 is a simplified schematic diagram of a distance between two points cartogram algorithm;

[0026] 图7_a是压缩过程不意图; [0026] FIG 7_a compression process is not intended;

[0027] 图7_b是原始位置与转换位置的示意图; [0027] FIG 7_b is a schematic view of the original position and a transfer position;

[0028] 图8_a是所有点的原始位置示意图; [0028] FIG 8_a is a schematic view of the original position of all points;

[0029] 图8-b是所有点经cartogram算法调整后的位置示意图; [0029] FIG. 8-b is a schematic view of all the points by cartogram position after adjustment algorithm;

[0030] 图9-a是原始地图; [0030] FIG. 9-a is the original map;

[0031] 图9-b是根据原是地图中心位置将所有面要素转换成点要素后的示意图; [0031] FIG. 9-b is a center position of the map originally convert all face feature point feature to the schematic;

[0032] 图9-c是对点要素实施cartogram算法调整后的示意图; [0032] FIG. 9-c is a schematic view of the adjustment algorithm cartogram embodiment of the feature point;

[0033] 图9_d是将位置具有相邻关系的点用直线连接后的不意图; [0033] FIG 9_d point is the position having no intention of neighbor relations with the line connecting;

[0034] 图10是本发明实施例中所采用的用于获取微博信息的工具截图; [0034] FIG 10 is a screenshot tool employed in the embodiment for obtaining micro-blog information of the present invention;

[0035] 图11是本发明实施例中所得到文本的关键词和词频统计结果图; [0035] FIG. 11 is a word frequency statistics and keywords FIG text obtained in Example of the present invention;

[0036] 图12是本发明实施例中面向不同尺度显示规则的标签云显示示意图; [0036] FIG. 12 is an embodiment of the present invention show rules for different scales tag cloud shows a schematic embodiment;

[0037] 图13是本发明实施例中随时间变化的"sparkclouds"式标签云显示示意图; [0037] FIG. 13 is "sparkclouds" tag cloud embodiments of formula time-varying diagram showing embodiments of the present invention;

[0038] 图14是本发明实施例中随时间变化的"瀑布"式标签云显示示意图。 [0038] FIG. 14 is an embodiment of the present invention over time "waterfall" tag cloud displayed FIG.

具体实施方式 Detailed ways

[0039] 下面结合附图对本发明的具体实施方式作进一步的说明。 [0039] The following drawings of specific embodiments of the present invention will be further described in conjunction.

[0040] 本发明的一种基于标签云的位置关联文本信息可视化方法是将与空间位置关联的文本信息以标签云的形式和地图结合起来的可视化方法,如图4所示,该可视化方法具体实现过程如下: [0040] The visualization method of the present invention, the position of the tag cloud associated text information visualization based is the text information associated with a spatial location in a tag cloud form and maps combine shown in Figure 4, the visualization method particularly implementation process is as follows:

[0041] 1.从普通地图和与地理位置关联的文本信息获取数据。 [0041] 1. Obtain data maps and text information associated with the general location from.

[0042] 2•利用点要素cartogram生成算法、面要素cartogram生成算法得到统计地图。 [0042] 2 • use point feature cartogram generation algorithm, polygon features cartogram generation algorithm statistical map. Cartogram算法是一种根据某种属性值将对象形状进行夸大或缩小的地图,它保持位置的相对正确,基于属性进行夸张变形,直观地传递某种特定信息。 Cartogram algorithm is an object shapes exaggerated or reduced map according to an attribute value, it maintains the correct relative position, based on the attributes exaggerated deformation, intuitively transfer certain information. 用户在微薄网站上发布的部分消息会同时包含其位置信息,如城市和城市街区。 User posted on the website meager portion of the message will also contain their location information, such as cities and urban neighborhoods. 在小比例尺地图上,可将城市看作点状要素,城市街区在大比例尺地图上看作面状要素。 On the small scale map, the city can be seen as point features, urban neighborhoods seen as planar elements on a large scale maps.

[0043] 本发明中cartogram算法关注的是点状和面状要素。 [0043] The present invention is concerned cartogram algorithm and the planar feature point. 针对点状要素,该算法的首要规则是所有点都分布于规则的网格交叉点上,这样便于浏览,实现有序的可视化布局,同时这也是与认知地图学的结论相吻合的,人们倾向于在水平方向或者垂直方向上来记忆位置之间的关系,对原本分布密度不规则的两个相邻点之间的距离进行简化,根据两点之间夹角e的大小,只保留两点X轴和Y轴方向上距离较大的,并且将较大的距离调整为标准单位1,较小的简化为〇,如图6所示,也即是如果两个点之间水平方向的距离比较大,这样可以认为两个点在同一水平线上,垂直方向的距离简化为〇。 For point features, the first rule of this algorithm is distributed to all points on a grid intersection rules, it is easy to browse, orderly visual layout, and this was also the cognitive cartography conclusions coincide, people tends up the relationship between the memory position in the horizontal direction or the vertical direction, the distance between two adjacent dots had an irregular density distribution is simplified, in accordance with the size of the angle e between the two points, leaving only two from the X-axis and Y-axis direction is larger, and the larger the distance is adjusted to a standard unit, a small square is shown simplified in FIG. 6, that is, if the horizontal distance between two points is relatively large, so that two points may be at the same level, the vertical distance simplified billion. 下面给出该算法实现的具体过程: A description is given of the procedure of the algorithm:

[0044] 首先进行横向压缩,然后进行纵向压缩,如图7-a所示,方向是从左至右,上至下, 假设I 1是一个n个X坐标相同的位置点的集合,I2是与I1在X轴上相邻的一个集合,0| igh 和0|W定义为I1上的点Vk和I2上的分别位于点V k的上面和下面且距离Vk最近的两个点仏^和仏的夹角,如图7-b所示。 [0044] First lateral compression, longitudinal compression and then, as shown in FIG. 7-a, the direction is from left to right, top to bottom, assuming that I 1 is a set of n identical X coordinate position of a point, I2 is I1 X axis and a set of adjacent, 0 | igh and 0 | W is defined as a point located above and below V k and Vk from the nearest point on the two points Fo Vk ^ I1 and I2 and the Fo angle, as shown in FIG. 7-b. 只有当所有夹角都不小于阈值角度©(设为45° )时,I1 和I2两个数据集合便可以压缩成相同的X值。 Only when all the angle is not less than the threshold angle © (to 45 °), I1 and I2 two sets of data can be compressed into the same X value. 在Y轴上重复该过程完成纵向压缩,这样就将所有的位置点置于规则的单元网格的交叉点上,如图8-a和8-b所示。 At the intersection of the Y-axis is repeated the process is complete longitudinal compression so that the position of all the points will be placed into a regular grid of cells, as 8-a and 8-b in FIG.

[0045] min 气8.:1k = 1,2,…n}之© [0045] min gas 8.:1k = 1,2, ... n} of ©

[0046] 对于面要素,首先根据其中心点位置将所有面要素转换成点要素,然后,实施点要素的cartogram生成算法,最后,将位置具有相邻关系的点用直线连接起来,如图9-a至图9-d所示。 [0046] For surface elements, first of all according to its center point position of the face feature point is converted into feature, then, feature points embodiment cartogram generation algorithm, and finally, the position relationship with adjacent points connected by a straight line, as shown in FIG 9 -a to Figure 9-d.

[0047] 3.获取发布信息,进行词法分析和过滤以提取关键词和相应的词频。 [0047] 3. Obtain the information published, lexical analysis and filtering to extract keywords and respective word frequency. 很多流行的微博网站,比如新浪和腾讯,都会提供API接口。 Many popular microblogging site, such as Sina and Tencent will provide API interface. 根据这些API接口,获取用户发布的信息, 如图10所示,获取每条发布信息的时间、地点、用户名、粉丝数量、转发数量、评论数量以及全文内同等信息。 According to these API interface for information posted by users, shown in Figure 10, each time to obtain the release of information, location, user name, and the number of fans, forwarding number, number of comments, and the same information in the text.

[0048] 将获取的数据结构化存储于数据库中,通过构造不同的SQL语句获得其中的任何一个子集,例如,提取从2013-03-14到2013-03-17在武汉市发布的信息,SQL语句如下: [0048] The acquisition of structured data stored in the database, access to any subset of them by a different SQL statement is constructed, for example, to extract information from 2013-03-14 to 2013-03-17 released in Wuhan City, SQL statement is as follows:

[0049] select wb_content from weibo_tab where wb_time between' 2013-03-14' and^ 2013-〇3_17,andwb_address like 武汉 [0049] select wb_content from weibo_tab where wb_time between '2013-03-14' and ^ 2013-〇3_17, andwb_address like Wuhan

[0050] 对于大数据量的文本数据,可通过现有的工具如ICTCLAS进行分词和过滤,从而获得文本的关键词和词频统计,如图11所示。 [0050] The text data for the large amount of data, as can be ICTCLAS segmentation and filtration, to thereby obtain keywords and text Frequency statistics, 11 by conventional means.

[0051] 4.关于标签云的生成已经有很多成熟的算法和工具(如Wordle和Tagxedo等), 因此cartogram和标签云结合的关键在于显示规则的设计,本实施例以两种显示规则为例,一种是面向不同尺度,另一种是面向不同时间。 [0051] 4. generated on the tag cloud has a lot of sophisticated algorithms and tools (e.g. Wordle and Tagxedo etc.), and thus the key tag cloud cartogram design shows that the combination of the rule, in the present embodiment two display rules Example a is for different scales, and the other is for a different time.

[0052] 1)面向不同尺度的显示规则 [0052] 1) show rules for different scales

[0053] 该规则是用离散的几个模型表达不同尺度上的相同对象,从国家级别到地区级另IJ。 [0053] The rule is the expression of different objects on the same scale with a few discrete model, another IJ from the national level to the regional level. 本实施例给出了4中不同的离散模型,如图12-a到图12-d,用户逐渐放大地图,比例尺越来越大,标签云显示的内容会愈加详细。 This embodiment provides a different discrete model 4, as shown in FIG. 12-a to 12-d, the user gradually enlarge the map scale growing, the content tag cloud displayed will be more detailed. 首先显示出来的是所有城市,每一城市用模型用a表示,接着至放大模型b,最后是模型c,当用户继续放大至市级级别,便会显示出城市的不同地区,用模型d表示,在模型d中,相邻的地区用直线连接起来。 First of all cities are shown, each represented by a model of a city, then to the amplifier model B, C and finally the model, the user continues to enlarge the municipal level, will exhibit different areas of the city, the model represented by d , d in the model, adjacent regions connected by a straight line. 如果用户继续放大, 每一个地区又会重复该过程。 If the user continues to enlarge, each region will repeat the process.

[0054] 不同地区的信息量是有差异的,为了表示出这种差异,首先采用归一化的方法计算出每个地区所对应的标准信息量,然后使用不同的颜色来表示。 [0054] The amount of information in different regions is different, this difference is shown in order, using the first normalized standard methods to calculate the amount of information corresponding to each region, and then use different colors to represent. 具体的计算过程如下,M 表示地区的信息量。 This calculation procedure is as follows, M is the amount of information areas.

[0055] 对于每一个模型,关键字则使用模型填充颜色的相近色系来表示,如图12-d所示,表1中给出每一个模型的实验参数。 [0055] For each model, the use of similar color keyword fill color model to represent, as shown in FIG. 12-d, experimental parameters are given in Table 1 for each model.

[0056] 表1 [0056] TABLE 1

[0057] 最大词标签大小模型最大级别最小级别频数(px) 模型U) 1: 19百万1: 4.75百万3 60 模型(b) 1: 4.75 百万1: 2.38 百万12 120 模型(c) 1: 2.38 百万1: 0.59 百万24 180 h 0.59 百万1: 0.29 百万3 60 模型(d) 1: 0.29 百万I: (U0 百万12 120 1: 0.10 百万24 180 [0057] The maximum size of the tag word model number of the maximum level of the minimum level (px) Frequency Model U) 1: 19 one million 1: 3 60 4.75 one million model (B) 1: one million 1 4.75: 2.38 12 120 model one million (c ) 1: one million 2.38 1: 0.59 0.59 one million one million 24180 1 H: 0.29 one million model 360 (D) 1: 0.29 one million I: (U0 one million 121 201: 24 180 0.10 one million

[0058] 面向不同时间的显示规则 [0058] show rules for different times

[0059] 本实施例中给出两种时间标签云的显示方法,第一种类似于"sparkclouds"的思想,用户移动鼠标至某一个关键词上时,它就会浮动出来的并且放大显示。 Time display method gives two kinds of tag clouds [0059] In the present embodiment, similar to the first thought "sparkclouds", the user moves a mouse onto a keyword, and it will float out of the enlarged display. 文字下面的波线图表示的是在一段时间内该关键词出现的频率,如图13所示。 The following text is a wave diagram showing a frequency within a period of time of the occurrence of the keyword, as shown in Fig. 第二种方法是使用"瀑布"的隐喻,随时间变化的文本以瀑布飞流直下的形式分布,如图14所示,用户点击图上任何一个模型,右栏就会显示出一个"瀑布"式的标签云。 The second method is to use metaphors "waterfall", a change over time in text form waterfall distribution, as shown in FIG. 14, any user clicks on a model, right-hand column will show a "waterfall" type of tag clouds.

[0060] 以中国及其周边19个国家为例,具体说明该流程完整的实现过程。 [0060] 19 to China and its surrounding countries, for example, specify the full implementation of the process. 图5中(1)是普通的行政区划图,以各行政区划的中心点生成离散单位圆(图5(2)),将这些离散单位圆按照水平和垂直方向重新布局,使其不相互压盖(图5 (3)),并且保持相对位置的一定准确性。 FIG. (1) is a common 5 administrative division map to a center point of each administrative division to generate discrete unit circle (FIG. 5 (2)), the discrete unit circle in accordance with the horizontal and vertical directions relocation, pressed to each other so as not to the cover (FIG. 5 (3)), and to maintain a certain accuracy of the relative position. 按照权重的不同为离散单位圆设置大小不同的直径,图5(4)中是依据百度百科对各国描述的文字数量所计算的权重。 According to the different weights of different size arranged as discrete unit circle diameter, in FIG. 5 (4) is based on the weight of the number of text words Baidu States described the calculated weight. 不同单位圆之间建立连接关系,图5(5)中是将陆上边界相邻的国家之间以直线相连。 Establishing connections between different units circle, FIG. 5 (5) is connected to a straight line between the land border neighboring countries. 图5(6-9)是依不同比例尺对各单位圆所进行的显示控制,当标签云地图随比例尺放大时,首先显示出国名(图5 (6)),依次显示出国名和50个标签(图5(7)),国名和100个标签(图5(8)),国名和200个标签(图5(9))。 5 (6-9) is a graph showing the control according to the different dimensions of each unit circle performed, when the map with scale enlarged tag cloud displayed first name abroad (FIG. 5 (6)), and sequentially displayed abroad name tags 50 ( 5 (7)), and the country name label 100 (FIG. 5 (8)), and the country name label 200 (FIG. 5 (9)).

[0061] 本发明是一种将与空间位置关联的文本信息以标签云的形式和地图结合起来的可视化,该方法滤除了普通地图上不相关的细节信息,只保留了主要的信息,并根据不同的尺度信息的详略程度不同;本发明既使用于点状要素,又能适用于面状要素,标签云没有使用行政区域的轮廓,避免了由于标签云位置而产生的误解,便于用户浏览与地理位置关联的文本信息,减少了一些不必要的操作,并能够帮助用户在大量的位置关联文本信息中把握信息的总体特征和趋势。 [0061] The present invention is a text information associated with a spatial location in a tag cloud form and maps combined visualization, which filters out details unrelated to the ordinary maps, leaving only the main information, and in accordance with different level of detail of information of different scales; the present invention even for point features, but also applicable to a planar element, without the use of tag clouds contour administrative areas, to avoid misunderstandings due to the position of the tag cloud is generated, user-friendly browsing text information associated with a geographic location, reducing the number of unnecessary operations, and can help users to grasp the overall features and trend information in a large number of position-related text messages.

Claims (6)

1. 一种基于标签云的位置关联文本信息可视化方法,其特征在于,该可视化方法包括以下步骤: 1) 将普通地图中的各个地理位置划分成离散的点; 2) 根据点要素和面要素生成算法对划分的各离散点进行调整,使其不相互压盖,并能保持相对位置的准确性; 3)对地理位置关联的文本信息进行词法分析和过滤以提取关键词和相应的词频,按照各地理位置对应的词频为与各地理位置对应的离散点设置权重; 4)按照权重的不同对各离散单元按照标签云的显示规则进行显示。 1. A position information associated text tag cloud visualization based, characterized in that the visualization method comprising the steps of: 1) the normal map into a respective discrete point location; 2) The point and area features generation algorithm for each discrete point division is adjusted so as not to mutually gland, and to maintain the accuracy of relative position; 3) text information on the geographical locations lexical analysis and filtering to extract keywords and respective word frequency, according to word frequency of each location corresponding to each discrete location corresponding to the weight set point; 4) in accordance with the respective weights of different discrete units are displayed by the display rule tag cloud.
2.根据权利要求1所述的基于标签云的位置关联文本信息可视化方法,其特征在于, 所述的步骤2)是采用Cartogram生成算法实现,该算法是根据某种属性值将各离散单位圆按照水平和垂直方向重新布局,保持位置的相对正确。 The associated text based on the location information tag cloud visualization method according to claim 1, wherein said step 2) uses Cartogram generation algorithm, this algorithm is based on an attribute value of each discrete unit circle re-layout according to the horizontal and vertical direction, maintaining the correct relative positions.
3.根据权利要求2所述的基于标签云的位置关联文本信息可视化方法,其特征在于, 所述步骤2)中Cartogram算法的实现过程如下: a) 将得到的所有离散点都分布于规则的网格交叉点上; b) 按照设定的方向对相邻两个离散点之间的距离进行简化,保留两点X轴和Y轴方向上距离较大的,并且将较大的距离调整为标准单位1,较小的简化为0。 The location-based information associated text tag cloud visualization method according to claim 2, wherein said implementation step 2) Cartogram algorithm is as follows: All discrete points a) obtained are distributed in a regular grid intersections; b) the distance between two adjacent discrete points in the direction of simplifying the set, remain on the two X-axis and Y-axis a greater distance, and is adjusted to a larger distance standard unit 1, 0 is smaller simplified.
4.根据权利要求2所述的基于标签云的位置关联文本信息可视化方法,其特征在于, 所述步骤4)中的标签云显示规则包括面向不同尺度的显示规则和面向不同时间的显示规则。 4. The location information associated text tag cloud visualization based, wherein according to claim 2, the tag cloud step 4) display rules include display rules for different scales and display rules for different times.
5.根据权利要求4所述的基于标签云的位置关联文本信息可视化方法,其特征在于, 所述面向不同尺度的显示规则是用离散的若干模型表达不同尺度上的相同对象。 According to claim position information associated text tag cloud visualization method based on claim 4 wherein, said display rules for different scales of the same object is expressed on a different scale in several discrete model.
6.根据权利要求5所述的基于标签云的位置关联文本信息可视化方法,其特征在于, 所述面向不同时间的显示规则包括两种,第一种是类似于"sprakclouds的思想,该方式是用户移动鼠标至某一个关键词时,该更关键词就会浮动出来且放大显示;第二种方式是使用"瀑布"的隐喻,随时间变化的文本以瀑布飞流之下的形式分布,用户点击图上的任何一个模型,就会显示出一个"瀑布"式的标签云。 The location-based information associated text tag cloud visualization method according to claim 5, wherein said display rules for different time comprises two, the first one is similar to the "sprakclouds thinking, this embodiment is when a user moves the mouse to a certain keyword, the keyword will float out more and enlarged display; second way is to use a "waterfall" metaphor, versus time profile in the form of text under waterfalls, the user click on any model on the map, it will show a tag cloud "waterfall" type.
CN201410466976.8A 2014-09-12 2014-09-12 Position associated text information visualization method based on label cloud CN104376038A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410466976.8A CN104376038A (en) 2014-09-12 2014-09-12 Position associated text information visualization method based on label cloud

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410466976.8A CN104376038A (en) 2014-09-12 2014-09-12 Position associated text information visualization method based on label cloud

Publications (1)

Publication Number Publication Date
CN104376038A true CN104376038A (en) 2015-02-25

Family

ID=52554945

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410466976.8A CN104376038A (en) 2014-09-12 2014-09-12 Position associated text information visualization method based on label cloud

Country Status (1)

Country Link
CN (1) CN104376038A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107393410A (en) * 2017-06-29 2017-11-24 网易(杭州)网络有限公司 Method and apparatus of displaying data on map, medium, and computing device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6542813B1 (en) * 1999-03-23 2003-04-01 Sony International (Europe) Gmbh System and method for automatic managing geolocation information and associated references for geographic information systems
CN101308498A (en) * 2008-07-03 2008-11-19 上海交通大学 Text collection visualized system
US20090327883A1 (en) * 2008-06-27 2009-12-31 Microsoft Corporation Dynamically adapting visualizations

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6542813B1 (en) * 1999-03-23 2003-04-01 Sony International (Europe) Gmbh System and method for automatic managing geolocation information and associated references for geographic information systems
US20090327883A1 (en) * 2008-06-27 2009-12-31 Microsoft Corporation Dynamically adapting visualizations
CN101308498A (en) * 2008-07-03 2008-11-19 上海交通大学 Text collection visualized system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HYUNGEUN J,ETC: "Placegram: A Diagrammatic Map for Personal Geotagged Data Browsing", 《IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHIC》 *
LEE B,ETC: "SparkClouds: Visualizing Trends in Tag Clouds", 《IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHIC》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107393410A (en) * 2017-06-29 2017-11-24 网易(杭州)网络有限公司 Method and apparatus of displaying data on map, medium, and computing device

Similar Documents

Publication Publication Date Title
Von Landesberger et al. Visual analysis of large graphs: state‐of‐the‐art and future research challenges
EP2612263B1 (en) Sketch-based image search
Roche Geographic Information Science I: Why does a smart city need to be spatially enabled?
CN104063466B (en) Virtual - reality three-dimensional display method and system integration
CN101639847B (en) Electronic map query method, electronic map query system and navigator
Gibin et al. An exploratory cartographic visualisation of London through the Google Maps API
CN102609507A (en) Data visualization system based on Web
CN103049580B (en) A method and apparatus for visualization of hierarchical data
CN102629271B (en) Complex data visualization method and equipment based on stacked tree graph
Mark Ware et al. Automated production of schematic maps for mobile applications
Hussain et al. Scalable visualization of semantic nets using power-law graphs
Schintler et al. Big data for policy analysis: The good, the bad, and the ugly
Panse et al. Visualization of geo-spatial point sets via global shape transformation and local pixel placement
Lohmann et al. Visual analysis of microblog content using time-varying co-occurrence highlighting in tag clouds
CN103383688B (en) Memory database for geocoding / geoprocessing
CN102332056B (en) Information visualization technology-based house property data visualization system
CN101887413B (en) Structure processing method and system of plate type table
KR20140123019A (en) Visual representation of map navigation history
US7091970B2 (en) Mapping display space
Burch et al. Prefix tag clouds
CN102368259A (en) Electronic map data storage and query method, device and system
CN101075249A (en) Data warehouse system and its construction for geographical information system
CN103270509A (en) Methods, apparatuses and computer program products for converting a geographical database into a map tile database
CN103514243B (en) Temporal data management system and data management method spatiotemporal
CN103208225A (en) Tile map manufacturing method and system

Legal Events

Date Code Title Description
C06 Publication
C10 Entry into substantive examination
RJ01