CN106991525A - The air quality and resident trip visual analysis method and system driven based on big data - Google Patents
The air quality and resident trip visual analysis method and system driven based on big data Download PDFInfo
- Publication number
- CN106991525A CN106991525A CN201710173669.4A CN201710173669A CN106991525A CN 106991525 A CN106991525 A CN 106991525A CN 201710173669 A CN201710173669 A CN 201710173669A CN 106991525 A CN106991525 A CN 106991525A
- Authority
- CN
- China
- Prior art keywords
- poi
- data
- liveness
- air quality
- cum rights
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 34
- 230000000007 visual effect Effects 0.000 title claims abstract description 20
- 230000003442 weekly effect Effects 0.000 claims description 12
- 238000013507 mapping Methods 0.000 claims description 9
- 238000000034 method Methods 0.000 claims description 9
- 230000001186 cumulative effect Effects 0.000 claims description 7
- 238000010586 diagram Methods 0.000 claims description 7
- 239000003086 colorant Substances 0.000 claims description 6
- 239000011159 matrix material Substances 0.000 claims description 6
- 238000009825 accumulation Methods 0.000 claims 8
- 230000002146 bilateral effect Effects 0.000 claims 2
- 238000005201 scrubbing Methods 0.000 claims 2
- 230000004456 color vision Effects 0.000 claims 1
- 239000004744 fabric Substances 0.000 claims 1
- 238000003064 k means clustering Methods 0.000 claims 1
- 239000011800 void material Substances 0.000 claims 1
- 238000012800 visualization Methods 0.000 abstract description 7
- 230000003993 interaction Effects 0.000 abstract 1
- 230000000694 effects Effects 0.000 description 56
- 238000004364 calculation method Methods 0.000 description 11
- 238000004140 cleaning Methods 0.000 description 4
- 238000011160 research Methods 0.000 description 4
- 238000007405 data analysis Methods 0.000 description 3
- 238000007418 data mining Methods 0.000 description 3
- 230000007613 environmental effect Effects 0.000 description 3
- 238000012544 monitoring process Methods 0.000 description 3
- 230000006399 behavior Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000002452 interceptive effect Effects 0.000 description 2
- 238000001556 precipitation Methods 0.000 description 2
- CBENFWSGALASAD-UHFFFAOYSA-N Ozone Chemical compound [O-][O+]=O CBENFWSGALASAD-UHFFFAOYSA-N 0.000 description 1
- UCKMPCXJQFINFW-UHFFFAOYSA-N Sulphide Chemical compound [S-2] UCKMPCXJQFINFW-UHFFFAOYSA-N 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013079 data visualisation Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 150000004767 nitrides Chemical class 0.000 description 1
- 239000002245 particle Substances 0.000 description 1
- 239000013618 particulate matter Substances 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/26—Government or public services
Landscapes
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Engineering & Computer Science (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- Tourism & Hospitality (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Entrepreneurship & Innovation (AREA)
- Marketing (AREA)
- Development Economics (AREA)
- General Business, Economics & Management (AREA)
- Educational Administration (AREA)
- Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- Operations Research (AREA)
- Game Theory and Decision Science (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Primary Health Care (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Instructional Devices (AREA)
Abstract
Description
技术领域technical field
本发明涉及基于大数据驱动的空气质量与居民出行可视分析方法与系统。The invention relates to a big data-driven visual analysis method and system for air quality and residents' travel.
背景技术Background technique
伴随着我国工业化进程的发展,以硫化物(SOx)、氮化物(NOx)、臭氧(O3)、碳化物(COx)、颗粒物(粒径小于等于10μm和2.5μm)为主的工业排泄物对空气质量造成的污染问题日益严重,对人们的日常出行和生活造成极大影响,据调查显示,当空气质量较差时,人们更愿意待在室内以减少非必须出行行为。With the development of my country's industrialization process, industrial excrement mainly composed of sulfide (SOx), nitride (NOx), ozone (O3), carbide (COx), and particulate matter (particle size less than or equal to 10 μm and 2.5 μm) has The pollution problem caused by air quality is becoming more and more serious, which has a great impact on people's daily travel and life. According to the survey, when the air quality is poor, people are more willing to stay indoors to reduce non-essential travel behaviors.
随着科技的发展,数据被大量采集和存储,数据量呈爆炸式增长,如何从这些数据中挖掘出有价值的信息成为急需解决的问题。在面对庞大并且复杂的数据时,传统的数据挖掘和数据分析方法在探索数据时显得力不从心。为了获取数据中蕴含的价值,各种数据分析与挖掘方法运用而生。With the development of science and technology, a large amount of data is collected and stored, and the amount of data is growing explosively. How to mine valuable information from these data has become an urgent problem to be solved. In the face of huge and complex data, traditional data mining and data analysis methods are unable to explore the data. In order to obtain the value contained in the data, various data analysis and mining methods are used.
因此我们需要一种行之有效的方法来解决这些问题。近些年来,作为以可视交互界面为基础的分析推理科学,可视分析为数据挖掘、数据分析提供了一种全新的手段,它以交互性、可视性等特点受到广大研究者的热烈欢迎,已渐渐成为研究热点。So we need an effective method to solve these problems. In recent years, as a science of analysis and reasoning based on a visual interactive interface, visual analysis provides a new means for data mining and data analysis. It has been enthusiastically received by researchers due to its interactivity and visibility. Welcome, has gradually become a research hotspot.
因此,针对空气质量与居民出行的可视化研究对于探究空气质量和居民出行之间的关系具有重要意义,它不仅可以为探索居民的出行行为提供重要参考,还可以引起交通、医疗等相关部门对空气质量的重视。因此探究空气质量和居民出行的可视化研究无论在理论上还是在实际应用中都具有非常重要的研究价值。Therefore, the visualization research on air quality and residents' travel is of great significance for exploring the relationship between air quality and residents' travel. The emphasis on quality. Therefore, the visualization research on air quality and residents' travel has very important research value both in theory and in practical application.
发明内容Contents of the invention
本发明针对空气质量与居民出行分析的问题,设计一种基于大数据驱动的空气质量与居民出行可视分析方法与系统,更好的帮助交通、医疗等部门对空气质量与居民出行进行分析,并且提供一套可视分析系统帮助用户分析空气质量特征,居民出行特征,展示空气质量条形图、温度箱线图、POI带权活跃度堆叠图和流图、POI带权活跃度偏移率日历热图和多维柱状图,对城市空气质量和居民出行进行探索。本发明的目的是通过以下技术方案来实现的:一种基于大数据驱动的空气质量与居民出行可视分析方法,该方法包括以下步骤:Aiming at the problem of air quality and resident travel analysis, the present invention designs a big data-driven visual analysis method and system for air quality and resident travel, which can better help transportation, medical and other departments to analyze air quality and resident travel, It also provides a set of visual analysis system to help users analyze air quality characteristics and residents' travel characteristics, and display air quality bar graphs, temperature box plots, POI weighted activity stack graphs and flow graphs, and POI weighted activity offset rates Calendar heatmap and multi-dimensional histogram to explore urban air quality and residents' travel. The object of the present invention is achieved through the following technical solutions: a big data-driven visual analysis method for air quality and resident travel, the method comprising the following steps:
(1)原始空气质量数据、温度数据、POI数据和打车难易度数据重构:首先分别对空气质量数据、温度数据、POI数据和打车难易度数据进行数据清理和排序,其中数据清理主要是对各种数据源中数据异常和缺失值的查找及剔除,然后按照时间戳将所有数据按照时间排序,这有利于后续的时序数据可视化。所述打车难易度数据包括打车难易度分布点的地理坐标和权值。所述POI数据包括POI分布点的地理坐标和POI类型。(1) Reconstruction of the original air quality data, temperature data, POI data, and taxi difficulty data: firstly, data cleaning and sorting are performed on the air quality data, temperature data, POI data, and taxi difficulty data, among which data cleaning mainly It is to find and eliminate data anomalies and missing values in various data sources, and then sort all data by time according to timestamp, which is conducive to the subsequent visualization of time series data. The taxi-hailing difficulty data includes geographical coordinates and weights of distribution points of taxi-hailing difficulty. The POI data includes geographic coordinates of POI distribution points and POI types.
(2)POI带权活跃度及偏移率计算:POI带权活跃度反映POI周围人流量的大小;偏移率反映POI带权活跃度的变化情况。(2) Calculation of POI weighted activity and offset rate: POI weighted activity reflects the size of the flow of people around POI; offset rate reflects the change of POI weighted activity.
POI带权活跃度的计算具体为:The calculation of POI weighted activity is as follows:
(2.1)计算打车难易度分布点和每个POI分布点之间的欧氏距离,判断欧式距离是否小于预先设置的阈值T,若满足条件则将打车难易度分布点的权值设为这个POI活跃度的权值。(2.1) Calculate the Euclidean distance between the distribution point of taxi difficulty and each POI distribution point, and judge whether the Euclidean distance is less than the preset threshold T. If the condition is met, set the weight of the taxi difficulty distribution point to The weight of the POI activity.
(2.2)根据POI类型不同分别统计各种类型POI活跃度的累加和,作为这种类型POI带权活跃度。(2.2) According to different POI types, the cumulative sum of the activeness of various types of POIs is counted separately, as the weighted activeness of this type of POI.
POI带权活跃度偏移率的计算具体为:The calculation of POI weighted activity offset rate is as follows:
Offsett=(POIWeightt-Averweek,hour)/(POIWeightt)-1Offset t =(POIWeight t -Aver week,hour )/(POIWeight t )-1
其中,Averweek,hour为每星期每小时POI带权活跃度均值,POIWeightt为当前小时POI带权活跃度,Offsett为偏移率。Among them, Aver week, hour is the average POI weighted activity per hour per week, POIWeight t is the POI weighted activity in the current hour, and Offset t is the offset rate.
3)相同类型POI聚类:计算每个打车难易度分布点周围欧氏距离小于等于T范围内所有的POI分布点,记为POIdidi。统计POIdidi中相同类型的POI分布点,计算聚类中心的位置,并设置打车难易度分布点的权值为聚类中心的权值。其中,基于k‐means的聚类算法对POI分布点进行聚类,将计算出来新的聚类中心经纬度坐标作为POI中心位置的经纬度坐标。3) Clustering of POIs of the same type: Calculate all POI distribution points within the range of Euclidean distance less than or equal to T around each taxi difficulty distribution point, denoted as POI didi . Count the POI distribution points of the same type in POI didi , calculate the position of the cluster center, and set the weight of the distribution point of taxi difficulty as the weight of the cluster center. Among them, the k-means-based clustering algorithm clusters the POI distribution points, and the calculated longitude and latitude coordinates of the new cluster center are used as the longitude and latitude coordinates of the POI center position.
4)空气质量与居民出行的可视分析,具体为:4) Visual analysis of air quality and residents' travel, specifically:
(4.1)颜色视觉编码:对颜色进行映射时,由于空气质量指数AQI的不同,采用动态映射方案,即根据空气质量指数值动态的调整:(4.1) Color visual coding: when mapping the color, due to the difference of the air quality index (AQI), a dynamic mapping scheme is adopted, that is, a dynamic adjustment is made according to the value of the air quality index:
其中Colorrect为矩形的填充色。Where Color rect is the filling color of the rectangle.
(4.2)条形‐箱线图分析组件:每天的空气质量指数用矩形展示,矩形从左向右的顺序表示每天日期的先后,矩形的填充色根据步骤4.1的方案确定,高度根据空气质量指数AQI确定。箱线图代表每周每时温度,箱线图从左向右表示每周日期的先后,箱线图上虚线,下虚线分别代表上四分之一数据范围和下四分之一数据范围,箱线图中央小矩形代表数据四分之一至四分之三分位数据范围,小矩形中央横线位置代表数据的中位数。(4.2) Bar-box diagram analysis component: the daily air quality index is displayed in a rectangle, and the order of the rectangles from left to right indicates the order of each day's date. The filling color of the rectangle is determined according to the scheme in step 4.1, and the height is determined according to the air quality index AQI OK. The box plot represents the weekly and hourly temperature. The box plot represents the order of the weekly dates from left to right. The upper dotted line and the lower dotted line of the box plot represent the upper quarter data range and the lower quarter data range respectively. The small rectangle in the center of the box plot represents the data range from one quarter to three quarters of the data, and the position of the horizontal line in the center of the small rectangle represents the median of the data.
(4.3)流图‐堆积图分析组件:堆积图和流图的横坐标是指定时间范围每小时坐标,以每星期为基本刻度。纵坐标是POI带权活跃度值。堆积图中用不同颜色的面积图代表不同类型的POI,堆积图沿坐标轴单侧排列,展示指定时间范围一种或多种POI带权活跃度的变化情况。流图沿坐标双侧排列,展示指定时间范围一种或多种POI带权活跃度的变化情况。(4.3) Flow chart-stacked chart analysis component: the abscissa of the stacked chart and the stream chart is the hourly coordinate of the specified time range, with each week as the basic scale. The vertical axis is the POI weighted activity value. The stacked chart uses area charts of different colors to represent different types of POIs. The stacked chart is arranged along one side of the coordinate axis to show the changes in the weighted activity of one or more POIs within a specified time range. The flow graph is arranged along the two sides of the coordinates, showing the changes in the weighted activity of one or more POIs within a specified time range.
(4.4)散点矩阵‐GeoMap‐日历热图分析组件:散点矩阵图是散点图高维方面的拓展,用来展示空气质量、温度和POI带权活跃度。日历热图将多维数据以二维的形式呈现出来,并用颜色深浅来表示数值的大小,通过日历热图展示相同POI在不同空气质量和温度情况下POI带权活跃度偏移率的变化情况。GeoMap用来展示相同类型POI聚类的活跃度权值和地理分布情况。(4.4) Scatter matrix-GeoMap-calendar heat map analysis component: The scatter matrix map is an extension of the high-dimensional aspect of the scatter map, which is used to display air quality, temperature and POI weighted activity. The calendar heat map presents multi-dimensional data in two-dimensional form, and uses the color depth to represent the size of the value. The calendar heat map shows the change of POI weighted activity offset rate of the same POI under different air quality and temperature conditions. GeoMap is used to display the activity weight and geographical distribution of POI clusters of the same type.
一种基于大数据驱动的空气质量与居民出行可视分析系统,该系统包括以下组件:A big data-driven visual analysis system for air quality and residents' travel, which includes the following components:
(1)条形‐箱线图分析组件:每天的空气质量指数用矩形展示,矩形从左向右的顺序表示每天日期的先后;矩形的高度根据空气质量指数AQI确定,填充色采用动态映射方案,即根据空气质量指数值动态的调整:(1) Bar-box plot analysis component: the daily air quality index is displayed in a rectangle, and the order of the rectangles from left to right indicates the order of each day's date; the height of the rectangle is determined according to the air quality index AQI, and the filling color adopts a dynamic mapping scheme , which is dynamically adjusted according to the air quality index value:
其中Colorrect为矩形的填充色。Where Color rect is the filling color of the rectangle.
箱线图代表每周每时温度,箱线图从左向右表示每周日期的先后,箱线图上虚线,下虚线分别代表上四分之一数据范围和下四分之一数据范围,箱线图中央小矩形代表数据四分之一至四分之三分位数据范围,小矩形中央横线位置代表数据的中位数。The box plot represents the weekly and hourly temperature. The box plot represents the order of the weekly dates from left to right. The upper dotted line and the lower dotted line of the box plot represent the upper quarter data range and the lower quarter data range respectively. The small rectangle in the center of the box plot represents the data range from one quarter to three quarters of the data, and the position of the horizontal line in the center of the small rectangle represents the median of the data.
(2)流图‐堆积图分析组件:堆积图和流图的横坐标是指定时间范围每小时坐标,以每星期为基本刻度。纵坐标是POI带权活跃度值。堆积图中用不同颜色的面积图代表不同类型的POI,堆积图沿坐标轴单侧排列,展示指定时间范围一种或多种POI带权活跃度的变化情况。流图沿坐标双侧排列,展示指定时间范围一种或多种POI带权活跃度的变化情况,POI带权活跃度的计算具体为:(2) Flow chart - Stacked chart analysis component: The abscissa of the stacked chart and stream chart is the hourly coordinate of the specified time range, with each week as the basic scale. The vertical axis is the POI weighted activity value. The stacked chart uses area charts of different colors to represent different types of POIs. The stacked chart is arranged along one side of the coordinate axis to show the changes in the weighted activity of one or more POIs within a specified time range. The flow graph is arranged along both sides of the coordinates, showing the change of one or more POI weighted activities within a specified time range. The calculation of POI weighted activities is as follows:
(2.1)计算打车难易度分布点和每个POI分布点之间的欧氏距离,判断欧式距离是否小于预先设置的阈值T,若满足条件则将打车难易度分布点的权值设为这个POI活跃度的权值。(2.1) Calculate the Euclidean distance between the distribution point of taxi difficulty and each POI distribution point, and judge whether the Euclidean distance is less than the preset threshold T. If the condition is met, set the weight of the taxi difficulty distribution point to The weight of the POI activity.
(2.2)根据POI类型不同分别统计各种类型POI活跃度的累加和,作为这种类型POI带权活跃度。(2.2) According to different POI types, the cumulative sum of the activeness of various types of POIs is counted separately, as the weighted activeness of this type of POI.
(3)散点矩阵‐GeoMap‐日历热图分析组件:散点矩阵图是散点图高维方面的拓展,用来展示空气质量、温度和POI带权活跃度。日历热图将多维数据以二维的形式呈现出来,并用颜色深浅来表示数值的大小,通过日历热图展示相同POI在不同空气质量和温度情况下POI带权活跃度偏移率的变化情况。GeoMap用来展示相同类型POI聚类的活跃度权值和地理分布情况。(3) Scatter matrix-GeoMap-calendar heat map analysis component: The scatter matrix map is an extension of the high-dimensional aspect of the scatter map, which is used to display air quality, temperature and POI weighted activity. The calendar heat map presents multi-dimensional data in two-dimensional form, and uses the color depth to represent the size of the value. The calendar heat map shows the change of POI weighted activity offset rate of the same POI under different air quality and temperature conditions. GeoMap is used to display the activity weight and geographical distribution of POI clusters of the same type.
相同类型POI聚类的活跃度权值的计算具体为:计算每个打车难易度分布点周围欧氏距离小于等于T范围内所有的POI分布点,记为POIdidi。统计POIdidi中相同类型的POI分布点,计算聚类中心的位置,并设置打车难易度分布点的权值为聚类中心的权值。其中,基于k‐means的聚类算法对POI分布点进行聚类,将计算出来新的聚类中心经纬度坐标作为POI中心位置的经纬度坐标。The calculation of the activity weight of the same type of POI clustering is as follows: calculate all POI distribution points within the range of Euclidean distance less than or equal to T around each taxi difficulty distribution point, denoted as POI didi . Count the POI distribution points of the same type in POI didi , calculate the position of the cluster center, and set the weight of the distribution point of taxi difficulty as the weight of the cluster center. Among them, the k-means-based clustering algorithm clusters the POI distribution points, and the calculated longitude and latitude coordinates of the new cluster center are used as the longitude and latitude coordinates of the POI center position.
本发明的有益效果是:本发明与传统的空气质量可视化不同,本发明提出了针对空气质量和居民出行的数据可视化、用户可以从全局到局部再到全局的方式探索空气质量对城市不同区域的活跃度变化情况,分析空气质量影响居民的出行目的地变化。通过交互的手段,降低了分析人员使用系统的成本,达到很好的展示效果,系统可以从空气质量、温度、POI带权活跃度和偏移率四个层面展示了空气质量和居民出行的多种规律。The beneficial effects of the present invention are: the present invention is different from the traditional visualization of air quality. The present invention proposes data visualization for air quality and resident travel, and users can explore the impact of air quality on different areas of the city from global to local and then to global. Changes in activity, and analyze the impact of air quality on changes in residents' travel destinations. Through interactive means, the cost of using the system for analysts is reduced, and a good display effect is achieved. The system can display the air quality and the multiplicity of residents' travel from four levels: air quality, temperature, POI weighted activity, and offset rate. kind of law.
附图说明Description of drawings
图1条形‐箱线图分析组件;Figure 1 Bar-box plot analysis component;
图2流图‐堆积图分析组件;Figure 2 flow graph-stacked graph analysis component;
图3散点矩阵‐GeoMap‐日历热图分析组件;Figure 3 Scatter matrix-GeoMap-calendar heat map analysis component;
图4系统前后端依赖关系图。Figure 4 is a diagram of the front-end and back-end dependencies of the system.
具体实施方式detailed description
下面结合实施例及附图进行详细说明。The following describes in detail in conjunction with the embodiments and the accompanying drawings.
本发明所依据的数据基础有:空气质量数据为各地级及以上环境保护行政主管部门或其授权的环境监测站发布数据,包括日报和时报。时报数据的时间周期为1小时,每一整点时刻发布各个监测站点的实时报,实时报的指标包括SO2、NO2、O3、CO、PM2.5、PM10浓度,日报数据为一天SO2、NO2、O3、CO、PM2.5、PM1024小时浓度平均值;大气环境数据为各地级及以上气象保护行政主管部门或其授权的气象监测站发布,包括日报和时报。时报数据的时间周期为1小时,每一整点时刻发布各个检测站点的实时报,实时报的指标包括气压、温度、湿度、降水与风力风向等数据。日报数据为一天气压、温度、湿度、降水与风力风向24小时数据的均值;居民出行数据为滴滴苍穹大数据平台提供的打车难易度数据,其中数据时间周期为1小时,每一整点提供不同地点的打车难易度。每一整点数据包括:经度、维度、打车难易度;POI分布数据为POI的详细数据,包括有POI地址、POI名称、POI经度、POI纬度和POI类型。The data bases on which the present invention is based include: the air quality data are released by the environmental protection administrative departments at or above the local level or their authorized environmental monitoring stations, including daily newspapers and times. The time period of the report data is 1 hour, and the real-time report of each monitoring station is released every hour. The indicators of the real-time report include the concentration of SO 2 , NO 2 , O 3 , CO, PM 2.5 , and PM 10 . The daily data is SO 2. The 24-hour average concentration of NO 2 , O 3 , CO, PM 2.5 , and PM 10 ; atmospheric environmental data are released by the meteorological protection administrative departments at or above the local level or their authorized meteorological monitoring stations, including daily and times. The time period of the report data is 1 hour, and the real-time report of each detection station is released every hour. The indicators of the real-time report include air pressure, temperature, humidity, precipitation and wind direction and other data. The daily data is the average value of the 24-hour data of air pressure, temperature, humidity, precipitation, and wind direction in one day; the travel data of residents is the difficulty of taking a taxi provided by the Didi Sky big data platform, and the data time period is 1 hour, every hour Provides taxi difficulty for different locations. Each point data includes: longitude, latitude, taxi difficulty; POI distribution data is detailed data of POI, including POI address, POI name, POI longitude, POI latitude and POI type.
本发明提供的一种基于大数据驱动的空气质量与居民出行可视分析方法,包括以下几个步骤:The present invention provides a big data-driven visual analysis method for air quality and residents' travel, which includes the following steps:
(1)原始空气质量数据、温度数据、POI数据和打车难易度数据重构:首先分别对空气质量数据、温度数据、POI数据和打车难易度数据进行数据清理和排序,其中数据清理主要是对各种数据源中数据异常和缺失值的查找及剔除,然后按照时间戳将所有数据按照时间排序,这有利于后续的时序数据可视化。所述打车难易度数据包括打车难易度分布点的地理坐标和权值。所述POI数据包括POI分布点的地理坐标和POI类型。(1) Reconstruction of the original air quality data, temperature data, POI data, and taxi difficulty data: firstly, data cleaning and sorting are performed on the air quality data, temperature data, POI data, and taxi difficulty data, among which data cleaning mainly It is to find and eliminate data anomalies and missing values in various data sources, and then sort all data by time according to timestamp, which is conducive to the subsequent visualization of time series data. The taxi-hailing difficulty data includes geographical coordinates and weights of distribution points of taxi-hailing difficulty. The POI data includes geographic coordinates of POI distribution points and POI types.
(2)POI带权活跃度及偏移率计算:POI带权活跃度反映POI周围人流量的大小;偏移率反映POI带权活跃度的变化情况。(2) Calculation of POI weighted activity and offset rate: POI weighted activity reflects the size of the flow of people around POI; offset rate reflects the change of POI weighted activity.
POI带权活跃度的计算具体为:The calculation of POI weighted activity is as follows:
(2.1)计算打车难易度分布点和每个POI分布点之间的欧氏距离,判断欧式距离是否小于预先设置的阈值T,T可取0.5km,若满足条件则将打车难易度分布点的权值设为这个POI活跃度的权值。(2.1) Calculate the Euclidean distance between the distribution point of taxi difficulty and each POI distribution point, and judge whether the Euclidean distance is less than the preset threshold T, T can be 0.5km, if the condition is met, the taxi difficulty distribution point The weight of is set as the weight of the POI activity.
(2.2)根据POI类型不同分别统计各种类型POI活跃度的累加和,作为这种类型POI带权活跃度。(2.2) According to different POI types, the cumulative sum of the activeness of various types of POIs is counted separately, as the weighted activeness of this type of POI.
POI带权活跃度偏移率的计算具体为:The calculation of POI weighted activity offset rate is as follows:
Offsett=(POIWeightt-Averweek,hour)/(POIWeightt)-1Offset t =(POIWeight t -Aver week,hour )/(POIWeight t )-1
其中,Averweek,hour为每星期每小时POI带权活跃度均值,POIWeightt为当前小时POI带权活跃度,Offsett为偏移率。Among them, Aver week, hour is the average POI weighted activity per hour per week, POIWeight t is the POI weighted activity in the current hour, and Offset t is the offset rate.
3)相同类型POI聚类:计算每个打车难易度分布点周围欧氏距离小于等于T范围内所有的POI分布点,记为POIdidi。统计POIdidi中相同类型的POI分布点,计算聚类中心的位置,并设置打车难易度分布点的权值为聚类中心的权值。其中,基于k‐means的聚类算法对POI分布点进行聚类,将计算出来新的聚类中心经纬度坐标作为POI中心位置的经纬度坐标。3) Clustering of POIs of the same type: Calculate all POI distribution points within the range of Euclidean distance less than or equal to T around each taxi difficulty distribution point, denoted as POI didi . Count the POI distribution points of the same type in POI didi , calculate the position of the cluster center, and set the weight of the distribution point of taxi difficulty as the weight of the cluster center. Among them, the k-means-based clustering algorithm clusters the POI distribution points, and the calculated longitude and latitude coordinates of the new cluster center are used as the longitude and latitude coordinates of the POI center position.
4)空气质量与居民出行的可视分析,具体为:4) Visual analysis of air quality and residents' travel, specifically:
(4.1)颜色视觉编码:对颜色进行映射时,由于空气质量指数AQI的不同,采用动态映射方案,即根据空气质量指数值动态的调整:(4.1) Color visual coding: when mapping the color, due to the difference of the air quality index (AQI), a dynamic mapping scheme is adopted, that is, a dynamic adjustment is made according to the value of the air quality index:
其中Colorrect为矩形的填充色。Where Color rect is the filling color of the rectangle.
(4.2)条形‐箱线图分析组件:每天的空气质量指数用矩形展示,矩形从左向右的顺序表示每天日期的先后,矩形的填充色根据步骤4.1的方案确定,高度根据空气质量指数AQI确定。箱线图代表每周每时温度,箱线图从左向右表示每周日期的先后,箱线图上虚线,下虚线分别代表上四分之一数据范围和下四分之一数据范围,箱线图中央小矩形代表数据四分之一至四分之三分位数据范围,小矩形中央横线位置代表数据的中位数,如图1所示。(4.2) Bar-box diagram analysis component: the daily air quality index is displayed in a rectangle, and the order of the rectangles from left to right indicates the order of each day's date. The filling color of the rectangle is determined according to the scheme in step 4.1, and the height is determined according to the air quality index AQI OK. The box plot represents the weekly and hourly temperature. The box plot represents the order of the weekly dates from left to right. The upper dotted line and the lower dotted line of the box plot represent the upper quarter data range and the lower quarter data range respectively. The small rectangle in the center of the box plot represents the data range from one quarter to three quarters of the data, and the position of the horizontal line in the center of the small rectangle represents the median of the data, as shown in Figure 1.
(4.3)流图‐堆积图分析组件:堆积图和流图的横坐标是指定时间范围每小时坐标,以每星期为基本刻度。纵坐标是POI带权活跃度值。堆积图中用不同颜色的面积图代表不同类型的POI,堆积图沿坐标轴单侧排列,展示指定时间范围一种或多种POI带权活跃度的变化情况。流图沿坐标双侧排列,展示指定时间范围一种或多种POI带权活跃度的变化情况,如图2所示。(4.3) Flow chart-stacked chart analysis component: the abscissa of the stacked chart and the stream chart is the hourly coordinate of the specified time range, with each week as the basic scale. The vertical axis is the POI weighted activity value. The stacked chart uses area charts of different colors to represent different types of POIs. The stacked chart is arranged along one side of the coordinate axis to show the changes in the weighted activity of one or more POIs within a specified time range. The flow graph is arranged along both sides of the coordinates, showing the changes in the weighted activity of one or more POIs within a specified time range, as shown in Figure 2.
(4.4)散点矩阵‐GeoMap‐日历热图分析组件:散点矩阵图是散点图高维方面的拓展,用来展示空气质量、温度和POI带权活跃度。日历热图将多维数据以二维的形式呈现出来,并用颜色深浅来表示数值的大小,通过日历热图展示相同POI在不同空气质量和温度情况下POI带权活跃度偏移率的变化情况。GeoMap用来展示相同类型POI聚类的活跃度权值和地理分布情况,如图3所示。(4.4) Scatter matrix-GeoMap-calendar heat map analysis component: The scatter matrix map is an extension of the high-dimensional aspect of the scatter map, which is used to display air quality, temperature and POI weighted activity. The calendar heat map presents multi-dimensional data in two-dimensional form, and uses the color depth to represent the size of the value. The calendar heat map shows the change of POI weighted activity offset rate of the same POI under different air quality and temperature conditions. GeoMap is used to display the activity weight and geographical distribution of the same type of POI clusters, as shown in Figure 3.
一种基于大数据驱动的空气质量与居民出行可视分析系统,该系统包括以下组件:A big data-driven visual analysis system for air quality and residents' travel, which includes the following components:
(1)条形‐箱线图分析组件:每天的空气质量指数用矩形展示,矩形从左向右的顺序表示每天日期的先后;矩形的高度根据空气质量指数AQI确定,填充色采用动态映射方案,即根据空气质量指数值动态的调整:(1) Bar-box plot analysis component: the daily air quality index is displayed in a rectangle, and the order of the rectangles from left to right indicates the order of each day's date; the height of the rectangle is determined according to the air quality index AQI, and the filling color adopts a dynamic mapping scheme , which is dynamically adjusted according to the air quality index value:
其中Colorrect为矩形的填充色。Where Color rect is the filling color of the rectangle.
箱线图代表每周每时温度,箱线图从左向右表示每周日期的先后,箱线图上虚线,下虚线分别代表上四分之一数据范围和下四分之一数据范围,箱线图中央小矩形代表数据四分之一至四分之三分位数据范围,小矩形中央横线位置代表数据的中位数,如图1所示。The box plot represents the weekly and hourly temperature. The box plot represents the order of the weekly dates from left to right. The upper dotted line and the lower dotted line of the box plot represent the upper quarter data range and the lower quarter data range respectively. The small rectangle in the center of the box plot represents the data range from one quarter to three quarters of the data, and the position of the horizontal line in the center of the small rectangle represents the median of the data, as shown in Figure 1.
(2)流图‐堆积图分析组件:堆积图和流图的横坐标是指定时间范围每小时坐标,以每星期为基本刻度。纵坐标是POI带权活跃度值。堆积图中用不同颜色的面积图代表不同类型的POI,堆积图沿坐标轴单侧排列,展示指定时间范围一种或多种POI带权活跃度的变化情况。流图沿坐标双侧排列,展示指定时间范围一种或多种POI带权活跃度的变化情况,如图2所示。POI带权活跃度的计算具体为:(2) Flow chart - Stacked chart analysis component: The abscissa of the stacked chart and stream chart is the hourly coordinate of the specified time range, with each week as the basic scale. The vertical axis is the POI weighted activity value. The stacked chart uses area charts of different colors to represent different types of POIs. The stacked chart is arranged along one side of the coordinate axis to show the changes in the weighted activity of one or more POIs within a specified time range. The flow graph is arranged along both sides of the coordinates, showing the changes in the weighted activity of one or more POIs within a specified time range, as shown in Figure 2. The calculation of POI weighted activity is as follows:
(2.1)计算打车难易度分布点和每个POI分布点之间的欧氏距离,判断欧式距离是否小于预先设置的阈值T,若满足条件则将打车难易度分布点的权值设为这个POI活跃度的权值。(2.1) Calculate the Euclidean distance between the distribution point of taxi difficulty and each POI distribution point, and judge whether the Euclidean distance is less than the preset threshold T. If the condition is met, set the weight of the taxi difficulty distribution point to The weight of the POI activity.
(2.2)根据POI类型不同分别统计各种类型POI活跃度的累加和,作为这种类型POI带权活跃度。(2.2) According to different POI types, the cumulative sum of the activeness of various types of POIs is counted separately, as the weighted activeness of this type of POI.
(3)散点矩阵‐GeoMap‐日历热图分析组件:散点矩阵图是散点图高维方面的拓展,用来展示空气质量、温度和POI带权活跃度。日历热图将多维数据以二维的形式呈现出来,并用颜色深浅来表示数值的大小,通过日历热图展示相同POI在不同空气质量和温度情况下POI带权活跃度偏移率的变化情况。GeoMap用来展示相同类型POI聚类的活跃度权值和地理分布情况,如图3所示。(3) Scatter matrix-GeoMap-calendar heat map analysis component: The scatter matrix map is an extension of the high-dimensional aspect of the scatter map, which is used to display air quality, temperature and POI weighted activity. The calendar heat map presents multi-dimensional data in two-dimensional form, and uses the color depth to represent the size of the value. The calendar heat map shows the change of POI weighted activity offset rate of the same POI under different air quality and temperature conditions. GeoMap is used to display the activity weight and geographical distribution of the same type of POI clusters, as shown in Figure 3.
相同类型POI聚类的活跃度权值的计算具体为:计算每个打车难易度分布点周围欧氏距离小于等于T范围内所有的POI分布点,记为POIdidi。统计POIdidi中相同类型的POI分布点,计算聚类中心的位置,并设置打车难易度分布点的权值为聚类中心的权值。其中,基于k‐means的聚类算法对POI分布点进行聚类,将计算出来新的聚类中心经纬度坐标作为POI中心位置的经纬度坐标。The calculation of the activity weight of the same type of POI clustering is as follows: calculate all POI distribution points within the range of Euclidean distance less than or equal to T around each taxi difficulty distribution point, denoted as POI didi . Count the POI distribution points of the same type in POI didi , calculate the position of the cluster center, and set the weight of the distribution point of taxi difficulty as the weight of the cluster center. Among them, the k-means-based clustering algorithm clusters the POI distribution points, and the calculated longitude and latitude coordinates of the new cluster center are used as the longitude and latitude coordinates of the POI center position.
在本发明方法的预处理过程中,POI带权活跃度计算主要通过统计每个打车难易度点周围不同类型POI的个数的累加和,以此来获得POI带权活跃度的计量;POI带权活跃度偏移率主要统计实时POI活跃度相对历史POI带权活跃度均值的偏移情况。通过绘制柱状‐箱线图、堆积‐流图、散点矩阵‐GeoMap‐日历热图,用户通过多种可视化视图之间的交互,不仅可以为探索居民的出行行为提供重要参考,还可以引起交通、医疗等相关部门对空气质量的重视,为相关部门提供建设性的意见。In the preprocessing process of the method of the present invention, the calculation of POI weighted activity is mainly by counting the cumulative sum of the numbers of different types of POIs around each taxi difficulty point, so as to obtain the measurement of POI weighted activity; POI The offset rate of weighted activity mainly counts the deviation of real-time POI activity relative to the average value of historical POI weighted activity. By drawing columnar-box plots, stacked-flow maps, scatter matrix-GeoMap-calendar heat maps, users can interact with various visual views, which can not only provide important references for exploring the travel behavior of residents, but also cause traffic , medical and other related departments attach importance to air quality, and provide constructive suggestions for relevant departments.
以上阐述的是本发明给出的一个实施案例,展示了多种层面的有效可视化组件,显然本发明不只是限于上述实施案例,在不偏离本发明基本精神及不超出本发明实质内容所涉及范围的前提下对其可做种种变形加以实施。What has been set forth above is an implementation case given by the present invention, which shows effective visualization components at various levels. Obviously, the present invention is not limited to the above-mentioned implementation case, and does not deviate from the basic spirit of the present invention and does not exceed the scope of the present invention. It can be implemented in various deformations under the premise.
Claims (2)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710173669.4A CN106991525B (en) | 2017-03-22 | 2017-03-22 | Visual analysis method and system for air quality and residents' travel |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710173669.4A CN106991525B (en) | 2017-03-22 | 2017-03-22 | Visual analysis method and system for air quality and residents' travel |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106991525A true CN106991525A (en) | 2017-07-28 |
CN106991525B CN106991525B (en) | 2021-06-18 |
Family
ID=59411741
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710173669.4A Active CN106991525B (en) | 2017-03-22 | 2017-03-22 | Visual analysis method and system for air quality and residents' travel |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106991525B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110286663A (en) * | 2019-06-28 | 2019-09-27 | 云南中烟工业有限责任公司 | Method for improving based on zonal cigarette physical index standardized production |
CN111380583A (en) * | 2018-12-28 | 2020-07-07 | 株式会社基恩士 | Gas flowmeter |
CN112699284A (en) * | 2021-01-11 | 2021-04-23 | 四川大学 | Bus stop optimization visualization method based on multi-source data |
CN118828254A (en) * | 2024-09-11 | 2024-10-22 | 贵州桥梁建设集团有限责任公司 | Meteorological data transmission and sharing method for highways based on Internet of Things |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CA2561458A1 (en) * | 2005-09-29 | 2007-03-29 | Ronald Kurnik | Ct determination by cluster analysis with variable cluster endpoint |
US7826965B2 (en) * | 2005-06-16 | 2010-11-02 | Yahoo! Inc. | Systems and methods for determining a relevance rank for a point of interest |
US20140163873A1 (en) * | 2011-02-02 | 2014-06-12 | Mapquest, Inc. | Systems and methods for generating electronic map displays with points-of-interest information |
CN105493109A (en) * | 2013-06-05 | 2016-04-13 | 微软技术许可有限责任公司 | Air quality inference using multiple data sources |
CN105679009A (en) * | 2016-02-03 | 2016-06-15 | 西安交通大学 | Taxi-taking/order-receiving POI recommendation system and method based on taxi GPS data mining |
CN105825672A (en) * | 2016-04-11 | 2016-08-03 | 中山大学 | City guidance area extraction method based on floating car data |
-
2017
- 2017-03-22 CN CN201710173669.4A patent/CN106991525B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7826965B2 (en) * | 2005-06-16 | 2010-11-02 | Yahoo! Inc. | Systems and methods for determining a relevance rank for a point of interest |
CA2561458A1 (en) * | 2005-09-29 | 2007-03-29 | Ronald Kurnik | Ct determination by cluster analysis with variable cluster endpoint |
US20140163873A1 (en) * | 2011-02-02 | 2014-06-12 | Mapquest, Inc. | Systems and methods for generating electronic map displays with points-of-interest information |
CN105493109A (en) * | 2013-06-05 | 2016-04-13 | 微软技术许可有限责任公司 | Air quality inference using multiple data sources |
CN105679009A (en) * | 2016-02-03 | 2016-06-15 | 西安交通大学 | Taxi-taking/order-receiving POI recommendation system and method based on taxi GPS data mining |
CN105825672A (en) * | 2016-04-11 | 2016-08-03 | 中山大学 | City guidance area extraction method based on floating car data |
Non-Patent Citations (6)
Title |
---|
HSUN-PING HSIEH ET AL: "Inferring Air Quality for Station Location Recommendation Based on Urban Big Data", 《KDD "15: PROCEEDINGS OF THE 21TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING》 * |
NICK QI ZHU: "《Data Visualization with D3.js Cookbook》", 31 October 2013 * |
ZEQIAN SHEN ET AL: "Visual Analysis of Large Heterogeneous Social Networks by Semantic and Structural Abstraction", 《IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS》 * |
任磊等: "大数据可视分析综述", 《软件学报》 * |
张明月: "基于出租车轨迹的载客点与热点区域推荐", 《中国优秀硕士学位论文全文数据库 工程科技Ⅱ辑》 * |
王爽等: "基于城市网络空间的 POI 分布密度分析及可视化", 《城市勘测》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111380583A (en) * | 2018-12-28 | 2020-07-07 | 株式会社基恩士 | Gas flowmeter |
CN110286663A (en) * | 2019-06-28 | 2019-09-27 | 云南中烟工业有限责任公司 | Method for improving based on zonal cigarette physical index standardized production |
CN110286663B (en) * | 2019-06-28 | 2021-05-25 | 云南中烟工业有限责任公司 | Regional cigarette physical index standardized production improving method |
CN112699284A (en) * | 2021-01-11 | 2021-04-23 | 四川大学 | Bus stop optimization visualization method based on multi-source data |
CN118828254A (en) * | 2024-09-11 | 2024-10-22 | 贵州桥梁建设集团有限责任公司 | Meteorological data transmission and sharing method for highways based on Internet of Things |
Also Published As
Publication number | Publication date |
---|---|
CN106991525B (en) | 2021-06-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Yeprintsev et al. | Technologies for creating geographic information resources for monitoring the socio-ecological conditions of cities | |
US9183221B2 (en) | Component and method for overlying information bearing hexagons on a map display | |
CN103810849B (en) | A kind of traffic flow variation tendency extracting method based on floating car data | |
CN106991525A (en) | The air quality and resident trip visual analysis method and system driven based on big data | |
CN110555544B (en) | A Traffic Demand Estimation Method Based on GPS Navigation Data | |
CN112699284B (en) | Bus stop optimization visualization method based on multi-source data | |
CN107153928A (en) | Visual highway maintenance decision system | |
WO2023050955A1 (en) | Urban functional zone identification method based on function mixing degree and ensemble learning | |
Ming et al. | Spatial pattern of anthropogenic heat flux in monocentric and polycentric cities: The case of Chengdu and Chongqing | |
CN110427533A (en) | Pollution spread mode visible analysis method and system based on timing Particle tracking | |
CN110716935A (en) | Track data analysis and visualization method and system based on online taxi appointment travel | |
Deng et al. | Unraveling the association between the built environment and air pollution from a geospatial perspective | |
Cummings et al. | Mobile monitoring of air pollution reveals spatial and temporal variation in an urban landscape | |
Nissen et al. | How does weather affect the use of public transport in Berlin? | |
CN111400877A (en) | Intelligent city simulation system and method based on GIS data | |
CN115062682A (en) | Target functional area identification method, device, storage medium and device | |
CN115170044A (en) | Visual analysis system for atmospheric pollution space-time situation and propagation mode | |
Ndletyana et al. | Spatial Distribution of PM 10 and NO 2 in Ambient Air Quality in Cape Town CBD, South Africa. | |
CN112066998A (en) | Rendering method and system for airline map | |
CN116089498A (en) | Big data monitoring and analysis platform for leisure tourism area of field | |
Huang et al. | Evaluating air pollution exposure among cyclists: Real-time levels of PM2. 5 and NO2 and POI impact | |
Hernández-Ceballos et al. | Identification of airborne radioactive spatial patterns in Europe–Feasibility study using Beryllium-7 | |
Ding et al. | Understanding spatiotemporal mobility patterns related to transport hubs from floating car data | |
Tu et al. | Detecting congestion and detour of taxi trip via GPS data | |
Deng et al. | Tackling the modifiable areal unit problem: Enhancing urban sustainability through improved land surface temperature and its influencing factors analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |