CN111783832A

CN111783832A - An Interactive Selection Method for Spatio-temporal Data Prediction Models

Info

Publication number: CN111783832A
Application number: CN202010492269.1A
Authority: CN
Inventors: 孙国道; 查梦; 朱素佳; 徐超清; 王磊; 梁荣华
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2020-06-03
Filing date: 2020-06-03
Publication date: 2020-10-16
Anticipated expiration: 2040-06-03
Also published as: CN111783832B

Abstract

An interactive selection method based on a spatiotemporal data prediction model, which filters, cleans, deletes outliers, and completes missing values from the original data; then combines the Canopy and K-Means clustering algorithms to cluster the above data to obtain K clusters and extract the first K1 cluster areas to calculate the boundaries; after the data of each area is differentially processed, the Random Forest algorithm, SVM algorithm, ARIMA algorithm, and LSTM algorithm are used to train and establish models, and then each area is based on the model. Make predictions; display the obtained prediction result data with visual glyphs in each area on the map; after completing the above steps, optimize the layout of the area glyphs and area connections. The interactive exploration component provided by the system helps users to effectively distinguish the differences of various models in an intuitive way. The font design and geographic layout design of the present invention enables the user to intuitively conduct in-depth analysis of the prediction output.

Description

An Interactive Selection Method for Spatio-temporal Data Prediction Models

技术领域technical field

本发明涉及一种时空数据预测模型的交互式选择方法。The invention relates to an interactive selection method of a spatiotemporal data prediction model.

背景技术Background technique

随着社会的发展，各领域产生了大量的时空数据，如：跟踪工业生产，进行金融交易和监视环境。对时空数据的分析通常包括统计分析，模型推荐和最佳预测，这对社会的许多方面都具有重要意义。通过分析时空数据做出合理的预测，可以帮助研究人员掌握社会/技术发展的趋势，并做出更好的决策。With the development of society, a large amount of spatiotemporal data is generated in various fields, such as: tracking industrial production, conducting financial transactions and monitoring the environment. Analysis of spatiotemporal data typically includes statistical analysis, model recommendation, and optimal prediction, which has important implications for many aspects of society. Making reasonable predictions by analyzing spatiotemporal data can help researchers grasp social/technological trends and make better decisions.

时空预测的核心作用是建立预测模型，对时空数据的预测方法可以分为两大类：传统统计方法和机器学习方法。传统的统计方法通常基于用户的观察，并且需要以下步骤：数据采样，数据绘图和曲线拟合/参数估计，具体包括移动平均模型(MA)，自回归移动平均值模型(ARMA)和自回归综合移动平均值(ARIMA)等模型。机器学习方法主要基于数据分类，并且需要以下步骤：数据分解，模型训练和模型预测，具体包括支持向量机(SVM)，随机森林(Random Forest)和非参数回归模型等模型。随着发展，出现了很多高级模型，例如人工神经网络(ANN)和长期短期记忆网络(LSTM)，以帮助人们深入了解不同应用领域中的时空数据。The core role of spatiotemporal prediction is to establish a prediction model. The prediction methods for spatiotemporal data can be divided into two categories: traditional statistical methods and machine learning methods. Traditional statistical methods are usually based on user observations and require the following steps: data sampling, data plotting, and curve fitting/parameter estimation, including moving average models (MA), autoregressive moving average models (ARMA), and autoregressive synthesis Models such as moving average (ARIMA). Machine learning methods are mainly based on data classification and require the following steps: data decomposition, model training, and model prediction, including models such as support vector machines (SVM), random forests, and nonparametric regression models. With the development, many advanced models, such as artificial neural network (ANN) and long short-term memory network (LSTM), have emerged to help people gain insight into spatiotemporal data in different application domains.

由于不同类型的模型适合不同的应用场景并且可能有其特定的局限性，研究人员对模型推荐进行了一系列研究，以帮助人们在模型选择方面有更好的选择。然而，他们的工作中存在着一些挑战。困难之一在于他们的研究重点是数据驱动。尽管可以通过数据转换分析来找到大量有趣的信息，但也需要考虑大量的应用场景信息。同时，由于存在各种级别的不确定性，这可能很难。另一个挑战是他们对模型的预测性能没有适当的解释。用户可能想知道为什么该模型运行良好，而其他模型在特定情况下却无法运行，以及不同模型的优势及其使用范围。Since different types of models are suitable for different application scenarios and may have their specific limitations, researchers have conducted a series of studies on model recommendation to help people have better choices in model selection. However, there are some challenges in their work. One of the difficulties is that their research focus is data-driven. Although a lot of interesting information can be found through data transformation analysis, there is also a lot of application scenario information to consider. At the same time, this can be difficult due to various levels of uncertainty. Another challenge is that they do not have a proper explanation for the predictive performance of the models. Users may wonder why this model works well while others do not in specific situations, and the advantages of different models and their scope of use.

现阶段，在时空预测模型的可视化分析中，视觉分析系统将多个机器学习模型与交互式可视化相结合，允许用户检查模型并在视觉上比较不同模型之间的预测性能。许多可视化研究人员通过将模型性能度量(例如准确性和精度)链接到可视化组件(例如折线图，条形图和热图)来进行分析。通过交互式探索过程，用户可以了解预测模型的多种可能性，并洞悉多种模型的差异。但是，现有的用于多模型转向和选择的工作旨在检测性能指标，而不是全面考虑输入数据和模型输出，对时空数据与模型输出之间关系的理解不足可能会导致模型选择不正确。At this stage, in the visual analysis of spatiotemporal prediction models, visual analysis systems combine multiple machine learning models with interactive visualizations, allowing users to examine models and visually compare the prediction performance between different models. Many visualization researchers perform analysis by linking model performance metrics such as accuracy and precision to visualization components such as line charts, bar charts, and heatmaps. Through an interactive exploration process, users can learn about the multiple possibilities of a predictive model and gain insight into how multiple models differ. However, existing work for multi-model steering and selection aims to detect performance metrics without comprehensively considering input data and model outputs, and insufficient understanding of the relationship between spatiotemporal data and model outputs may lead to incorrect model selection.

基于以上问题，我们结合时间序列/地理空间数据和多种预测模型性能来精心设计可视化框架，这对机器学习初学者和非专业人士选择、理解模型是有益的。本发明提出了一种视觉分析系统，该系统可使用户以交互方式检查时空数据模型预测输出的相似性。与其他工作相比，我们旨在将地理信息映射到预测模型的输出，同时多个模型的相关性。聚类方法已应用于相似性分析，结合相关矩阵视图，用户可以直观了解不同模型的相似性。同时，我们设计新颖的字形，使用户可以直观地对预测输出进行深入分析。此外，根据鼠标缩放比例，我们设计了多层布局，当用户密切观察与模型的预测输出链接的字形时，这些布局有助于消除地理空间重叠。在每个级别，已将力导向图算法应用于字形以避免碰撞。此外，设计好的时间轴视图可以帮助用户快速了解原始时间序列。Based on the above questions, we carefully design a visualization framework combining time series/geospatial data and the performance of various prediction models, which is beneficial for machine learning beginners and non-experts to choose and understand models. The present invention proposes a visual analysis system that enables users to interactively check the similarity of the predicted outputs of spatiotemporal data models. In contrast to other work, we aim to map geographic information to the output of a predictive model, with simultaneous correlation of multiple models. Clustering methods have been applied to similarity analysis, combined with the correlation matrix view, users can intuitively understand the similarity of different models. At the same time, we design novel glyphs that allow users to intuitively perform in-depth analysis of the prediction output. Furthermore, based on mouse scaling, we design multi-layered layouts that help eliminate geospatial overlap when the user closely observes the glyphs linked to the model's predicted output. At each level, a force-directed graph algorithm has been applied to the glyphs to avoid collisions. In addition, a well-designed timeline view can help users quickly understand the original time series.

在各领域，数据预测是一个常用的数据分析方式。但是，对于初学者和非专业人士而言，选择一个合适的预测模型，并且了解不同模型的预测性能和使用场景是有困难的。现有的模型推荐系统基于数据驱动给用户推荐一个最合适的预测模型，并没有对预测模型的预测性能给出适当的解释，用户很难从中发现有趣的点。用户可能想知道为什么该模型运行良好，而其他模型在特定情况下却无法运行，以及不同模型的优势及其使用范围。In various fields, data forecasting is a common data analysis method. However, it is difficult for beginners and non-experts to choose a suitable forecasting model and understand the forecasting performance and usage scenarios of different models. Existing model recommendation systems recommend a most suitable prediction model to users based on data-driven, and do not give a proper explanation for the prediction performance of the prediction model, so it is difficult for users to find interesting points from it. Users may wonder why this model works well while others do not in specific situations, and the advantages of different models and their scope of use.

发明内容SUMMARY OF THE INVENTION

现有的用于多模型转向和选择的工作旨在检测性能指标，而不是全面考虑输入数据和模型输出，对时空数据与模型输出之间关系的理解不足可能会导致模型选择不正确。为了克服现有技术的不足，本发明提供一种时空数据预测模型的交互式选择方法。Existing work for multi-model steering and selection aims to detect performance metrics rather than comprehensively considering input data and model outputs, and insufficient understanding of the relationship between spatiotemporal data and model outputs may lead to incorrect model selection. In order to overcome the deficiencies of the prior art, the present invention provides an interactive selection method of a spatiotemporal data prediction model.

为了解决上述技术问题本发明提供如下的技术方案：In order to solve the above-mentioned technical problems, the present invention provides the following technical solutions:

一种时空数据预测模型的交互式选择方法，基于地图区域通过可视化方式解释对比时空预测模型，所述方法包括以下步骤：An interactive selection method for a spatiotemporal data prediction model, which visually interprets and contrasts the spatiotemporal prediction model based on a map area, and the method includes the following steps:

(1)获取共享单车数据，删除不在我们的分析区域内以及骑行时间异常的单车数据，并判断此区域的所有日期的所有时间点的每个单车数据是否为空，若为空则用0填充，并制作好数据集；(1) Obtain the shared bicycle data, delete the bicycle data that is not in our analysis area and the riding time is abnormal, and judge whether the data of each bicycle at all time points on all dates in this area is empty, if it is empty, use 0 Fill in and make a data set;

(2)结合Canopy聚类算法和K-means聚类算法对上述所得到的数据集进行聚类，并取前K1个簇类作为预测区域；然后，对此K1个预测区域的流量数据进行建模并预测，测试结果包括预测数据、真实数据、预测数据的MAE值、预测数据的RMSE值、预测数据的R-Squared值和1-R-Squared值；(2) Combine the Canopy clustering algorithm and the K-means clustering algorithm to cluster the data set obtained above, and take the first K1 clusters as the prediction area; then, construct the traffic data of the K1 prediction areas. Model and predict, the test results include predicted data, real data, MAE value of predicted data, RMSE value of predicted data, R-Squared value and 1-R-Squared value of predicted data;

(3)提出可视化字形展示每个区域的四种预测算法的预测结果，分析展示步骤如下：(3) The prediction results of the four prediction algorithms for each region are presented in a visualized glyph. The analysis and display steps are as follows:

(3-1)使用多层雷达图作为区域图的第一层：分别将上述得到的(1-R-Squared)值、MAE值和RMSE值按照逆时针顺序依次作为雷达图的顶点，每个顶点显示参数名称以及数值大小，四个算法分别使用四层雷达图堆叠展示，每层雷达图分别使用不同的纹理来表示，反斜杠系表示的是Random Forest算法，竖杠表示的是LSTM算法，横杠表示的是ARIMA算法，斜杠系表示的是SVM算法；(3-1) Use the multi-layer radar map as the first layer of the area map: take the (1-R-Squared) value, MAE value and RMSE value obtained above as the vertices of the radar map in anti-clockwise order. The vertices display the parameter names and numerical values. The four algorithms are displayed by stacking four layers of radar images. Each layer of radar images is represented by a different texture. The backslashes represent the Random Forest algorithm, and the vertical bars represent the LSTM algorithm. , the horizontal bar represents the ARIMA algorithm, and the slash system represents the SVM algorithm;

(3-2)使用柱状图作为区域图的第二层：分别将上述得到的R-Squared值、MAE值和RMSE值按照逆时针顺序绘制柱状图分布在第二层上，每个柱子的高度表示数值大小，每个柱子显示参数名称；分别使用不同的纹理来表示不同的预测算法得到预测结果，其中反斜杠系表示的是Random Forest算法，竖杠表示的是LSTM算法，横杠表示的是ARIMA算法，斜杠系表示的是SVM算法；同一纹理的柱状图的不同灰度示同一算法的不同训练模型得到的预测结果；(3-2) Use the histogram as the second layer of the area map: draw the R-Squared value, the MAE value and the RMSE value obtained above in a counterclockwise order and distribute the histogram on the second layer. The height of each column is Indicates the numerical value, and each column displays the parameter name; different textures are used to represent different prediction algorithms to obtain prediction results, where the backslashes represent the Random Forest algorithm, the vertical bars represent the LSTM algorithm, and the horizontal bars represent the is the ARIMA algorithm, and the slashes represent the SVM algorithm; the different gray levels of the histogram of the same texture represent the prediction results obtained by different training models of the same algorithm;

(4)基于地图预测区域字形的布局算法；根据区域的中心位置，将字形的原始位置放置在该区域的中心点，用改进的力导向布局算法对重叠的字形重新布局，布局算法的过程为：(4) Layout algorithm based on map prediction area glyph; according to the center position of the area, place the original position of the glyph at the center point of the area, and use the improved force-oriented layout algorithm to re-layout the overlapping glyphs. The process of the layout algorithm is as follows: :

(4-1)输入K1个初始节点位置，节点的半径为r，并计算节点间的距离，若节点距离小于2r，则两节点重合；(4-1) Input K1 initial node positions, the radius of the node is r, and calculate the distance between the nodes, if the node distance is less than 2r, the two nodes coincide;

(4-2)计算每个节点之间的相对位置得到相对矩阵M，对于节点{a1,a2....}(i＝K1),计算得到矩阵M为：(4-2) Calculate the relative position between each node to obtain the relative matrix M. For the nodes {a1, a2....} (i=K1), the calculated matrix M is:

其中，针对每一个节点，定义其右上方标记为0，其左上方标记为1，其左下方标记为2，其右下方标记为3，然后计算每一个原始节点与其他所有节点之间的相对位置，得到K1阶矩阵；Among them, for each node, define its upper right label as 0, its upper left label as 1, its lower left label as 2, and its lower right label as 3, and then calculate the relative relationship between each original node and all other nodes position, get the K1 order matrix;

(4-4)计算两个节点之间的力作用所产生的位移，所述两点的位移计算公式为：(4-4) Calculate the displacement generated by the force action between the two nodes, and the displacement calculation formula of the two points is:

其中x表示两点之间的力作用所产生的位移，Δx表示两点的横坐标之差，Δy表示两点纵坐标之差，k为力作用系数；Where x represents the displacement generated by the force action between two points, Δx represents the difference between the abscissas of the two points, Δy represents the difference between the ordinates of the two points, and k is the force action coefficient;

(4-5)对于每个节点，计算其与其他所有节点间力作用所产生的位移，并进行累加得到每个节点的单位位移；(4-5) For each node, calculate the displacement generated by the force action between it and all other nodes, and accumulate to obtain the unit displacement of each node;

(4-6)根据每个节点的单位位移，依次更新K1个节点的坐标；(4-6) According to the unit displacement of each node, update the coordinates of K1 nodes in turn;

(4-7)计算更新后的每个节点之间的相对位置得到相对矩阵M1，比较M和M1，即比较各点之间的相对位置和更新后的各点之间的相对位置；对于相对位置发生改变的两节点P₁,P₂，则根据两点原始相对角度和原始坐标计算两点的最大位移，更新P₂的坐标；(4-7) Calculate the updated relative position between each node to obtain the relative matrix M1, and compare M and M1, that is, compare the relative position between the points and the relative position between the updated points; For the two nodes P ₁ and P ₂ whose positions have changed, the maximum displacement of the two points is calculated according to the original relative angle and the original coordinates of the two points, and the coordinates of P ₂ are updated;

(4-8)重复步骤(4-4)到(4-7)，迭代n次，直至所有节点均不重叠；(4-8) Repeat steps (4-4) to (4-7), and iterate n times until all nodes do not overlap;

(5)使用线连接相似区域的字形；计算区域的节点的属性，属性是由预测算法的各个指标组成的，然后聚类算法将相似的加点归为一类，最后用线段连接这些区域对应的相似字形，并创建贝塞尔曲线进行美化，其中，线段连接字形的算法过程如下：(5) Use lines to connect the glyphs of similar areas; calculate the attributes of the nodes of the area, the attributes are composed of various indicators of the prediction algorithm, and then the clustering algorithm classifies similar points into one category, and finally uses line segments to connect the corresponding areas of these areas. Similar glyphs, and create Bezier curves for beautification. The algorithm process of connecting glyphs with line segments is as follows:

(5-1)输入一系列坐标点S＝{P₁,P₂···P_t}；(5-1) Input a series of coordinate points S={P ₁ , P ₂ ···P _t };

(5-2)选择第一个点P₁作为初始点p₀；(5-2) Select the first point P ₁ as the initial point p ₀ ;

(5-3)计算p₀与其他点的欧氏距离，并选出距离最近的点p_k，并保存线段[p₀,p_k]和最近距离dis，从S中删除p₀；(5-3) Calculate the Euclidean distance between p ₀ and other points, select the closest point p _k , save the line segment [p ₀ , p _k ] and the closest distance dis, and delete p ₀ from S;

(5-4)令p₀＝p_k，重复步骤(5-3)；(5-4) Let p ₀ =p _k , repeat step (5-3);

(5-5)迭代(t-1)次，即当S中只有一个元素的时候停止迭代，并保存此连接线路为lines0，保存线路长度distance0；(5-5) Iterate (t-1) times, that is, stop the iteration when there is only one element in S, and save the connection line as lines0, and save the line length distance0;

(5-6)依次选择P₂···P_t作为初始点p₀，重复步骤(5-2)到步骤(5-5)，得到(t-1)种连接线路及其线路长度，并筛选出线段互不相交(共端点不算相交)的连接线路，然后从中选出线路长度最短的一种连接线路作为线段连接方案；(5-6) Select P ₂ ...P _t as the initial point p ₀ in turn, and repeat steps (5-2) to (5-5) to obtain (t-1) types of connection lines and their line lengths, and Screen out the connection lines whose line segments do not intersect with each other (the common endpoints are not considered to intersect), and then select the connection line with the shortest line length as the line segment connection scheme;

(6)提出了一个线布局算法来检测线段与节点的碰撞并重新规划线条路径，实现步骤如下：(6) A line layout algorithm is proposed to detect the collision between line segments and nodes and re-plan the line paths. The implementation steps are as follows:

(6-1)节点的半径为r，检测每个节点到线段的距离d，若r>d，则检测为节点与线段碰撞；否则，继续判断下一个节点；(6-1) The radius of the node is r, and the distance d from each node to the line segment is detected. If r>d, it is detected as a collision between the node and the line segment; otherwise, continue to judge the next node;

(6-2)当节点与线段碰撞时，判断节点与线段的相对位置，若节点在线段的左侧，则在节点右侧选择驻点，否则，在节点左侧选择驻点；(6-2) When the node collides with the line segment, judge the relative position of the node and the line segment, if the node is on the left side of the line segment, select the stagnation point on the right side of the node, otherwise, select the stagnation point on the left side of the node;

(6-3)连接驻点与节点中心，在非相似节点的圆周得到两个交叉点，找到距离线段最近的交叉点，然后在该点的阈值θ内产生虚拟节点,连接虚拟节点与相似的节点。(6-3) Connect the stagnation point and the node center, obtain two intersection points on the circumference of the dissimilar nodes, find the intersection point closest to the line segment, and then generate a virtual node within the threshold θ of this point, and connect the virtual node with the similar node.

本发明的技术构思是：设计一种交互式可视化分析系统，用于对预测模型及其性能进行交互式分析理解和可视化比较。首先，对空间数据进行分析，然后通过数据集建立预测模型、完成预测，最后设计了一组可视化字形以更好地显示模型的预测结果，同时提出了新颖的布局算法来解决字形与字形碰撞的问题和字形与线段碰撞的问题，展示了预测模型间的差异性与相关性。帮助了用户更好地了解预测模型及其参数对预测结果的影响，以及预测结果与不同模型之间的差异。The technical idea of the present invention is to design an interactive visual analysis system for interactive analysis, understanding and visual comparison of the prediction model and its performance. Firstly, the spatial data is analyzed, then the prediction model is established through the data set, and the prediction is completed. Finally, a set of visual glyphs are designed to better display the prediction results of the model. At the same time, a novel layout algorithm is proposed to solve the collision between glyphs and glyphs. The problem and the glyph-line-segment collision problem demonstrate the differences and correlations between prediction models. It helps users to better understand the impact of the forecasting model and its parameters on the forecasting results, as well as the differences between the forecasting results and different models.

本发明的有益效果：通过可视化分析比较预测模型，结合时间序列/地理空间数据和多种预测模型性能，全面考虑时空数据与模型输出之间的关系，设计一种交互式可视化分析系统，允许用户交互式探索时空数据模型预测输出的相似性，加深对模型性能的理解。创新的字形设计和地理布局设计使用户可以直观地对预测输出进行深入分析。Beneficial effects of the present invention: through visual analysis and comparison of prediction models, combined with the performance of time series/geospatial data and various prediction models, comprehensively considering the relationship between spatiotemporal data and model output, an interactive visual analysis system is designed, allowing users to Interactively explore the similarity of model prediction outputs on spatiotemporal data to gain a deeper understanding of model performance. Innovative glyph design and geographic layout design allow users to intuitively perform in-depth analysis of the forecast output.

附图说明Description of drawings

图1为本发明的流程图。FIG. 1 is a flow chart of the present invention.

图2为本发明的可视化字形图。FIG. 2 is a visual glyph diagram of the present invention.

图3为本发明的字形布局图，其中，(a)表示区域字形，(b)表示布局后的区域字形。FIG. 3 is a font layout diagram of the present invention, wherein (a) represents the regional font, and (b) represents the regional font after layout.

图4为本发明的连接线布局图，其中，(a)表示区域字形的连线，(b)表示布局后的区域字形的连线。FIG. 4 is a layout diagram of the connection lines of the present invention, wherein (a) represents the connection lines of the regional fonts, and (b) represents the connection lines of the regional fonts after the layout.

具体实施方案specific implementation

下面根据附图和优选实施例详细描述本发明，本发明的目的和效果将变得更加明白，以下结合附图和实施例，对本发明进行进一步详细说明。应当理解，此处所描述的具体实施例仅仅用以解释本发明，并不用于限定本发明。The present invention will be described in detail below according to the accompanying drawings and preferred embodiments, and the purpose and effects of the present invention will become clearer. It should be understood that the specific embodiments described herein are only used to explain the present invention, but not to limit the present invention.

参照图1～图4，一种时空数据预测模型的交互式选择方法，包括以下步骤：Referring to Fig. 1 to Fig. 4, an interactive selection method of a spatiotemporal data prediction model includes the following steps:

(3)提出可视化字形展示每个区域的四种预测算法的预测结果(如图2)，分析展示步骤如下：(3) Propose a visual font to display the prediction results of the four prediction algorithms in each region (as shown in Figure 2). The analysis and display steps are as follows:

(3-2)使用柱状图作为区域图的第二层：分别将上述得到的R-Squared值、MAE值和RMSE值按照逆时针顺序绘制柱状图分布在第二层上，每个柱子的高度表示数值大小，每个柱子显示参数名称；分别使用不同的纹理来表示不同的预测算法得到预测结果，其中反斜杠系表示的是RandomForest算法，竖杠表示的是LSTM算法，横杠表示的是ARIMA算法，斜杠系表示的是SVM算法；同一纹理的柱状图的不同灰度示同一算法的不同训练模型得到的预测结果；(3-2) Use the histogram as the second layer of the area map: draw the R-Squared value, the MAE value and the RMSE value obtained above in a counterclockwise order and distribute the histogram on the second layer. The height of each column is Represents the numerical value, and each column displays the parameter name; different textures are used to represent different prediction algorithms to obtain prediction results, where the backslashes represent the RandomForest algorithm, the vertical bars represent the LSTM algorithm, and the horizontal bars represent the ARIMA algorithm, the slashes represent the SVM algorithm; the different gray levels of the histogram of the same texture represent the prediction results obtained by different training models of the same algorithm;

(4)基于地图预测区域字形的布局算法；根据区域的中心位置，将字形的原始位置放置在该区域的中心点，用改进的力导向布局算法对重叠的字形重新布局(如图3)，布局算法的过程为：(4) Layout algorithm based on map prediction area glyph; according to the center position of the area, place the original position of the glyph at the center point of the area, and use the improved force-oriented layout algorithm to re-layout the overlapping glyphs (as shown in Figure 3), The process of the layout algorithm is:

(4-8)重复步骤(4-4)到(4-7)，迭代n次，直至所有节点均不重叠。(4-8) Repeat steps (4-4) to (4-7), and iterate n times until all nodes do not overlap.

(5)使用线连接相似区域的字形(如图4(a))；计算区域的节点的属性，属性是由预测算法的各个指标组成的，然后聚类算法将相似的加点归为一类，最后用线段连接这些区域对应的相似字形，并创建贝塞尔曲线进行美化，其中，线段连接字形的算法过程如下：(5) Use lines to connect the glyphs of similar areas (as shown in Figure 4(a)); calculate the attributes of the nodes in the area. Finally, line segments are used to connect similar glyphs corresponding to these areas, and Bezier curves are created for beautification. The algorithm process of connecting glyphs with line segments is as follows:

(5-6)依次选择P₂···P_t作为初始点p₀，重复步骤(5-2)到步骤(5-5)，得到(t-1)种连接线路及其线路长度，并筛选出线段互不相交(共端点不算相交)的连接线路，然后从中选出线路长度最短的一种连接线路作为线段连接方案5(5-6) Select P ₂ ...P _t as the initial point p ₀ in turn, and repeat steps (5-2) to (5-5) to obtain (t-1) types of connection lines and their line lengths, and Filter out the connection lines whose line segments do not intersect with each other (the common endpoints are not considered to intersect), and then select the connection line with the shortest line length as the line segment connection scheme 5

(6)参考图4，在一个优选的实施例子中，提出了一个线布局算法来检测线段与节点的碰撞并重新规划线条路径，实现步骤如下：(6) Referring to FIG. 4, in a preferred embodiment, a line layout algorithm is proposed to detect the collision between line segments and nodes and re-plan the line paths. The implementation steps are as follows:

本实施例设计一种交互式可视化分析系统，用于对预测模型及其性能进行交互式分析理解和可视化比较。首先，对空间数据进行分析，然后通过数据集建立预测模型、完成预测，最后设计了一组可视化字形以更好地显示模型的预测结果，同时提出了新颖的布局算法来解决字形与字形碰撞的问题和字形与线段碰撞的问题，展示了预测模型间的差异性与相关性。帮助了用户更好地了解预测模型及其参数对预测结果的影响，以及预测结果与不同模型之间的差异。This embodiment designs an interactive visual analysis system, which is used for interactive analysis, understanding and visual comparison of the prediction model and its performance. Firstly, the spatial data is analyzed, then the prediction model is established through the data set, and the prediction is completed. Finally, a set of visual glyphs are designed to better display the prediction results of the model. At the same time, a novel layout algorithm is proposed to solve the collision between glyphs and glyphs. The problem and the glyph-line-segment collision problem demonstrate the differences and correlations between prediction models. It helps users to better understand the impact of the forecasting model and its parameters on the forecasting results, as well as the differences between the forecasting results and different models.

本领域普通技术人员可以理解，以上所述仅为发明的优选实例而已，并不用于限制发明，尽管参照前述实例对发明进行了详细的说明，对于本领域的技术人员来说，其依然可以对前述各实例记载的技术方案进行修改，或者对其中部分技术特征进行等同替换。凡在发明的精神和原则之内，所做的修改、等同替换等均应包含在发明的保护范围之内。Those of ordinary skill in the art can understand that the above are only preferred examples of the invention and are not intended to limit the invention. Although the invention has been described in detail with reference to the foregoing examples, those skilled in the art can still understand the Modifications are made to the technical solutions described in the foregoing examples, or equivalent replacements are made to some of the technical features. All modifications and equivalent replacements made within the spirit and principle of the invention shall be included within the protection scope of the invention.

Claims

1. the interactive selection method of a spatiotemporal data prediction model, is characterized in that, described method comprises the following steps:

(1) Obtain the shared bicycle data, delete the bicycle data that is not in our analysis area and the riding time is abnormal, and judge whether the data of each bicycle at all time points on all dates in this area is empty, if it is empty, use 0 Fill in and make a data set;

(2) Combine the Canopy clustering algorithm and the K-means clustering algorithm to cluster the data set obtained above, and take the first K1 clusters as the prediction area; then, construct the traffic data of the K1 prediction areas. Model and predict, the test results include predicted data, real data, MAE value of predicted data, RMSE value of predicted data, R-Squared value and 1-R-Squared value of predicted data;

(3) The prediction results of the four prediction algorithms for each region are presented in a visualized glyph. The analysis and display steps are as follows:

(3-1) Use the multi-layer radar map as the first layer of the area map: take the (1-R-Squared) value, MAE value and RMSE value obtained above as the vertices of the radar map in anti-clockwise order. The vertices display the parameter names and numerical values. The four algorithms are displayed by stacking four layers of radar images. Each layer of radar images is represented by a different texture. The backslashes represent the Random Forest algorithm, and the vertical bars represent the LSTM algorithm. , the horizontal bar represents the ARIMA algorithm, and the slash system represents the SVM algorithm;

(3-2) Use the histogram as the second layer of the area map: draw the R-Squared value, the MAE value and the RMSE value obtained above in a counterclockwise order and distribute the histogram on the second layer. The height of each column is Indicates the numerical value, and each column displays the parameter name; different textures are used to represent different prediction algorithms to obtain prediction results, where the backslashes represent the Random Forest algorithm, the vertical bars represent the LSTM algorithm, and the horizontal bars represent the is the ARIMA algorithm, and the slashes represent the SVM algorithm; the different gray levels of the histogram of the same texture represent the prediction results obtained by different training models of the same algorithm;

(4) Layout algorithm based on map prediction area glyph; according to the center position of the area, place the original position of the glyph at the center point of the area, and use the improved force-oriented layout algorithm to re-layout the overlapping glyphs. The process of the layout algorithm is as follows: :

(4-1) Input K1 initial node positions, the radius of the node is r, and calculate the distance between the nodes, if the node distance is less than 2r, the two nodes coincide;

(4-2) Calculate the relative position between each node to obtain the relative matrix M. For the nodes {a1, a2....} (i=K1), the calculated matrix M is:

Among them, for each node, define its upper right label as 0, its upper left label as 1, its lower left label as 2, and its lower right label as 3, and then calculate the relative relationship between each original node and all other nodes position, get the K1 order matrix;

(4-4) Calculate the displacement generated by the force action between the two nodes, and the displacement calculation formula of the two points is:

Where x represents the displacement generated by the force action between two points, Δx represents the difference between the abscissas of the two points, Δy represents the difference between the ordinates of the two points, and k is the force action coefficient;

(4-5) For each node, calculate the displacement generated by the force action between it and all other nodes, and accumulate to obtain the unit displacement of each node;

(4-6) According to the unit displacement of each node, update the coordinates of K1 nodes in turn;

(4-7) Calculate the updated relative position between each node to obtain the relative matrix M1, and compare M and M1, that is, compare the relative position between the points and the relative position between the updated points; For the two nodes P ₁ and P ₂ whose positions have changed, the maximum displacement of the two points is calculated according to the original relative angle and the original coordinates of the two points, and the coordinates of P ₂ are updated;

(4-8) Repeat steps (4-4) to (4-7), and iterate n times until all nodes do not overlap;

(5) Use lines to connect the glyphs of similar areas; calculate the attributes of the nodes of the area, the attributes are composed of various indicators of the prediction algorithm, and then the clustering algorithm classifies similar points into one category, and finally uses line segments to connect the corresponding areas of these areas. Similar glyphs, and create Bezier curves for beautification. The algorithm process of connecting glyphs with line segments is as follows:

(5-1) Input a series of coordinate points S={P ₁ , P ₂ …P _t };

(5-2) Select the first point P ₁ as the initial point p ₀ ;

(5-3) Calculate the Euclidean distance between p ₀ and other points, select the closest point p _k , save the line segment [p ₀ , p _k ] and the closest distance dis, and delete p ₀ from S;

(5-4) Let p ₀ =p _k , repeat step (5-3);

(5-5) Iterate (t-1) times, that is, stop the iteration when there is only one element in S, and save the connection line as lines0, and save the line length distance0;

(5-6) Select P ₂ ... P _t as the initial point p ₀ in turn, repeat steps (5-2) to (5-5) to obtain (t-1) types of connection lines and their line lengths, and filter out Connecting lines whose line segments do not intersect with each other, and then select a connecting line with the shortest line length as the line segment connection scheme;

(6) A line layout algorithm is proposed to detect the collision between line segments and nodes and re-plan the line paths. The implementation steps are as follows:

(6-1) The radius of the node is r, and the distance d from each node to the line segment is detected. If r>d, it is detected as a collision between the node and the line segment; otherwise, continue to judge the next node;

(6-2) When the node collides with the line segment, judge the relative position of the node and the line segment, if the node is on the left side of the line segment, select the stagnation point on the right side of the node, otherwise, select the stagnation point on the left side of the node;

(6-3) Connect the stagnation point and the node center, obtain two intersection points on the circumference of the dissimilar nodes, find the intersection point closest to the line segment, and then generate a virtual node within the threshold θ of this point, and connect the virtual node with the similar node.