CN117271099B

CN117271099B - Automatic space data analysis scheduling system and method based on rule base

Info

Publication number: CN117271099B
Application number: CN202311552215.XA
Authority: CN
Inventors: 李想
Original assignee: Shandong Normal University
Current assignee: Shandong Normal University
Priority date: 2023-11-21
Filing date: 2023-11-21
Publication date: 2024-01-26
Anticipated expiration: 2043-11-21
Also published as: CN117271099A

Abstract

An automatic scheduling system and method for spatial data analysis based on a rule base, including a spatial data input module, a data preprocessing module, a rule definition and matching module, a data processing automatic scheduling and optimization module, an error handling module, a user interface module, and a spatial data The input module is used to upload spatial data, the data preprocessing module is used to preprocess spatial data, the rule definition and matching module is used to define rule base rules and match spatial data and rules, and the data processing automatic scheduling and optimization module is used to schedule spatial data processing. tasks and optimizes the scheduling process, the error handling module is used to handle scheduling exceptions, and the user interface module is used to provide a user interface. The present invention proposes a distributed optimal matching algorithm that improves the two-way multiplier method to match spatial data and rules, and proposes an automatic scheduling algorithm that improves the deep Q network to automatically schedule spatial data processing tasks.

Description

An automatic scheduling system and method for spatial data analysis based on a rule base

技术领域Technical field

本发明创造涉及空间数据处理、最优匹配和任务自动调度领域，具体涉及一种基于规则库的空间数据分析自动调度系统及方法。The invention relates to the fields of spatial data processing, optimal matching and automatic task scheduling, and specifically relates to an automatic scheduling system and method for spatial data analysis based on a rule base.

背景技术Background technique

空间数据处理是对空间数据进行获取、预处理、分析、存储和可视化操作的技术手段，这些技术用于处理不同种类的空间数据，如地理信息系统（GIS）数据、遥感数据、地理定位数据等，以执行各种空间数据分析任务，获取各种类型的空间数据，包括卫星图像、传感器数据、地图数据和GPS轨迹数据，使用各种地理信息分析和空间数据分析方法，如地理空间模式识别、空间缓冲区分析和地图代数运算，来派生有关空间数据的有用信息，将预定义的规则库和模型应用于空间数据，以自动化执行特定分析任务，以便用户能够更轻松的进行空间数据分析。Spatial data processing is a technical means to acquire, preprocess, analyze, store and visualize spatial data. These technologies are used to process different types of spatial data, such as geographic information system (GIS) data, remote sensing data, geolocation data, etc. , to perform various spatial data analysis tasks, obtain various types of spatial data, including satellite images, sensor data, map data and GPS trajectory data, and use various geographical information analysis and spatial data analysis methods, such as geospatial pattern recognition, Spatial buffer analysis and map algebra operations are used to derive useful information about spatial data, and predefined rule libraries and models are applied to spatial data to automate specific analysis tasks so that users can perform spatial data analysis more easily.

最优匹配是用于确定最佳匹配规则的方法，该技术的目的是从规则库中选择适当的规则，以便执行空间数据分析任务，并在考虑不同规则的性能和约束条件时获得最佳结果，通常使用一组评估标准来评估每个规则的适用性，这些标准包括规则的准确性、精度、计算效率和内存使用，根据这些评估标准对规则口模型排名，以确定哪个规则最适合当前的分析任务，最优匹配技术的选择将取决于问题的具体需求、规则库的性质和空间数据分析任务的特征，这些技术的目标是确保能够选择和执行最适合当前任务的规则，以实现最佳的分析结果。Optimal matching is a method used to determine the best matching rules. The purpose of this technique is to select appropriate rules from a rule base in order to perform spatial data analysis tasks and obtain the best results when considering the performance and constraints of different rules. , the suitability of each rule is usually evaluated using a set of evaluation criteria, including the rule's accuracy, precision, computational efficiency, and memory usage. The rule port model is ranked according to these evaluation criteria to determine which rule is most suitable for the current Analysis task, the choice of the optimal matching technique will depend on the specific needs of the problem, the nature of the rule base and the characteristics of the spatial data analysis task. The goal of these techniques is to ensure that the most suitable rules for the current task can be selected and executed to achieve the best analysis results.

任务自动调度是能自动识别、计划和安排执行不同的空间数据分析任务，而无需用户手动干预的技术，该技术通常基于一组事先定义的规则和条件来执行以下任务，基于规则库中的条件触发器，用户提供的任务描述到的数据变化来完成，计划各种分析任务的执行顺序和时间，涉及到优先级分配、任务依赖性分析和资源分配，根据任务需求来分配计算资源、存储资源和数据访问权限，以便执行分析任务，一旦任务被识别、计划和资源分配，这些任务就会自动执行，自动调度技术的目标是降低用户的操作负担，提高空间数据分析的效率和自动化程度，用户能在有需求时设置规则库和任务参数，然后根据自动调度方法自动执行任务，这对于大规模、复杂的空间数据分析任务特别有用，能提高工作效率和准确性。Automatic task scheduling is a technology that can automatically identify, plan and arrange the execution of different spatial data analysis tasks without the need for manual intervention by the user. This technology is usually based on a set of pre-defined rules and conditions to perform the following tasks, based on conditions in the rule base Triggers are used to complete the data changes in the task description provided by the user. Plan the execution sequence and time of various analysis tasks, involving priority allocation, task dependency analysis and resource allocation, and allocate computing resources and storage resources according to task requirements. and data access rights in order to perform analysis tasks. Once the tasks are identified, planned and resource allocated, these tasks will be automatically executed. The goal of automatic scheduling technology is to reduce the user's operational burden and improve the efficiency and automation of spatial data analysis. Users It can set the rule base and task parameters when needed, and then automatically execute the task according to the automatic scheduling method. This is particularly useful for large-scale and complex spatial data analysis tasks, and can improve work efficiency and accuracy.

发明内容Contents of the invention

针对上述问题，本发明旨在提供一种基于规则库的空间数据分析自动调度方法。In response to the above problems, the present invention aims to provide an automatic scheduling method for spatial data analysis based on a rule base.

本发明创造的目的通过以下技术方案实现：The purpose of the invention is achieved through the following technical solutions:

一种基于规则库的空间数据分析自动调度方法，包括空间数据输入模块、数据预处理模块、规则定义与匹配模块、数据处理自动调度与优化模块、错误处理模块和用户界面模块，空间数据输入模块用于上传空间数据，数据预处理模块用于预处理空间数据，规则定义与匹配模块包括规则库定义单元和规则匹配单元，规则库定义单元用于定义规则库规则，规则匹配单元提出改进双向乘数法的分布式最优匹配算法将空间数据与规则库规则进行最优匹配，数据处理自动调度与优化模块包括自动调度单元和调度优化单元，自动调度单元提出改进深度Q网络的自动调度算法自动调度空间数据处理任务，调度优化单元用于优化自动调度过程，错误处理模块用于处理调度过程中出现的异常，用户界面模块用于提供用户界面。An automatic scheduling method for spatial data analysis based on a rule base, including a spatial data input module, a data preprocessing module, a rule definition and matching module, a data processing automatic scheduling and optimization module, an error handling module and a user interface module, a spatial data input module It is used to upload spatial data. The data preprocessing module is used to preprocess spatial data. The rule definition and matching module includes a rule base definition unit and a rule matching unit. The rule base definition unit is used to define rule base rules. The rule matching unit proposes improved two-way multiplication. The mathematical distributed optimal matching algorithm optimally matches spatial data with rule base rules. The data processing automatic scheduling and optimization module includes an automatic scheduling unit and a scheduling optimization unit. The automatic scheduling unit proposes an automatic scheduling algorithm that improves the deep Q network. Scheduling spatial data processing tasks, the scheduling optimization unit is used to optimize the automatic scheduling process, the error handling module is used to handle exceptions that occur during the scheduling process, and the user interface module is used to provide a user interface.

进一步的，空间数据输入模块用过卫星遥感、传感器、数据库、互联网数据源获取各种类型的空间数据，并将空间数据上传至规则库。Furthermore, the spatial data input module uses satellite remote sensing, sensors, databases, and Internet data sources to obtain various types of spatial data, and uploads the spatial data to the rule base.

进一步的，数据预处理模块通过清理和修复空间数据中的错误、缺失和不一致的信息，同时进行数据校准和数据降维，以此来对空间数据进行预处理。Furthermore, the data preprocessing module preprocesses spatial data by cleaning and repairing errors, missing and inconsistent information in spatial data, while performing data calibration and data dimensionality reduction.

进一步的，规则库定义单元用于定义和管理规则，通过条件、操作和数据处理步骤规则来指导系统如何有效的处理和分析大规模的空间数据，并将规则按照不同的任务、分析类型和数据类型进行分类，以根据应用适当的规则处理空间数据。Furthermore, the rule base definition unit is used to define and manage rules. It uses conditions, operations and data processing step rules to guide the system on how to effectively process and analyze large-scale spatial data, and organizes the rules according to different tasks, analysis types and data. Types are classified to process spatial data according to applying appropriate rules.

进一步的，规则匹配单元提出去改进双向乘数法的分布式最优匹配算法对空间数据与规则库规则进行最优匹配。Furthermore, the rule matching unit proposes to improve the distributed optimal matching algorithm of the two-way multiplier method to optimally match spatial data and rule base rules.

进一步的，改进双向乘数法的分布式最优匹配算法具体如下：首先，通过计算空间数据样本间的距离自适应地判断邻域半径，根据邻域半径，得到每类数据空间样本的密度，并增大聚类中心，同时，利用模糊聚类有效性指数来判断当前的聚类效果，然后，选择最佳的聚类数和聚类中心，最后，通过最小化聚类目标函数，优化聚类结果，具体如下：在转向点的空间数据集中，其中，/>为转向点的数据集，/>为第1个转向点的密度，/>为第2个转向点的密度，/>为第/>个转向点的密度，/>为第n个转向点的密度，为了保证算法的适应性，邻域半径根据等式自适应的确定，即：，其中，M为邻域半径，/>为第/>个转向点的密度，/>为第/>个转向点的密度，/>是/>和/>之间的欧氏距离，当/>被聚类成k时，聚类中心为/>，其中，/>，/>为所选的一组聚类中心，/>为/>和集合/>中的聚类中心的距离之和，用户模糊聚类有效性指数来衡量聚类效果，模糊聚类有效性指数为：/>，其中，/>为第/>类中第/>个数据样本/>的隶属度，/>为第m个团簇中心，/>为第h个团簇中心，/>为/>和/>之间的距离，/>为/>和之间的距离，/>为/>和/>的最大公约数，/>为/>和/>的最小公倍数，然后从起始点的位置开始寻找最佳转向区域；Further, the distributed optimal matching algorithm of the improved two-way multiplier method is as follows: First, the neighborhood radius is adaptively determined by calculating the distance between spatial data samples. Based on the neighborhood radius, the density of each type of data space sample is obtained. And increase the clustering center. At the same time, use the fuzzy clustering effectiveness index to judge the current clustering effect. Then, select the best cluster number and clustering center. Finally, optimize the clustering by minimizing the clustering objective function. Class results, as follows: Spatial dataset at turning points in, among,/> is the data set of turning points,/> is the density of the first turning point,/> is the density of the second turning point,/> For the first/> density of turning points,/> is the density of the nth turning point. In order to ensure the adaptability of the algorithm, the neighborhood radius is adaptively determined according to the equation, that is: , where M is the neighborhood radius,/> For the first/> density of turning points,/> For the first/> density of turning points,/> Yes/> and/> The Euclidean distance between/> When clustered into k, the cluster center for/> , where,/> ,/> is the selected group of cluster centers,/> for/> and collection/> The sum of the distances between the cluster centers in , the user fuzzy clustering effectiveness index is used to measure the clustering effect. The fuzzy clustering effectiveness index is:/> , where,/> For the first/> Class No./> data samples/> degree of membership,/> is the center of the m-th cluster,/> is the h-th cluster center,/> for/> and/> The distance between/> for/> and The distance between/> for/> and/> The greatest common divisor,/> for/> and/> The least common multiple of , and then search for the best turning area starting from the starting point;

然后求解最优匹配规则，空间数据和规则之间的匹配目标函数为，其中，R为空间数据和规则之间的匹配目标函数，/>为连续交叉乘积运算，/>为规则集合，A为空间数据的集合，a为A中的空间数据，B为规则的集合，b为B中的规则，/> 为空间数据a的效用函数，/>为规则b的效用函数，为最优匹配规则，双向乘数法为/>，且/>，其中，BM为双向乘数目标函数，/>为空间数据a的效用函数，/>为规则b的效用函数，X和Y为常数矩阵，c为常数向量，为方便对匹配目标函数为R进行求解，通过双向乘数法将匹配目标函数R改进为/>，即/>，其中，/>为改进后的匹配目标函数，通过拉格朗日乘数法对/>进行求解，即，其中，L为拉格朗日函数化公式，/>为拉格朗日乘子，为使求解过程更准确，加入对偶条件将/>进一步改进为/>，即，其中，/>为对/>后的匹配目标函数，要最快速度找到最优匹配规则/>才能对空间数据与规则库规则进行最优匹配，因此，针对算法的迭代过程进行改进，即+，/>+，其中，/>为迭代次数，/>为第/>次迭代下的连续交叉乘积运算，/>为找到最小的匹配规则/>，/>为控制收敛速度的参数，为第/>次迭代下的匹配规则，/>为第/>次迭代下的匹配规则，对进行求解，有/>；Then solve the optimal matching rule. The matching objective function between spatial data and rules is , where R is the matching objective function between spatial data and rules, /> For continuous cross product operation,/> is a set of rules, A is a set of spatial data, a is the spatial data in A, B is a set of rules, b is the rule in B,/> is the utility function of spatial data a,/> is the utility function of rule b, is the optimal matching rule, and the two-way multiplier method is/> , and/> , where BM is the two-way multiplier objective function,/> is the utility function of spatial data a,/> is the utility function of rule b, X and Y are constant matrices, and c is a constant vector. In order to facilitate the solution of the matching objective function R, the matching objective function R is improved to/> through the two-way multiplier method. , that is/> , where,/> For the improved matching objective function, the Lagrange multiplier method is used to To solve, that is , where L is the Lagrangian functional formula,/> is the Lagrange multiplier. In order to make the solution process more accurate, adding the dual condition will/> Further improved to/> ,Right now , where,/> for/> The final matching objective function is to find the optimal matching rule as quickly as possible/> Only in this way can the spatial data and rule base rules be optimally matched. Therefore, the iterative process of the algorithm should be improved, that is, + ,/> + , where,/> is the number of iterations,/> For the first/> Continuous cross product operation under iterations,/> To find the smallest matching rule/> ,/> is a parameter that controls the convergence speed, For the first/> Matching rules under iterations,/> For the first/> Matching rules under iterations, for To solve, there are/> ;

最后根据求解得到的最优匹配规则进行空间数据与规则之间的最优匹配，根据最优匹配规则/>进行空间数据与规则之间的匹配，首先，建立特征转向区域数据集/>，找到/>的中心点/>，然后给出任何特征的转向区域之间的距离，即：/>，其中，/>为特征转向区域/>和/>的欧几里得距离，特征转向区域的速度和路线为：/>，，/>，其中，/>为特征转向区域中特定类型的转向点的数量；计算第一转向区域/>和下一个转向区域/>之间的差异，用转向区域的总距离来表示，即：，其中，，/>为/>和/>之间的转向差，最后，最优匹配规则被转换成转向区域中总距离/>的最小值，即：/>，其中，/>为路线距离的权重，/>为转向区域的距离，改进双向乘数法的分布式最优匹配算法首先对数据进行分类，并在分类过程中通过自适应的确定邻域半径和基于样本密度逐渐增加聚类中心来获得每个数据点区域的密度，然后对匹配目标函数的两次改进以方便且精确的求解最优匹配规则，最后将最优匹配规则转换为空间数据转向区域中的总距离，实现对空间数据与规则库规则进行最优匹配。Finally, according to the optimal matching rules obtained by solving Perform optimal matching between spatial data and rules, based on the optimal matching rules/> To match spatial data and rules, first, establish a feature steering area data set/> , find/> center point/> , then gives the distance between the turning areas of any feature, that is:/> , where,/> Turn region for feature/> and/> The Euclidean distance, the speed and route of the characteristic turning area are:/> , ,/> , where,/> is the number of turning points of a specific type in the characteristic turning area; calculate the first turning area/> and next turning area/> The difference between them is expressed by the total distance of the turning area, that is: ,in, ,/> for/> and/> The steering difference between them, and finally, the optimal matching rule is converted into the total distance in the steering area/> The minimum value of , that is:/> , where,/> is the weight of route distance,/> In order to turn the distance of the region, the distributed optimal matching algorithm of the improved two-way multiplier method first classifies the data, and in the classification process, each clustering center is obtained by adaptively determining the neighborhood radius and gradually increasing the cluster center based on the sample density. The density of the data point area, and then the matching objective function is improved twice to conveniently and accurately solve the optimal matching rule. Finally, the optimal matching rule is converted into the total distance in the spatial data steering area to realize the integration of spatial data and rule base. rules for optimal matching.

进一步的，自动调度单元提出改进深度Q网络的自动调度算法自动调度空间数据处理任务。Furthermore, the automatic scheduling unit proposes an automatic scheduling algorithm that improves the deep Q network to automatically schedule spatial data processing tasks.

进一步的，改进深度Q网络的自动调度算法具体如下：奖励函数为，其中，/>为奖励函数，/>为调度策略，/>为归一化因子，为调度空间数据的最大完成时间，/>为调度空间数据的最大完成时间的下限，累计奖励为/>，其中，AR累计奖励，/>为时间t时的奖励，/>为时间t+1时的奖励，/>为时间t+2时的奖励，/>为时间t+N时的奖励，/>为折扣因子，n为/>之间的时间整数，深度Q网络中Q值的更新过程为/>，其中，/>在状态s下采取动作/>的Q值，/>为控制每次迭代中Q值的更新步长的学习率，/>为在状态s下采取动作后获得的奖励，/>为折扣因子，/>为采取动作/>后得到的新状态，/>为在新状态/>下的调度策略中的最佳调度动作，为解决深度Q网络过度估计问题，将/>改进为两个独立的Q值函数，即/>，，其中，/>为第1个独立的Q值函数，/>为第2个独立的Q值函数，将/>评估自动调度的最佳动作，将用于更新Q值，通过两个独立的Q值函数交互使用，解决深度Q网络过度估计问题，为使自动调度过程中具有更好的自适应性，因此对学习率/>通过学习率衰减改进为/>以提高算法性能，即/>，其中，/>为改进后的学习率，m为衰减因子，/>为迭代更新步数，同时加入/>因子控制学习率的衰减幅度，即/>，其中，/>为加入/>因子后的学习率，改进深度Q网络的自动调度算法首先将原始深度Q值函数分解为两个独立的深度Q值函数解决深度Q网络过度估计问题，然后提出学习率衰减和衰减幅度控制因子对学习率进行改进以使自动调度过程具有自适应性，且能更好的收敛，实现空间数据处理任务的自动调度。Further, the improved automatic scheduling algorithm of the deep Q network is as follows: the reward function is , where,/> is the reward function,/> is the scheduling strategy,/> is the normalization factor, is the maximum completion time of scheduling spatial data,/> is the lower limit of the maximum completion time of scheduling space data, and the cumulative reward is/> , among which, AR cumulative reward,/> is the reward at time t,/> is the reward at time t+1,/> is the reward at time t+2,/> is the reward at time t+N,/> is the discount factor, n is/> The time integer between, the update process of Q value in deep Q network is/> , where,/> Take action in state s/> Q value,/> To control the learning rate of the update step of the Q value in each iteration,/> To take action in state s Reward obtained after,/> is the discount factor,/> To take action/> The new state obtained after,/> for in new status/> The optimal scheduling action in the scheduling strategy under , in order to solve the overestimation problem of deep Q network, // Improved to two independent Q-value functions, namely/> , , where,/> is the first independent Q-value function,/> is the second independent Q-value function, // Evaluate the best action for automatic scheduling, which will Used to update the Q value, through the interactive use of two independent Q value functions, to solve the overestimation problem of the deep Q network. In order to have better adaptability in the automatic scheduling process, the learning rate/> Improved by learning rate decay to/> To improve algorithm performance, i.e./> , where,/> is the improved learning rate, m is the attenuation factor,/> Update the number of steps for the iteration and add/> The factor controls the attenuation amplitude of the learning rate, that is/> , where,/> To join/> The learning rate after the factor, improves the automatic scheduling algorithm of the deep Q network. First, the original depth Q value function is decomposed into two independent depth Q value functions to solve the overestimation problem of the deep Q network, and then a pair of learning rate attenuation and attenuation amplitude control factors are proposed. The learning rate is improved to make the automatic scheduling process adaptive and better convergent, thereby realizing automatic scheduling of spatial data processing tasks.

进一步的，调度优化单元通过动态调度实时空间数据来调整任务的执行顺序，建立监控系统来跟踪任务执行的性能和资源利用情况，以及任务执行时间的变化，实现对自动调度过程的优化。Furthermore, the scheduling optimization unit adjusts the execution order of tasks by dynamically scheduling real-time spatial data, and establishes a monitoring system to track the performance and resource utilization of task execution, as well as changes in task execution time, to optimize the automatic scheduling process.

进一步的，错误处理模块通过监视整个数据分析和调度流程，以及输入的数据，以检测潜在的错误和异常情况，一旦出现错误，错误处理模块会记录错误的类型、时间以及相关信息，并进行故障排除和问题分析，以此来处理和管理在空间数据分析和自动调度过程中出现的错误和异常情况。Furthermore, the error processing module monitors the entire data analysis and scheduling process, as well as the input data, to detect potential errors and abnormal situations. Once an error occurs, the error processing module will record the type, time and related information of the error, and perform fault management. Troubleshooting and problem analysis to handle and manage errors and anomalies that occur during spatial data analysis and automated scheduling.

进一步的，用户界面模块用于为用户提供一个可视化和交互式的界面，以便用户能轻松的应用空间数据分析自动调度方法对空间数据进行管理，同时允许用户输入需处理的空间数据以及规则库规则，用户可以通过界面设置分析任务的具体要求，用户通过用户界面查看分析任务的结果，并以图形和图表的形式呈现，结果的可视化有助于用户更好的理解空间数据分析结果。Furthermore, the user interface module is used to provide users with a visual and interactive interface so that users can easily apply spatial data analysis automatic scheduling methods to manage spatial data, and at the same time allow users to input spatial data to be processed and rule base rules. , users can set specific requirements for analysis tasks through the interface. Users can view the results of analysis tasks through the user interface and present them in the form of graphics and charts. The visualization of the results helps users better understand the spatial data analysis results.

本发明创造的有益效果：本发明的创新点在于，提出了一种基于规则库的空间数据分析自动调度方法，用于空间数据分析的自动调度，通过空间数据输入模块、数据预处理模块、规则定义与匹配模块、数据处理自动调度与优化模块、错误处理模块和用户界面模块的融合，为空间数据处理分析提供一种自动调度方法，提出改进双向乘数法的分布式最优匹配算法对空间数据处理任务进行自动调度，本发明的创新之处在于，改进双向乘数法的分布式最优匹配算法首先对数据进行分类，并在分类过程中通过自适应的确定邻域半径和基于样本密度逐渐增加聚类中心来获得每个数据点区域的密度，然后对匹配目标函数的两次改进以方便且精确的求解最优匹配规则，最后将最优匹配规则转换为空间数据转向区域中的总距离，实现对空间数据与规则库规则进行最优匹配，提出改进深度Q网络的自动调度算法自动调度空间数据处理任务，本发明的创新之处在于，改进深度Q网络的自动调度算法首先将原始深度Q值函数分解为两个独立的深度Q值函数解决深度Q网络过度估计问题，然后提出学习率衰减和衰减幅度控制因子对学习率进行改进以使自动调度过程具有自适应性，且能更好的收敛，实现空间数据处理任务的自动调度，有效提高一种基于规则库的空间数据分析自动调度方法的工作效果，为一种基于规则库的空间数据分析自动调度方法提供更为全面、准确地技术支撑，为安全、科学、高效的一种基于规则库的空间数据分析自动调度方法提供更好的决策支持，同时，本发明涉及最优匹配算法和强化学习算法，为人们提供方便且高效的一种基于规则库的空间数据分析自动调度方法，也能为其他应用领域的发展巩固基础，在空间数据处理、最优匹配和任务自动调度鼎盛发展的时代，空间数据处理、最优匹配和任务自动调度的融合为多领域融合的发展打下了坚实的基础，且能应用于市场中的多个行业及领域，为空间数据处理、最优匹配和任务自动调度的融合提供了新的发展方向，为空间数据处理技术领域贡献了重要应用价值。Beneficial effects created by the present invention: The innovative point of the present invention is to propose an automatic scheduling method for spatial data analysis based on a rule base, which is used for automatic scheduling of spatial data analysis. Through the spatial data input module, data preprocessing module, and rules The integration of the definition and matching module, data processing automatic scheduling and optimization module, error handling module and user interface module provides an automatic scheduling method for spatial data processing and analysis, and proposes a distributed optimal matching algorithm that improves the two-way multiplier method for spatial data processing. Data processing tasks are automatically scheduled. The innovation of the present invention is that the distributed optimal matching algorithm that improves the two-way multiplier method first classifies the data, and during the classification process, the neighborhood radius and sample density are adaptively determined. Gradually increase the clustering center to obtain the density of each data point area, then improve the matching objective function twice to conveniently and accurately solve the optimal matching rules, and finally convert the optimal matching rules into the total number of points in the spatial data steering area. distance to achieve optimal matching of spatial data and rule base rules, and an improved deep Q network automatic scheduling algorithm is proposed to automatically schedule spatial data processing tasks. The innovation of the present invention is that the improved deep Q network automatic scheduling algorithm first combines the original The deep Q-value function is decomposed into two independent deep Q-value functions to solve the overestimation problem of the deep Q-network, and then the learning rate attenuation and attenuation amplitude control factors are proposed to improve the learning rate to make the automatic scheduling process adaptive and more efficient. Good convergence enables automatic scheduling of spatial data processing tasks, effectively improves the working effect of an automatic scheduling method for spatial data analysis based on a rule base, and provides a more comprehensive and accurate method for an automatic scheduling method for spatial data analysis based on a rule base. Technical support is provided to provide better decision-making support for a safe, scientific and efficient spatial data analysis automatic scheduling method based on a rule base. At the same time, the present invention involves an optimal matching algorithm and a reinforcement learning algorithm to provide people with convenient and efficient An automatic scheduling method for spatial data analysis based on a rule base can also consolidate the foundation for the development of other application fields. In the era of flourishing development of spatial data processing, optimal matching and automatic task scheduling, spatial data processing, optimal matching and The integration of automatic task scheduling has laid a solid foundation for the development of multi-domain integration, and can be applied to multiple industries and fields in the market, providing a new development direction for the integration of spatial data processing, optimal matching and automatic task scheduling. , which has contributed important application value to the field of spatial data processing technology.

附图说明Description of the drawings

利用附图对发明创造作进一步说明，但附图中的实施例不构成对本发明创造的任何限制，对于本领域的普通技术人员，在不付出创造性劳动的前提下，还可以根据以下附图获得其它的附图。The invention and creation are further explained using the accompanying drawings, but the embodiments in the drawings do not constitute any limitation on the invention and creation. For those of ordinary skill in the art, without exerting creative work, they can also obtain the results according to the following drawings. Other drawings.

图1是本发明结构示意图。Figure 1 is a schematic structural diagram of the present invention.

具体实施方式Detailed ways

结合以下实施例对本发明作进一步描述。The present invention will be further described in conjunction with the following examples.

优选的，空间数据输入模块用过卫星遥感、传感器、数据库、互联网数据源获取各种类型的空间数据，如地理信息系统（GIS）数据、气象数据、地形数据、人口数据，并将空间数据上传至规则库。Preferably, the spatial data input module uses satellite remote sensing, sensors, databases, and Internet data sources to obtain various types of spatial data, such as geographic information system (GIS) data, meteorological data, terrain data, and population data, and uploads the spatial data to the rule base.

优选的，数据预处理模块通过清理和修复空间数据中的错误、缺失和不一致的信息，同时进行数据校准和数据降维，减少计算复杂性并提高分析效率，以此来对空间数据进行预处理。Preferably, the data preprocessing module preprocesses spatial data by cleaning and repairing errors, missing and inconsistent information in spatial data, while performing data calibration and data dimensionality reduction, reducing computational complexity and improving analysis efficiency. .

优选的，规则库定义单元用于定义和管理规则，通过条件、操作和数据处理步骤规则来指导系统如何有效的处理和分析大规模的空间数据，并将规则按照不同的任务、分析类型和数据类型进行分类，以根据应用适当的规则处理空间数据，提高空间数据分析的效率，同时确保一致性和可重复性。Preferably, the rule base definition unit is used to define and manage rules, guide the system how to effectively process and analyze large-scale spatial data through conditions, operations and data processing step rules, and apply the rules according to different tasks, analysis types and data Types are classified to process spatial data according to the application of appropriate rules, improving the efficiency of spatial data analysis while ensuring consistency and repeatability.

优选的，规则匹配单元提出去改进双向乘数法的分布式最优匹配算法对空间数据与规则库规则进行最优匹配。Preferably, the rule matching unit is proposed to improve the distributed optimal matching algorithm of the two-way multiplier method to optimally match the spatial data and the rules of the rule base.

具体的，改进双向乘数法的分布式最优匹配算法具体如下：首先，通过计算空间数据样本间的距离自适应地判断邻域半径，根据邻域半径，得到每类数据空间样本的密度，并增大聚类中心，同时，利用模糊聚类有效性指数来判断当前的聚类效果，然后，选择最佳的聚类数和聚类中心，最后，通过最小化聚类目标函数，优化聚类结果，具体如下：在转向点的空间数据集中，其中，KP为转向点的数据集，/>为第1个转向点的密度，/>为第2个转向点的密度，/>为第/>个转向点的密度，/>为第n个转向点的密度，由于转向点的密度/>是其邻域半径/>内相邻转向点的个数，因此转向点密度与域半径有关，为了保证算法的适应性，邻域半径根据等式自适应的确定，即：/>，其中，M为邻域半径，/>为第/>个转向点的密度，/>为第/>个转向点的密度，/>是/>和/>之间的欧氏距离，当KP被聚类成/>时，聚类中心/>为/>，其中，为所选的一组聚类中心，/>为/>和集合/>中的聚类中心的距离之和，用户模糊聚类有效性指数来衡量聚类效果，模糊聚类有效性指数为：/>，其中，/>为第类中第/>个数据样本/>的隶属度，/>为第m个团簇中心，/>为第h个团簇中心，/>为/>和/>之间的距离，/>为和/>之间的距离，/>为/>和/>的最大公约数，/>为/>和/>的最小公倍数，然后从起始点的位置开始寻找最佳转向区域；Specifically, the distributed optimal matching algorithm of the improved two-way multiplier method is as follows: First, the neighborhood radius is adaptively determined by calculating the distance between spatial data samples. According to the neighborhood radius, the density of each type of data space sample is obtained. And increase the clustering center. At the same time, use the fuzzy clustering effectiveness index to judge the current clustering effect. Then, select the best cluster number and clustering center. Finally, optimize the clustering by minimizing the clustering objective function. Class results, as follows: Spatial dataset at turning points , where KP is the data set of turning points, /> is the density of the first turning point,/> is the density of the second turning point,/> For the first/> density of turning points,/> is the density of the nth turning point. Since the density of the turning point/> is its neighborhood radius/> The number of adjacent turning points within, so the turning point density is related to the domain radius. In order to ensure the adaptability of the algorithm, the neighborhood radius is adaptively determined according to the equation, that is:/> , where M is the neighborhood radius,/> For the first/> density of turning points,/> For the first/> density of turning points,/> Yes/> and/> Euclidean distance between, when KP is clustered into/> When, the clustering center/> for/> ,in, is the selected group of cluster centers,/> for/> and collection/> The sum of the distances between the clustering centers in , the user fuzzy clustering effectiveness index is used to measure the clustering effect. The fuzzy clustering effectiveness index is:/> , where,/> for the first Class No./> data samples/> degree of membership,/> is the center of the m-th cluster,/> is the h-th cluster center,/> for/> and/> The distance between/> for and/> The distance between/> for/> and/> The greatest common divisor,/> for/> and/> The least common multiple of , and then search for the best turning area starting from the starting point;

然后求解最优匹配规则，空间数据和规则之间的匹配目标函数为，其中，R为空间数据和规则之间的匹配目标函数，/>为连续交叉乘积运算，/>为规则集合，A为空间数据的集合，/>为A中的空间数据，B为规则的集合，b为B中的规则，/>为空间数据/>的效用函数，/>为规则b的效用函数，/>为最优匹配规则，双向乘数法为/>，且，其中，BM为双向乘数目标函数，/>为空间数据/>的效用函数，/>为规则b的效用函数，X和Y为常数矩阵，c为常数向量，为方便对匹配目标函数为R进行求解，通过双向乘数法将匹配目标函数R改进为/>，即，其中，为改进后的匹配目标函数，通过拉格朗日乘数法对/>进行求解，即，其中，L为拉格朗日函数化公式，/>为拉格朗日乘子，为使求解过程更准确，加入对偶条件将/>进一步改进为/>，即，其中，/>为对/>后的匹配目标函数，要最快速度找到最优匹配规则/>才能对空间数据与规则库规则进行最优匹配，因此，针对算法的迭代过程进行改进，即+，，其中，/>为迭代次数，/>为第/>次迭代下的连续交叉乘积运算，/>为找到最小的匹配规则/>，/>为控制收敛速度的参数，/>为第/>次迭代下的匹配规则，为第/>次迭代下的匹配规则，对/>进行求解，有；Then solve the optimal matching rule. The matching objective function between spatial data and rules is , where R is the matching objective function between spatial data and rules, /> For continuous cross product operation,/> is a set of rules, A is a set of spatial data,/> is the spatial data in A, B is the set of rules, b is the rule in B,/> For spatial data/> The utility function of /> is the utility function of rule b,/> is the optimal matching rule, and the two-way multiplier method is/> ,and , where BM is the two-way multiplier objective function,/> For spatial data/> The utility function of /> is the utility function of rule b, X and Y are constant matrices, and c is a constant vector. In order to facilitate the solution of the matching objective function R, the matching objective function R is improved to/> through the two-way multiplier method. ,Right now ,in, For the improved matching objective function, the Lagrange multiplier method is used to To solve, that is , where L is the Lagrangian functional formula,/> is the Lagrange multiplier. In order to make the solution process more accurate, adding the dual condition will/> Further improved to/> ,Right now , where,/> for/> The final matching objective function is to find the optimal matching rule as quickly as possible/> Only in this way can the spatial data and rule base rules be optimally matched. Therefore, the iterative process of the algorithm should be improved, that is, + , , where,/> is the number of iterations,/> For the first/> Continuous cross product operation under iterations,/> To find the smallest matching rule/> ,/> is a parameter that controls the convergence speed,/> For the first/> Matching rules under iterations, For the first/> Matching rules under iterations, pair/> To solve, there is ;

最后根据求解得到的最优匹配规则进行空间数据与规则之间的最优匹配，根据最优匹配规则/>进行空间数据与规则之间的匹配，首先，建立特征转向区域数据集/>，找到/>的中心点/>，然后给出任何特征的转向区域之间的距离，即：/>，其中，/>为特征转向区域/>和/>的欧几里得距离，特征转向区域的速度和路线为：/>，，/>，其中，/>为特征转向区域中特定类型的转向点的数量；计算第一转向区域/>和下一个转向区域/>之间的差异，用转向区域的总距离来表示，即：，其中，，/>为/>和/>之间的转向差，最后，最优匹配规则被转换成转向区域中总距离/>的最小值，即：，其中，/>为路线距离的权重，/>为转向区域的距离，改进双向乘数法的分布式最优匹配算法首先对数据进行分类，并在分类过程中通过自适应的确定邻域半径和基于样本密度逐渐增加聚类中心来获得每个数据点区域的密度，然后对匹配目标函数的两次改进以方便且精确的求解最优匹配规则，最后将最优匹配规则转换为空间数据转向区域中的总距离，实现对空间数据与规则库规则进行最优匹配。Finally, according to the optimal matching rules obtained by solving Perform optimal matching between spatial data and rules, based on the optimal matching rules/> To match spatial data and rules, first, establish a feature steering area data set/> , find/> center point/> , then gives the distance between the turning areas of any feature, that is:/> , where,/> Turn region for feature/> and/> The Euclidean distance, the speed and route of the characteristic turning area are:/> , ,/> , where,/> is the number of turning points of a specific type in the characteristic turning area; calculate the first turning area/> and next turning area/> The difference between them is expressed by the total distance of the turning area, that is: ,in, ,/> for/> and/> The steering difference between them, and finally, the optimal matching rule is converted into the total distance in the steering area/> The minimum value of , that is: , where,/> is the weight of route distance,/> In order to turn the distance of the region, the distributed optimal matching algorithm of the improved two-way multiplier method first classifies the data, and in the classification process, each clustering center is obtained by adaptively determining the neighborhood radius and gradually increasing the cluster center based on the sample density. The density of the data point area, and then the matching objective function is improved twice to conveniently and accurately solve the optimal matching rule. Finally, the optimal matching rule is converted into the total distance in the spatial data steering area to realize the integration of spatial data and rule base. rules for optimal matching.

优选的，自动调度单元提出改进深度Q网络的自动调度算法自动调度空间数据处理任务。Preferably, the automatic scheduling unit proposes an automatic scheduling algorithm that improves the deep Q network to automatically schedule spatial data processing tasks.

具体的，改进深度Q网络的自动调度算法具体如下：奖励函数为，其中，/>为奖励函数，/>为调度策略，/>为归一化因子，为调度空间数据的最大完成时间，/>为调度空间数据的最大完成时间的下限，该奖励函数更适合于空间数据的调度问题，累计奖励为其中，AR为累计奖励，/>为时间t时的奖励，/>为时间t+1时的奖励，/>为时间t+2时的奖励，/>为时间t+N时的奖励，/>为折扣因子，n为/>之间的时间整数，深度Q网络中Q值的更新过程为，其中，/>为在状态s下采取动作/>的Q值，/>为控制每次迭代中Q值的更新步长的学习率，/>为在状态s下采取动作后获得的奖励，/>为折扣因子，/>为采取动作/>后得到的新状态，/>为在新状态/>下的调度策略中的最佳调度动作，Q值通过不断迭代更新，以便更好的估计在每个状态下采取每个动作的长期累积奖励，为解决深度Q网络过度估计问题，将/>改进为两个独立的Q值函数，即/>，，其中，/>为第1个独立的Q值函数，/>为第2个独立的Q值函数，将/>评估自动调度的最佳动作，将用于更新Q值，通过两个独立的Q值函数交互使用，解决深度Q网络过度估计问题，为使自动调度过程中具有更好的自适应性，因此对学习率/>通过学习率衰减改进为/>以提高算法性能，即/>，其中，/>为改进后的学习率，m为衰减因子，/>为迭代更新步数，同时加入/>因子控制学习率的衰减幅度，即/>，其中，/>为加入/>因子后的学习率，改进深度Q网络的自动调度算法首先将原始深度Q值函数分解为两个独立的深度Q值函数解决深度Q网络过度估计问题，然后提出学习率衰减和衰减幅度控制因子对学习率进行改进以使自动调度过程具有自适应性，且能更好的收敛，实现空间数据处理任务的自动调度。Specifically, the automatic scheduling algorithm of the improved deep Q network is as follows: the reward function is , where,/> is the reward function,/> is the scheduling strategy,/> is the normalization factor, is the maximum completion time of scheduling spatial data,/> is the lower limit of the maximum completion time of scheduling spatial data. This reward function is more suitable for the scheduling problem of spatial data. The cumulative reward is Among them, AR is the cumulative reward,/> is the reward at time t,/> is the reward at time t+1,/> is the reward at time t+2,/> is the reward at time t+N,/> is the discount factor, n is/> time integer between, the update process of Q value in deep Q network is , where,/> To take action in state s/> Q value,/> To control the learning rate of the update step of the Q value in each iteration,/> To take action in state s Reward obtained after,/> is the discount factor,/> To take action/> The new state obtained after,/> for in new status/> The optimal scheduling action in the scheduling strategy under the Q value is updated through continuous iteration to better estimate the long-term cumulative reward of taking each action in each state. In order to solve the overestimation problem of the deep Q network, // Improved to two independent Q-value functions, namely/> , , among which,/> is the first independent Q-value function,/> is the second independent Q-value function, // Evaluate the best action for automatic scheduling, which will Used to update the Q value, through the interactive use of two independent Q value functions, to solve the overestimation problem of the deep Q network. In order to have better adaptability in the automatic scheduling process, the learning rate/> Improved by learning rate decay to/> To improve algorithm performance, i.e./> , among which,/> is the improved learning rate, m is the attenuation factor,/> Update the number of steps for the iteration and add/> The factor controls the attenuation amplitude of the learning rate, that is/> , among which,/> To join/> The learning rate after the factor, improves the automatic scheduling algorithm of the deep Q network. First, the original depth Q value function is decomposed into two independent depth Q value functions to solve the overestimation problem of the deep Q network, and then a pair of learning rate attenuation and attenuation amplitude control factors are proposed. The learning rate is improved to make the automatic scheduling process adaptive and better convergent, thereby realizing automatic scheduling of spatial data processing tasks.

优选的，调度优化单元通过动态调度实时空间数据来调整任务的执行顺序，建立监控系统来跟踪任务执行的性能和资源利用情况，以及任务执行时间的变化，实现对自动调度过程的优化。Preferably, the scheduling optimization unit adjusts the execution order of tasks by dynamically scheduling real-time spatial data, and establishes a monitoring system to track the performance and resource utilization of task execution, as well as changes in task execution time, to optimize the automatic scheduling process.

优选的，错误处理模块通过监视整个数据分析和调度流程，以及输入的数据，以检测潜在的错误和异常情况，一旦出现错误，错误处理模块会记录错误的类型、时间以及相关信息，并进行故障排除和问题分析，以此来处理和管理在空间数据分析和自动调度过程中出现的错误和异常情况，对于一些已知的错误和异常情况，错误处理模块尝试自动处理它们，例如，如果某些数据缺失，则根据事先定义的规则进行填充，如果规则匹配出现问题，则进行匹配规则的调整。Preferably, the error processing module monitors the entire data analysis and scheduling process, as well as the input data, to detect potential errors and abnormal situations. Once an error occurs, the error processing module will record the type, time and related information of the error, and perform fault management. Troubleshooting and problem analysis to handle and manage errors and exceptions that occur during spatial data analysis and automatic scheduling. For some known errors and exceptions, the error handling module attempts to automatically handle them. For example, if some If the data is missing, it will be filled in according to the predefined rules. If there is a problem with the rule matching, the matching rules will be adjusted.

优选的，用户界面模块用于为用户提供一个可视化和交互式的界面，以便用户能轻松的应用空间数据分析自动调度方法对空间数据进行管理，同时允许用户输入需处理的空间数据以及规则库规则，例如空间数据集、规则库、分析任务参数等，用户可以通过界面设置分析任务的具体要求，包括所需的规则、分析方法和期望的结果，用户通过用户界面查看分析任务的结果，并以图形和图表的形式呈现，结果的可视化有助于用户更好的理解空间数据分析结果。Preferably, the user interface module is used to provide users with a visual and interactive interface so that users can easily apply spatial data analysis automatic scheduling methods to manage spatial data, and at the same time allow users to input spatial data to be processed and rule base rules. , such as spatial data sets, rule bases, analysis task parameters, etc. Users can set specific requirements for analysis tasks through the interface, including required rules, analysis methods and expected results. Users can view the results of analysis tasks through the user interface and use Presented in the form of graphs and charts, the visualization of results helps users better understand spatial data analysis results.

提出了一种基于规则库的空间数据分析自动调度方法，用于空间数据分析的自动调度，通过空间数据输入模块、数据预处理模块、规则定义与匹配模块、数据处理自动调度与优化模块、错误处理模块和用户界面模块的融合，为空间数据处理分析提供一种自动调度方法，提出改进双向乘数法的分布式最优匹配算法对空间数据处理任务进行自动调度，本发明的创新之处在于，改进双向乘数法的分布式最优匹配算法首先对数据进行分类，并在分类过程中通过自适应的确定邻域半径和基于样本密度逐渐增加聚类中心来获得每个数据点区域的密度，然后对匹配目标函数的两次改进以方便且精确的求解最优匹配规则，最后将最优匹配规则转换为空间数据转向区域中的总距离，实现对空间数据与规则库规则进行最优匹配，提出改进深度Q网络的自动调度算法自动调度空间数据处理任务，本发明的创新之处在于，改进深度Q网络的自动调度算法首先将原始深度Q值函数分解为两个独立的深度Q值函数解决深度Q网络过度估计问题，然后提出学习率衰减和衰减幅度控制因子对学习率进行改进以使自动调度过程具有自适应性，且能更好的收敛，实现空间数据处理任务的自动调度，有效提高一种基于规则库的空间数据分析自动调度方法的工作效果，为一种基于规则库的空间数据分析自动调度方法提供更为全面、准确地技术支撑，为安全、科学、高效的一种基于规则库的空间数据分析自动调度方法提供更好的决策支持，同时，本发明涉及最优匹配算法和强化学习算法，为人们提供方便且高效的一种基于规则库的空间数据分析自动调度方法，也能为其他应用领域的发展巩固基础，在空间数据处理、最优匹配和任务自动调度鼎盛发展的时代，空间数据处理、最优匹配和任务自动调度的融合为多领域融合的发展打下了坚实的基础，且能应用于市场中的多个行业及领域，为空间数据处理、最优匹配和任务自动调度的融合提供了新的发展方向，为空间数据处理技术领域贡献了重要应用价值。An automatic scheduling method for spatial data analysis based on a rule base is proposed for automatic scheduling of spatial data analysis. Through the spatial data input module, data preprocessing module, rule definition and matching module, data processing automatic scheduling and optimization module, error The integration of the processing module and the user interface module provides an automatic scheduling method for spatial data processing and analysis, and proposes a distributed optimal matching algorithm that improves the two-way multiplier method to automatically schedule spatial data processing tasks. The innovation of the present invention is that , the distributed optimal matching algorithm that improves the two-way multiplier method first classifies the data, and during the classification process, the density of each data point area is obtained by adaptively determining the neighborhood radius and gradually increasing the cluster center based on the sample density. , then improve the matching objective function twice to conveniently and accurately solve the optimal matching rules, and finally convert the optimal matching rules into the total distance in the spatial data steering area to achieve optimal matching of spatial data and rule base rules. , an improved automatic scheduling algorithm for deep Q networks is proposed to automatically schedule spatial data processing tasks. The innovation of the present invention is that the improved automatic scheduling algorithm for deep Q networks first decomposes the original depth Q-value function into two independent depth Q-value functions. Solve the overestimation problem of the deep Q network, and then propose learning rate attenuation and attenuation amplitude control factors to improve the learning rate so that the automatic scheduling process is adaptive and can converge better, realizing automatic scheduling of spatial data processing tasks, and effectively Improve the working effect of an automatic scheduling method for spatial data analysis based on a rule base, provide more comprehensive and accurate technical support for an automatic scheduling method for spatial data analysis based on a rule base, and provide a safe, scientific and efficient method based on The spatial data analysis automatic scheduling method of the rule base provides better decision support. At the same time, the present invention relates to the optimal matching algorithm and the reinforcement learning algorithm, providing people with a convenient and efficient spatial data analysis automatic scheduling method based on the rule base. It can also solidify the foundation for the development of other application fields. In an era when spatial data processing, optimal matching and automatic task scheduling are flourishing, the integration of spatial data processing, optimal matching and automatic task scheduling has laid a solid foundation for the development of multi-field integration. It provides a new development direction for the integration of spatial data processing, optimal matching and automatic task scheduling, and contributes important application value to the field of spatial data processing technology.

最后应当说明的是，以上实施例仅用以说明本发明的技术方案，而非对本发明保护范围的限制，尽管参照较佳实施例对本发明作了详细地说明，本领域的普通技术人员应当理解，可以对本发明的技术方案进行修改，而不脱离本发明技术方案的实质和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention and are not intended to limit the scope of the present invention. Although the present invention has been described in detail with reference to the preferred embodiments, those of ordinary skill in the art will understand that , the technical solution of the present invention can be modified without departing from the essence and scope of the technical solution of the present invention.

Claims

1. An automatic scheduling system for spatial data analysis based on a rule base, which is characterized by including a spatial data input module, a data preprocessing module, a rule definition and matching module, a data processing automatic scheduling and optimization module, an error handling module and a user interface module, the spatial data input module is used to upload spatial data, the data preprocessing module is used to preprocess spatial data, the rule definition and matching module includes a rule base definition unit and a rule matching unit, the rule base definition unit is used to define rule base rules, rules The matching unit uses a distributed optimal matching algorithm with an improved two-way multiplier method to optimally match spatial data with rule base rules. The data processing automatic scheduling and optimization module includes an automatic scheduling unit and a scheduling optimization unit. The automatic scheduling unit proposes an improved depth Q The network's automatic scheduling algorithm automatically schedules spatial data processing tasks, the scheduling optimization unit is used to optimize the automatic scheduling process, the error handling module is used to handle exceptions that occur during the scheduling process, and the user interface module is used to provide a user interface;

The distributed optimal matching algorithm of the improved two-way multiplier method is specifically as follows: first, the neighborhood radius is adaptively determined by calculating the distance between spatial data samples, and based on the neighborhood radius, the density of each type of data space sample is obtained, and Increase the clustering center, and use the fuzzy clustering effectiveness index to judge the current clustering effect. Then, select the optimal number of clusters and clustering centers. Finally, optimize the clustering by minimizing the clustering objective function. The results are as follows: in the spatial data set of turning points KP = {kp ₁ , kp ₂ ,...,kp _i ,...,kp _n }, where KP is the data set of turning points, and kp ₁ is the first turning point The density of points, kp ₂ is the density of the second turning point, kp _i is the density of the i-th turning point, kp _n is the density of the n-th turning point, in order to ensure the adaptability of the algorithm, the neighborhood radius is based on the equation Adaptive determination, that is: Among them, M is the neighborhood radius, kp _i is the density of the i-th turning point, kp _j is the density of the j-th turning point, d(kp _i , kp _j ) is the Euclidean distance between kp _i and kp _j , when KP is clustered into k, the cluster center O _k is O _k ={kp _i |i＝argmax(B _i ×|M _ε (kp)|)}, where,/> O＝{O ₁ ,O ₂ ,…,O _k-1 } is a selected group of cluster centers,/> It is the sum of the distances between tp _i and the cluster center in set O. The user fuzzy clustering effectiveness index is used to measure the clustering effect. The fuzzy clustering effectiveness index is:/> Among them, U _m,i is the membership degree of the i-th (i=1,2,...,n) data sample kp _i in the m-th (m=1,2,...,k) class, and O _m is the m-th Cluster center, O _h is the h-th cluster center, d(O _m ,O _h ) is the distance between O _m and O _h , d(kp _i ,O _m ) is the distance between kp _i and O _m Distance, maxd(O _m ,O _h ) is the greatest common divisor of O _m and O _h , mind(O _m , O _h ) is the least common multiple of O _m and O _h , and then start from the starting point to find the best turn area;

Then solve the optimal matching rule. The matching objective function between spatial data and rules is Among them, R is the matching objective function between spatial data and rules, ΠΓ is the continuous cross product operation, Γ is the set of rules, A is the set of spatial data, a is the spatial data in A, B is the set of rules, and b is For the rules in B, f _a is the utility function of spatial data a, g _b is the utility function of rule b, π _ab is the optimal matching rule, the two-way multiplier method is BM=min(f _a +g _b ), and Xa +Yb=c, where BM is a two-way multiplier objective function, f _a is the utility function of spatial data a, g _b is the utility function of rule b, X and Y are constant matrices, and c is a constant vector to facilitate pair matching The objective function is R to solve, and the matching objective function R is improved to R' through the two-way multiplier method, that is Among them, R' is the improved matching objective function, which is solved by the Lagrange multiplier method, that is, L=R'+∑ _a∈A ∑ _b∈B λ(f _a (π _ab )-g _a (π _a )), where L is the Lagrange functional formula and λ is the Lagrange multiplier. In order to make the solution process more accurate, the dual condition is added to further improve R' to R", that is, /> Among them, R" is the matching objective function after R'. Only by finding the optimal matching rule π _ab as quickly as possible can the spatial data and rule base rules be optimally matched. Therefore, the iterative process of the algorithm is improved, that is/ > Among them, k is the number of iterations, ΠΓ(k+1) is the continuous cross product operation under the (k+1)th iteration, In order to find the smallest matching rule π _ab , η is the parameter that controls the convergence speed, π _ab (k) is the matching rule under the k-th iteration, π _ab (k+1) is the matching rule under the k+1-th iteration , solving for π _ab (k+1), we have

Finally, the optimal matching between spatial data and rules is performed according to the optimal matching rule π _ab (k) obtained from the solution, and the matching between spatial data and rules is performed according to the optimal matching rule π _ab (k). First, the characteristics are established Turning to the regional dataset KR _typei , find center point/> Then the distance between the turning regions of any feature is given, i.e.: in, is the Euclidean distance between the characteristic turning area i and i+k, and the speed and route of the characteristic turning area are:/> Among them, x is the number of specific types of turning points in the characteristic turning area; the difference between the first turning area KR _i and the next turning area KR _i+1 is calculated, expressed by the total distance of the turning area, that is:/> Among them,/> θ is the steering difference between KR _i and KR _i+1 . Finally, the optimal matching rule is converted into the minimum value of the total distance D _total in the steering area, that is: D _total =D _course ω _C +D _distance ω _D , Among them, ω _C is the weight of the route distance, and ω _D is the distance of the turning area. The distributed optimal matching algorithm that improves the two-way multiplier method first classifies the data, and adaptively determines the neighborhood radius and Based on the sample density, the cluster center is gradually increased to obtain the density of each data point area, and then the matching objective function is improved twice to conveniently and accurately solve the optimal matching rule, and finally the optimal matching rule is converted into a spatial data steering area The total distance in , achieving optimal matching of spatial data and rule base rules;

The automatic scheduling unit proposes an automatic scheduling algorithm that improves the deep Q network to automatically schedule spatial data processing tasks; the automatic scheduling algorithm of the improved deep Q network is specifically as follows: the reward function is Among them, r(π) is the reward function, π is the scheduling strategy, μ is the normalization factor, and T _max is the maximum completion time of the scheduling space data,/> is the lower limit of the maximum completion time of scheduling spatial data, and the cumulative reward is Among them, AR is the cumulative reward, r _t is the reward at time t, r _t+1 is the reward at time t+1, r _t+2 is the reward at time t+2, r _{t+N is the reward} at time t+ The reward at N time, γ is the discount factor, n is the time integer between [0, N], the update process of Q value in the deep Q network is Q(s,a)=(1-α)Q(s,a) )+α[r+γmax _a' Q(s',a')], where Q(s,a) is the Q value of taking action a in state s, and α controls the update of the Q value in each iteration The learning rate of the step length, r is the reward obtained after taking action a in state s, γ is the discount factor, s' is the new state obtained after taking action a, a' is the scheduling policy under new state s' _The optimal scheduling action _of ,a)+α[r+γmax _a' Q ₁ (s′,a′)], Q ₂ (s,a)＝(1-α)Q ₂ (s,a)+α[r+γmax _a' Q ₁ (s',a')], where Q ₁ (s, a) is the first independent Q-value function, Q ₂ (s, a) is the second independent Q-value function, Q ₁ (s,a) evaluates the best action of automatic scheduling, uses Q ₂ (s,a) to update the Q value, and uses two independent Q value functions interactively to solve the overestimation problem of deep Q networks. In order to make automatic scheduling The process has better adaptability, so the learning rate α is improved to α' through learning rate attenuation to improve algorithm performance, that is, α' = α·e ^-m·epoch , where α' is the improved learning rate , m is the attenuation factor, epoch is the number of iterative update steps, and the factor factor is added to control the attenuation amplitude of the learning rate, that is, α”=α’·factor, where α” is the learning rate after adding the factor factor, improving the deep Q network The automatic scheduling algorithm first decomposes the original depth Q-value function into two independent depth Q-value functions to solve the overestimation problem of the deep Q-network, and then proposes learning rate attenuation and attenuation amplitude control factors to improve the learning rate so that the automatic scheduling process has It is adaptive and can achieve better convergence and realize automatic scheduling of spatial data processing tasks.

2. An automatic scheduling system for spatial data analysis based on a rule base according to claim 1, characterized in that the spatial data input module uses satellite remote sensing, sensors, databases, and Internet data sources to obtain various types of spatial data. data and upload spatial data to the rule base.

3. An automatic scheduling system for spatial data analysis based on a rule base according to claim 1, characterized in that the data preprocessing module simultaneously cleans and repairs errors, missing and inconsistent information in the spatial data. Data calibration and data dimensionality reduction to preprocess spatial data.

4. A spatial data analysis automatic scheduling system based on a rule base according to claim 1, characterized in that the rule base definition unit is used to define and manage rules, guided by conditions, operations and data processing step rules. How the system can effectively process and analyze large-scale spatial data and classify rules according to different tasks, analysis types and data types to process spatial data according to the application of appropriate rules.

5. An automatic scheduling system for spatial data analysis based on a rule base according to claim 1, characterized in that the rule matching unit proposes to improve the distributed optimal matching algorithm of the two-way multiplier method for spatial data and rule base rules. Make the best match.

6. An automatic scheduling system for spatial data analysis based on a rule base according to claim 1, characterized in that the scheduling optimization unit adjusts the execution order of tasks by dynamically scheduling real-time spatial data and establishes a monitoring system to track tasks. The execution performance and resource utilization, as well as changes in task execution time, realize the optimization of the automatic scheduling process; the error handling module monitors the entire data analysis and scheduling process, as well as the input data, to detect potential errors and abnormal situations. , Once an error occurs, the error handling module will record the type, time and related information of the error, and perform troubleshooting and problem analysis to handle and manage errors and exceptions that occur during spatial data analysis and automatic scheduling.

7. A method for automatic scheduling of spatial data analysis using the system according to any one of claims 1 to 6.