CN104376365A

CN104376365A - Method for constructing information system running rule libraries on basis of association rule mining

Info

Publication number: CN104376365A
Application number: CN201410708182.8A
Authority: CN
Inventors: 陈龙; 刘嘉华; 何金陵; 康睿; 王琪; 周锁; 盛华
Original assignee: State Grid Corp of China SGCC; State Grid Jiangsu Electric Power Co Ltd; Nari Information and Communication Technology Co; Nanjing NARI Group Corp; Information and Telecommunication Branch of State Grid Jiangsu Electric Power Co Ltd
Current assignee: State Grid Corp of China SGCC; State Grid Jiangsu Electric Power Co Ltd; NARI Group Corp; Nari Information and Communication Technology Co; Information and Telecommunication Branch of State Grid Jiangsu Electric Power Co Ltd
Priority date: 2014-11-28
Filing date: 2014-11-28
Publication date: 2015-02-25
Anticipated expiration: 2034-11-28
Also published as: CN104376365B

Abstract

The invention discloses a construction method of an information system operation rule base based on association rule mining, which is characterized in that it includes the following steps: S01: Obtain the network topology structure of the information system and the dynamic monitoring indicators and static monitoring indicators of all equipment; S02 : Generate a network fault tree through the network topology and dynamic and static monitoring indicators of equipment, and generate a basic rule base through the network fault tree; S03: Execute the association rule mining algorithm on the historical data of the information system to obtain the association rule base; S04: Combine The basic rule base and the association rule base are inferred to generate an extended rule base; wherein, the retrieval priority of each rule base is: basic rule base>association rule base>extended rule base. Use fault tree technology and association rule mining technology to intelligently generate information system operation rule base, and use machine learning technology to optimize the rules. Furthermore, the three-domain structure of the rules is designed to realize the automatic sorting and automatic adjustment of the rules.

Description

A Construction Method of Information System Operating Rule Base Based on Association Rules Mining

技术领域technical field

本发明涉及一种基于关联规则挖掘的信息系统运行规则库的构造方法。The invention relates to a construction method of an information system operating rule base based on association rule mining.

背景技术Background technique

为保证信息系统安全、稳定、有效运行，国家电网公司在2008年启动了覆盖“综合网管、桌面管理、安全管理、运维服务”的信息运维综合监管系统(以下简称“IMS”)的建设，于2011年完成系统全网推广，并在2012年完成了以“深化采集、设备管理、一单两票、告警中心、展示中心、绿色机房”六大模块为核心的IMS系统的深化应用建设工作，全面覆盖了网络、网络设备、主机、数据库、中间件、桌面终端、安全设备等IT基础设备以及业务系统的实时监控，为全网的信息系统运行维护工作提供了技术支撑手段。In order to ensure the safe, stable, and effective operation of the information system, the State Grid Corporation of China launched the construction of an integrated information operation and maintenance supervision system (hereinafter referred to as "IMS") covering "integrated network management, desktop management, security management, and operation and maintenance services" in 2008. In 2011, the whole network promotion of the system was completed, and in 2012, the in-depth application construction of the IMS system was completed with the six modules of "deep collection, equipment management, one order and two tickets, alarm center, display center, and green computer room" as the core. The work fully covers the network, network equipment, host, database, middleware, desktop terminal, security equipment and other IT basic equipment and real-time monitoring of business systems, providing technical support for the operation and maintenance of the information system of the entire network.

但是在运行监控规则设定与判断方面还存在着以下不足：However, there are still the following deficiencies in the setting and judgment of operation monitoring rules:

一、IT基础设施与业务系统的运行性能监控还是需要运维人员根据历史运维经验和专业方向知识来设定监控阈值规则，不能自适应IT基础设施与业务系统的运行规律，在某些时间段内固化的监控阈值规则不符合实际运行情况，容易产生误报、漏报；1. The operation performance monitoring of IT infrastructure and business systems still requires operation and maintenance personnel to set monitoring threshold rules based on historical operation and maintenance experience and professional knowledge, which cannot adapt to the operating rules of IT infrastructure and business systems. The monitoring threshold rules solidified in the segment do not conform to the actual operating conditions, which is prone to false positives and false negatives;

二、设定的运行监控规则不能进行合理性的判断，无法验证设定的运行监控规则是否贴合IT基础设施与业务系统的实际运行情况；2. The set operation monitoring rules cannot be rationally judged, and it is impossible to verify whether the set operation monitoring rules fit the actual operation of IT infrastructure and business systems;

三、运行监控规则的设定没有自学习功能，不能根据IT基础设施与业务系统的历史运行情况自行调整优化。3. The setting of operation monitoring rules has no self-learning function, and cannot be adjusted and optimized according to the historical operation conditions of IT infrastructure and business systems.

发明内容Contents of the invention

针对上述问题，本发明提供一种基于关联规则挖掘的信息系统运行规则库的构造方法，利用故障树技术和关联规则挖掘技术来智能生成信息系统运行规则库，并采用机器学习技术来对规则进行优化。进一步的，设计了规则的三域结构，实现了规则的自动排序和自动调整。In view of the above problems, the present invention provides a method for constructing an information system operation rule base based on association rule mining, using fault tree technology and association rule mining technology to intelligently generate an information system operation rule base, and using machine learning technology to analyze the rules optimization. Furthermore, the three-domain structure of the rules is designed to realize the automatic sorting and automatic adjustment of the rules.

为实现上述技术目的，达到上述技术效果，本发明通过以下技术方案实现：In order to achieve the above-mentioned technical purpose and achieve the above-mentioned technical effect, the present invention is realized through the following technical solutions:

一种基于关联规则挖掘的信息系统运行规则库的构造方法，其特征在于，包括如下步骤：A method for constructing an information system operating rule base based on association rule mining, characterized in that it includes the following steps:

S01：获得信息系统的网络拓扑架构和所有设备的动态监控指标和静态监控指标；S01: Obtain the network topology of the information system and the dynamic monitoring indicators and static monitoring indicators of all equipment;

S02：通过网络拓扑架构和设备的动、静态监控指标生成网络故障树，并通过网络故障树生成基本规则库；S02: Generate a network fault tree through the network topology and dynamic and static monitoring indicators of equipment, and generate a basic rule base through the network fault tree;

S03：对信息系统的历史数据执行关联规则挖掘算法，得到关联规则库；S03: Execute an association rule mining algorithm on the historical data of the information system to obtain an association rule base;

S04：结合基本规则库和关联规则库进行推理生成扩展规则库；S04: Combining the basic rule base and the association rule base for reasoning to generate an extended rule base;

其中，各规则库的检索优先级是：基本规则库>关联规则库>扩展规则库。Wherein, the retrieval priority of each rule base is: basic rule base>associated rule base>extended rule base.

优选，基本规则库的每个规则为三域结构，即包括，Preferably, each rule of the basic rule base is a three-domain structure, including,

规则序列域：规则在实际的运行过程中执行成功的次数，执行失败的次数，规则最终计数及规则排序；Rule sequence field: the number of successful executions of the rules during the actual running process, the number of execution failures, the final count of the rules and the ordering of the rules;

规则标识域：用来标识该规则的从属对象；Rule identification field: used to identify the subordinate object of the rule;

规则主体域：用于对规则的详细说明。Rule subject field: used to describe the rule in detail.

优选，系统实时执行规则排序算法和规则流动算法对规则进行优先级确定和规则刷新。Preferably, the system executes the rule sorting algorithm and the rule flow algorithm in real time to determine the priority of the rules and refresh the rules.

其中，在每个规则库中，通过规则序列域中的规则最终计数指标来确定规则被检索的优先级，其中，规则最终计数的公式为：Among them, in each rule base, the rule retrieval priority is determined by the rule final count index in the rule sequence domain, where the formula for the rule final count is:

F＝R-0.5WF＝R-0.5W

式中，F为最终计数，R为在实际运行过程中规则执行成功的次数，W为规则执行失败的次数；如果对执行失败的场景进行机器学习，对相关规则经过优化并解决相关问题，则相应的执行失败的次数W减一。In the formula, F is the final count, R is the number of successful rule executions in the actual running process, and W is the number of rule execution failures; if machine learning is performed on the execution failure scenarios, the relevant rules are optimized and related problems are solved, then Correspondingly, the number W of execution failures is reduced by one.

优选，关联规则库的规则流动算法是：在系统实际运行过程中，规则只要有一次被证明是正确的，直接移动到基本规则库；如果该规则有两次被证明错误，则删除该规则。Preferably, the rule flow algorithm of the association rule base is: in the actual operation of the system, as long as the rule is proven correct once, it will be directly moved to the basic rule base; if the rule is proven wrong twice, the rule will be deleted.

优选，扩展规则库的规则流动算法是：使用历史数据来验证所有规则，Preferably, the rule flow algorithm of the extended rule base is: use historical data to verify all rules,

对于成功率在80％～100％的规则，使用历史数据进行机器学习后直接移动到基本规则库；For rules with a success rate of 80% to 100%, use historical data for machine learning and then move directly to the basic rule base;

对于成功率在60％～80％的规则，使用历史数据进行机器学习后，如果成功率大于80％则移动到基本规则库，否则继续留在扩展规则库，并接受运行数据的机器学习，直到其成功率大于80％；For rules with a success rate of 60% to 80%, after using historical data for machine learning, if the success rate is greater than 80%, move to the basic rule base, otherwise stay in the extended rule base, and accept machine learning of running data until Its success rate is greater than 80%;

对于成功率在50％～60％的规则，使用历史数据和运行数据进行机器学习，直到其成功率大于80％，移动到基本规则库，否则继续留在扩展规则库；For rules with a success rate of 50% to 60%, use historical data and operational data for machine learning until the success rate is greater than 80%, move to the basic rule base, otherwise stay in the extended rule base;

对于成功率小于50％的规则，直接删除。For the rules whose success rate is less than 50%, delete them directly.

本发明实现信息系统运行规则库动态构建和优化，可应用于公司信息运维综合监管平台，使监控告警规则的建立与维护更容易，规则匹配效率更高，从而迅速适应信息系统对象、运行环境、运行状态数据源的各种变化，同时满足大规模信息系统规则集匹配处理实时性要求，大大提高规则系统的实用性，提升信息系统监控报警、安全管理、行为审计和合规管理质量。The invention realizes the dynamic construction and optimization of the information system operation rule library, which can be applied to the company's information operation and maintenance comprehensive supervision platform, making the establishment and maintenance of monitoring and alarm rules easier, and the rule matching efficiency higher, thereby quickly adapting to information system objects and operating environments , Various changes in the operating state data source, while meeting the real-time requirements of large-scale information system rule set matching processing, greatly improving the practicability of the rule system, and improving the quality of information system monitoring and alarming, security management, behavior auditing and compliance management.

本发明的有益效果是：The beneficial effects of the present invention are:

一、规则库的区域化构造：本发明方法设计的规则库共有三个分区，分别存储基本规则、关联规则和扩展规则，其中基本规则的优先级最高，关联规则次之，扩展规则的优先级最低。通过规则库的分区，可以通过规则的优先级管理确定规则检索的优先级顺序，并且低区域规则可以通过不断的实时机器学习进行升级，实现规则由低向高的流动。1. The regionalized structure of the rule base: the rule base designed by the method of the present invention has three partitions, which store the basic rules, association rules and extension rules respectively, wherein the priority of the basic rules is the highest, followed by the association rules, and the priority of the extension rules lowest. Through the partitioning of the rule base, the priority order of rule retrieval can be determined through the priority management of rules, and the low-area rules can be upgraded through continuous real-time machine learning to realize the flow of rules from low to high.

二、规则的三域结构：规则的三域结构包括规则序列域，规则标识域和规则主体域：规则序列域通过量化的手段实现规则的优先级排序；规则标识域用来标识该规则的从属对象，以便于网络拓扑架构改变时的规则库自适应调整；规则主体域存储了规则的主体部分，这是对规则的详细说明。2. The three-domain structure of the rule: The three-domain structure of the rule includes the rule sequence domain, the rule identification domain and the rule subject domain: the rule sequence domain implements the priority ordering of the rules through quantitative means; the rule identification domain is used to identify the subordination of the rule object, so as to facilitate the adaptive adjustment of the rule base when the network topology changes; the rule body field stores the main part of the rule, which is a detailed description of the rule.

三、实时的自适应阈值调整：系统利用性能历史数据和运行数据，分析计算出适合业务运行告警要求的告警阈值，提高针对信息系统的告警自学习能力，采用阈值规划算法动态调整告警阈值，做到从事件的源头减少事件量，提高了监控告警的质量。3. Real-time adaptive threshold adjustment: the system uses historical performance data and operating data to analyze and calculate the alarm threshold suitable for business operation alarm requirements, improve the alarm self-learning ability for the information system, and use the threshold planning algorithm to dynamically adjust the alarm threshold. To reduce the amount of events from the source of the event, improve the quality of monitoring alarms.

四、新增规则入库的自动化合理性分析：新增规则可以由系统自动生成，也可以人工添加。对于新增规则，采用历史数据和实时运行数据对规则进行合理化分析，确定规则的可用性。4. Automatic rationality analysis of new rules storage: new rules can be automatically generated by the system, or added manually. For newly added rules, historical data and real-time operation data are used to rationalize the rules and determine the availability of the rules.

五、规则的自动调整优化：通过实时执行规则排序算法和规则流动算法，来对规则进行优先级确定和优先级的刷新或升级，确保规则库处于最优状态，提高规则的检索效率和规则的正确率，从而提高系统性能。5. Automatic adjustment and optimization of rules: through real-time execution of the rule sorting algorithm and rule flow algorithm, the priority of the rules is determined and the priority is refreshed or upgraded to ensure that the rule base is in the optimal state and improve the retrieval efficiency of rules and the efficiency of rules. accuracy, thereby improving system performance.

附图说明Description of drawings

图1是本发明一种基于关联规则挖掘的信息系统运行规则库的构造方法的流程图；Fig. 1 is a flow chart of a method for constructing an information system operation rule base based on association rule mining in the present invention;

图2是本发明基本规则库的规则的三域结构图；Fig. 2 is a three-domain structural diagram of the rules of the basic rule base of the present invention;

图3是本发明规则库三区扩展规则流动算法流程图。Fig. 3 is a flow chart of the rule flow algorithm for expanding the three regions of the rule base in the present invention.

具体实施方式Detailed ways

下面结合附图和具体的实施例对本发明技术方案作进一步的详细描述，以使本领域的技术人员可以更好的理解本发明并能予以实施，但所举实施例不作为对本发明的限定。The technical scheme of the present invention will be described in further detail below in conjunction with the accompanying drawings and specific embodiments, so that those skilled in the art can better understand the present invention and implement it, but the examples given are not intended to limit the present invention.

一种基于关联规则挖掘的信息系统运行规则库的构造方法，如图1所示，包括如下步骤：A method for constructing an information system operating rule base based on association rule mining, as shown in Figure 1, includes the following steps:

S01：获得信息系统的网络拓扑架构和所有设备的动态监控指标和静态监控指标。S01: Obtain the network topology of the information system and the dynamic monitoring indicators and static monitoring indicators of all equipment.

首先通过拓扑发现技术得到网络拓扑架构，然后对拓扑架构中的每个网络设备，采集相应的动态监控指标和静态监控指标，包括网络指标、安全指标、主机指标、数据库指标、中间件指标和业务系统指标六大类。Firstly, the network topology structure is obtained through topology discovery technology, and then for each network device in the topology structure, corresponding dynamic monitoring indicators and static monitoring indicators are collected, including network indicators, security indicators, host indicators, database indicators, middleware indicators and business indicators. Six categories of system indicators.

网络指标包括链路时延、网络设备健康运行时长、网络设备状态、网络设备CPU使用率、网络设备内存使用率、接受丢包率、发送丢包率、接收错包率、发送错包率、接口接收流量、接口发送流量、接口总流量和接口带宽利用率；安全指标包括安全事件、安全设备的状态(CPU、内存等)以及合规性；主机指标包括主机状态、健康运行时长、CPU使用率、内存使用率、磁盘空间使用率、关键进程数和主机配置信息。Network indicators include link delay, network device health running time, network device status, network device CPU usage, network device memory usage, receiving packet loss rate, sending packet loss rate, receiving error packet rate, sending error packet rate, Interface receiving traffic, interface sending traffic, interface total traffic, and interface bandwidth utilization; security indicators include security events, security device status (CPU, memory, etc.) and compliance; host indicators include host status, healthy running time, and CPU usage rate, memory usage, disk space usage, number of critical processes, and host configuration information.

数据库指标有SqlServer指标、Oracle指标和DB2指标。其中SqlServer指标包括SGA的命中率、可用缓存大小、字典缓冲区的命中率、共享缓存区的命中率、Redo日志缓存区的命中率、会话数量、可用会话数量、事务响应时间、表空间可用率、表空间增长率和MTS性能；Oracle指标包括会话数量、可用会话数量、事务响应时间、表空间可用率、表空间增长率、共享内存使用率、共享内存命中率和回滚段使用率；DB2指标包括Process可用率、缓冲池(Bufferpool)可用率、缓冲池命中率、表空间可用率、表空间增长率、排序指数(SortsPerTransaction)、会话数量以及可用会话数量。Database indicators include SqlServer indicators, Oracle indicators and DB2 indicators. The SqlServer indicators include SGA hit rate, available cache size, dictionary buffer hit rate, shared cache area hit rate, Redo log buffer hit rate, number of sessions, number of available sessions, transaction response time, and table space availability , table space growth rate and MTS performance; Oracle indicators include the number of sessions, the number of available sessions, transaction response time, table space availability, table space growth rate, shared memory usage, shared memory hit rate and rollback segment usage; DB2 Indicators include Process availability, buffer pool (Bufferpool) availability, buffer pool hit rate, table space availability, table space growth rate, sorting index (SortsPerTransaction), number of sessions, and number of available sessions.

中间件指标有Weblogic指标和Websphere指标。其中Weblogic指标包括JVM内存堆空闲量、JVM内存堆总量、JVM内存堆使用率、Servlet所有调用的执行时长、Servlet单个调用的最长执行时长、Servlet平均执行时长、Servlet执行次数、JDBC pool最大容量、JDBC Pool活动连接数的高水位线、JDBC Pool等待连接数的高水位线、JDBC Pool实例化以来累计的连接数、JDBC Pool平均活动连接数、JDBC Pool平均连接时延、JDBC Pool泄漏的连接数、JDBC pool的当前容量、JDBC Pool重新连接的失败数、JDBC Pool最大可用连接数、JDBCPool最大不可用连接数、JDBC Pool LEAKED连接数、JDBC Pool中的可用连接数、JDBC POOL中的不可用连接数、JDBC Pool利用率、当前会话数、最大会话数以及会话占用率；Websphere指标包括JVM内存空闲量、JVM内存总量、JVM内存使用率、平均会话生存期、当前访问的会话总数、当前存活的会话总数、JDBC pool最大容量、JDBC Pool平均活动连接数、JDBC Pool平均连接时延、JDBC Pool泄漏的连接数、JDBC pool的当前容量、JDBC Pool重新连接的失败数、JDBC Pool最大可用连接数、JDBC Pool最大不可用连接数、JDBC Pool LEAKED连接数、JDBC Pool中的可用连接数、JDBC POOL中的不可用连接数以及JDBC Pool利用率。Middleware indicators include Weblogic indicators and Websphere indicators. Among them, Weblogic indicators include JVM memory heap free amount, JVM memory heap total amount, JVM memory heap usage rate, execution time of all Servlet calls, longest execution time of a single Servlet call, Servlet average execution time, Servlet execution times, and JDBC pool maximum Capacity, the high water mark of the number of active connections in the JDBC Pool, the high water mark of the number of waiting connections in the JDBC Pool, the cumulative number of connections since the instantiation of the JDBC Pool, the average number of active connections in the JDBC Pool, the average connection delay in the JDBC Pool, and the leakage rate of the JDBC Pool The number of connections, the current capacity of the JDBC pool, the number of JDBC Pool reconnection failures, the maximum number of available connections in the JDBC Pool, the maximum number of unavailable connections in the JDBC Pool, the number of JDBC Pool LEAKED connections, the number of available connections in the JDBC Pool, and the number of unavailable connections in the JDBC Pool Number of connections, JDBC Pool utilization, current session number, maximum session number, and session occupancy rate; Websphere indicators include JVM memory free, JVM total memory, JVM memory usage, average session lifetime, total number of currently accessed sessions, The total number of currently surviving sessions, the maximum capacity of the JDBC pool, the average number of active connections of the JDBC pool, the average connection delay of the JDBC pool, the number of leaked connections of the JDBC pool, the current capacity of the JDBC pool, the number of JDBC pool reconnection failures, and the maximum available JDBC pool The number of connections, the maximum number of unavailable connections in JDBC Pool, the number of JDBC Pool LEAKED connections, the number of available connections in JDBC Pool, the number of unavailable connections in JDBC Pool, and the utilization rate of JDBC Pool.

业务系统指标包括在线用户数、日登录用户数、业务系统运行状态、业务系统接口状态和业务系统健康运行时长。Business system indicators include the number of online users, the number of daily logged-in users, the operating status of the business system, the status of the interface of the business system, and the healthy running time of the business system.

S02：通过网络拓扑架构和设备的动、静态监控指标生成网络故障树，并通过网络故障树生成基本规则库。通过故障树的构建可以简洁明了的表示各个监控指标以及各个网络设备之间的关系。其中，基本规则库中的相关阈值通过对历史数据的机器学习和执行阈值规划算法来确定。S02: Generate a network fault tree through the network topology structure and dynamic and static monitoring indicators of equipment, and generate a basic rule base through the network fault tree. Through the construction of the fault tree, various monitoring indicators and the relationship between various network devices can be expressed concisely and clearly. Among them, the relevant thresholds in the basic rule base are determined through machine learning of historical data and execution of threshold planning algorithms.

针对基本规则，设计了规则的三域结构，如图2所示，包括规则序列域、规则标识域和规则主体域。For the basic rules, the three-domain structure of the rules is designed, as shown in Figure 2, including the rule sequence domain, the rule identification domain and the rule body domain.

规则序列域用来存储规则在实际的运行过程中执行成功的次数、执行失败的次数、规则最终计数及规则排序。规则序列域存在的目的是为了便于对规则的优先级进行排序，提高规则的检索效率。The rule sequence field is used to store the number of successful executions of the rules, the number of failed executions, the final count of the rules, and the sorting of the rules during the actual running process. The purpose of the rule sequence field is to facilitate the ordering of the priority of the rules and improve the retrieval efficiency of the rules.

规则标识域用来标识该规则的从属对象，例如规则是某个网络设备的专属规则，或者规则从属于某个子网或整个网络。规则标识域存在的目的是为了对每条规则进行标识，在网络拓扑结构发生改变的时候，可以通过规则的标识域识别需要删除和修改的规则，并且通过对变动部分的拓扑结构重新生成相应的基本规则来实现规则的增删改，智能构造适应新网络架构的规则库。The rule identification field is used to identify the subordinate object of the rule, for example, the rule is the exclusive rule of a certain network device, or the rule belongs to a certain subnet or the entire network. The purpose of the rule identification field is to identify each rule. When the network topology changes, the rules that need to be deleted and modified can be identified through the identification field of the rule, and the corresponding topological structure of the changed part can be regenerated. Basic rules are used to add, delete, and modify rules, and intelligently construct a rule base that adapts to the new network architecture.

规则主体域存储了规则的主体部分，这是对规则的详细说明。规则就是产生式规则，是指人们思维判断中的一种固定逻辑结构关系。一般产生式的结构可表示为自然语言形式，事实上，在自然语言表达中，人们广泛使用的各种“原因—-结果”，“条件—结论”，“前提—操作”，“事实—进展”，“情况—行为”等结构，都可归结为产生式的知识表达形式。规则的基本形式：A→B或者IF A THENB，A是产生式的前提(前件)，用于指出该产生式是否可用的条件。B是一组结论或操作(后件)，用于指出当前提A所指示的条件满足时，应该得出的结论或应该执行的操作。产生式规则推理的推理方式有正向推理、逆向推理和双向推理三种。三种推理方式在不同情境下都有相应的优势，在规则推理方式选择时综合考虑。The rule body field stores the body part of the rule, which is a detailed description of the rule. Rules are production rules, which refer to a fixed logical structural relationship in people's thinking and judgment. The structure of a general production can be expressed in the form of natural language. In fact, in natural language expression, various "cause-effect", "condition-conclusion", "premise-operation", "fact-progress" are widely used ", "Situation-behavior" and other structures can all be attributed to productive knowledge expressions. The basic form of the rule: A→B or IF A THENB, A is the premise (precondition) of the production, which is used to indicate whether the production is available or not. B is a set of conclusions or operations (consequents), which are used to point out the conclusions that should be drawn or the operations that should be performed when the conditions indicated by the premise A are met. There are three types of reasoning in production rule reasoning: forward reasoning, reverse reasoning and two-way reasoning. The three reasoning methods have corresponding advantages in different situations, and they should be considered comprehensively when choosing the rule reasoning method.

S03：对信息系统的历史数据执行关联规则挖掘算法，得到关联规则库，关联规则是通过关联规则挖掘生成，并通过历史数据检验的规则。S03: Execute the association rule mining algorithm on the historical data of the information system to obtain the association rule base. The association rules are generated through association rule mining and passed the inspection of historical data.

优选，采用基于分支筛选优化策略和数据库单次扫描技术的改进的Apriori算法来进行历史数据关联规则的挖掘。Apriori算法是一种挖掘关联规则的频繁项集算法，算法分为两个阶段：寻找频繁项集和由频繁项集挖掘关联规则。算法原理是从数据集中寻找满足最小支持度的频繁项集，进而根据频繁项集产生关联规则。Apriori算法是一个很经典的关联规则挖掘算法，但是存在两个弊端，在寻找频繁项集会产生很多候选集，浪费大量计算效率和时间，且需要多次扫描数据库，严重影响算法效率。针对第一个问题，采用哈希表和位容器对候选集进行过滤，减少算法在产生候选集上的消耗。因为经典算法的主要消耗在C1,L1,C2,L2的产生上，在C2的生成中过滤更多的分支，可以大大提高算法效率。针对第二个问题，经典算法每次计算支持度均需扫描整个数据库，而算法中计算支持度的频率非常高，这就需要频繁扫描数据库，导致算法效率不高。所以通过维护一个布尔矩阵来记录数据库中所有的事务信息，只需扫描一次数据库就可以构建布尔矩阵，这个布尔矩阵包含了计算支持度需要的所有数据，以后就不需要再次扫描数据库了，大大提高了算法效率。Preferably, an improved Apriori algorithm based on branch screening optimization strategy and database single scan technology is used to mine historical data association rules. Apriori algorithm is a frequent itemset algorithm for mining association rules. The algorithm is divided into two stages: finding frequent itemsets and mining association rules from frequent itemsets. The principle of the algorithm is to find frequent itemsets that satisfy the minimum support from the data set, and then generate association rules based on the frequent itemsets. The Apriori algorithm is a very classic association rule mining algorithm, but it has two disadvantages. When searching for frequent itemsets, many candidate sets will be generated, which wastes a lot of computing efficiency and time, and requires multiple scans of the database, which seriously affects the efficiency of the algorithm. For the first problem, hash table and bit container are used to filter the candidate set to reduce the consumption of the algorithm in generating the candidate set. Because the main consumption of the classic algorithm is in the generation of C1, L1, C2, and L2, filtering more branches in the generation of C2 can greatly improve the efficiency of the algorithm. For the second problem, the classical algorithm needs to scan the entire database every time it calculates the support, and the frequency of calculating the support in the algorithm is very high, which requires frequent scanning of the database, resulting in low efficiency of the algorithm. Therefore, by maintaining a Boolean matrix to record all the transaction information in the database, the Boolean matrix can be constructed only by scanning the database once. This Boolean matrix contains all the data needed to calculate the support, and there is no need to scan the database again in the future, greatly improving Algorithmic efficiency.

通过改进的Apriori算法，可以对历史数据进行关联规则挖掘，得到的结果在阈值规划算法的配合下，可以智能生成关联规则库。关联规则是从历史数据中挖掘出来的，通过了历史数据的检验，可信度比较高，但是关联规则仍存在一些不确定性，必须通过运行数据的检验才能升级为基本规则。Through the improved Apriori algorithm, the historical data can be mined for association rules, and the obtained results can intelligently generate an association rule base with the cooperation of the threshold programming algorithm. Association rules are mined from historical data, passed the test of historical data, and have relatively high credibility, but there are still some uncertainties in association rules, and must pass the test of operational data before they can be upgraded to basic rules.

关联规则库中的相关阈值通过对历史数据的机器学习和执行阈值规划算法来确定。The relevant thresholds in the association rule base are determined by machine learning on historical data and executing threshold planning algorithms.

基本规则库和关联规则库在阈值的确定上，利用性能历史数据，分析计算出适合业务运行告警要求的告警阈值，提高针对信息系统的告警自学习能力，优化告警逻辑，动态调整告警阈值，做到从事件的源头减少事件量，提高监控告警的质量。In terms of threshold determination, the basic rule base and association rule base use historical performance data to analyze and calculate alarm thresholds suitable for business operation alarm requirements, improve the alarm self-learning ability for information systems, optimize alarm logic, and dynamically adjust alarm thresholds. To reduce the amount of events from the source of the event and improve the quality of monitoring alarms.

优选，某个指标的阈值规划算法为：Preferably, the threshold planning algorithm for a certain indicator is:

对指标在网络正常运行状况下的历史数据进行统计分析，确定其最大值，最小值和中位数，然后按以下公式来确定阈值：Statistically analyze the historical data of the indicator under normal network operation conditions, determine its maximum value, minimum value and median, and then determine the threshold according to the following formula:

${T T}_{i i} = = {D D.}_{i i} + + \frac{22 (({Z Z}_{i i} - - {X x}_{i i})) * * (({M m}_{i i} - - {D D.}_{i i}))}{33 (({D D.}_{i i} - - {X x}_{i i}))}$

式中，T_i为阈值，D_i为网络正常运行状况下的指标最大值，X_i为网络正常运行状况下的指标最小值，M_i为指标设计的最大值,Z_i为网络正常运行状况下的指标中位数。In the formula, T _i is the threshold, D _i is the maximum value of the index under the normal operation of the network, Xi is the minimum value of the index under the normal operation of the network, M _i is the maximum value of the index design, _{Z i} _is the normal operation of the network The median of the indicators below.

在规则库投入运行之后，在网络正常运行状况下该指标的所有有效值都可以实时参与计算，实时确定该指标的阈值。阈值的自适应动态修改提高了阈值适应系统的能力，有利于系统性能的提高。After the rule base is put into operation, all effective values of the indicator can participate in the calculation in real time under normal network operation conditions, and the threshold of the indicator can be determined in real time. The adaptive dynamic modification of the threshold improves the ability of the threshold to adapt to the system, which is beneficial to the improvement of the system performance.

S04：结合基本规则库和关联规则库进行推理生成扩展规则库。S04: Combining the basic rule base and the association rule base to infer and generate an extended rule base.

规则就是产生式规则，是指人们思维判断中的一种固定逻辑结构关系。规则的基本形式：A→B或者IF A THEN B，A是产生式的前提(前件)，用于指出该产生式是否可用的条件。B是一组结论或操作(后件)，用于指出当前提A所指示的条件满足时，应该得出的结论或应该执行的操作。使用基本规则和关联规则通过规则推理可以直接生成扩展规则。举例说明，如果在基本规则存和关联规则存在规则“A→B”，“B→C”和“AD”，通过规则推理，可以得到三条扩展规则“B→C”，“D→B”“D→C”。Rules are production rules, which refer to a fixed logical structural relationship in people's thinking and judgment. The basic form of the rule: A → B or IF A THEN B, A is the premise (precondition) of the production, which is used to indicate the condition of whether the production is available. B is a set of conclusions or operations (consequents), which are used to point out the conclusions that should be drawn or the operations that should be performed when the conditions indicated by the premise A are met. Extended rules can be generated directly through rule inference using basic rules and association rules. For example, if there are rules "A→B", "B→C" and "A D", through rule reasoning, three extended rules "B→C", "D→B" and "D→C" can be obtained.

扩展规则是由基本规则和关联规则推理出来的，规则的推理本身就存在不确定性，所以拓展规则可信度是最低的，必须经过严格的验证(包括历史数据的验证和运行数据的验证)，才能升级为基本规则。Extended rules are deduced from the basic rules and association rules. The reasoning of the rules itself has uncertainty, so the credibility of the extended rules is the lowest, and must undergo strict verification (including verification of historical data and verification of operating data). , to be upgraded to a basic rule.

在研究信息系统运行监控报警规则库的构造技术的基础上，从监控的类型、数据、来源、告警时间、告警模式、性能数据等方面着手，通过对监控历史数据和相关日常运维工单故障类型的分析，从信息系统繁忙时段、空闲时段等不同时间段出发，结合信息系统的业务时间和业务量，理解业务的潮涨潮落，利用性能历史数据，分析计算出适合业务运行告警要求的告警阈值，提高针对信息系统的告警自学习能力，动态调整告警阈值，做到从事件的源头减少事件量，提高监控告警的质量。On the basis of studying the construction technology of the information system operation monitoring and alarm rule base, starting from the monitoring type, data, source, alarm time, alarm mode, performance data, etc., through monitoring historical data and related daily operation and maintenance work order faults Type analysis, starting from different time periods such as busy periods and idle periods of the information system, combined with the business time and business volume of the information system, to understand the ebb and flow of the business, and use the historical performance data to analyze and calculate the alarm that is suitable for the alarm requirements of the business operation Threshold, improve the alarm self-learning ability of the information system, dynamically adjust the alarm threshold, reduce the amount of events from the source of the event, and improve the quality of monitoring alarms.

我们可以将规则库分为三个分区，分别存储不同类型的规则，比如一区存储基本规则库，二区存储关联规则库，三区存储扩展规则库。其中，各规则库的检索优先级是：基本规则库>关联规则库>扩展规则库。在规则的检索过程中，首先对一区的基本规则进行检索，如果没有找到相应的规则，才会对二区的关联规则和三区的扩展规则进行检索。对二区的关联规则和三区的扩展规则通过对历史数据的机器学习来进行规则的自动调整优化，此外，相关规则需要通过历史数据的合理性校检才能保留，否则直接移除该规则。We can divide the rule base into three partitions to store different types of rules. For example, the first area stores the basic rule base, the second area stores the association rule base, and the third area stores the extended rule base. Wherein, the retrieval priority of each rule base is: basic rule base>associated rule base>extended rule base. In the rule retrieval process, the basic rules of the first area are searched first, and if no corresponding rules are found, the association rules of the second area and the extended rules of the third area are searched. The association rules of the second area and the extension rules of the third area are automatically adjusted and optimized through the machine learning of historical data. In addition, the relevant rules need to pass the rationality check of the historical data before they can be retained, otherwise the rules will be removed directly.

此外，在每个规则库分区中，可以通过规则排序算法来确定规则的优先级，具体是通过规则序列域中的规则最终计数指标来确定规则被检索的优先级，优先级高的规则优先检索，优先级低的规则推后检索，这样可以提高规则检索效率。其中，规则最终计数的公式为：In addition, in each rule base partition, the priority of the rules can be determined by the rule sorting algorithm, specifically, the priority of the rules to be retrieved is determined by the rule final count index in the rule sequence field, and the rules with higher priority are retrieved first , the rules with low priority will be retrieved later, which can improve the efficiency of rule retrieval. Among them, the formula for the final count of the rule is:

F＝R-0.5WF＝R-0.5W

通过历史数据和运行数据的检验，可以找出规则库的所有规则中哪些是合理的，哪些是不合理的，而且可以通过定量分析的手段确定规则的合理性，比如可以通过规则的三域结构中规则序列域的最终计数指标来量化规则的合理性。通过规则的合理性分析后，可以智能的对规则进行进一步处理，比如，某些规则经过验证，符合系统要求；某些规则合理性一般，需要经过机器学习才可以被使用；某些规则的合理性比较差，可能就直接被删除了。Through the inspection of historical data and operational data, it is possible to find out which of all the rules in the rule base are reasonable and which are unreasonable, and the rationality of the rules can be determined by means of quantitative analysis, for example, through the three-domain structure of the rules The final count indicator in the sequence domain of the rule in the middle is used to quantify the rationality of the rule. After the rationality analysis of the rules, the rules can be further processed intelligently. For example, some rules have been verified and meet the system requirements; some rules are generally reasonable and need to be used after machine learning; If the performance is relatively poor, it may be deleted directly.

同样的，通过历史数据和运行数据的机器学习，可以不断提升规则性能，使之与系统匹配性更高，并提供相应的性能优化调整建议。比如阈值不是一成不变的，可以通过系统运行数据来进行规则的自适应实时学习，提高规则合理性。Similarly, through machine learning of historical data and operating data, rule performance can be continuously improved to make it more compatible with the system, and corresponding performance optimization and adjustment suggestions can be provided. For example, the threshold value is not static, and the system operation data can be used to carry out adaptive real-time learning of the rules to improve the rationality of the rules.

在规则库的设计中，也允许规则在低级区域向高级区域的流动。规则从低级区域向高级区域的流动，第一是需要对规则合理性的验证，第二是需要通过机器学习，不断提高规则的合理性。在实际的运行过程中，通过实时的运行数据动态对规则进行自动调整优化：通过规则排序算法，对规则库一区、二区、三区的规则进行优先级确定和排序，通过规则流动算法来对二区和三区的规则进行升级或刷新。In the design of the rule base, the flow of rules from the low-level area to the high-level area is also allowed. The flow of rules from the low-level area to the high-level area first requires verification of the rationality of the rules, and secondly requires machine learning to continuously improve the rationality of the rules. In the actual operation process, the rules are automatically adjusted and optimized through real-time operation data: through the rule sorting algorithm, the rules in the first, second and third areas of the rule base are prioritized and sorted, and the rule flow algorithm is used to Upgrade or refresh the rules of the second and third districts.

其中，关联规则库的规则流动算法是：在系统实际运行过程中，规则只要有一次被证明是正确的，直接移动到基本规则库；如果该规则有两次被证明错误，则删除该规则。Among them, the rule flow algorithm of the association rule base is: in the actual operation of the system, as long as the rule is proved to be correct once, it will be directly moved to the basic rule base; if the rule is proved to be wrong twice, the rule will be deleted.

扩展规则库的规则流动算法如图3所示：使用历史数据来验证所有规则，The rule flow algorithm of the extended rule base is shown in Figure 3: using historical data to verify all rules,

通过对规则实时进行优先级调整，可以让规则库处于最优状态，提高规则的检索效率和规则的正确率，从而提高系统性能。规则的优先级调整非常重要，常用的规则和合理性较高的规则理应提前检索，不常用的规则和合理性较低的规则可以推后检索，这样可以提高规则的检索效率，从而提高系统性能。By adjusting the priority of the rules in real time, the rule base can be in the optimal state, and the retrieval efficiency and accuracy of the rules can be improved, thereby improving the system performance. The priority adjustment of rules is very important. Frequently used rules and rules with high rationality should be retrieved in advance, and rules that are not commonly used and rules with low rationality can be retrieved later, which can improve the retrieval efficiency of rules and improve system performance. .

另外，还可以通过人工方式进行一些操作，比如系统运维人员可以直接增加和删除规则，并对已有规则的相关属性进行修改。In addition, some operations can also be performed manually. For example, system operation and maintenance personnel can directly add and delete rules, and modify the relevant attributes of existing rules.

以上仅为本发明的优选实施例，并非因此限制本发明的专利范围，凡是利用本发明说明书及附图内容所作的等效结构或者等效流程变换，或者直接或间接运用在其他相关的技术领域，均同理包括在本发明的专利保护范围内。The above are only preferred embodiments of the present invention, and are not intended to limit the patent scope of the present invention. Any equivalent structure or equivalent process transformation made by using the description of the present invention and the contents of the accompanying drawings, or directly or indirectly used in other related technical fields , are all included in the scope of patent protection of the present invention in the same way.

Claims

1. A method for constructing an information system operating rule base based on association rule mining, characterized in that, comprising the steps:

S01: Obtain the network topology of the information system and the dynamic monitoring indicators and static monitoring indicators of all equipment;

S02: Generate a network fault tree through the network topology and dynamic and static monitoring indicators of equipment, and generate a basic rule base through the network fault tree;

S03: Execute an association rule mining algorithm on the historical data of the information system to obtain an association rule base;

S04: Combining the basic rule base and the association rule base for reasoning to generate an extended rule base;

Wherein, the retrieval priority of each rule base is: basic rule base>associated rule base>extended rule base.

2. the construction method of a kind of information system operating rule base based on association rule mining according to claim 1, it is characterized in that, each rule of basic rule base is a three-domain structure, promptly comprises, rule sequence domain: rule in The number of successful executions, the number of failed executions, the final count of rules and the sorting of rules during the actual operation;

Rule identification field: used to identify the subordinate object of the rule;

Rule subject field: used to describe the rule in detail.

3. the construction method of a kind of information system operating rule base based on association rule mining according to claim 1, it is characterized in that, the correlation threshold in the basic rule base and the association rule base is passed through the machine learning and execution threshold to historical data planning algorithm to determine.

4. the construction method of a kind of information system operating rule base based on association rule mining according to claim 3, is characterized in that, described threshold value programming algorithm is:

Statistically analyze the historical data of the indicator under normal network operation conditions, determine its maximum value, minimum value and median, and then determine the threshold according to the following formula:

{T T}_{i i} = = {D D.}_{i i} + + \frac{22 (({Z Z}_{i i} - - {X x}_{i i})) * * (({M m}_{i i} - - {D D.}_{i i}))}{33 (({D D.}_{i i} - - {X x}_{i i}))}

In the formula, T _i is the threshold, D _i is the maximum value of the index under the normal operation of the network, Xi is the minimum value of the index under the normal operation of the network, M _i is the maximum value of the index design, _{Z i} _is the normal operation of the network The median of the indicators below.

5. A method for constructing an information system operating rule base based on association rule mining according to claim 1, wherein the system executes the rule sorting algorithm and the rule flow algorithm in real time to determine the priority of the rules and refresh the rules.

6. A method for constructing an information system operating rule base based on association rule mining according to claim 5, wherein, in each rule base, it is determined that the rule is determined by the rule final count index in the rule sequence domain. The priority of the search, where the formula for the final count of the rule is:

F＝R-0.5W

In the formula, F is the final count, R is the number of successful rule executions in the actual running process, and W is the number of rule execution failures; if machine learning is performed on the execution failure scenarios, the relevant rules are optimized and related problems are solved, then Correspondingly, the number W of execution failures is reduced by one.

7. the construction method of a kind of information system operating rule base based on association rule mining according to claim 5, is characterized in that, the rule flow algorithm of association rule base is:

During the actual operation of the system, as long as the rule is proved to be correct once, it will be directly moved to the basic rule base; if the rule is proved to be wrong twice, the rule will be deleted.

8. the construction method of a kind of information system operating rule base based on association rule mining according to claim 5, is characterized in that, the rule flow algorithm of extended rule base is:

Use historical data to verify all rules. For rules with a success rate of 80% to 100%, use historical data for machine learning and then move directly to the basic rule base; for rules with a success rate of 60% to 80%, use historical data for verification. After machine learning, if the success rate is greater than 80%, move to the basic rule base, otherwise stay in the extended rule base, and accept the machine learning of the running data until the success rate is greater than 80%; for the success rate between 50% and 60% For rules, use historical data and running data for machine learning, until the success rate is greater than 80%, move to the basic rule base, otherwise stay in the extended rule base; for rules with a success rate of less than 50%, delete them directly.

9. the construction method of a kind of information system operating rule base based on association rule mining according to claim 1, it is characterized in that, in step S03, adopt the improved Apriori algorithm based on branch screening optimization strategy and database single scan technology To carry out mining of historical data association rules; wherein, the improved Apriori algorithm uses hash tables and bit containers to filter the candidate sets, reducing the consumption of the algorithm in generating candidate sets, and recording in the database by maintaining a Boolean matrix All business information.

10. A method for constructing an information system operating rule base based on association rule mining according to claim 1, characterized in that system operation and maintenance personnel can directly add and delete rules, and modify the relevant attributes of existing rules .