CN115955388A - A Distributed Cloud Comprehensive Alarm System - Google Patents
A Distributed Cloud Comprehensive Alarm System Download PDFInfo
- Publication number
- CN115955388A CN115955388A CN202211638775.2A CN202211638775A CN115955388A CN 115955388 A CN115955388 A CN 115955388A CN 202211638775 A CN202211638775 A CN 202211638775A CN 115955388 A CN115955388 A CN 115955388A
- Authority
- CN
- China
- Prior art keywords
- alarm
- distributed cloud
- layer
- processing
- node
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012545 processing Methods 0.000 claims abstract description 65
- 238000012423 maintenance Methods 0.000 claims abstract description 17
- 238000000034 method Methods 0.000 claims abstract description 6
- 230000006870 function Effects 0.000 claims description 23
- 238000012544 monitoring process Methods 0.000 claims description 21
- 238000013507 mapping Methods 0.000 claims description 6
- 239000000306 component Substances 0.000 claims description 5
- 230000002085 persistent effect Effects 0.000 claims description 5
- 238000007621 cluster analysis Methods 0.000 claims description 3
- 239000008358 core component Substances 0.000 claims description 3
- 238000009415 formwork Methods 0.000 claims description 3
- 230000002093 peripheral effect Effects 0.000 claims description 3
- 230000030279 gene silencing Effects 0.000 claims description 2
- 230000002688 persistence Effects 0.000 claims description 2
- 238000010276 construction Methods 0.000 claims 1
- 230000009286 beneficial effect Effects 0.000 abstract description 2
- 239000002071 nanotube Substances 0.000 abstract 1
- 238000010586 diagram Methods 0.000 description 4
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Landscapes
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
Description
技术领域technical field
本发明涉及云计算技术领域,具体为一种分布式云综合告警系统。The invention relates to the technical field of cloud computing, in particular to a distributed cloud comprehensive alarm system.
背景技术Background technique
分布式云(Distributed Cloud)是指云服务提供商(Cloud Service Provider,CSP)将公有云服务分发到不同的物理位置,由CSP统一负责云服务的运营,治理,更新和演进。分布式云一般采取集中运维模式,在此背景下,运维支撑系统一般采用两级架构:分布式云平台、和中心侧运维支撑平台。Distributed Cloud (Distributed Cloud) refers to the cloud service provider (Cloud Service Provider, CSP) distributes public cloud services to different physical locations, and the CSP is responsible for the operation, governance, update and evolution of cloud services. Distributed clouds generally adopt a centralized operation and maintenance mode. In this context, the operation and maintenance support system generally adopts a two-level architecture: distributed cloud platform, and center-side operation and maintenance support platform.
现有技术中,分布式云平台部署在客户侧,并通过专用网络将监控数据、告警数据、性能数据、容量数据回传到中心侧;中心侧的运维支撑平台,基于分布式云平台的回传数据,实现集中运维的核心功能,包括分布式云的容量监控、性能监控、告警监控等。其中,告警系统作为整个运维支撑系统中最重要的子系统,架构设计尤为重要。In the existing technology, the distributed cloud platform is deployed on the client side, and the monitoring data, alarm data, performance data, and capacity data are sent back to the center side through a dedicated network; the operation and maintenance support platform on the center side is based on the distributed cloud platform. Return data to realize the core functions of centralized operation and maintenance, including distributed cloud capacity monitoring, performance monitoring, alarm monitoring, etc. Among them, the alarm system is the most important subsystem in the entire operation and maintenance support system, and the architecture design is particularly important.
但是,传统告警系统无法实现对纳管分布式云平台告警的集中接入、分析和处理;同时,难以支撑各专业运维队列人员快速发现、及时响应、准确定位故障;无法缩短业务中断时长,实现重要客户、重要业务的重点保障。However, the traditional alarm system cannot realize the centralized access, analysis and processing of the alarms of the managed distributed cloud platform; at the same time, it is difficult to support the professional operation and maintenance queue personnel to quickly discover, respond in time, and accurately locate faults; it cannot shorten the duration of business interruption, Realize key guarantees for important customers and important businesses.
发明内容Contents of the invention
本发明的目的在于提供一种分布式云综合告警系统,以解决上述背景技术中提出的问题。The purpose of the present invention is to provide a distributed cloud integrated alarm system to solve the problems raised in the above background technology.
为实现上述目的,本发明提供如下技术方案:一种分布式云综合告警系统,所述分布式云综合告警系统由告警消息接入层、流处理层、数据层以及应用层构成;In order to achieve the above object, the present invention provides the following technical solutions: a distributed cloud integrated alarm system, the distributed cloud integrated alarm system is composed of an alarm message access layer, a stream processing layer, a data layer and an application layer;
告警消息接入层,基于SD-WAN,实现分布式云平台与Center侧网络链路打通;The alarm message access layer, based on SD-WAN, realizes the network link connection between the distributed cloud platform and the Center side;
流处理层,基于消息集群的消息流驱动,实现分布式云平台的告警分析处理核心功能;The stream processing layer, based on the message flow drive of the message cluster, realizes the core function of alarm analysis and processing of the distributed cloud platform;
数据层,采用异构多数据库系统架构构造数据层;Data layer, using heterogeneous multi-database system architecture to construct the data layer;
应用层,基于数据层实现核心应用场景。The application layer implements core application scenarios based on the data layer.
优选的,分布式云平台中的告警组件,基于Http请求代理,通过SD-WAN网络转发到中心侧的接入代理,然后通过接入代理转发到消息集群;Preferably, the alarm component in the distributed cloud platform, based on the Http request agent, is forwarded to the access agent on the central side through the SD-WAN network, and then forwarded to the message cluster through the access agent;
SD-WAN网络地址在整个分布式云范围内统一规划,并通过策略控制,实现从分布式云平台的Http请求代理到中心侧的Http接入代理的单向请求访问;The SD-WAN network address is uniformly planned within the entire distributed cloud, and through policy control, one-way request access from the Http request agent on the distributed cloud platform to the Http access agent on the center side is realized;
消息集群基于开源消息中间件kafka构建,并采用分布式集群部署架构。The message cluster is built based on the open source message middleware kafka, and adopts a distributed cluster deployment architecture.
优选的,流处理层基于Flink分布式集群构建,支持高可用、高性能、可扩展;流处理层包括Source节点、标准流映射节点、CMDB关联节点、告警收敛节点、AI处理节点、自动派单节点以及持久化节点。Preferably, the stream processing layer is built based on Flink distributed clusters, which supports high availability, high performance, and scalability; the stream processing layer includes Source nodes, standard stream mapping nodes, CMDB associated nodes, alarm convergence nodes, AI processing nodes, and automatic order dispatching nodes and persistent nodes.
优选的,Source节点基于Flink框架提供的功能,实现Kafka消息流的接入;Preferably, the Source node realizes the access of Kafka message flow based on the functions provided by the Flink framework;
标准流映射节点,通过字符串流的解析,映射成标准的告警消息流,后续处理节点均基于标准告警消息流进行处理;The standard stream mapping node is mapped to a standard alarm message stream through the analysis of the string stream, and the subsequent processing nodes are all processed based on the standard alarm message stream;
CMDB关联节点,基于CMDB库,关联告警对象的配置属性,用于后续的告警分析处理,并更新内存数据库中告警对象的告警状态;The CMDB association node, based on the CMDB library, associates the configuration attributes of the alarm object for subsequent alarm analysis and processing, and updates the alarm status of the alarm object in the memory database;
告警收敛节点,是告警分析处理的核心组件;The alarm convergence node is the core component of alarm analysis and processing;
AI处理节点,基于外部AI引擎,封装AI算法,实现未知告警消息的聚类分析;基于知识库,关联故障处理知识信息;AI processing node, based on external AI engine, encapsulates AI algorithm, realizes cluster analysis of unknown alarm messages; based on knowledge base, associates fault handling knowledge information;
自动派单节点,基于规则引擎,将最终分析处理输出的告警消息,形成工单信息,匹配故障处理队列和人员,推送给外围工单系统,实现自动派单;The automatic dispatch node, based on the rule engine, will finally analyze and process the output alarm message to form work order information, match the fault processing queue and personnel, and push it to the peripheral work order system to realize automatic dispatch;
持久化节点,将最终分析处理输出的告警消息记录,写入关系数据库,将原始告警消息,写入时序数据库。The persistent node writes the alarm message record output by the final analysis and processing into the relational database, and writes the original alarm message into the time series database.
优选的,告警收敛节点实现以下核心功能:Preferably, the alarm convergence node implements the following core functions:
基于内存数据库中的告警对象实时状态、规则引擎中的收敛规则,实现不同层次、相同告警对象的告警合并,有效避免告警风暴对后续处理节点造成的峰值处理压力;Based on the real-time state of the alarm object in the memory database and the convergence rules in the rule engine, the alarms of different levels and the same alarm object can be combined to effectively avoid the peak processing pressure caused by the alarm storm on the subsequent processing nodes;
基于规则实现告警静默。Alarm silencing based on rules.
优选的,数据层由CMDB库、内存数据库、关系数据库以及时序数据库构成。Preferably, the data layer is composed of CMDB library, memory database, relational database and time series database.
优选的,CMDB库,采用开源图数据库ArangoDB,用于支持告警对象的信息关联匹配;Preferably, the CMDB library adopts an open source graph database ArangoDB to support information association matching of alarm objects;
内存数据库,采用分布式Redis集群,用于缓存字典数据、告警对象的状态数据;The memory database adopts distributed Redis cluster to cache dictionary data and status data of alarm objects;
关系数据库,采用开源的MariaDB高可用集群,用于存储配置数据、告警处理记录数据的持久化,并为上层应用提供数据查询;The relational database adopts the open source MariaDB high-availability cluster, which is used to store configuration data, persist alarm processing record data, and provide data query for upper-layer applications;
时序数据库,采用InfluxDB分布式集群,用于存储原始告警记录,并为上层应用展示提供数据支持。The time series database adopts InfluxDB distributed cluster to store original alarm records and provide data support for upper-layer application display.
优选的,应用层包括以下子系统:告警展现、告警分析、规则配置以及场景监控。Preferably, the application layer includes the following subsystems: alarm presentation, alarm analysis, rule configuration, and scene monitoring.
优选的,告警展现,面向各专业运维人员,基于告警数据提供告警应用展现视图;Preferably, the alarm display is oriented to various professional operation and maintenance personnel, and an alarm application display view is provided based on the alarm data;
告警分析,面向各专业运维人员,提供故障定位和根因分析功能;Alarm analysis, for professional operation and maintenance personnel, provides fault location and root cause analysis functions;
规则配置,面向配置管理人员,提供告警规则的前端配置功能;Rule configuration, for configuration managers, provides the front-end configuration function of alarm rules;
场景监控,面向重点保障场景,实现重保对象的告警监控大屏、网络拓扑、应用拓扑场景监控功能。Scenario monitoring, oriented to key security scenarios, realizes the alarm monitoring large screen, network topology, and application topology scenario monitoring functions of re-insurance objects.
与现有技术相比,本发明的有益效果是:Compared with prior art, the beneficial effect of the present invention is:
本发明提出的分布式云综合告警系统实现对纳管分布式云平台告警的集中接入、分析和处理。同时,支撑各专业运维队列人员快速发现、及时响应、准确定位故障,缩短业务中断时长,实现重要客户、重要业务的重点保障。同时,系统技术架构支持异构云,并具备高性能、高可靠、可扩展等特性。The distributed cloud comprehensive alarm system proposed by the present invention realizes the centralized access, analysis and processing of the alarms of the managed distributed cloud platform. At the same time, support the professional operation and maintenance queue personnel to quickly discover, respond in time, and accurately locate faults, shorten the duration of business interruption, and achieve key guarantees for important customers and important businesses. At the same time, the system technical architecture supports heterogeneous clouds, and has the characteristics of high performance, high reliability, and scalability.
附图说明Description of drawings
图1为本发明分布式云综合告警系统体系架构图;Fig. 1 is the architecture diagram of the distributed cloud comprehensive alarm system of the present invention;
图2为本发明分布式云综合告警系统技术架构图;Fig. 2 is a technical architecture diagram of the distributed cloud comprehensive alarm system of the present invention;
图3为本发明分布式云综合告警系统告警接入架构图;Fig. 3 is an alarm access architecture diagram of the distributed cloud comprehensive alarm system of the present invention;
图4为本发明分布式云综合告警系统告警流处理模型图。Fig. 4 is a diagram of an alarm stream processing model of the distributed cloud integrated alarm system of the present invention.
具体实施方式Detailed ways
为了使本发明的目的、技术方案进行清楚、完整地描述,及优点更加清楚明白,以下结合附图对本发明实施例进行进一步详细说明。应当理解,此处所描述的具体实施例是本发明一部分实施例,而不是全部的实施例,仅仅用以解释本发明实施例,并不用于限定本发明实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。In order to clearly and completely describe the purpose, technical solution, and advantages of the present invention, the embodiments of the present invention will be further described in detail below in conjunction with the accompanying drawings. It should be understood that the specific embodiments described here are part of the embodiments of the present invention, rather than all embodiments, and are only used to explain the embodiments of the present invention, and are not intended to limit the embodiments of the present invention. All other embodiments obtained under the premise of creative work all belong to the protection scope of the present invention.
实施例一Embodiment one
请参阅图1至图4,本发明提供一种技术方案:一种分布式云综合告警系统,所述分布式云综合告警系统由告警消息接入层、流处理层、数据层以及应用层构成;Please refer to Fig. 1 to Fig. 4, the present invention provides a kind of technical solution: a kind of distributed cloud comprehensive warning system, described distributed cloud comprehensive warning system is made up of warning message access layer, stream processing layer, data layer and application layer ;
告警消息接入层,基于SD-WAN,实现分布式云平台与Center侧网络链路打通;分布式云平台中的告警组件,基于Http请求代理,通过SD-WAN网络转发到中心侧的接入代理,然后通过接入代理转发到消息集群;SD-WAN网络地址在整个分布式云范围内统一规划,并通过策略控制,实现从分布式云平台的Http请求代理到中心侧的Http接入代理的单向请求访问;消息集群基于开源消息中间件kafka构建,并采用分布式集群部署架构;The alarm message access layer, based on SD-WAN, realizes the network link connection between the distributed cloud platform and the Center side; the alarm component in the distributed cloud platform, based on the Http request agent, forwards the access to the center side through the SD-WAN network Agent, and then forwarded to the message cluster through the access agent; the SD-WAN network address is uniformly planned within the entire distributed cloud, and through policy control, it is realized from the Http request agent on the distributed cloud platform to the Http access agent on the center side One-way request access; the message cluster is built based on the open source message middleware kafka, and adopts a distributed cluster deployment architecture;
流处理层,基于消息集群的消息流驱动,实现分布式云平台的告警分析处理核心功能;流处理层基于Flink分布式集群构建,支持高可用、高性能、可扩展;流处理层包括Source节点、标准流映射节点、CMDB关联节点、告警收敛节点、AI处理节点、自动派单节点以及持久化节点;Source节点基于Flink框架提供的功能,实现Kafka消息流的接入;标准流映射节点,通过字符串流的解析,映射成标准的告警消息流,后续处理节点均基于标准告警消息流进行处理;CMDB关联节点,基于CMDB库,关联告警对象的配置属性,用于后续的告警分析处理,并更新内存数据库中告警对象的告警状态;告警收敛节点,是告警分析处理的核心组件;AI处理节点,基于外部AI引擎,封装AI算法,实现未知告警消息的聚类分析;基于知识库,关联故障处理知识信息;自动派单节点,基于规则引擎,将最终分析处理输出的告警消息,形成工单信息,匹配故障处理队列和人员,推送给外围工单系统,实现自动派单;持久化节点,将最终分析处理输出的告警消息记录,写入关系数据库,将原始告警消息,写入时序数据库;告警收敛节点实现以下核心功能:基于内存数据库中的告警对象实时状态、规则引擎中的收敛规则,实现不同层次、相同告警对象的告警合并,有效避免告警风暴对后续处理节点造成的峰值处理压力;基于规则实现告警静默;Stream processing layer, driven by message stream based on message cluster, realizes the core functions of alarm analysis and processing of distributed cloud platform; stream processing layer is built based on Flink distributed cluster, supports high availability, high performance, and scalability; stream processing layer includes Source nodes , standard flow mapping node, CMDB associated node, alarm convergence node, AI processing node, automatic order dispatching node, and persistent node; the Source node realizes the access of Kafka message flow based on the functions provided by the Flink framework; the standard flow mapping node, through The parsing of the string stream is mapped to a standard alarm message flow, and the subsequent processing nodes are processed based on the standard alarm message flow; the CMDB association node is based on the CMDB library, and the configuration attributes of the associated alarm object are used for subsequent alarm analysis and processing, and Update the alarm status of the alarm object in the memory database; the alarm convergence node is the core component of alarm analysis and processing; the AI processing node, based on the external AI engine, encapsulates the AI algorithm, and realizes the cluster analysis of unknown alarm messages; based on the knowledge base, associated faults Processing knowledge information; automatic order dispatching nodes, based on the rule engine, will finally analyze and process the output alarm messages to form work order information, match fault processing queues and personnel, and push them to peripheral work order systems to realize automatic order dispatching; persistent nodes, Write the alarm message record output by the final analysis and processing into the relational database, and write the original alarm message into the time series database; the alarm convergence node realizes the following core functions: based on the real-time status of the alarm object in the memory database, the convergence rules in the rule engine, Realize the merging of alarms at different levels and the same alarm object, effectively avoiding the peak processing pressure caused by alarm storms on subsequent processing nodes; realize alarm silence based on rules;
数据层,采用异构多数据库系统架构构造数据层;数据层由CMDB库、内存数据库、关系数据库以及时序数据库构成;CMDB库,采用开源图数据库ArangoDB,用于支持告警对象的信息关联匹配;内存数据库,采用分布式Redis集群,用于缓存字典数据、告警对象的状态数据;关系数据库,采用开源的MariaDB高可用集群,用于存储配置数据、告警处理记录数据的持久化,并为上层应用提供数据查询;时序数据库,采用InfluxDB分布式集群,用于存储原始告警记录,并为上层应用展示提供数据支持;The data layer adopts heterogeneous multi-database system architecture to construct the data layer; the data layer is composed of CMDB library, memory database, relational database and time series database; the CMDB library uses the open source graph database ArangoDB to support information association and matching of alarm objects; The database uses a distributed Redis cluster to cache dictionary data and status data of alarm objects; the relational database uses an open-source MariaDB high-availability cluster to store configuration data, persist alarm processing record data, and provide Data query; time series database adopts InfluxDB distributed cluster to store original alarm records and provide data support for upper-layer application display;
应用层,基于数据层实现核心应用场景;应用层包括以下子系统:告警展现、告警分析、规则配置以及场景监控;告警展现,面向各专业运维人员,基于告警数据提供告警应用展现视图;告警分析,面向各专业运维人员,提供故障定位和根因分析功能;规则配置,面向配置管理人员,提供告警规则的前端配置功能;场景监控,面向重点保障场景,实现重保对象的告警监控大屏、网络拓扑、应用拓扑场景监控功能。The application layer implements core application scenarios based on the data layer; the application layer includes the following subsystems: alarm display, alarm analysis, rule configuration, and scene monitoring; alarm display, for professional operation and maintenance personnel, provides alarm application display views based on alarm data; alarm Analysis, for professional operation and maintenance personnel, provides fault location and root cause analysis functions; rule configuration, for configuration management personnel, provides front-end configuration functions for alarm rules; scene monitoring, for key guarantee scenarios, realizes alarm monitoring of re-insurance objects Screen, network topology, and application topology scene monitoring functions.
实施例二Embodiment two
分布式云平台的综合告警系统的架构方法。Architecture method of comprehensive alarm system for distributed cloud platform.
如图1所示,是本发明所实现的综合告警系统的体系架构,系统整体分为四层:告警消息接入、流处理层、数据层、应用层,具体如下。As shown in Figure 1, it is the architecture of the comprehensive alarm system implemented by the present invention. The system is divided into four layers as a whole: alarm message access, stream processing layer, data layer, and application layer, as follows.
一、告警消息接入层1. Alarm message access layer
分布式云平台基于SD-WAN网络打通和中心侧的数据回传链路。SD-WAN网络是软件控制的广域网,技术上基于VPN隧道技术,可以实现数据的安全可靠传输。The distributed cloud platform is based on the SD-WAN network connection and the data return link on the center side. SD-WAN network is a software-controlled wide area network, technically based on VPN tunnel technology, which can realize safe and reliable data transmission.
分布式云平台的告警组件负责平台运行的监控告警,并将告警信息基于消息回传到中心侧综合告警系统。为了能够实现各分布式云消息的并发接入,告警消息基于内部协议封装,并采用JSON数据格式,具体结构说明如下:The alarm component of the distributed cloud platform is responsible for the monitoring and alarm of platform operation, and sends the alarm information back to the comprehensive alarm system on the central side based on the message. In order to achieve concurrent access to distributed cloud messages, alarm messages are encapsulated based on internal protocols and adopt the JSON data format. The specific structure is described as follows:
1)、消息头部分,包括:云平台编码、告警记录数、消息长度、触发时间等元数据信息;1) The header part of the message, including: cloud platform code, number of alarm records, message length, trigger time and other metadata information;
2)、消息数据部分,包含具体的告警消息记录列表。2). The message data part contains a list of specific alarm message records.
分布式云平台的告警消息,统一接入到中心侧综合告警系统的消息集群,并驱动上层流处理层进行分析处理。The alarm messages of the distributed cloud platform are uniformly connected to the message cluster of the comprehensive alarm system on the central side, and drive the upper stream processing layer for analysis and processing.
消息集群采用开源消息中间件kafka,并基于分布式集群架构。The message cluster uses the open source message middleware kafka, and is based on a distributed cluster architecture.
二、流处理层2. Stream processing layer
如图2所示,是分布式云平台综合告警系统的技术架构。其中,流处理层是综合告警的核心子系统,基于消息集群的消息流驱动,实现分布式云平台的告警分析处理核心功能。As shown in Figure 2, it is the technical architecture of the integrated alarm system of the distributed cloud platform. Among them, the stream processing layer is the core subsystem of the comprehensive alarm, based on the message flow drive of the message cluster, to realize the core function of alarm analysis and processing of the distributed cloud platform.
流处理层基于Flink分布式集群构建,支持高可用、高性能、可扩展。The stream processing layer is built based on the Flink distributed cluster, which supports high availability, high performance, and scalability.
流处理层的各个处理节点(算子)模块,逻辑上包括:接入、标准化、对象关联、收敛、AI算子、自动派单、通知、持久化。Each processing node (operator) module of the stream processing layer logically includes: access, standardization, object association, convergence, AI operator, automatic dispatch, notification, and persistence.
实际运行中,基于框架和配置,可以实现各个处理节点的并发处理,并可根据实际负载调整不同节点的并发度。In actual operation, based on the framework and configuration, the concurrent processing of each processing node can be realized, and the concurrency of different nodes can be adjusted according to the actual load.
三、数据层3. Data layer
分布式云的综合告警系统,由于多数据源和处理逻辑的复杂性,并采用高性能架构,数据层采用多数据库系统架构。The comprehensive alarm system of the distributed cloud adopts a high-performance architecture due to the complexity of multiple data sources and processing logic, and the data layer adopts a multi-database system architecture.
CMDB库,采用开源图数据库ArangoDB构建,用于支持告警对象的信息关联匹配。The CMDB library is built using the open source graph database ArangoDB to support the information association and matching of alarm objects.
内存数据库,采用分布式Redis集群,用于缓存字典数据、告警对象的状态数据。The memory database uses a distributed Redis cluster to cache dictionary data and status data of alarm objects.
关系数据库,采用开源的MariaDB高可用集群,用于存储配置数据、告警处理记录数据的持久化,并为上层应用提供数据查询。The relational database adopts the open source MariaDB high-availability cluster, which is used to store configuration data, persist alarm processing record data, and provide data query for upper-layer applications.
时序数据库,用于存储原始告警记录,并为上层应用展示提供数据支持。The time series database is used to store original alarm records and provide data support for upper-layer application display.
四、应用层4. Application layer
应用层基于数据层实现核心应用场景,主要包括以下子系统:告警展现、告警分析、规则配置、场景监控。The application layer implements core application scenarios based on the data layer, mainly including the following subsystems: alarm display, alarm analysis, rule configuration, and scene monitoring.
告警展现,面向各专业运维人员,基于告警数据提供告警分析展现。Alarm display, for professional operation and maintenance personnel, provides alarm analysis and display based on alarm data.
告警分析,面向各专业运维人员,提供故障定位和根因分析功能。Alarm analysis, for professional operation and maintenance personnel, provides fault location and root cause analysis functions.
规则配置,面向配置管理人员,提供告警规则的前端配置功能。Rule configuration, for configuration managers, provides the front-end configuration function of alarm rules.
场景监控,面向重点保障场景,实现重保对象的告警监控大屏、网络拓扑、应用拓扑等场景监控功能。Scenario monitoring, oriented to key security scenarios, realizes scene monitoring functions such as alarm monitoring large screen, network topology, and application topology of re-insurance objects.
五、Center中心侧集群部署架构5. Center-side cluster deployment architecture
本发明涉及的Center中心侧系统基于Kubernetes容器集群部署,采用高可用、负载均衡架构,具体如下:The center side system involved in the present invention is based on Kubernetes container cluster deployment, adopts high availability, load balancing architecture, specifically as follows:
1)、消息中间件采用开源kafka集群架构;1), the message middleware adopts the open source Kafka cluster architecture;
2)、数据库采用MariaDB Galera集群“一主两从”架构;2), the database adopts MariaDB Galera cluster "one master and two slaves" architecture;
3)、时序数据库采用InfluxDB分布式集群架构;3), the time series database adopts InfluxDB distributed cluster architecture;
4)、流处理采用Flink分布式集群架构,并基于Kubernetes和Docker镜像部署;4), stream processing adopts Flink distributed cluster architecture, and is based on Kubernetes and Docker image deployment;
5)、核心应用逻辑组件,采用多副本Service部署,实现高可用和负载均衡。5) The core application logic components are deployed with multiple copies of Service to achieve high availability and load balancing.
尽管已经示出和描述了本发明的实施例,对于本领域的普通技术人员而言,可以理解在不脱离本发明的原理和精神的情况下可以对这些实施例进行多种变化、修改、替换和变型,本发明的范围由所附权利要求及其等同物限定。Although the embodiments of the present invention have been shown and described, those skilled in the art can understand that various changes, modifications and substitutions can be made to these embodiments without departing from the principle and spirit of the present invention. and modifications, the scope of the invention is defined by the appended claims and their equivalents.
Claims (9)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211638775.2A CN115955388A (en) | 2022-12-20 | 2022-12-20 | A Distributed Cloud Comprehensive Alarm System |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211638775.2A CN115955388A (en) | 2022-12-20 | 2022-12-20 | A Distributed Cloud Comprehensive Alarm System |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115955388A true CN115955388A (en) | 2023-04-11 |
Family
ID=87282026
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211638775.2A Pending CN115955388A (en) | 2022-12-20 | 2022-12-20 | A Distributed Cloud Comprehensive Alarm System |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115955388A (en) |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018184165A1 (en) * | 2017-04-06 | 2018-10-11 | 邹霞 | Distributed cloud service system for flow totalization |
CN109218097A (en) * | 2018-09-19 | 2019-01-15 | 山东浪潮云投信息科技有限公司 | A kind of warning system and alarm method of cloud platform configurable alert rule |
CN111786833A (en) * | 2020-07-01 | 2020-10-16 | 浪潮云信息技术股份公司 | Alarm matching processing implementation method based on cloud service platform |
CN112073437A (en) * | 2020-10-09 | 2020-12-11 | 腾讯科技(深圳)有限公司 | Multidimensional security threat event analysis method, device, equipment and storage medium |
CN112532456A (en) * | 2020-12-04 | 2021-03-19 | 浪潮云信息技术股份公司 | Alarm monitoring method in cloud environment |
WO2021068831A1 (en) * | 2019-10-10 | 2021-04-15 | 平安科技(深圳)有限公司 | Service alert method and device, and storage medium |
CN114443433A (en) * | 2022-01-26 | 2022-05-06 | 浪潮云信息技术股份公司 | Method and system for realizing distributed automatic alarm processing on cloud computing platform |
CN115412553A (en) * | 2022-08-03 | 2022-11-29 | 浪潮云信息技术股份公司 | CMDB automatic configuration method based on distributed cloud platform |
CN115442212A (en) * | 2022-08-24 | 2022-12-06 | 浪潮云信息技术股份公司 | A cloud computing-based intelligent monitoring and analysis method and system |
CN115470025A (en) * | 2022-09-06 | 2022-12-13 | 上海浪潮云计算服务有限公司 | Intelligent root cause analysis method, device, medium and equipment in distributed cloud scene |
-
2022
- 2022-12-20 CN CN202211638775.2A patent/CN115955388A/en active Pending
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2018184165A1 (en) * | 2017-04-06 | 2018-10-11 | 邹霞 | Distributed cloud service system for flow totalization |
CN109218097A (en) * | 2018-09-19 | 2019-01-15 | 山东浪潮云投信息科技有限公司 | A kind of warning system and alarm method of cloud platform configurable alert rule |
WO2021068831A1 (en) * | 2019-10-10 | 2021-04-15 | 平安科技(深圳)有限公司 | Service alert method and device, and storage medium |
CN111786833A (en) * | 2020-07-01 | 2020-10-16 | 浪潮云信息技术股份公司 | Alarm matching processing implementation method based on cloud service platform |
CN112073437A (en) * | 2020-10-09 | 2020-12-11 | 腾讯科技(深圳)有限公司 | Multidimensional security threat event analysis method, device, equipment and storage medium |
CN112532456A (en) * | 2020-12-04 | 2021-03-19 | 浪潮云信息技术股份公司 | Alarm monitoring method in cloud environment |
CN114443433A (en) * | 2022-01-26 | 2022-05-06 | 浪潮云信息技术股份公司 | Method and system for realizing distributed automatic alarm processing on cloud computing platform |
CN115412553A (en) * | 2022-08-03 | 2022-11-29 | 浪潮云信息技术股份公司 | CMDB automatic configuration method based on distributed cloud platform |
CN115442212A (en) * | 2022-08-24 | 2022-12-06 | 浪潮云信息技术股份公司 | A cloud computing-based intelligent monitoring and analysis method and system |
CN115470025A (en) * | 2022-09-06 | 2022-12-13 | 上海浪潮云计算服务有限公司 | Intelligent root cause analysis method, device, medium and equipment in distributed cloud scene |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10635649B2 (en) | Systems, methods, and media for managing an in-memory NoSQL database | |
US10817195B2 (en) | Key-value based message oriented middleware | |
CN110650038B (en) | Security event log collecting and processing method and system for multiple classes of supervision objects | |
CN107577805B (en) | A business service system for log big data analysis | |
CN104079436B (en) | A kind of Element management system of striding equipment in the EPON networks of agreement | |
US11676066B2 (en) | Parallel model deployment for artificial intelligence using a primary storage system | |
CN106815338A (en) | A kind of real-time storage of big data, treatment and inquiry system | |
US20120110042A1 (en) | Database insertions in a stream database environment | |
WO2019001312A1 (en) | Method and apparatus for realizing alarm association, and computer readable storage medium | |
CN104618455B (en) | A kind of general caching system and method | |
US8954478B2 (en) | Systems, methods, and media for managing RAM resources for in-memory NoSQL databases | |
CN107800808A (en) | A kind of data-storage system based on Hadoop framework | |
CN113377626A (en) | Visual unified alarm method, device, equipment and medium based on service tree | |
CN108170832A (en) | The monitoring system and monitoring method of a kind of heterogeneous database towards industrial big data | |
CN109120434A (en) | A kind of storage cluster alarm method, device and computer readable storage medium | |
WO2016095329A1 (en) | Log recording system and log recording operating method | |
CN113765717B (en) | An operation and maintenance management system based on a confidential special computing platform | |
CN115955388A (en) | A Distributed Cloud Comprehensive Alarm System | |
CN110390027A (en) | A method and system for constructing fault model of information system based on graph database | |
US20140164374A1 (en) | Streaming data pattern recognition and processing | |
CN115022402B (en) | Agent acquisition method and system based on stack-type integration technology | |
CN116382914A (en) | Resource management method and platform oriented to multi-cloud scene | |
CN115550382A (en) | Configuration item synchronization method, device, system and equipment | |
CN115733885A (en) | Service processing method and device, electronic equipment and storage medium | |
CN111488321B (en) | A storage volume management system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20230411 |