CN117424797B - Real-time large concurrent alarm receiving and processing method - Google Patents

Real-time large concurrent alarm receiving and processing method Download PDF

Info

Publication number
CN117424797B
CN117424797B CN202311749561.7A CN202311749561A CN117424797B CN 117424797 B CN117424797 B CN 117424797B CN 202311749561 A CN202311749561 A CN 202311749561A CN 117424797 B CN117424797 B CN 117424797B
Authority
CN
China
Prior art keywords
alarm
queue
processing
time
alarms
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311749561.7A
Other languages
Chinese (zh)
Other versions
CN117424797A (en
Inventor
于进海
马栓祥
童玲
陈华玮
邓华兵
黄耿亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tisson Regaltec Communications Tech Co Ltd
Original Assignee
Tisson Regaltec Communications Tech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tisson Regaltec Communications Tech Co Ltd filed Critical Tisson Regaltec Communications Tech Co Ltd
Priority to CN202311749561.7A priority Critical patent/CN117424797B/en
Publication of CN117424797A publication Critical patent/CN117424797A/en
Application granted granted Critical
Publication of CN117424797B publication Critical patent/CN117424797B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/02Standardisation; Integration
    • H04L41/0213Standardised network management protocols, e.g. simple network management protocol [SNMP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0604Management of faults, events, alarms or notifications using filtering, e.g. reduction of information by using priority, element types, position or time
    • H04L41/0622Management of faults, events, alarms or notifications using filtering, e.g. reduction of information by using priority, element types, position or time based on time
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0604Management of faults, events, alarms or notifications using filtering, e.g. reduction of information by using priority, element types, position or time
    • H04L41/0627Management of faults, events, alarms or notifications using filtering, e.g. reduction of information by using priority, element types, position or time by acting on the notification or alarm source
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/50Queue scheduling
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Abstract

The invention relates to the technical field of network management, in particular to a real-time large concurrent alarm receiving and processing method, which mainly refers to an Snmp Trap alarm, wherein alarm information is received in real time based on a distributed queue, and then the first segmentation of the alarm information is carried out through an IP address, so that an original alarm queue is created. Then, the original alarm queue is analyzed, processed and standardized, and further subjected to second fragmentation according to the IP of the alarm equipment and the alarm category to form a standardized alarm queue. The standardized alarm queue applies processing rules and performs normal analysis according to the processed congestion degree, thereby capturing alarm characteristics in the congestion queue. Based on these characteristics, the system generates a third time slicing rule and matches the corresponding processing rule accordingly. The method has the advantages of reducing the use of traversal process and resource lock, accelerating the alarm processing speed, realizing flexible and orderly automatic processing in alarm storm, and avoiding the neglect or response delay of important alarms.

Description

Real-time large concurrent alarm receiving and processing method
Technical Field
The invention relates to the technical field of network management, in particular to a method for receiving and processing real-time large concurrent alarms.
Background
In modern information technology environments, systems and network devices frequently generate large amounts of alert information. These alert information are critical to maintaining system safety, performance, and reliability. However, with the complexity of network environments and the increase of data traffic, conventional alarm processing methods face many challenges, especially when processing large-scale concurrent alarms. Common problems include overload of alarm information, repeated alarms, processing delays, and difficulty in quickly identifying important alarms.
In the existing alarm processing technology, a common method is to count alarm information through a counter. This approach relies primarily on counting the number of alarm events to provide a basic alarm management function for a system administrator or automation tool. While this approach is relatively effective in handling simple and single source alarms, it has significant limitations in facing large-scale concurrent alarm situations, in the first place, it is difficult to handle and distinguish alarm information from different sources by means of counter statistics alone. In complex network environments, different types and sources of alert information require different processing policies and priorities. The counter approach does not provide enough detail to support such a differentiation process, resulting in the neglect of important alarms or response delays.
Disclosure of Invention
In order to solve the problems, the invention provides a real-time large concurrency alarm receiving and processing method.
In order to achieve the above purpose, the invention adopts the following technical scheme:
a real-time large concurrency alarm receiving and processing method comprises the following steps:
receiving alarm information in real time based on the distributed queue;
performing first fragmentation according to the IP of the alarm information to obtain an original alarm queue, performing analysis processing and standardization processing on the original alarm queue, and performing second fragmentation according to the IP of the alarm equipment and the alarm category to obtain a standardized alarm queue;
processing rules are implemented on the standardized alarm queues, and normal analysis is carried out according to the processed congestion degree to obtain congestion queues;
capturing and acquiring alarm characteristics in the crowded queue, and generating a third time of slicing according to the alarm characteristics;
and matching the processing rule according to the third time of slicing.
Further, the processing rules comprise resource association analysis of alarms, repeated alarm compression shielding, analysis of root alarms and automatic notification dispatch of alarms.
Further, the real-time receiving of the alarm information based on the distributed queue includes:
disposing a plurality of receiving nodes, wherein the receiving nodes receive a preset alarm information source; and the receiving node sends alarm information to a message queue according to the time sequence.
Further, the performing the first slicing according to the IP of the alarm information includes:
extracting IP address information from the alarm information;
distributing alarm information to a plurality of first sub-queues according to the extracted IP address, wherein each sub-first sub-queue corresponds to a group of IP address range;
performing time stamp marking on the alarm information in each first sub-queue;
and applying a filtering rule to each first sub-queue according to the time stamp sequence to obtain an original alarm queue.
Further, the parsing and normalizing the original alarm queue includes:
deconstructing and separating elements from each piece of alarm information of the original queue, wherein the elements comprise an alarm generation time stamp, an alarm level identifier, an alarm equipment IP, an alarm category and an alarm content description;
and normalizing the deconstructed elements into the same data format to obtain alarm data.
Further, the performing the second slicing according to the alarm device IP and the alarm category includes:
classifying the alarm data into a plurality of second sub-queues based on the IP of the alarm device and the alarm class;
and grouping the alarm data according to the alarm category in each second sub-queue to obtain a standardized alarm queue.
Further, the processing rule is determined according to the alarm category.
Further, the normalcy analysis includes:
continuously monitoring alarm data in the standardized alarm queue, and identifying the crowded standardized alarm queue in the processing flow;
calculating average processing time and waiting time of each alarm category in a processing flow, and calculating normal congestion degree;
marking a second sub-queue with normal congestion degree higher than a preset value as a congestion queue;
and executing no mark on the second sub-queue with the normal congestion degree lower than or equal to the preset value.
Further, the capturing the alert feature in the get congestion queue includes:
identifying an alert behavior having repeatability and patterning;
extracting key attributes of alarm data, including IP of alarm information, alarm frequency and alarm mode;
and packaging the key attributes of the alarm data with the common key attributes to generate alarm characteristics.
Further, the generating the third time slice according to the alarm feature includes:
in the second sub-queue, carrying out rule standardization on the alarm characteristics to generate characteristic rules;
and creating a third sub-queue and taking the characteristic rule as a screening condition to generate a third time of slicing.
The invention has the beneficial effects that: the invention firstly realizes real-time receiving and high-efficiency processing of the alarm information through the architecture based on the distributed queue without carrying out standardization and format conversion in the link, reduces the possibility of losing the alarm in the face of the alarm storm, creates the original alarm queues by carrying out first slicing on the alarm information according to the IP address, and carries out analysis processing and standardization on the queues so as to effectively manage the alarm information from different sources. The method and the device avoid blocking of important early warning of other areas caused by high concurrent fault alarms when suffering regional faults. By further performing the second slicing according to the IP of the alarm device and the alarm category, a more refined standardized alarm queue is generated, so that a great number of repeated early warning is sent to the server by rapidly processing a certain facility or a certain tired fault. Finally, the crowded queue is further identified, and then, the feature extraction is carried out on the alarms in the crowded queue to automatically generate a third time of slicing and match the processing rule, so that the traversal process in the second time of slicing is reduced, the processing speed is increased, the flexible and orderly automatic processing in the alarm storm is realized, and the neglect or response delay of important alarms is avoided.
Drawings
FIG. 1 is a flow chart of the steps of a method for receiving and processing real-time large concurrent alarms in the present invention.
Fig. 2 is a flowchart illustrating the steps of performing the first slicing according to the IP of the alarm information in the present invention.
Fig. 3 is a step flow chart of step S4 in the present invention.
Detailed Description
Referring to fig. 1-3, the present invention relates to a method for receiving and processing real-time large concurrent alarms;
specifically, referring to fig. 1, the invention provides a method for receiving and processing real-time large concurrent alarms, which comprises the following steps:
s1, receiving alarm information in real time based on a distributed queue;
the step S1 comprises the following steps: disposing a plurality of receiving nodes, wherein the receiving nodes receive a preset alarm information source; and the receiving node sends alarm information to a message queue according to the time sequence.
In some embodiments, the system first deploys a series of receiving nodes distributed over different network locations, each node being specifically responsible for receiving data from a particular preset alert information source. This distributed architecture enables alarm information to be collected at multiple points simultaneously, effectively dispersing processing load and increasing fault tolerance of the system. In the process of receiving the alarm information, each receiving node is also responsible for time sequence ordering of the received alarm data. This means that each piece of alarm information is ordered according to the generated time stamp, so that timeliness and logical continuity of the alarm information are ensured. For example, if one network device fails, resulting in the continuous generation of multiple pieces of alert information, the alert information will be captured and ordered by the corresponding node in the order in which they occur.
By the method, the alarm information can be captured and recorded in real time, the sequence and the integrity of the alarm data can be ensured, and a solid and reliable basis is provided for the subsequent alarm processing steps. The distributed queue-based method shows higher efficiency and accuracy in processing large-scale concurrent alarms, and can better cope with challenges in modern complex network environments compared with the traditional centralized processing method.
S2, performing first fragmentation according to the IP of the alarm information to obtain an original alarm queue, performing analysis processing and standardization processing on the original alarm queue, and performing second fragmentation according to the IP of the alarm equipment and the alarm category to obtain a standardization alarm queue; the step S2 comprises the following steps:
s21, extracting IP address information from the alarm information;
it should be noted that, step S21 is a key link in the real-time large concurrent alert receiving and processing method, and relates to extracting an IP address in the alert information. In this step, the system analyzes each piece of alarm information received exclusively to identify and extract IP address information therein. This process is critical to subsequent alert information processing because the IP address is a key identifier that identifies and classifies the alert information.
In some embodiments, the system first scans each piece of alert information and extracts the IP address contained therein. This operation involves parsing the data structure of the alert information to identify the network address portion therein. These extracted IP addresses are then used to initially classify the alert information. For example, if a particular network device fails, all alert information sent from that device will share the same IP address and thus can be categorized into the same category. Furthermore, the extraction of the IP address also facilitates subsequent analysis and response procedures, as it may indicate the source of the alert information, helping to determine the nature and urgency of the alert. By accurately extracting and utilizing the IP address information, step S21 effectively lays a foundation for further alarm processing procedures, ensuring that the alarm information can be correctly identified, classified and processed.
S22, distributing alarm information to a plurality of first sub-queues according to the extracted IP address, wherein each sub-first sub-queue corresponds to a group of IP address range;
in step S22, the alarm information is effectively classified according to the IP address extracted from the alarm information. In this step, the system distributes the alarm information to different first sub-queues according to the extracted IP address information. Each first sub-queue corresponds to a specific set of IP address ranges, and such a classification mechanism makes management and processing of alarm information more orderly and efficient.
In some embodiments, assuming that multiple network devices are distributed in different IP address segments, when the devices generate alarm information, the system will assign each alarm to a corresponding first sub-queue according to its IP address. In this way, alarms from the same IP address range are categorized into the same queue, facilitating subsequent processing and analysis. For example, all alarms from the 192.168.1.X range may be assigned to one sub-queue, while alarms from the 192.168.2.X range are assigned to another sub-queue. Such IP address-based alert information classification not only improves the efficiency of alert processing, but also facilitates rapid localization and response of alerts of specific origin, especially in the face of a large number of concurrent alerts. In addition, by classifying the alarm information into different queues, the system can more effectively manage the alarm load, avoid the overload problem of a single processing point, and further ensure the stability and reliability of the whole alarm processing system.
S23, performing time stamp marking on the alarm information in each first sub-queue;
in some embodiments, the system will examine each alert message that enters the first sub-queue and assign it a time stamp. This timestamp is typically the exact time of generation of the alert information reflecting the moment at which the alert event occurred. For example, if a certain network device fails at 10 am and generates an alarm message, the message will be marked as a 10-point timestamp. In this way, the system is able to record the exact time of occurrence of an alarm event and process the alarm information in this chronological order. This is particularly important for handling time sensitive alarms such as those requiring immediate response to network security events or equipment failures. The time stamping ensures that the alert information is not ignored or mishandled due to processing delays or misclassifications. In addition, the time stamping also provides important data for subsequent alarm analysis, such as for identifying alarm patterns, predicting potential problems, or optimizing alarm response strategies. In summary, step S23 enhances the accuracy and efficiency of alarm processing by accurate time stamping, providing important support for the effective operation of the whole alarm management system.
S24, applying a filtering rule to each first sub-queue according to the time stamp sequence to obtain an original alarm queue;
s25, deconstructing and separating elements of each piece of alarm information of the original queue, wherein the elements comprise an alarm generation time stamp, an alarm level identifier, an alarm equipment IP, an alarm category and an alarm content description;
it should be noted that, step S25 involves performing deep parsing and processing on each alarm message in the original alarm queue. In this step, the system performs a deconstructing operation for each alarm information, which means separating and extracting key information elements from each alarm. The alert generation timestamp is the exact point in time when the alert occurs and is critical to understanding the timeliness and urgency of the alert. The alert level identification provides an assessment of the severity of the alert, which may be a severe, medium or low level, guiding the subsequent processing priority. The alerting device IP is the network address of the source of the alert, which is necessary to track the alert to a specific device and location. The alert category further describes the nature of the alert, such as whether it may indicate a system failure, a security threat, or a performance problem. Finally, the alarm content description provides detailed information about the alarm condition, such as a specific error code or fault description.
This deconstructing and element separation process allows the original, often differently formatted alert information to be converted into a more standardized and structured form. This transformation provides the necessary basis for subsequent alarm processing procedures such as alarm analysis, classification and response. By accurately extracting and processing these key elements, step S25 ensures that the alert information can be effectively understood and processed, improving the overall performance and response capabilities of the alert processing system.
S26, standardizing the deconstructed elements into the same data format to obtain alarm data;
in particular, the key to the standardized process is to ensure that the different sources and types of alert information are consistent in terms of data structure and expression. For example, the time stamp needs to be converted to a standard date and time format, the alert level needs to be converted to a uniform level code, the IP address needs to be represented in a uniform network address format, and the alert category and content description needs to be converted to a text format conforming to a predetermined template.
Through this normalization process, alert information that may otherwise be in a variety of formats is converted into consistent and comparable data, which is critical to subsequent data processing and analysis. The standardized data format not only makes the system easier to automatically process the alarm information, but also provides convenience for using various analysis tools, thereby enhancing the processing efficiency and accuracy of the alarm information.
S27, classifying alarm data into a plurality of second sub-queues based on the IP of the alarm device and the alarm category;
in some embodiments, the system classifies the normalized alarm data according to the alarm device IP from which it originated, rather than merely based on a range of IP addresses. This means that the system will further subdivide the alerting data for the same or similar IP address fields to reflect the specifics of the different devices or network areas. For example, even if two alarms come from the same IP address range, if they originate from different devices or services, the system will assign them to different second sub-queues. The subdivision processing mode based on the IP of the alarm equipment enables the alarm information management to be more accurate and effective. It helps to distinguish alarms from different devices or services, thereby making alarm responses and processing more targeted for a particular device or network area. In addition, the classification method also supports the adoption of different priorities or processing strategies for different types of devices or services in the processing process, so that the flexibility and the efficiency of alarm processing are improved.
S28, grouping alarm data according to alarm categories in each second sub-queue to obtain a standardized alarm queue;
specifically, in each second sub-queue, the alarm information will be classified into different packets according to its category. Alert categories may include, but are not limited to, system failures, network security, performance issues, configuration changes, and the like. For example, different classes of alarms, such as system fault alarms and security alarms, from the same IP address field will be classified into different packets, respectively. With such class-based grouping, the system can more efficiently handle and respond to various types of alarms. Different classes of alarms require different processing priorities and response measures. For example, safety-related alarms may require emergency processing, while performance-related alarms may require periodic review. This classification approach also facilitates subsequent alarm analysis and decision support because it provides a clear view and management path for each class of alarms. The system administrator and automated processing tools can more quickly identify critical alarms from these groupings and formulate more accurate and efficient response strategies.
S3, implementing a processing rule on the standardized alarm queue, and carrying out normal analysis according to the processed congestion degree to obtain a congestion queue; the step S3 comprises the following steps:
s31, implementing a processing rule on the standardized alarm queue;
the processing rules comprise resource association analysis of alarms, repeated alarm compression shielding, analysis of root alarms and automatic notification dispatch of alarms; the processing rule is determined according to the alarm category;
in some embodiments, the system applies these processing rules to each piece of alarm information in the standardized alarm queue. The scope of processing rules is broad and includes, but is not limited to, resource association analysis of alarms, compression masking of duplicate alarms, in-depth analysis of root alarms, and automatic notification and dispatch of alarms. Each rule is designed for a particular alarm category to ensure the most efficient response and processing. The resource association analysis may help identify potential links between alarms, such as where multiple alarms may originate from the same underlying problem. The repeated alarm compression shielding is used for reducing redundant alarms and improving the processing efficiency. Root alert analysis aims at determining the original problem that caused a series of alerts, while automatic notification and dispatch ensures that the relevant personnel respond to the alert event in time. The implementation of these processing rules is critical to maintaining the stability and security of the network. They not only increase the efficiency of alarm handling, but also enhance the ability to quickly respond to critical problems. By determining specific processing rules according to the alarm categories, step S31 ensures that the alarm information is processed in the most appropriate manner, thereby greatly improving the performance and reliability of the overall alarm management system.
S32, continuously monitoring alarm data in the standardized alarm queue, and identifying the crowded standardized alarm queue in the processing flow;
it should be noted that the main purpose of the steps is to identify a standardized alarm queue in which congestion occurs during the process flow. In the process, the system continuously examines the processing state of each standardized alarm queue so as to discover and deal with possible processing bottlenecks or congestion situations in time.
The action of continuous monitoring includes tracking the processing progress of each alarm queue, checking the backlog of alarm information and processing speed. The system focuses on queues where the number of alarms increases rapidly or where the processing speed is significantly slower. For example, if a particular alarm category suddenly experiences a large alarm backlog, the system may mark it as a potentially crowded point. This monitoring is dynamic, meaning that the system will continually adjust its observations and analysis based on real-time data.
S33, calculating average processing time and waiting time of each alarm category in the processing flow, and calculating normal congestion degree;
in some embodiments, the system will calculate the average time it spends in the process for each alarm category separately. This includes the entire period from alert generation to final processing. Statistics of processing time may reveal that processing efficiency is low for certain alert categories, while statistics of latency may help identify the length of time that alert information remains in the queue. For example, if a class of alarms is found to have significantly longer latency in the system than other classes, this may indicate that there is a bottleneck to the alarm handling for that class. Based on these statistics, the system then calculates the normal congestion level for each alert category. The normal congestion level is an important performance index, and reflects the processing conditions of various alarms of the alarm processing system under the normal running condition. A high degree of congestion may be indicative of efficiency problems or resource maldistribution in the process flow.
By performing these statistics and calculations, step S33 can not only provide insight into the performance of the alarm handling, but can also provide important data support for system administrators or automated decision engines. Based on these analysis results, corresponding measures can be taken to optimize the process flow, such as adjusting resource allocation, improving processing algorithms, or re-prioritizing alarm processing, to increase the response speed and processing efficiency of the overall system.
S34, marking a second sub-queue with the normal congestion degree higher than a preset value as a congestion queue;
specifically, the system first evaluates the alarm processing status of each second sub-queue according to the normal congestion degree data calculated in step S33. The normal congestion degree is a key index for measuring the alarm processing efficiency and load, and reflects the backlog condition and processing speed of the alarms in the queue. For those queues whose normal congestion level exceeds a pre-set threshold, the system identifies them as congested queues that may be under-handled and require priority attention.
For example, if a queue for a particular alert class continues to exhibit a higher than normal level of congestion, this may mean that the number of alerts exceeds the current processing capacity, or that the alert processing efficiency for that class is poor. In this case, the queue will be marked as a crowded queue so that a system administrator or automated processing mechanism can take appropriate action.
With this tagging mechanism, step S34 provides important guidance for further optimization of the alarm processing system. The system can manage and optimize the alarm processing flow more pertinently, particularly under the condition of facing a large number of concurrent alarms, thereby ensuring that the alarm information is processed timely and effectively and improving the response capability and stability of the whole system.
S35, executing no mark on the second sub-queue with the normal congestion degree lower than or equal to a preset value;
it should be noted that the core of this step is to maintain the normal operation of the alarm processing flow that is already in a good state. For those queues with good normal congestion degree, the alarm processing flow is high-efficiency, alarm information is not backlogged, and the processing speed meets the expectations. In this case, the system determines that the current resource allocation and processing policy is sufficient to handle the alarm loads of these queues, and therefore no additional marking or adjustment is required.
For example, if a particular class of alert queue shows that its processing time and latency are within acceptable ranges, this indicates that the processing power of the queue matches the alert load. Thus, the system may choose to remain the same without taking additional management steps for these queues. This helps to avoid unnecessary intervention, ensures efficient utilization of system resources, and ensures that those currently running well-behaved queues continue to maintain their efficient processing performance.
In general, step S35 ensures the overall efficiency and balance of the alarm processing system by taking a "no intervention" strategy for the well-represented alarm queue. This strategy helps the system focus on those queues that do need to be optimized and tuned while maintaining the stability and efficiency of those queues that are well-behaved.
S4, capturing and acquiring alarm characteristics in the congestion queue, and generating a third time of slicing according to the alarm characteristics; referring to fig. 3, step S4 includes:
s41, identifying the alarm behaviors with repeatability and modeling;
specifically, the task of S41 is to analyze the alarm data in the congestion queue for possible repetitive patterns or behavioral trends. These repetitive patterns may indicate some common source or problem, such as, for example, may be directed to a particular type of equipment failure, network security threat, or system configuration error. To perform this task, the system analyzes the alert content, frequency, and other relevant attributes to determine whether a particular pattern or trend exists. For example, if some type of alarm occurs frequently in a short period of time, this may suggest a larger system problem. Also, if multiple alarms come from the same device or network segment, this may indicate that a particular challenge or problem exists in that area.
S42, extracting key attributes of alarm data, including IP of alarm information, alarm frequency and alarm mode;
in particular, the IP address of the alert information, which attribute helps identify the source of the alert, may be a particular server, network device, or other IT resource. By analyzing the IP address, the system is able to track the specific source location of the alert, thereby locating problems and solutions more accurately. Alarm frequency, which relates to the frequency at which alarms occur. High frequency alarms may indicate some persistent or aggravated problem, while low frequency alarms may be directed to sporadic or intermittent events. Alert modes, which include temporal modes in which alerts occur (e.g., a set of specific time periods), type modes (e.g., alerts of a specific category occur frequently), and the like. Potential problem trends, such as system vulnerabilities, configuration problems, or security threats, are embodied.
S43, packaging the key attributes of the alarm data with the common key attributes to generate alarm features;
specifically, the alert data will be analyzed finely to find and extract key attributes that are repeated in different alert instances, such as specific IP addresses, abnormal access frequencies, or alert patterns of a certain class. For example, if a series of alarms from the same network area exhibit similar abnormal patterns, or multiple alarms all point to the same system failure, these are considered to be common key attributes. Next, the alarms with the same key attributes are clustered together to form a unified set of alarm characteristics. This process is similar to aggregating data points with similar characteristics into one data packet for subsequent analysis and processing. For example, all alarms directed to a particular server failure may be packaged into a feature set so that problems associated with the server may be identified and handled more quickly.
S44, in the second sub-queue, carrying out rule standardization on the alarm characteristics to generate characteristic rules;
specifically, this step includes analyzing and converting the set of alert features generated in step S43 to form a set of standardized processing rules. These rules define the manner in which certain types of alarms are handled based on the commonalities and patterns of alarm features. For example, if a set of alarm characteristics indicate that a certain class of alarms frequently point to the same network security problem, the system will create a specific processing rule based on this pattern, such as automatically forwarding the alarms to a security analysis team or triggering a specific security response protocol.
S45, creating a third sub-queue and generating a third sub-slice by taking the characteristic rule as a screening condition;
in some embodiments, the creation of the third sub-queue is first involved. The new queue is designed to specifically handle alarms that meet certain characterization rules that were established in previous steps based on patterns and trends of alarm data. Once the third sub-queue is created, the system then applies these feature rules as screening conditions to further screen and categorize the alert data that passed through the second sub-queue. For example, if a certain feature rule is for a particular type of network attack, then all alarms that meet this pattern will be automatically screened into the third sub-queue. In this way, the system can intensively process alarms with common characteristics, thereby improving processing efficiency and accuracy. By creating and using the third sub-queue, step S45 effectively refines the alarm processing flow into a more focused and efficient sub-process. Such a strategy not only helps to reduce processing time and increase response speed, but also enables the system to more accurately locate and resolve specific problems, thereby improving the performance and reliability of the overall alarm processing system.
S5, according to a third time of fragment matching processing rule;
specifically, alarms in each third shard are evaluated and compared to a series of predefined processing rules. These processing rules may include resource association analysis of alarms, repeated alarm compression masking, analysis of root alarms, and automatic notification dispatch of alarms. For example, if alarms in the third shard are primarily related to network security, the system may automatically forward those alarms to a network security team or trigger certain security response measures. Likewise, if the screened alarms coincide with a certain known system failure mode, the corresponding processing rules may include automatically executing a troubleshooting program or notifying a related technical support team. By precisely matching the alarms with the appropriate processing rules, step S5 ensures that each alarm can be processed quickly and efficiently. The method not only quickens the overall response speed to the alarm, but also improves the processing precision and reduces the possibility of errors or delays. In summary, step S5 significantly improves the efficiency and effectiveness of the alarm processing system by ensuring that each alarm is optimally processed.
The above embodiments are merely illustrative of the preferred embodiments of the present invention and are not intended to limit the scope of the present invention, and various modifications and improvements made by those skilled in the art to the technical solution of the present invention should fall within the scope of protection defined by the claims of the present invention without departing from the spirit of the design of the present invention.

Claims (5)

1. A method for receiving and processing real-time large concurrent alarms, comprising:
receiving alarm information in real time based on the distributed queue;
performing first fragmentation according to the IP of the alarm information to obtain an original alarm queue, performing analysis processing and standardization processing on the original alarm queue, and performing second fragmentation according to the IP of the alarm equipment and the alarm category to obtain a standardized alarm queue;
processing rules are implemented on the standardized alarm queues, and normal analysis is carried out according to the processed congestion degree to obtain congestion queues;
capturing and acquiring alarm characteristics in the crowded queue, and generating a third time of slicing according to the alarm characteristics;
according to the third time of fragment matching processing rule;
the first slicing according to the IP of the alarm information comprises the following steps:
extracting IP address information from the alarm information;
distributing alarm information to a plurality of first sub-queues according to the extracted IP address, wherein each sub-first sub-queue corresponds to a group of IP address range;
performing time stamp marking on the alarm information in each first sub-queue;
applying a filtering rule to each first sub-queue according to the time stamp sequence to obtain an original alarm queue;
the second slicing according to the alarm device IP and the alarm category includes:
classifying the alarm data into a plurality of second sub-queues based on the IP of the alarm device and the alarm class;
grouping the alarm data according to the alarm category in each second sub-queue to obtain a standardized alarm queue;
the normalcy analysis includes:
continuously monitoring alarm data in the standardized alarm queue, and identifying the crowded standardized alarm queue in the processing flow;
calculating average processing time and waiting time of each alarm category in a processing flow, and calculating normal congestion degree;
marking a second sub-queue with normal congestion degree higher than a preset value as a congestion queue;
executing no mark on the second sub-queue with the normal congestion degree lower than or equal to the preset value;
the capturing and acquiring the alarm features in the congestion queue comprises:
identifying an alert behavior having repeatability and patterning;
extracting key attributes of alarm data, including IP of alarm information, alarm frequency and alarm mode;
packaging the key attributes of the alarm data with the common key attributes to generate alarm characteristics;
the generating the third time slice according to the alarm feature comprises:
in the second sub-queue, carrying out rule standardization on the alarm characteristics to generate characteristic rules;
and creating a third sub-queue and taking the characteristic rule as a screening condition to generate a third time of slicing.
2. The method for receiving and processing real-time large concurrent alarms according to claim 1, wherein the processing rules include resource association analysis of alarms, repeated alarm compression shielding, analysis of root alarms and automatic notification dispatch of alarms.
3. The method for receiving and processing real-time large concurrent alarms according to claim 1, wherein said receiving alarm information in real time based on the distributed queue comprises:
disposing a plurality of receiving nodes, wherein the receiving nodes receive a preset alarm information source; and the receiving node sends alarm information to a message queue according to the time sequence.
4. The method for receiving and processing real-time large concurrent alarms according to claim 1, wherein the parsing and normalizing the original alarm queue comprises:
deconstructing and separating elements from each piece of alarm information of the original queue, wherein the elements comprise an alarm generation time stamp, an alarm level identifier, an alarm equipment IP, an alarm category and an alarm content description;
and normalizing the deconstructed elements into the same data format to obtain alarm data.
5. A method of receiving and processing real-time large concurrent alarms according to claim 2, characterized in that the processing rules are determined according to the alarm category.
CN202311749561.7A 2023-12-19 2023-12-19 Real-time large concurrent alarm receiving and processing method Active CN117424797B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311749561.7A CN117424797B (en) 2023-12-19 2023-12-19 Real-time large concurrent alarm receiving and processing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311749561.7A CN117424797B (en) 2023-12-19 2023-12-19 Real-time large concurrent alarm receiving and processing method

Publications (2)

Publication Number Publication Date
CN117424797A CN117424797A (en) 2024-01-19
CN117424797B true CN117424797B (en) 2024-03-01

Family

ID=89523412

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311749561.7A Active CN117424797B (en) 2023-12-19 2023-12-19 Real-time large concurrent alarm receiving and processing method

Country Status (1)

Country Link
CN (1) CN117424797B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105450445A (en) * 2015-11-17 2016-03-30 武汉日电光通信工业有限公司 High-performance alarm processing system under large capacity packet transmission system and method thereof
CN109218097A (en) * 2018-09-19 2019-01-15 山东浪潮云投信息科技有限公司 A kind of warning system and alarm method of cloud platform configurable alert rule
CN112671560A (en) * 2020-12-11 2021-04-16 广东电力通信科技有限公司 High-availability distributed real-time alarm processing method and system
CN113448812A (en) * 2021-07-15 2021-09-28 中国银行股份有限公司 Monitoring alarm method and device under micro-service scene
CN113724100A (en) * 2021-08-27 2021-11-30 广东电网有限责任公司 Power grid monitoring alarm message processing method of distributed cluster
CN114553682A (en) * 2022-02-25 2022-05-27 中国平安人寿保险股份有限公司 Real-time alarm method, system, computer equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105450445A (en) * 2015-11-17 2016-03-30 武汉日电光通信工业有限公司 High-performance alarm processing system under large capacity packet transmission system and method thereof
CN109218097A (en) * 2018-09-19 2019-01-15 山东浪潮云投信息科技有限公司 A kind of warning system and alarm method of cloud platform configurable alert rule
CN112671560A (en) * 2020-12-11 2021-04-16 广东电力通信科技有限公司 High-availability distributed real-time alarm processing method and system
CN113448812A (en) * 2021-07-15 2021-09-28 中国银行股份有限公司 Monitoring alarm method and device under micro-service scene
CN113724100A (en) * 2021-08-27 2021-11-30 广东电网有限责任公司 Power grid monitoring alarm message processing method of distributed cluster
CN114553682A (en) * 2022-02-25 2022-05-27 中国平安人寿保险股份有限公司 Real-time alarm method, system, computer equipment and storage medium

Also Published As

Publication number Publication date
CN117424797A (en) 2024-01-19

Similar Documents

Publication Publication Date Title
CN111885012B (en) Network situation perception method and system based on information acquisition of various network devices
US10917420B2 (en) Anomaly detection in a data stream
WO2020238810A1 (en) Alarm analysis method and related device
EP2487860B1 (en) Method and system for improving security threats detection in communication networks
US6182022B1 (en) Automated adaptive baselining and thresholding method and system
US6941367B2 (en) System for monitoring relevant events by comparing message relation key
JP2018533897A5 (en)
US7903657B2 (en) Method for classifying applications and detecting network abnormality by statistical information of packets and apparatus therefor
US10652103B2 (en) System and method for handling events involving computing systems and networks using fabric monitoring system
CN113553210A (en) Alarm data processing method, device, equipment and storage medium
CN107547228B (en) Implementation architecture of safe operation and maintenance management platform based on big data
CN105051696A (en) An improved streaming method and system for processing network metadata
CN112350854B (en) Flow fault positioning method, device, equipment and storage medium
CN110929896A (en) Security analysis method and device for system equipment
Kim et al. Unsupervised anomaly detection and root cause analysis in mobile networks
CN117424797B (en) Real-time large concurrent alarm receiving and processing method
CN115549953B (en) Network security alarm method and system
Chakor et al. Proposing a Layer to Integrate the Sub-classification of Monitoring Operations Based on AI and Big Data to Improve Efficiency of Information Technology Supervision
JP2019175070A (en) Alert notification device and alert notification method
KR101520103B1 (en) System and method for inference and surveillance of application fault of it service using functional partitioning
CN117640748B (en) Cross-platform equipment information acquisition system
CN113890814B (en) Fault perception model construction and fault perception method and system, equipment and medium
CN115686381B (en) Prediction method and device for storage cluster running state
CN112685214B (en) Method for analyzing poisoning machine and alarming through log collection
US11329868B2 (en) Automated network monitoring and control

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant