CN116991661A - Problem alarm system and method for software system - Google Patents

Problem alarm system and method for software system Download PDF

Info

Publication number
CN116991661A
CN116991661A CN202310895691.5A CN202310895691A CN116991661A CN 116991661 A CN116991661 A CN 116991661A CN 202310895691 A CN202310895691 A CN 202310895691A CN 116991661 A CN116991661 A CN 116991661A
Authority
CN
China
Prior art keywords
log data
data
information
alarm
cluster
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310895691.5A
Other languages
Chinese (zh)
Inventor
刘华
于泳洋
刘晓明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhiketong Technology Co ltd
Original Assignee
Beijing Zhiketong Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhiketong Technology Co ltd filed Critical Beijing Zhiketong Technology Co ltd
Priority to CN202310895691.5A priority Critical patent/CN116991661A/en
Publication of CN116991661A publication Critical patent/CN116991661A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/302Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a software system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/32Monitoring with visual or acoustical indication of the functioning of the machine
    • G06F11/324Display of status information
    • G06F11/327Alarm or error message display

Abstract

The embodiment of the invention discloses a problem alarm system and a method of a software system, which collect log data through a filecoat arranged at a client; distributing the log data to a Storm data analysis cluster through a first Kafka data distribution cluster; the Storm data analysis cluster performs stream computation processing on the received log data to obtain processed log data; distributing the processed log data to a document type storage engine through a second Kafka data distribution cluster to store the data; carrying out graphic processing on the stored processed log data, judging whether the processed log data has abnormal values, and if so, acquiring system problem information based on the abnormal values and the log data; and sending out alarm prompt information based on the system problem information. The problem alarm method of the software system solves the problem that the prior art cannot quickly discover, locate and solve faults occurring in the running process of the software system.

Description

Problem alarm system and method for software system
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a problem alarm system and method for a software system, an electronic device, and a storage medium.
Background
A series of problems can occur in the online running process of the software system, and huge business loss can be caused if the problems in the running process of the software system cannot be acquired in time. The monitoring service functions of some existing software systems are single, more monitoring on the aspects of software system hardware, such as cpu, memory, network and the like, cannot take comprehensive capabilities of interface performance monitoring, abnormal monitoring, alarming, log tracking and the like into account, and problems can be quickly found and positioned, so that the problems can be conveniently and quickly solved.
There is a need for a software business monitoring method that can quickly discover and locate problems, thereby facilitating quick solutions to the problems.
Disclosure of Invention
The embodiment of the invention aims to provide a problem alarm system, a method, electronic equipment and a storage medium of a software system, which are used for solving the problem that faults in the running process of the software system cannot be found, positioned and solved rapidly in the prior art.
In order to achieve the above object, an embodiment of the present invention provides a method for alarming a problem in a software system, the method specifically includes:
collecting log data through a filecoat installed at a client;
the log data are transmitted to a first Kafka data distribution cluster, and the log data are distributed to a Storm data analysis cluster through the first Kafka data distribution cluster;
the Storm data analysis cluster performs stream computation processing on the received log data to obtain processed log data;
transmitting the processed log data to a second Kafka data distribution cluster, and distributing the processed log data to a document type storage engine for data storage through the second Kafka data distribution cluster;
carrying out graphical processing on the stored processed log data, judging whether the processed log data has an abnormal value, and if so, acquiring system problem information based on the abnormal value and the log data;
and sending out alarm prompt information based on the system problem information.
Based on the technical scheme, the invention can also be improved as follows:
further, the collecting log data through the filebean installed at the client includes:
acquiring user information, and writing the user information and log acquisition parameters configured by the user into a filecoat default configuration file;
and when the filebean is installed on the client, verifying the user information.
Further, the collecting log data through the filebean installed at the client side further includes:
and after the filecoat is successfully started, carrying user information to interact with the first Kafka data distribution cluster so as to carry out data transmission.
Further, the collecting log data through the filebean installed at the client side further includes:
grading the log data based on an application scene, wherein the log data comprises application log data and performance log data;
recording service application information through the application log data, and monitoring service abnormality based on the service application information;
and monitoring system abnormality based on the performance information accessed through the performance log data recording interface.
Further, the performing the graphics processing on the stored processed log data, determining whether the processed log data has an outlier, if so, acquiring system problem information based on the outlier and the log data, and further including:
and determining the abnormal code corresponding to each abnormal type, and monitoring the abnormal condition corresponding to each abnormal code to determine the abnormal type corresponding to the system problem.
Further, the sending the alarm prompt information based on the system problem information includes:
configuring alarm rules;
the alarm rule includes: the current minute request quantity is larger than a first preset value, and alarming is started;
the current system abnormality rate is larger than a second preset value and early warning is started;
the current business abnormality rate is larger than a third preset value and early warning is started;
the current average execution time is larger than a fourth preset value to start early warning;
the current response time is larger than a fifth preset value to start alarming;
the current average rate of increase of the minute request is larger than a sixth preset value, and alarming is started;
the current response time starts to alarm when the cycle-to-cycle growth rate is larger than a seventh preset value;
and the current minute request volume ring rate of increase is larger than an eighth preset value to start alarming.
Further, the sending of the alarm prompt information based on the system problem information further includes:
configuring a sending channel of alarm prompt information, wherein the sending channel comprises a short message prompt, a mail prompt and a WeChat prompt;
the alarm prompt information comprises alarm product line information, alarm application name information, alarm method information, alarm value information, alarm description information and trigger time information.
A problem alert system for a software system, comprising:
the filecoat module is arranged at the client and used for collecting log data;
a first Kafka data distribution cluster for distributing the log data to a Storm data analysis cluster;
the Storm data analysis cluster is used for carrying out stream computation processing on the received log data to obtain processed log data;
the second Kafka data distribution cluster is used for distributing the processed log data to a document type storage engine for data storage;
the abnormal value acquisition module is used for carrying out graphic processing on the stored processed log data, judging whether the processed log data has abnormal values or not, and if so, acquiring system problem information based on the abnormal values and the log data;
and the alarm prompt module is used for sending alarm prompt information based on the system problem information.
An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method when the computer program is executed.
A non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method.
The embodiment of the invention has the following advantages:
according to the problem alarming method of the software system, log data are collected through a filecoat installed at a client; the log data are transmitted to a first Kafka data distribution cluster, and the log data are distributed to a Storm data analysis cluster through the first Kafka data distribution cluster; the Storm data analysis cluster performs stream computation processing on the received log data to obtain processed log data; transmitting the processed log data to a second Kafka data distribution cluster, and distributing the processed log data to a document type storage engine for data storage through the second Kafka data distribution cluster; carrying out graphical processing on the stored processed log data, judging whether the processed log data has an abnormal value, and if so, acquiring system problem information based on the abnormal value and the log data; and sending out alarm prompt information based on the system problem information, so that the problem that faults in the running process of a software system cannot be found, positioned and solved rapidly in the prior art is solved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It will be apparent to those skilled in the art from this disclosure that the drawings described below are merely exemplary and that other embodiments may be derived from the drawings provided without undue effort.
The structures, proportions, sizes, etc. shown in the present specification are shown only for the purposes of illustration and description, and are not intended to limit the scope of the invention, which is defined by the claims, so that any structural modifications, changes in proportions, or adjustments of sizes, which do not affect the efficacy or the achievement of the present invention, should fall within the scope of the invention.
FIG. 1 is a flow chart of a problem alert method of a software system of the present invention;
FIG. 2 is a first architecture diagram of a problem alert system of the software system of the present invention;
FIG. 3 is a graph of performance versus the software system of the present invention;
FIG. 4 is a flow monitoring diagram of a software system of the present invention;
FIG. 5 is an anomaly monitoring graph of the software system of the present invention;
FIG. 6 is a subdivision anomaly monitoring graph of the software system of the present invention;
fig. 7 is a schematic diagram of an entity structure of an electronic device according to the present invention.
Wherein the reference numerals are as follows:
the system comprises a filebean module 10, a first Kafka data distribution cluster 20, a storm data analysis cluster 30, a second Kafka data distribution cluster 40, an outlier acquisition module 50, an alarm prompting module 60, an electronic device 70, a processor 701, a memory 702 and a bus 703.
Detailed Description
Other advantages and advantages of the present invention will become apparent to those skilled in the art from the following detailed description, which, by way of illustration, is to be read in connection with certain specific embodiments, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Examples
Fig. 1 is a flowchart of an embodiment of a problem alarm method of a software system according to the present invention, as shown in fig. 1, and the problem alarm method of a software system according to the embodiment of the present invention includes the following steps:
s101, acquiring log data through a filecoat installed at a client;
specifically, filebean is a lightweight transport for forwarding and concentrating log data. Filebean monitors the log files or locations you specify, collects log events, and forwards them to the elastomer search or logflash for indexing.
The filecoat works as follows: when filebean is started, it will start one or more inputs that will be looked up in the locations specified for the log data. For each log found by filebean, filebean will launch the collector. Each collector reads a single log to obtain new content and sends new log data to libbreak, which aggregates events and sends aggregated data to the output configured for filefloat.
Acquiring user information, and writing the user information and log acquisition parameters configured by the user into a filecoat default configuration file;
and when the filebean is installed on the client, verifying the user information.
After the filebean is successfully started, the filebean interacts with the first Kafka data distribution cluster 20 to perform data transmission.
Grading the log data based on an application scene, wherein the log data comprises application log data and performance log data;
recording service application information through the application log data, monitoring service abnormality based on the service application information, facilitating problem investigation and log tracking by a developer, and finding problems through fast positioning and executing processes of parameter information recorded in a log;
and monitoring system abnormality based on the performance information accessed through the performance log data recording interface. And the subsequent analysis of the performance is convenient.
The log format adopts a unified log processing frame, and the application log and the performance log are automatically processed in the frame, so that service development is not required to pay attention to implementation details, and the method has no invasiveness to the service. The developer only needs to print the application log according to a fixed method.
Each application service records its own log, including application log, standard, biglog, performancelog, nginx, which is recorded under a server specific log path, so that filebean on each server monitors and collects.
Related log path specification/home/eyelog/{ service_name }, each service creates a root directory, 3 subdirectories are placed under each service root directory, and application log is a new application log, corresponding to an application log index in kibana, biglog: gateway log of new version, index to biglog-in kibana, performancelog: the qian eye performance log, corresponding to the performancelog, has been closed;
the method comprises the steps that an nginx access log is located at/home/wwlogs/lower of each server, and indexes in corresponding kibana are ginx-; the old php, nodejs log, under catalog/home/nodeLogs/is divided into 2 categories, common log record is of-out-0.log, and index in corresponding kibana is of standard-out log;
php, nodejs error log is recorded as: -err-0.log, and index in corresponding kibana is standard-errlog-;
the java framework of the method already encapsulates the tool class of the kilo-eye related log. The indices in the corresponding kibana are respectively:
applicationlog-*;standard-out-*;standard-errlog-*;
recording access logs of each station nginx, wherein the logs are stored in an nginx index;
the method comprises the steps that a request log is generally recorded in a gateway layer, and the request log mainly comprises a gateway log and a gateway log of a java mobile api at present, wherein the log is stored in a biglog index;
when searching the log, determining what log to search, locating the index in which the log is located, and developing and testing a common set of test; the pre-sending and the production are commonly used, the corresponding index is selected, the query time range is shortened as much as possible, and the keyword is searched: { filename }: "keyword"; when the query speed is slow, the specific index is precisely: for example, the rule of each index name is application log- { product_line } - { app_name } - { yyyyy.mm.dd }. Log, and the index selects the corresponding product_line and app_name index, which greatly reduces the query scope; the query scope, such as time, server host, etc., is minimized.
S102, log data are transmitted to the first Kafka data distribution cluster 20, and the log data are distributed to the Storm data analysis cluster 30 through the first Kafka data distribution cluster 20;
specifically, kafka is a high-throughput distributed publish-subscribe messaging system (message engine system) that can handle all action flow data of consumers in websites. Such actions (web browsing, searching and other user actions) are a key factor in many social functions on modern networks. These data are typically addressed by processing logs and log aggregations due to throughput requirements. This is a viable solution for log data and offline analysis systems like Hadoop, but with the limitation of requiring real-time processing. The purpose of Kafka is to unify on-line and off-line message processing through the Hadoop parallel loading mechanism, and also to provide real-time messages through the clusters.
System a sends a message to kafka (message engine system) and system B reads a sent message from kafka. Whereas kafka is an intermediate quotient.
A messaging system is responsible for transferring data from one application to another application, and an application only needs to focus on the data, and does not need to focus on how the data is transferred between two or more applications. Distributed messaging is based on reliable message queues to asynchronously transfer messages between client applications and a messaging system. There are two main modes of messaging: point-to-point delivery mode, publish-subscribe mode. Most messaging systems use a publish-subscribe mode. Kafka is a publish-subscribe model.
S103, the Storm data analysis cluster 30 performs stream calculation processing on the received log data to obtain processed log data;
in particular, storm is an open source distributed computing system for processing real-time data streams. The analysis of data in Storm involves mainly the following steps:
define data sources (sources), which are sources of data streams in a Storm, can be any data source, such as Kafka, rabbitMQ, etc. A Spout needs to be defined to read data from the data source.
Define data processing units (Bolts), which are the main units in Storm that process data. You can define one or more Bolts to process the data received from spouses. Bolts can perform any you need operation of filtering, functions, aggregation, connections, database interactions, etc.
Topology is defined, which is a network of spots and Bolts, defining how data flows in the system. You need to define a topology to specify which Bolt receives data from which Spout and how the data passes between Bolts.
The topology is deployed and executed, once defined, it can be deployed and executed on a Storm cluster. Storm will automatically distribute the data and process them.
And (3) storing the processing result into a database according to the requirement information, or visualizing the processing result through a real-time instrument board so as to perform further analysis.
S104, the processed log data is transmitted to the second Kafka data distribution cluster 40, and the processed log data is distributed to the document type storage engine for data storage through the second Kafka data distribution cluster 40.
S105, carrying out graphic processing on the stored processed log data, judging whether the processed log data has abnormal values, and if so, acquiring system problem information based on the abnormal values and the log data.
Specifically, after the calculated performance data is subjected to persistent storage, the performance data can be compared in a graphical mode, and when the performance changes are visually shown through the graphical mode, the change nodes of the performance are quickly found, so that the driving optimization is facilitated.
As shown in fig. 3: initial performance at 18 days 6 is better than 16 days 6, with a sudden time-consuming increase at 2 points in the first red circle, indicating that there must be an event at this point that reduces the performance of the service. At the second red circle, 3 points for 30 minutes, time consuming recovery. It can thus be concluded that there is an event that has an impact on performance during the 2-to 3-point 30 minute period.
As shown in fig. 4, the flow monitoring can perform multi-date comparison, visually sense the flow change through a graphical interface, quickly find the flow peak-valley value, and provide a reference of flow dimension for problem positioning. And meanwhile, the flow prediction method is used for providing data support during the period of large-scale activity, so that the service capacity can be conveniently estimated.
The system provides monitoring of outliers for finding outlier variations. The abnormality is classified into a business abnormality and a system abnormality, and the business abnormality refers to an abnormality which needs to be monitored on a business, such as insufficient inventory, frequent login and the like. System anomalies refer to system-level anomalies, such as network anomalies, service unavailability anomalies, and so forth.
By means of the anomaly monitoring, anomaly changes within a period of time can be quickly found, and by means of anomaly values and combination with logs, system problems can be quickly located.
As shown in fig. 5, it can be found that both system anomalies and business anomalies suddenly increased during the 2:06 to 3:36 period and lasted for 1.5 hours.
And determining the abnormal code corresponding to each abnormal type, and monitoring the abnormal condition corresponding to each abnormal code to determine the abnormal type corresponding to the system problem.
As shown in fig. 6, system and business anomalies can be found by anomaly monitoring, but it is not possible to see what type of anomaly is in particular. Then it is necessary to refine the anomaly type to facilitate finer granularity of anomaly point discovery. Thus providing monitoring of abnormal subdivision. The abnormal distinction can be made according to the abnormal codes, so that the abnormal condition corresponding to each abnormal code is monitored.
S106, sending out alarm prompt information based on the system problem information;
specifically, a sending channel of alarm prompt information is configured, wherein the sending channel comprises a short message prompt, a mail prompt and a micro message prompt;
configuring alarm rules;
the alarm rule includes: the current minute request quantity is larger than a first preset value, and alarming is started; the current system abnormality rate is larger than a second preset value and early warning is started; the current business abnormality rate is larger than a third preset value and early warning is started; the current average execution time is larger than a fourth preset value to start early warning; the current response time is larger than a fifth preset value to start alarming; the current average rate of increase of the minute request is larger than a sixth preset value, and alarming is started; the current response time starts to alarm when the cycle-to-cycle growth rate is larger than a seventh preset value; and the current minute request volume ring rate of increase is larger than an eighth preset value to start alarming. Preferably, the first to eighth preset values are 150%.
The alarm prompt information comprises alarm product line information, alarm application name information, alarm method information, alarm value information, alarm description information and trigger time information.
The control interface can intuitively see the changes and contrast conditions of flow, abnormality and the like, however, the alarm capacity is needed for fast sensing when abnormality occurs. The scheme provides multi-dimensional monitoring alarm rules such as minute request quantity, response time, system abnormality rate, business abnormality rate, 500, 404 abnormality rate, corresponding Zhou Tongbi, ring ratio and the like, and supports flexible configuration rules and notification modes. The notification modes comprise mail, enterprise micro, short message and the like.
According to the problem alarm method of the software system, log data are collected through a filecoat installed at a client; the log data are transmitted to a first Kafka data distribution cluster 20, and the log data are distributed to a Storm data analysis cluster 30 through the first Kafka data distribution cluster 20; the Storm data analysis cluster 30 performs stream computation processing on the received log data to obtain processed log data; transmitting the processed log data to a second Kafka data distribution cluster 40, and distributing the processed log data to a document type storage engine for data storage through the second Kafka data distribution cluster 40; carrying out graphical processing on the stored processed log data, judging whether the processed log data has an abnormal value, and if so, acquiring system problem information based on the abnormal value and the log data; and sending out alarm prompt information based on the system problem information. The method solves the problem that faults in the running process of the software system cannot be found, positioned and solved quickly in the prior art.
FIG. 2 is a flow chart of an embodiment of a problem alert system of the software system of the present invention; as shown in fig. 2, the problem alarm system of a software system provided by the embodiment of the invention includes the following steps:
the filecoat module 10 is installed at the client and used for collecting log data;
a first Kafka data distribution cluster 20 for distributing said log data to a Storm data analysis cluster 30;
a Storm data analysis cluster 30 for performing a stream computation process on the received log data to obtain processed log data;
a second Kafka data distribution cluster 40 for distributing the processed log data to a document storage engine for data storage;
an outlier obtaining module 50, configured to graphically process the stored processed log data, determine whether the processed log data has an outlier, and if yes, obtain system problem information based on the outlier and the log data;
the alarm prompting module 60 is configured to issue alarm prompting information based on the system problem information.
The filebean module 10 is further configured to:
acquiring user information, and writing the user information and log acquisition parameters configured by the user into a filecoat default configuration file;
and when the filebean is installed on the client, verifying the user information.
After the filebean is successfully started, the filebean interacts with the first Kafka data distribution cluster 20 to perform data transmission.
Grading the log data based on an application scene, wherein the log data comprises application log data and performance log data;
recording service application information through the application log data, and monitoring service abnormality based on the service application information;
and monitoring system abnormality based on the performance information accessed through the performance log data recording interface.
The alarm prompting module 60 is further configured to:
configuring alarm rules;
the alarm rule includes: the current minute request quantity is larger than a first preset value, and alarming is started;
the current system abnormality rate is larger than a second preset value and early warning is started;
the current business abnormality rate is larger than a third preset value and early warning is started;
the current average execution time is larger than a fourth preset value to start early warning;
the current response time is larger than a fifth preset value to start alarming;
the current average rate of increase of the minute request is larger than a sixth preset value, and alarming is started;
the current response time starts to alarm when the cycle-to-cycle growth rate is larger than a seventh preset value;
and the current minute request volume ring rate of increase is larger than an eighth preset value to start alarming.
And determining the abnormal code corresponding to each abnormal type, and monitoring the abnormal condition corresponding to each abnormal code to determine the abnormal type corresponding to the system problem.
The alarm prompting module 60 is further configured to;
configuring a sending channel of alarm prompt information, wherein the sending channel comprises a short message prompt, a mail prompt and a WeChat prompt;
the alarm prompt information comprises alarm product line information, alarm application name information, alarm method information, alarm value information, alarm description information and trigger time information.
According to the problem alarm system of the software system, log data are collected through a filecoat module 10 installed at a client; distributing the log data to a Storm data analysis cluster 30 by a first Kafka data distribution cluster 20; performing streaming calculation processing on the received log data through a Storm data analysis cluster 30 to obtain processed log data; distributing the processed log data to a document type storage engine through a second Kafka data distribution cluster 40 for data storage; graphically processing the stored processed log data through an abnormal value acquisition module 50, judging whether the processed log data has an abnormal value, and if so, acquiring system problem information based on the abnormal value and the log data; an alarm prompt message is issued by the alarm prompt module 60 based on the system problem information. The problem alarm method of the software system solves the problem that faults in the running process of the software system cannot be found, positioned and solved rapidly in the prior art.
Fig. 7 is a schematic diagram of an entity structure of an electronic device according to an embodiment of the present invention, as shown in fig. 7, an electronic device 70 includes: a processor 701, a memory 702, and a bus 703;
wherein, the processor 701 and the memory 702 complete communication with each other through the bus 703;
the processor 701 is configured to invoke program instructions in the memory 702 to perform the methods provided by the above-described method embodiments, for example, including: collecting log data through a filecoat installed at a client; the log data are transmitted to a first Kafka data distribution cluster 20, and the log data are distributed to a Storm data analysis cluster 30 through the first Kafka data distribution cluster 20; the Storm data analysis cluster 30 performs stream computation processing on the received log data to obtain processed log data; transmitting the processed log data to a second Kafka data distribution cluster 40, and distributing the processed log data to a document type storage engine for data storage through the second Kafka data distribution cluster 40; carrying out graphical processing on the stored processed log data, judging whether the processed log data has an abnormal value, and if so, acquiring system problem information based on the abnormal value and the log data; and sending out alarm prompt information based on the system problem information.
The present embodiment provides a non-transitory computer readable storage medium storing computer instructions that cause a computer to perform the methods provided by the above-described method embodiments, for example, including: collecting log data through a filecoat installed at a client; the log data are transmitted to a first Kafka data distribution cluster, and the log data are distributed to a Storm data analysis cluster through the first Kafka data distribution cluster; the Storm data analysis cluster performs stream computation processing on the received log data to obtain processed log data; transmitting the processed log data to a second Kafka data distribution cluster, and distributing the processed log data to a document type storage engine for data storage through the second Kafka data distribution cluster; carrying out graphical processing on the stored processed log data, judging whether the processed log data has an abnormal value, and if so, acquiring system problem information based on the abnormal value and the log data; and sending out alarm prompt information based on the system problem information.
Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware associated with program instructions, where the foregoing program may be stored in a computer readable storage medium, and when executed, the program performs steps including the above method embodiments; and the aforementioned storage medium includes: various storage media such as ROM, RAM, magnetic or optical disks may store program code.
The apparatus embodiments described above are merely illustrative, wherein elements illustrated as separate elements may or may not be physically separate, and elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on such understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the embodiments or the methods of some parts of the embodiments.
While the invention has been described in detail in the foregoing general description and specific examples, it will be apparent to those skilled in the art that modifications and improvements can be made thereto. Accordingly, such modifications or improvements may be made without departing from the spirit of the invention and are intended to be within the scope of the invention as claimed.

Claims (10)

1. The problem alarming method of the software system is characterized by comprising the following steps:
collecting log data through a filecoat installed at a client;
the log data are transmitted to a first Kafka data distribution cluster, and the log data are distributed to a Storm data analysis cluster through the first Kafka data distribution cluster;
the Storm data analysis cluster performs stream computation processing on the received log data to obtain processed log data;
transmitting the processed log data to a second Kafka data distribution cluster, and distributing the processed log data to a document type storage engine for data storage through the second Kafka data distribution cluster;
carrying out graphical processing on the stored processed log data, judging whether the processed log data has an abnormal value, and if so, acquiring system problem information based on the abnormal value and the log data;
and sending out alarm prompt information based on the system problem information.
2. The method of claim 1, wherein the collecting log data by a filebean installed at a client comprises:
acquiring user information, and writing the user information and log acquisition parameters configured by the user into a filecoat default configuration file;
and when the filebean is installed on the client, verifying the user information.
3. The method for alarming problems in a software system according to claim 1, wherein the collecting log data by a filebean installed at a client side further comprises:
and after the filecoat is successfully started, carrying user information to interact with the first Kafka data distribution cluster so as to carry out data transmission.
4. The method for alarming problems in a software system according to claim 1, wherein the collecting log data by a filebean installed at a client side further comprises:
grading the log data based on an application scene, wherein the log data comprises application log data and performance log data;
recording service application information through the application log data, and monitoring service abnormality based on the service application information;
and monitoring system abnormality based on the performance information accessed through the performance log data recording interface.
5. The method of claim 4, wherein graphically processing the stored processed log data to determine whether an outlier exists in the processed log data, and if so, obtaining system problem information based on the outlier and the log data, comprises:
and determining the abnormal code corresponding to each abnormal type, and monitoring the abnormal condition corresponding to each abnormal code to determine the abnormal type corresponding to the system problem.
6. The method of claim 1, wherein the sending an alarm prompt message based on the system problem information comprises:
configuring alarm rules;
the alarm rule includes: the current minute request quantity is larger than a first preset value, and alarming is started;
the current system abnormality rate is larger than a second preset value and early warning is started;
the current business abnormality rate is larger than a third preset value and early warning is started;
the current average execution time is larger than a fourth preset value to start early warning;
the current response time is larger than a fifth preset value to start alarming;
the current average rate of increase of the minute request is larger than a sixth preset value, and alarming is started;
the current response time starts to alarm when the cycle-to-cycle growth rate is larger than a seventh preset value;
and the current minute request volume ring rate of increase is larger than an eighth preset value to start alarming.
7. The method of claim 6, wherein the sending an alarm prompt message based on the system problem information, further comprises:
configuring a sending channel of alarm prompt information, wherein the sending channel comprises a short message prompt, a mail prompt and a WeChat prompt;
the alarm prompt information comprises alarm product line information, alarm application name information, alarm method information, alarm value information, alarm description information and trigger time information.
8. A problem alert system for a software system, comprising:
the filecoat module is arranged at the client and used for collecting log data;
a first Kafka data distribution cluster for distributing the log data to a Storm data analysis cluster;
the Storm data analysis cluster is used for carrying out stream computation processing on the received log data to obtain processed log data;
the second Kafka data distribution cluster is used for distributing the processed log data to a document type storage engine for data storage;
the abnormal value acquisition module is used for carrying out graphic processing on the stored processed log data, judging whether the processed log data has abnormal values or not, and if so, acquiring system problem information based on the abnormal values and the log data;
and the alarm prompt module is used for sending alarm prompt information based on the system problem information.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any one of claims 1 to 7 when the computer program is executed.
10. A non-transitory computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method according to any of claims 1 to 7.
CN202310895691.5A 2023-07-20 2023-07-20 Problem alarm system and method for software system Pending CN116991661A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310895691.5A CN116991661A (en) 2023-07-20 2023-07-20 Problem alarm system and method for software system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310895691.5A CN116991661A (en) 2023-07-20 2023-07-20 Problem alarm system and method for software system

Publications (1)

Publication Number Publication Date
CN116991661A true CN116991661A (en) 2023-11-03

Family

ID=88527710

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310895691.5A Pending CN116991661A (en) 2023-07-20 2023-07-20 Problem alarm system and method for software system

Country Status (1)

Country Link
CN (1) CN116991661A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110309030A (en) * 2019-07-05 2019-10-08 亿玛创新网络(天津)有限公司 Log analysis monitoring system and method based on ELK and Zabbix
CN110347716A (en) * 2019-05-27 2019-10-18 中国平安人寿保险股份有限公司 Daily record data processing method, device, terminal and storage medium
CN113157545A (en) * 2021-05-20 2021-07-23 京东方科技集团股份有限公司 Method, device and equipment for processing service log and storage medium
US20220309053A1 (en) * 2021-06-25 2022-09-29 Apollo Intelligent Connectivity (Beijing) Technology Co., Ltd. Method and apparatus of auditing log, electronic device, and medium
CN116414795A (en) * 2023-04-04 2023-07-11 中国民航信息网络股份有限公司 Ticket data processing method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110347716A (en) * 2019-05-27 2019-10-18 中国平安人寿保险股份有限公司 Daily record data processing method, device, terminal and storage medium
CN110309030A (en) * 2019-07-05 2019-10-08 亿玛创新网络(天津)有限公司 Log analysis monitoring system and method based on ELK and Zabbix
CN113157545A (en) * 2021-05-20 2021-07-23 京东方科技集团股份有限公司 Method, device and equipment for processing service log and storage medium
US20220309053A1 (en) * 2021-06-25 2022-09-29 Apollo Intelligent Connectivity (Beijing) Technology Co., Ltd. Method and apparatus of auditing log, electronic device, and medium
CN116414795A (en) * 2023-04-04 2023-07-11 中国民航信息网络股份有限公司 Ticket data processing method and device

Similar Documents

Publication Publication Date Title
CN110661659B (en) Alarm method, device and system and electronic equipment
CN108365985A (en) A kind of cluster management method, device, terminal device and storage medium
US20190235941A1 (en) Self-monitor for computing devices of a distributed computing system
CN110309030A (en) Log analysis monitoring system and method based on ELK and Zabbix
CN110888783A (en) Monitoring method and device of micro-service system and electronic equipment
CN106940677A (en) One kind application daily record data alarm method and device
CN110750426A (en) Service state monitoring method and device, electronic equipment and readable storage medium
CN111782477B (en) Abnormal log monitoring method and device, computer equipment and storage medium
CN114124655A (en) Network monitoring method, system, device, computer equipment and storage medium
CN113220534A (en) Cluster multi-dimensional anomaly monitoring method, device, equipment and storage medium
CN116991661A (en) Problem alarm system and method for software system
CN116594840A (en) Log fault acquisition and analysis method, system, equipment and medium based on ELK
CN116566873A (en) ELK-based automatic log analysis method, system and storage medium
CN116431324A (en) Edge system based on Kafka high concurrency data acquisition and distribution
NL2030719B1 (en) Microservice application observability system
CN114866606A (en) Micro-service management system
CN115391286A (en) Link tracking data management method, device, equipment and storage medium
CN113765717A (en) Operation and maintenance management system based on secret-related special computing platform
Yuan et al. Design and implementation of accelerator control monitoring system
CN110896545B (en) Online charging roaming fault positioning method, related device and storage medium
CN112260902A (en) Network equipment monitoring method, device, equipment and storage medium
CN112131077A (en) Fault node positioning method and device and database cluster system
CN114090382B (en) Health inspection method and device for super-converged cluster
CN116431872B (en) Observable system and service observing method based on observable system
US20240077866A1 (en) Information management apparatus, information management method, and computer-readable recording medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination