CN116991661A - Problem alarm system and method for software system - Google Patents
Problem alarm system and method for software system Download PDFInfo
- Publication number
- CN116991661A CN116991661A CN202310895691.5A CN202310895691A CN116991661A CN 116991661 A CN116991661 A CN 116991661A CN 202310895691 A CN202310895691 A CN 202310895691A CN 116991661 A CN116991661 A CN 116991661A
- Authority
- CN
- China
- Prior art keywords
- log data
- data
- information
- alarm
- cluster
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 57
- 230000002159 abnormal effect Effects 0.000 claims abstract description 51
- 238000009826 distribution Methods 0.000 claims abstract description 44
- 238000012545 processing Methods 0.000 claims abstract description 32
- 238000007405 data analysis Methods 0.000 claims abstract description 25
- 238000003860 storage Methods 0.000 claims abstract description 22
- 230000005856 abnormality Effects 0.000 claims description 26
- 238000012544 monitoring process Methods 0.000 claims description 26
- 238000013500 data storage Methods 0.000 claims description 11
- 230000004044 response Effects 0.000 claims description 9
- 238000004590 computer program Methods 0.000 claims description 7
- 230000005540 biological transmission Effects 0.000 claims description 4
- 230000008569 process Effects 0.000 abstract description 13
- 238000004458 analytical method Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000002776 aggregation Effects 0.000 description 2
- 238000004220 aggregation Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 229920001971 elastomer Polymers 0.000 description 1
- 239000000806 elastomer Substances 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3003—Monitoring arrangements specially adapted to the computing system or computing system component being monitored
- G06F11/302—Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a software system
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3065—Monitoring arrangements determined by the means or processing involved in reporting the monitored data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/32—Monitoring with visual or acoustical indication of the functioning of the machine
- G06F11/324—Display of status information
- G06F11/327—Alarm or error message display
Abstract
The embodiment of the invention discloses a problem alarm system and a method of a software system, which collect log data through a filecoat arranged at a client; distributing the log data to a Storm data analysis cluster through a first Kafka data distribution cluster; the Storm data analysis cluster performs stream computation processing on the received log data to obtain processed log data; distributing the processed log data to a document type storage engine through a second Kafka data distribution cluster to store the data; carrying out graphic processing on the stored processed log data, judging whether the processed log data has abnormal values, and if so, acquiring system problem information based on the abnormal values and the log data; and sending out alarm prompt information based on the system problem information. The problem alarm method of the software system solves the problem that the prior art cannot quickly discover, locate and solve faults occurring in the running process of the software system.
Description
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a problem alarm system and method for a software system, an electronic device, and a storage medium.
Background
A series of problems can occur in the online running process of the software system, and huge business loss can be caused if the problems in the running process of the software system cannot be acquired in time. The monitoring service functions of some existing software systems are single, more monitoring on the aspects of software system hardware, such as cpu, memory, network and the like, cannot take comprehensive capabilities of interface performance monitoring, abnormal monitoring, alarming, log tracking and the like into account, and problems can be quickly found and positioned, so that the problems can be conveniently and quickly solved.
There is a need for a software business monitoring method that can quickly discover and locate problems, thereby facilitating quick solutions to the problems.
Disclosure of Invention
The embodiment of the invention aims to provide a problem alarm system, a method, electronic equipment and a storage medium of a software system, which are used for solving the problem that faults in the running process of the software system cannot be found, positioned and solved rapidly in the prior art.
In order to achieve the above object, an embodiment of the present invention provides a method for alarming a problem in a software system, the method specifically includes:
collecting log data through a filecoat installed at a client;
the log data are transmitted to a first Kafka data distribution cluster, and the log data are distributed to a Storm data analysis cluster through the first Kafka data distribution cluster;
the Storm data analysis cluster performs stream computation processing on the received log data to obtain processed log data;
transmitting the processed log data to a second Kafka data distribution cluster, and distributing the processed log data to a document type storage engine for data storage through the second Kafka data distribution cluster;
carrying out graphical processing on the stored processed log data, judging whether the processed log data has an abnormal value, and if so, acquiring system problem information based on the abnormal value and the log data;
and sending out alarm prompt information based on the system problem information.
Based on the technical scheme, the invention can also be improved as follows:
further, the collecting log data through the filebean installed at the client includes:
acquiring user information, and writing the user information and log acquisition parameters configured by the user into a filecoat default configuration file;
and when the filebean is installed on the client, verifying the user information.
Further, the collecting log data through the filebean installed at the client side further includes:
and after the filecoat is successfully started, carrying user information to interact with the first Kafka data distribution cluster so as to carry out data transmission.
Further, the collecting log data through the filebean installed at the client side further includes:
grading the log data based on an application scene, wherein the log data comprises application log data and performance log data;
recording service application information through the application log data, and monitoring service abnormality based on the service application information;
and monitoring system abnormality based on the performance information accessed through the performance log data recording interface.
Further, the performing the graphics processing on the stored processed log data, determining whether the processed log data has an outlier, if so, acquiring system problem information based on the outlier and the log data, and further including:
and determining the abnormal code corresponding to each abnormal type, and monitoring the abnormal condition corresponding to each abnormal code to determine the abnormal type corresponding to the system problem.
Further, the sending the alarm prompt information based on the system problem information includes:
configuring alarm rules;
the alarm rule includes: the current minute request quantity is larger than a first preset value, and alarming is started;
the current system abnormality rate is larger than a second preset value and early warning is started;
the current business abnormality rate is larger than a third preset value and early warning is started;
the current average execution time is larger than a fourth preset value to start early warning;
the current response time is larger than a fifth preset value to start alarming;
the current average rate of increase of the minute request is larger than a sixth preset value, and alarming is started;
the current response time starts to alarm when the cycle-to-cycle growth rate is larger than a seventh preset value;
and the current minute request volume ring rate of increase is larger than an eighth preset value to start alarming.
Further, the sending of the alarm prompt information based on the system problem information further includes:
configuring a sending channel of alarm prompt information, wherein the sending channel comprises a short message prompt, a mail prompt and a WeChat prompt;
the alarm prompt information comprises alarm product line information, alarm application name information, alarm method information, alarm value information, alarm description information and trigger time information.
A problem alert system for a software system, comprising:
the filecoat module is arranged at the client and used for collecting log data;
a first Kafka data distribution cluster for distributing the log data to a Storm data analysis cluster;
the Storm data analysis cluster is used for carrying out stream computation processing on the received log data to obtain processed log data;
the second Kafka data distribution cluster is used for distributing the processed log data to a document type storage engine for data storage;
the abnormal value acquisition module is used for carrying out graphic processing on the stored processed log data, judging whether the processed log data has abnormal values or not, and if so, acquiring system problem information based on the abnormal values and the log data;
and the alarm prompt module is used for sending alarm prompt information based on the system problem information.
An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the method when the computer program is executed.
A non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method.
The embodiment of the invention has the following advantages:
according to the problem alarming method of the software system, log data are collected through a filecoat installed at a client; the log data are transmitted to a first Kafka data distribution cluster, and the log data are distributed to a Storm data analysis cluster through the first Kafka data distribution cluster; the Storm data analysis cluster performs stream computation processing on the received log data to obtain processed log data; transmitting the processed log data to a second Kafka data distribution cluster, and distributing the processed log data to a document type storage engine for data storage through the second Kafka data distribution cluster; carrying out graphical processing on the stored processed log data, judging whether the processed log data has an abnormal value, and if so, acquiring system problem information based on the abnormal value and the log data; and sending out alarm prompt information based on the system problem information, so that the problem that faults in the running process of a software system cannot be found, positioned and solved rapidly in the prior art is solved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It will be apparent to those skilled in the art from this disclosure that the drawings described below are merely exemplary and that other embodiments may be derived from the drawings provided without undue effort.
The structures, proportions, sizes, etc. shown in the present specification are shown only for the purposes of illustration and description, and are not intended to limit the scope of the invention, which is defined by the claims, so that any structural modifications, changes in proportions, or adjustments of sizes, which do not affect the efficacy or the achievement of the present invention, should fall within the scope of the invention.
FIG. 1 is a flow chart of a problem alert method of a software system of the present invention;
FIG. 2 is a first architecture diagram of a problem alert system of the software system of the present invention;
FIG. 3 is a graph of performance versus the software system of the present invention;
FIG. 4 is a flow monitoring diagram of a software system of the present invention;
FIG. 5 is an anomaly monitoring graph of the software system of the present invention;
FIG. 6 is a subdivision anomaly monitoring graph of the software system of the present invention;
fig. 7 is a schematic diagram of an entity structure of an electronic device according to the present invention.
Wherein the reference numerals are as follows:
the system comprises a filebean module 10, a first Kafka data distribution cluster 20, a storm data analysis cluster 30, a second Kafka data distribution cluster 40, an outlier acquisition module 50, an alarm prompting module 60, an electronic device 70, a processor 701, a memory 702 and a bus 703.
Detailed Description
Other advantages and advantages of the present invention will become apparent to those skilled in the art from the following detailed description, which, by way of illustration, is to be read in connection with certain specific embodiments, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Examples
Fig. 1 is a flowchart of an embodiment of a problem alarm method of a software system according to the present invention, as shown in fig. 1, and the problem alarm method of a software system according to the embodiment of the present invention includes the following steps:
s101, acquiring log data through a filecoat installed at a client;
specifically, filebean is a lightweight transport for forwarding and concentrating log data. Filebean monitors the log files or locations you specify, collects log events, and forwards them to the elastomer search or logflash for indexing.
The filecoat works as follows: when filebean is started, it will start one or more inputs that will be looked up in the locations specified for the log data. For each log found by filebean, filebean will launch the collector. Each collector reads a single log to obtain new content and sends new log data to libbreak, which aggregates events and sends aggregated data to the output configured for filefloat.
Acquiring user information, and writing the user information and log acquisition parameters configured by the user into a filecoat default configuration file;
and when the filebean is installed on the client, verifying the user information.
After the filebean is successfully started, the filebean interacts with the first Kafka data distribution cluster 20 to perform data transmission.
Grading the log data based on an application scene, wherein the log data comprises application log data and performance log data;
recording service application information through the application log data, monitoring service abnormality based on the service application information, facilitating problem investigation and log tracking by a developer, and finding problems through fast positioning and executing processes of parameter information recorded in a log;
and monitoring system abnormality based on the performance information accessed through the performance log data recording interface. And the subsequent analysis of the performance is convenient.
The log format adopts a unified log processing frame, and the application log and the performance log are automatically processed in the frame, so that service development is not required to pay attention to implementation details, and the method has no invasiveness to the service. The developer only needs to print the application log according to a fixed method.
Each application service records its own log, including application log, standard, biglog, performancelog, nginx, which is recorded under a server specific log path, so that filebean on each server monitors and collects.
Related log path specification/home/eyelog/{ service_name }, each service creates a root directory, 3 subdirectories are placed under each service root directory, and application log is a new application log, corresponding to an application log index in kibana, biglog: gateway log of new version, index to biglog-in kibana, performancelog: the qian eye performance log, corresponding to the performancelog, has been closed;
the method comprises the steps that an nginx access log is located at/home/wwlogs/lower of each server, and indexes in corresponding kibana are ginx-; the old php, nodejs log, under catalog/home/nodeLogs/is divided into 2 categories, common log record is of-out-0.log, and index in corresponding kibana is of standard-out log;
php, nodejs error log is recorded as: -err-0.log, and index in corresponding kibana is standard-errlog-;
the java framework of the method already encapsulates the tool class of the kilo-eye related log. The indices in the corresponding kibana are respectively:
applicationlog-*;standard-out-*;standard-errlog-*;
recording access logs of each station nginx, wherein the logs are stored in an nginx index;
the method comprises the steps that a request log is generally recorded in a gateway layer, and the request log mainly comprises a gateway log and a gateway log of a java mobile api at present, wherein the log is stored in a biglog index;
when searching the log, determining what log to search, locating the index in which the log is located, and developing and testing a common set of test; the pre-sending and the production are commonly used, the corresponding index is selected, the query time range is shortened as much as possible, and the keyword is searched: { filename }: "keyword"; when the query speed is slow, the specific index is precisely: for example, the rule of each index name is application log- { product_line } - { app_name } - { yyyyy.mm.dd }. Log, and the index selects the corresponding product_line and app_name index, which greatly reduces the query scope; the query scope, such as time, server host, etc., is minimized.
S102, log data are transmitted to the first Kafka data distribution cluster 20, and the log data are distributed to the Storm data analysis cluster 30 through the first Kafka data distribution cluster 20;
specifically, kafka is a high-throughput distributed publish-subscribe messaging system (message engine system) that can handle all action flow data of consumers in websites. Such actions (web browsing, searching and other user actions) are a key factor in many social functions on modern networks. These data are typically addressed by processing logs and log aggregations due to throughput requirements. This is a viable solution for log data and offline analysis systems like Hadoop, but with the limitation of requiring real-time processing. The purpose of Kafka is to unify on-line and off-line message processing through the Hadoop parallel loading mechanism, and also to provide real-time messages through the clusters.
System a sends a message to kafka (message engine system) and system B reads a sent message from kafka. Whereas kafka is an intermediate quotient.
A messaging system is responsible for transferring data from one application to another application, and an application only needs to focus on the data, and does not need to focus on how the data is transferred between two or more applications. Distributed messaging is based on reliable message queues to asynchronously transfer messages between client applications and a messaging system. There are two main modes of messaging: point-to-point delivery mode, publish-subscribe mode. Most messaging systems use a publish-subscribe mode. Kafka is a publish-subscribe model.
S103, the Storm data analysis cluster 30 performs stream calculation processing on the received log data to obtain processed log data;
in particular, storm is an open source distributed computing system for processing real-time data streams. The analysis of data in Storm involves mainly the following steps:
define data sources (sources), which are sources of data streams in a Storm, can be any data source, such as Kafka, rabbitMQ, etc. A Spout needs to be defined to read data from the data source.
Define data processing units (Bolts), which are the main units in Storm that process data. You can define one or more Bolts to process the data received from spouses. Bolts can perform any you need operation of filtering, functions, aggregation, connections, database interactions, etc.
Topology is defined, which is a network of spots and Bolts, defining how data flows in the system. You need to define a topology to specify which Bolt receives data from which Spout and how the data passes between Bolts.
The topology is deployed and executed, once defined, it can be deployed and executed on a Storm cluster. Storm will automatically distribute the data and process them.
And (3) storing the processing result into a database according to the requirement information, or visualizing the processing result through a real-time instrument board so as to perform further analysis.
S104, the processed log data is transmitted to the second Kafka data distribution cluster 40, and the processed log data is distributed to the document type storage engine for data storage through the second Kafka data distribution cluster 40.
S105, carrying out graphic processing on the stored processed log data, judging whether the processed log data has abnormal values, and if so, acquiring system problem information based on the abnormal values and the log data.
Specifically, after the calculated performance data is subjected to persistent storage, the performance data can be compared in a graphical mode, and when the performance changes are visually shown through the graphical mode, the change nodes of the performance are quickly found, so that the driving optimization is facilitated.
As shown in fig. 3: initial performance at 18 days 6 is better than 16 days 6, with a sudden time-consuming increase at 2 points in the first red circle, indicating that there must be an event at this point that reduces the performance of the service. At the second red circle, 3 points for 30 minutes, time consuming recovery. It can thus be concluded that there is an event that has an impact on performance during the 2-to 3-point 30 minute period.
As shown in fig. 4, the flow monitoring can perform multi-date comparison, visually sense the flow change through a graphical interface, quickly find the flow peak-valley value, and provide a reference of flow dimension for problem positioning. And meanwhile, the flow prediction method is used for providing data support during the period of large-scale activity, so that the service capacity can be conveniently estimated.
The system provides monitoring of outliers for finding outlier variations. The abnormality is classified into a business abnormality and a system abnormality, and the business abnormality refers to an abnormality which needs to be monitored on a business, such as insufficient inventory, frequent login and the like. System anomalies refer to system-level anomalies, such as network anomalies, service unavailability anomalies, and so forth.
By means of the anomaly monitoring, anomaly changes within a period of time can be quickly found, and by means of anomaly values and combination with logs, system problems can be quickly located.
As shown in fig. 5, it can be found that both system anomalies and business anomalies suddenly increased during the 2:06 to 3:36 period and lasted for 1.5 hours.
And determining the abnormal code corresponding to each abnormal type, and monitoring the abnormal condition corresponding to each abnormal code to determine the abnormal type corresponding to the system problem.
As shown in fig. 6, system and business anomalies can be found by anomaly monitoring, but it is not possible to see what type of anomaly is in particular. Then it is necessary to refine the anomaly type to facilitate finer granularity of anomaly point discovery. Thus providing monitoring of abnormal subdivision. The abnormal distinction can be made according to the abnormal codes, so that the abnormal condition corresponding to each abnormal code is monitored.
S106, sending out alarm prompt information based on the system problem information;
specifically, a sending channel of alarm prompt information is configured, wherein the sending channel comprises a short message prompt, a mail prompt and a micro message prompt;
configuring alarm rules;
the alarm rule includes: the current minute request quantity is larger than a first preset value, and alarming is started; the current system abnormality rate is larger than a second preset value and early warning is started; the current business abnormality rate is larger than a third preset value and early warning is started; the current average execution time is larger than a fourth preset value to start early warning; the current response time is larger than a fifth preset value to start alarming; the current average rate of increase of the minute request is larger than a sixth preset value, and alarming is started; the current response time starts to alarm when the cycle-to-cycle growth rate is larger than a seventh preset value; and the current minute request volume ring rate of increase is larger than an eighth preset value to start alarming. Preferably, the first to eighth preset values are 150%.
The alarm prompt information comprises alarm product line information, alarm application name information, alarm method information, alarm value information, alarm description information and trigger time information.
The control interface can intuitively see the changes and contrast conditions of flow, abnormality and the like, however, the alarm capacity is needed for fast sensing when abnormality occurs. The scheme provides multi-dimensional monitoring alarm rules such as minute request quantity, response time, system abnormality rate, business abnormality rate, 500, 404 abnormality rate, corresponding Zhou Tongbi, ring ratio and the like, and supports flexible configuration rules and notification modes. The notification modes comprise mail, enterprise micro, short message and the like.
According to the problem alarm method of the software system, log data are collected through a filecoat installed at a client; the log data are transmitted to a first Kafka data distribution cluster 20, and the log data are distributed to a Storm data analysis cluster 30 through the first Kafka data distribution cluster 20; the Storm data analysis cluster 30 performs stream computation processing on the received log data to obtain processed log data; transmitting the processed log data to a second Kafka data distribution cluster 40, and distributing the processed log data to a document type storage engine for data storage through the second Kafka data distribution cluster 40; carrying out graphical processing on the stored processed log data, judging whether the processed log data has an abnormal value, and if so, acquiring system problem information based on the abnormal value and the log data; and sending out alarm prompt information based on the system problem information. The method solves the problem that faults in the running process of the software system cannot be found, positioned and solved quickly in the prior art.
FIG. 2 is a flow chart of an embodiment of a problem alert system of the software system of the present invention; as shown in fig. 2, the problem alarm system of a software system provided by the embodiment of the invention includes the following steps:
the filecoat module 10 is installed at the client and used for collecting log data;
a first Kafka data distribution cluster 20 for distributing said log data to a Storm data analysis cluster 30;
a Storm data analysis cluster 30 for performing a stream computation process on the received log data to obtain processed log data;
a second Kafka data distribution cluster 40 for distributing the processed log data to a document storage engine for data storage;
an outlier obtaining module 50, configured to graphically process the stored processed log data, determine whether the processed log data has an outlier, and if yes, obtain system problem information based on the outlier and the log data;
the alarm prompting module 60 is configured to issue alarm prompting information based on the system problem information.
The filebean module 10 is further configured to:
acquiring user information, and writing the user information and log acquisition parameters configured by the user into a filecoat default configuration file;
and when the filebean is installed on the client, verifying the user information.
After the filebean is successfully started, the filebean interacts with the first Kafka data distribution cluster 20 to perform data transmission.
Grading the log data based on an application scene, wherein the log data comprises application log data and performance log data;
recording service application information through the application log data, and monitoring service abnormality based on the service application information;
and monitoring system abnormality based on the performance information accessed through the performance log data recording interface.
The alarm prompting module 60 is further configured to:
configuring alarm rules;
the alarm rule includes: the current minute request quantity is larger than a first preset value, and alarming is started;
the current system abnormality rate is larger than a second preset value and early warning is started;
the current business abnormality rate is larger than a third preset value and early warning is started;
the current average execution time is larger than a fourth preset value to start early warning;
the current response time is larger than a fifth preset value to start alarming;
the current average rate of increase of the minute request is larger than a sixth preset value, and alarming is started;
the current response time starts to alarm when the cycle-to-cycle growth rate is larger than a seventh preset value;
and the current minute request volume ring rate of increase is larger than an eighth preset value to start alarming.
And determining the abnormal code corresponding to each abnormal type, and monitoring the abnormal condition corresponding to each abnormal code to determine the abnormal type corresponding to the system problem.
The alarm prompting module 60 is further configured to;
configuring a sending channel of alarm prompt information, wherein the sending channel comprises a short message prompt, a mail prompt and a WeChat prompt;
the alarm prompt information comprises alarm product line information, alarm application name information, alarm method information, alarm value information, alarm description information and trigger time information.
According to the problem alarm system of the software system, log data are collected through a filecoat module 10 installed at a client; distributing the log data to a Storm data analysis cluster 30 by a first Kafka data distribution cluster 20; performing streaming calculation processing on the received log data through a Storm data analysis cluster 30 to obtain processed log data; distributing the processed log data to a document type storage engine through a second Kafka data distribution cluster 40 for data storage; graphically processing the stored processed log data through an abnormal value acquisition module 50, judging whether the processed log data has an abnormal value, and if so, acquiring system problem information based on the abnormal value and the log data; an alarm prompt message is issued by the alarm prompt module 60 based on the system problem information. The problem alarm method of the software system solves the problem that faults in the running process of the software system cannot be found, positioned and solved rapidly in the prior art.
Fig. 7 is a schematic diagram of an entity structure of an electronic device according to an embodiment of the present invention, as shown in fig. 7, an electronic device 70 includes: a processor 701, a memory 702, and a bus 703;
wherein, the processor 701 and the memory 702 complete communication with each other through the bus 703;
the processor 701 is configured to invoke program instructions in the memory 702 to perform the methods provided by the above-described method embodiments, for example, including: collecting log data through a filecoat installed at a client; the log data are transmitted to a first Kafka data distribution cluster 20, and the log data are distributed to a Storm data analysis cluster 30 through the first Kafka data distribution cluster 20; the Storm data analysis cluster 30 performs stream computation processing on the received log data to obtain processed log data; transmitting the processed log data to a second Kafka data distribution cluster 40, and distributing the processed log data to a document type storage engine for data storage through the second Kafka data distribution cluster 40; carrying out graphical processing on the stored processed log data, judging whether the processed log data has an abnormal value, and if so, acquiring system problem information based on the abnormal value and the log data; and sending out alarm prompt information based on the system problem information.
The present embodiment provides a non-transitory computer readable storage medium storing computer instructions that cause a computer to perform the methods provided by the above-described method embodiments, for example, including: collecting log data through a filecoat installed at a client; the log data are transmitted to a first Kafka data distribution cluster, and the log data are distributed to a Storm data analysis cluster through the first Kafka data distribution cluster; the Storm data analysis cluster performs stream computation processing on the received log data to obtain processed log data; transmitting the processed log data to a second Kafka data distribution cluster, and distributing the processed log data to a document type storage engine for data storage through the second Kafka data distribution cluster; carrying out graphical processing on the stored processed log data, judging whether the processed log data has an abnormal value, and if so, acquiring system problem information based on the abnormal value and the log data; and sending out alarm prompt information based on the system problem information.
Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware associated with program instructions, where the foregoing program may be stored in a computer readable storage medium, and when executed, the program performs steps including the above method embodiments; and the aforementioned storage medium includes: various storage media such as ROM, RAM, magnetic or optical disks may store program code.
The apparatus embodiments described above are merely illustrative, wherein elements illustrated as separate elements may or may not be physically separate, and elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on such understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the embodiments or the methods of some parts of the embodiments.
While the invention has been described in detail in the foregoing general description and specific examples, it will be apparent to those skilled in the art that modifications and improvements can be made thereto. Accordingly, such modifications or improvements may be made without departing from the spirit of the invention and are intended to be within the scope of the invention as claimed.
Claims (10)
1. The problem alarming method of the software system is characterized by comprising the following steps:
collecting log data through a filecoat installed at a client;
the log data are transmitted to a first Kafka data distribution cluster, and the log data are distributed to a Storm data analysis cluster through the first Kafka data distribution cluster;
the Storm data analysis cluster performs stream computation processing on the received log data to obtain processed log data;
transmitting the processed log data to a second Kafka data distribution cluster, and distributing the processed log data to a document type storage engine for data storage through the second Kafka data distribution cluster;
carrying out graphical processing on the stored processed log data, judging whether the processed log data has an abnormal value, and if so, acquiring system problem information based on the abnormal value and the log data;
and sending out alarm prompt information based on the system problem information.
2. The method of claim 1, wherein the collecting log data by a filebean installed at a client comprises:
acquiring user information, and writing the user information and log acquisition parameters configured by the user into a filecoat default configuration file;
and when the filebean is installed on the client, verifying the user information.
3. The method for alarming problems in a software system according to claim 1, wherein the collecting log data by a filebean installed at a client side further comprises:
and after the filecoat is successfully started, carrying user information to interact with the first Kafka data distribution cluster so as to carry out data transmission.
4. The method for alarming problems in a software system according to claim 1, wherein the collecting log data by a filebean installed at a client side further comprises:
grading the log data based on an application scene, wherein the log data comprises application log data and performance log data;
recording service application information through the application log data, and monitoring service abnormality based on the service application information;
and monitoring system abnormality based on the performance information accessed through the performance log data recording interface.
5. The method of claim 4, wherein graphically processing the stored processed log data to determine whether an outlier exists in the processed log data, and if so, obtaining system problem information based on the outlier and the log data, comprises:
and determining the abnormal code corresponding to each abnormal type, and monitoring the abnormal condition corresponding to each abnormal code to determine the abnormal type corresponding to the system problem.
6. The method of claim 1, wherein the sending an alarm prompt message based on the system problem information comprises:
configuring alarm rules;
the alarm rule includes: the current minute request quantity is larger than a first preset value, and alarming is started;
the current system abnormality rate is larger than a second preset value and early warning is started;
the current business abnormality rate is larger than a third preset value and early warning is started;
the current average execution time is larger than a fourth preset value to start early warning;
the current response time is larger than a fifth preset value to start alarming;
the current average rate of increase of the minute request is larger than a sixth preset value, and alarming is started;
the current response time starts to alarm when the cycle-to-cycle growth rate is larger than a seventh preset value;
and the current minute request volume ring rate of increase is larger than an eighth preset value to start alarming.
7. The method of claim 6, wherein the sending an alarm prompt message based on the system problem information, further comprises:
configuring a sending channel of alarm prompt information, wherein the sending channel comprises a short message prompt, a mail prompt and a WeChat prompt;
the alarm prompt information comprises alarm product line information, alarm application name information, alarm method information, alarm value information, alarm description information and trigger time information.
8. A problem alert system for a software system, comprising:
the filecoat module is arranged at the client and used for collecting log data;
a first Kafka data distribution cluster for distributing the log data to a Storm data analysis cluster;
the Storm data analysis cluster is used for carrying out stream computation processing on the received log data to obtain processed log data;
the second Kafka data distribution cluster is used for distributing the processed log data to a document type storage engine for data storage;
the abnormal value acquisition module is used for carrying out graphic processing on the stored processed log data, judging whether the processed log data has abnormal values or not, and if so, acquiring system problem information based on the abnormal values and the log data;
and the alarm prompt module is used for sending alarm prompt information based on the system problem information.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any one of claims 1 to 7 when the computer program is executed.
10. A non-transitory computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method according to any of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310895691.5A CN116991661A (en) | 2023-07-20 | 2023-07-20 | Problem alarm system and method for software system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310895691.5A CN116991661A (en) | 2023-07-20 | 2023-07-20 | Problem alarm system and method for software system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116991661A true CN116991661A (en) | 2023-11-03 |
Family
ID=88527710
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310895691.5A Pending CN116991661A (en) | 2023-07-20 | 2023-07-20 | Problem alarm system and method for software system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116991661A (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110309030A (en) * | 2019-07-05 | 2019-10-08 | 亿玛创新网络(天津)有限公司 | Log analysis monitoring system and method based on ELK and Zabbix |
CN110347716A (en) * | 2019-05-27 | 2019-10-18 | 中国平安人寿保险股份有限公司 | Daily record data processing method, device, terminal and storage medium |
CN113157545A (en) * | 2021-05-20 | 2021-07-23 | 京东方科技集团股份有限公司 | Method, device and equipment for processing service log and storage medium |
US20220309053A1 (en) * | 2021-06-25 | 2022-09-29 | Apollo Intelligent Connectivity (Beijing) Technology Co., Ltd. | Method and apparatus of auditing log, electronic device, and medium |
CN116414795A (en) * | 2023-04-04 | 2023-07-11 | 中国民航信息网络股份有限公司 | Ticket data processing method and device |
-
2023
- 2023-07-20 CN CN202310895691.5A patent/CN116991661A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110347716A (en) * | 2019-05-27 | 2019-10-18 | 中国平安人寿保险股份有限公司 | Daily record data processing method, device, terminal and storage medium |
CN110309030A (en) * | 2019-07-05 | 2019-10-08 | 亿玛创新网络(天津)有限公司 | Log analysis monitoring system and method based on ELK and Zabbix |
CN113157545A (en) * | 2021-05-20 | 2021-07-23 | 京东方科技集团股份有限公司 | Method, device and equipment for processing service log and storage medium |
US20220309053A1 (en) * | 2021-06-25 | 2022-09-29 | Apollo Intelligent Connectivity (Beijing) Technology Co., Ltd. | Method and apparatus of auditing log, electronic device, and medium |
CN116414795A (en) * | 2023-04-04 | 2023-07-11 | 中国民航信息网络股份有限公司 | Ticket data processing method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110661659B (en) | Alarm method, device and system and electronic equipment | |
CN108365985A (en) | A kind of cluster management method, device, terminal device and storage medium | |
US20190235941A1 (en) | Self-monitor for computing devices of a distributed computing system | |
CN110309030A (en) | Log analysis monitoring system and method based on ELK and Zabbix | |
CN110888783A (en) | Monitoring method and device of micro-service system and electronic equipment | |
CN106940677A (en) | One kind application daily record data alarm method and device | |
CN110750426A (en) | Service state monitoring method and device, electronic equipment and readable storage medium | |
CN111782477B (en) | Abnormal log monitoring method and device, computer equipment and storage medium | |
CN114124655A (en) | Network monitoring method, system, device, computer equipment and storage medium | |
CN113220534A (en) | Cluster multi-dimensional anomaly monitoring method, device, equipment and storage medium | |
CN116991661A (en) | Problem alarm system and method for software system | |
CN116594840A (en) | Log fault acquisition and analysis method, system, equipment and medium based on ELK | |
CN116566873A (en) | ELK-based automatic log analysis method, system and storage medium | |
CN116431324A (en) | Edge system based on Kafka high concurrency data acquisition and distribution | |
NL2030719B1 (en) | Microservice application observability system | |
CN114866606A (en) | Micro-service management system | |
CN115391286A (en) | Link tracking data management method, device, equipment and storage medium | |
CN113765717A (en) | Operation and maintenance management system based on secret-related special computing platform | |
Yuan et al. | Design and implementation of accelerator control monitoring system | |
CN110896545B (en) | Online charging roaming fault positioning method, related device and storage medium | |
CN112260902A (en) | Network equipment monitoring method, device, equipment and storage medium | |
CN112131077A (en) | Fault node positioning method and device and database cluster system | |
CN114090382B (en) | Health inspection method and device for super-converged cluster | |
CN116431872B (en) | Observable system and service observing method based on observable system | |
US20240077866A1 (en) | Information management apparatus, information management method, and computer-readable recording medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |