CN113254309B - Active early warning system and method for errors of service system - Google Patents

Active early warning system and method for errors of service system Download PDF

Info

Publication number
CN113254309B
CN113254309B CN202110571651.6A CN202110571651A CN113254309B CN 113254309 B CN113254309 B CN 113254309B CN 202110571651 A CN202110571651 A CN 202110571651A CN 113254309 B CN113254309 B CN 113254309B
Authority
CN
China
Prior art keywords
log
link
error
current
client
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110571651.6A
Other languages
Chinese (zh)
Other versions
CN113254309A (en
Inventor
叶荔姗
施建安
林斌
孙志伟
林静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Yilianzhong Yihui Technology Co ltd
Original Assignee
Xiamen Yilianzhong Yihui Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Yilianzhong Yihui Technology Co ltd filed Critical Xiamen Yilianzhong Yihui Technology Co ltd
Priority to CN202110571651.6A priority Critical patent/CN113254309B/en
Publication of CN113254309A publication Critical patent/CN113254309A/en
Application granted granted Critical
Publication of CN113254309B publication Critical patent/CN113254309B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • G06F11/3072Monitoring arrangements determined by the means or processing involved in reporting the monitored data where the reporting involves data filtering, e.g. pattern matching, time or event triggered, adaptive or policy-based reporting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • G06F11/3068Monitoring arrangements determined by the means or processing involved in reporting the monitored data where the reporting involves data format conversion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1805Append-only file systems, e.g. using logs or journals to store data
    • G06F16/1815Journaling file systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Debugging And Monitoring (AREA)
  • Computer And Data Communications (AREA)

Abstract

The embodiment of the invention provides a system and a method for actively early warning errors of a service system, wherein the system comprises the following steps: the client is used for creating a tracking number for each session when the session starts, associating the tracking number with a log output by the session, sequentially and serially connecting complete logs of the session in the whole request link to form a link log, completely extracting the link log of the current session by taking the error code position of the current error log as a cut-off point when detecting that the error log is output by a service system, and outputting the link log to the server to send the link log to the server; and the server is used for receiving and storing the link logs which are uploaded by the client and are based on the tracking numbers in series, performing error classification according to the error triggering code positions in the link logs, and performing different logic processing according to different error grades. The invention can discover the service error generated by actively discovering and reporting the service system.

Description

Active early warning system and method for errors of service system
Technical Field
The invention relates to the technical field of computers, in particular to a system and a method for actively early warning errors of a service system.
Background
In order to ensure the stable operation of the service system, hardware environment information, software context, service processing information, and the like of the service system need to be monitored to know the instant operation status of the service system, so as to determine whether the system operates stably.
The monitoring and analysis can be divided into instant data display and post data collection according to different dimensions. The instant data display mainly comprises hardware environment information acquisition, system access link health degree check and the like. And the posterior data collection is provided with an ELK log collection platform solution and the like. Among them, log analysis is a very important field in monitoring systems. By storing and analyzing the log, the operation details of the system can be known. After the system has error expression, the service parameters, the processing flow, the time sequence trend and the final fault information of a fault site can be known through log analysis.
For a business system, common monitoring and information collecting means mainly include:
environmental resource monitoring system: typical representatives of this type of system are Zabbix and Prometheus. Such systems mainly monitor hardware resources, which are commonly referred to as CPU utilization, memory utilization, port conditions, disk utilization, disk occupancy, and the like.
Micro-service calls the link trace class: such systems have skywalk, which complies with the opentraining standard, and have a proprietary PinPoint. This type of system is used to track link relationships in distributed invocations. The time-consuming situation of each link section, downstream on the link, can be better viewed.
Log collection and classification system: a typical representation of such systems is the ELK system. The system mainly collects the log information on the disk file to a central end to realize unified storage and online retrieval.
In the operation process of the business system, the abnormal conditions are roughly classified into several categories:
a service system node is down;
sporadic service exception of the service system;
the whole service of the business system is not available to the outside;
for the downtime situation, the environmental resource monitoring system may detect a large decrease in resource service consumption, such as a decrease in CPU utilization or a decrease in memory occupancy. The microservice call link trace type system can see that the number of real-time call requests is greatly reduced for further checking. But cannot be discovered immediately. The log collection system cannot respond to the situation and give an early warning because the log collection system does not have analysis capability.
For sporadic service abnormality of a service system, the environmental resource monitoring system monitors the use of hardware resources, and the service abnormality has no influence on the use of the resources, so that the system cannot detect the problem or even discover the problem. For the micro-service call link tracking system, the sporadic service failure is only represented by one or two times of request interruption or service error return. But is still intact on the call link. Such system traffic is therefore not discoverable and locatable. For the log collection system, the error of the service is often recorded in the log. But the system of log classification only plays a storage role, so that problems can be found if special review is carried out manually afterwards. But the system itself cannot be found immediately and cannot be warned early.
For the case that the service of the whole business system is not available to the outside. At this time, the service system is still running continuously, and is only unavailable for external services, and the consumption on hardware is equivalent. So the hardware monitoring system can not find the problem. For the micro-service call link tracking system, the whole system is unavailable, the number of requests is greatly reduced at the moment, an alarm can be given, and a problem source cannot be located. For the log collection system, since only log collection is performed, it is not aware of this situation.
It can be seen that, in the second case, i.e., sporadic service abnormality, the three systems cannot realize timely discovery and effective positioning. The sporadic service abnormality is the most frequent condition, and the sporadic service abnormality does not affect the stability of the service, but affects the user experience of the service. Since the occasional abnormal requests account for a small proportion of the total requests, the effect is poor even if manual backtracking is performed afterwards through the log aggregation system.
Disclosure of Invention
In view of the above, the present invention provides a system and a method for actively warning a business system error to solve the above problems.
The embodiment of the invention provides a business system error active early warning system, which comprises a client used for data acquisition and a server used for analysis and warning, wherein the server comprises:
the client is used for creating a tracking number for each session when the session starts, associating the tracking number with a log output by the session, sequentially and serially connecting complete logs of the session in the whole request link to form a link log, completely extracting the link log of the current session by taking the error code position of the current error log as an interception point when detecting that the service system outputs an error log, and outputting the link log to the server to send the link log to the server;
and the server is used for receiving and storing the link logs which are uploaded by the client and are based on the tracking numbers in series, performing error classification according to the error triggering code positions in the link logs, and performing different logic processing according to different error grades.
Preferably, the client includes: the log interface agent layer, the hot update layer and the function realization layer, wherein:
the log interface agent layer is used for decorating the log output device in the service system, so that access in the service system and output of logs are realized;
the hot updating layer is used for providing a transition bridge between the log interface agent layer and the function realizing layer to realize the hot updating function realizing layer in operation;
and the function realization layer is used for establishing a tracking number for each session, connecting logs corresponding to the session in series based on the tracking number, realizing complete storage of the link logs, and extracting the complete link logs for uploading when the error logs are output.
Preferably, the thermal renewal layer is specifically configured to:
receiving the current latest version number carried in the heartbeat response;
judging whether the latest version number prints a local version number or not;
if so, downloading the jar package file from the specified interface of the server to the specified path of the client; the downloading path takes the root path as a starting point and takes the latest version number as the name of the folder;
creating a jar file package in a URLClasLoader instance loading folder;
according to the class name in the latest version information, loading out the corresponding class by using a ClassLoader, and instantiating the object of the class by reflection;
directing interface variables in the log interface proxy layer to the newly instantiated object;
and calling the shutdown method of the original object to close the resource.
Preferably, the function implementation layer is specifically configured to:
acquiring current parameters; the current parameters comprise a tracking number and log contents;
acquiring a log object of a current thread;
judging whether the log object exists or not;
if the log object does not exist, initializing a log object by using the tracking number, adding the log content into a queue of the log object, and setting the log object into the current thread;
if yes, judging whether the tracking number in the log object is consistent with the tracking number of the current access participant;
if not, initializing a log object by using the tracking number, adding the log content into a queue of the log object, and setting the log object into the current thread;
and if the log objects are consistent, adding the log contents into the queue of the log object.
Preferably, the function implementation layer is further configured to:
acquiring and counting the total request number of a service system, and reporting the total request number to a server, so that the server can speculate and analyze the operation condition according to the error in the total request number; counting the total request number is realized by adding 1 to each request entering through a counter by an interceptor of Http; the reporting of the total number of requests is triggered by the timing task.
Preferably, the server includes:
the data uniform receiving layer is used for arranging the data sent by the client into a uniform format;
the data stream distribution processing coordination layer is used for constructing a data processing production line and promoting the data processing of the production line;
the processing module layer is used for coordinating the corresponding modules to process the data according to the processing sequence so as to realize the pipeline processing effect; the processing module layer comprises a heartbeat instruction processor, a monitoring instruction processor, a link log classifying and storing processor, a link log regularizing processor and a link log alarm processor.
Preferably, the link log categorizing and storing has means for:
aggregating the link logs by the error code positions and then intensively displaying; the link log reported by the client carries error code position information, and the error code position information is used as an error position identifier; after the identification is separated, the identification is directly written into the storage, and thus, during subsequent query, information is queried in groups according to the error position;
storing the link log through two tables, wherein the two tables comprise a current unprocessed link log table and a processing record table; the link log reported by the client is stored in the current unprocessed link log table, and when the log of the current unprocessed link log table is fed back and processed, the log information is moved from the current unprocessed link log table to a processing record table; wherein the queries of the two tables are separate; in the current unprocessed link log table, the link logs are not aggregated in advance, and the grouping effect displayed on the service-side interface is obtained by grouping and querying immediately during querying.
Preferably, the link log regularization processor is specifically configured to:
receiving matched text information input by an operator;
according to the input text information, taking the error code position as the aggregated link log group to carry out link log matching;
and when the text information is correctly matched, marking the corresponding link log information as the corresponding processing state and processing result.
Preferably, the link log alarm processor is specifically configured to:
selecting and triggering an alarm rule according to an error log registered in a link log and error code position information; wherein, the alarm rule is divided into:
artificially set alarm rules;
alarm rules according to priority;
alarm rules depending on the code location.
The embodiment of the invention also provides a method for actively early warning the errors of the service system, which comprises the following steps:
when each session starts, a client creates a tracking number for the session, and associates the tracking number with a log output by the session, so that complete logs of the session in the whole request link are sequentially connected in series to form a link log;
when the client detects that the service system outputs the error log, taking the error code position of the current error log as a cut-off point, completely extracting the link log of the current session, and outputting the link log to the server to send the link log to the server;
and the server receives and stores the link log which is uploaded by the client and is based on the tracking number in series, performs error classification according to the error triggering code position in the link log, and performs different logic processing according to different error grades.
In summary, the active early warning system and method for the error of the service system in this embodiment have at least one of the following beneficial effects:
(1) can discover the service error generated by the service system
(2) The method can actively report the service error generated by the service instead of finding the service error by searching afterwards
(3) Can remind the developer of the specific application based on the business error
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
Fig. 1 is a schematic structural diagram of a service system error active warning system according to a first embodiment of the present invention.
Fig. 2 is a block diagram of a client.
Fig. 3 is a schematic diagram of the operation of the thermal update layer.
Fig. 4 is a schematic diagram of the function implementation layer binding the tracking number and the log object.
Fig. 5 is a block diagram of the server.
FIG. 6 is a schematic diagram of an interface for batch processing at the processing module level.
FIG. 7 is a schematic diagram of the operation of the link log alarm processor.
Fig. 8 is a flowchart illustrating a method for actively warning a business system error according to a second embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
For better understanding of the technical solutions of the present invention, the following detailed descriptions of the embodiments of the present invention are provided with reference to the accompanying drawings.
It should be understood that the described embodiments are only some embodiments of the invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the examples of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
The invention is described in further detail below with reference to the following detailed description and accompanying drawings:
referring to fig. 1, a first embodiment of the present invention provides a business system error active warning system, which includes a client 100 for data acquisition and a server 200 for analysis and warning, wherein:
the client 100 is configured to create a tracking number for each session when each session starts, and associate the tracking number with a log output by the session, so as to sequentially connect complete logs of the session in the whole request link in series to form a link log, and when it is detected that an error log is output by a service system, completely extract the link log of the current session with an error code position of the current error log as a cut-off point, and output the link log to the server and send the link log to the server 200.
In this embodiment, the client 100 provides the service system with access in the form of SDK. Most business systems use a logging component for logging output as needed. When a service system has a service error, an error log is often output. Therefore, the occurrence of the error log can be used as a positioning anchor point of the service error. Based on the anchor point, all log information needs to be traced forward. In general, logs of different links are distinguished from each other so that they do not get mixed together when the output is printed. At the start of each request, a trace number is created, ordered traceId. The traceId is always associated with the log that needs to be output during the complete process of a request. Thus, with a globally unique traceId, complete logs in a request link can be concatenated together one after the other (the log output of a request is always first-in-first-out). Therefore, when an error occurs, the error log of the currently requested complete link is extracted, and the required practice is clear. That is, the service logs are extracted from the start of traceId creation to the time of an error, during which all logs created by the request are the current complete link logs.
Specifically, as shown in fig. 2, in this embodiment, the client 100 includes: a log interface agent layer 110, a hot update layer 120, and a function implementation layer 130, wherein:
the log interface agent layer 110 is configured to decorate itself as a log outputter in the service system, so as to implement access in the service system and output of a log.
In this embodiment, in order to access the service system seamlessly, the client 100 needs to implement a log outputter interface, so as to represent itself as a log outputter, and thus, the access can be implemented only by configuring a corresponding log outputter in the service system. At present, most systems adopt a mode of adding a log on a door surface. The facade log is generally the SLF4J framework, and the interfaces of the corresponding log exporters of different log implementations are different, but the principle is the same. Take the default log in the SpringBoot to realize the frame Logback as an example.
And (3) customizing a log output device, namely an appendix class needing to customize a log output. According to the interface requirement of the Logback, a class is required to be customized, an AppendeBase < ILoggingEvent > base class is inherited, and an ap-pend method of the class is realized. When a corresponding log needs to be output, the Logback calls an depend method of the log class and transmits the log content to the method. The actual processing, such as writing to disk, or client logic, is embodied in this method.
And the hot update layer 120 is used for providing a transition bridge between the log interface agent layer and the function implementation layer to implement a hot update function implementation layer in operation.
In this embodiment, during the operation of the client 100, if the function of the client needs to be updated, the client needs to update the code after the service is stopped, and reissue the code to see the new effect. But business service downtime updates due to clients are unacceptable. Therefore, specific functions of the client need to support hot upgrade in the running process. The basic principle of realizing hot upgrade is to replace different specific implementations under the same interface. According to the idea, firstly, a specific implementation interface of the service function needs to be defined. Consider that the external entries are fixed, i.e., the entry of log content. This interface can be defined in the form:
Figure BDA0003082797940000101
the definition of the interface requires 2 methods. The dorall method is used to process the log content, i.e. the actual service execution logic. The shutdown method is because in consideration of each implementation, some resources may be additionally turned on, and then the resources that have been previously applied need to be turned off before replacement. And calling the shutdown method originally realized before replacement becomes a trigger for closing the resources. After the definition is completed, the problem of how to download and replace is solved. Java provides URLClasLoader, which can load Jar packages on a specified path and load corresponding class files from inside. Based on the method, the interface implementation of the far end can be packaged into a jar packet, and the corresponding version number information is stored at the far end together. The server can attach the latest client version in the heartbeat response to the client. And once the version is found to be more updated than the current version, the latest interface implementation is directly downloaded to realize replacement.
And the function implementation layer 130 is configured to create a tracking number for each session, serially connect logs corresponding to the session based on the tracking number, implement complete storage of a link log, and extract the complete link log for uploading when an error log is output.
In this embodiment, at the beginning of each session, the tracking number, i.e. traceId, needs to be marked for the current session first. the traceId may be such that in the case of concurrent log output, the logs of one full session are concatenated together by the traceId for reading and analysis. It is clear that all outgoing logs need to carry the upper traceId in one session. This functionality may be implemented through the MDC mechanism provided by the log framework. In a system, threads are usually multiplexed, and a session is a logical concept that is "created" every time a request is made. In order to trace back the whole link log when a problem occurs, it is necessary to store all logs occurring in the session in the current session. To avoid concurrency, it is actually stored in the thread variables. Some sessions are started with explicit information, such as an Http request, and the marking of the session by the traceId can be implemented by the implementation form of the interceptor using the MDC mechanism. And may also purge the current MDC of data when the request ends. Therefore, the tidiness of the data in the thread variable is guaranteed. But there is also a case where the session is opened by a session initiation triggered by a user manually setting the traceId. In this case, considering the case where the developer may not actively clear the traceId, the session log stored in the current thread needs to be cleared all the time the new traceId is created next time.
In order to trace back the complete log information of the current session when a problem occurs, the log needs to be bound with the current session. As shown in fig. 4, the session is a logical concept, and the log is actually required to be stored. Typically, the session is consistent with the current thread request. Therefore, the log content can be continuously appended to the current thread. In order to avoid the OOM that is caused by the error of the service development and the log is added continuously, it is necessary to control the capacity of the queue to which the log is added. Considering that the latest log information is obviously more useful, the type of data structure in which the log should be stored should be selected as a queue, implementing first-in first-out.
In order to store the log in the current thread, thread local needs to be used to implement the corresponding storage. To compare if a traceId already exists for the current thread at the time the session initially created the traceId, the data result of this stored log should contain the immutable traceId.
In this embodiment, the function implementation layer 130 is further configured to:
acquiring and counting the total request number of a service system, and reporting the total request number to a server 200, so that the server 200 can speculate and analyze the operation condition according to the error in the total request number; counting the total number of requests is realized by adding 1 to each request entry through a counter by an interceptor of Http; the reporting of the total number of requests is triggered by the timing task.
In order to conveniently know the overall situation of the system, the total request number of the service system needs to be known. There is a guess and analysis of the operation by wrong ratio in the request number.
The reporting of the total number of requests can be divided into two steps:
statistics of request totals
Reporting of total number of requests
The total number of requests generally refers to the access statistics of the external system to the system. Then the counter may be incremented by 1 for each request entry by the interceptor of Http. To support the concurrent case, the counter needs to be of the AtomicInteger type.
The reporting of the total number of the requests is triggered by the timing task. At this time, the value that can be reported needs to be obtained from the counter, and then, after the reporting is successful, the value is subtracted from the counter.
The server 200 is configured to receive and store a link log uploaded by a client and based on the serial connection of the tracking numbers, perform error classification according to a code position triggered by an error in the link log, and perform different logic processing according to different error grades.
In this embodiment, a specific architecture of the server 200 is shown in fig. 5, and includes a data unified receiving layer 210, a data pipeline distribution processing coordination layer 220, and a processing module layer 230;
the data uniform receiving layer 210 is configured to arrange the data sent by the client into a uniform format, so as to facilitate subsequent processing.
The data pipeline distribution processing coordination layer 220 is used for constructing a data processing pipeline and promoting the data processing of the pipeline.
The data stream distribution processing coordination layer 220 is essentially a pipeline in a chain of responsibility mode, and when the system is started, all processors in the system are loaded, and are sequenced according to the sequence, and the information sent by the processors is processed and processed. To support this model, the processor and pipeline need to be defined separately.
First, a processor is defined as follows:
Figure BDA0003082797940000131
wherein, DataHandler is the interface definition of the pipeline processor, and data is the service parameter to be processed. And the Invoker is a completion pipeline, and an object is injected by the business system, wherein the Invoker object represents the next pipeline instance for processing data. The definition of the pipeline is a series of invoke calls, so the definition of the pipeline can be as follows:
interface Pipeline{
void process(Object data);
}
pipeline needs to scan the processors existing in the current system at the time of initialization, determine the sequence of each processor by using the order method return value of the processors, and then generate an explorer instance for each processor, and serially connect the explorer instances to form a Pipeline. The relationship between Pipeline, explorer and DataHandler can be shown in fig. 6, and according to the relationship diagram, the initialization process of Pipeline can be formulated as shown in fig. 7.
And the processing module layer 230 is used for coordinating corresponding modules to process data according to the processing sequence, so as to realize a pipeline.
The processing module layer 230 includes a heartbeat instruction process 231, a monitoring instruction process 232, a link log classifying and storing processor 233, a link log regularizing processor 234 and a link log alarm processor 235.
Heartbeat instruction processor 231
The heartbeat instruction processor 231 is the simplest one. The processor acquires the current all application and application instance information when the system is started. And maintains a complete mapping table of application instances and heartbeat times in memory based on this information. When receiving the heartbeat information sent by the client 100, according to the application name and the application instance identifier, obtaining the heartbeat state object of the instance, and updating the information therein, including the online state and the heartbeat information. Meanwhile, the background is provided with a thread to scan the online state information of all the instances at regular time, and if the distance of the heartbeat information of the instance exceeds the heartbeat interval threshold value currently, the state of the instance is updated to be an offline state.
Monitor instruction processor 232
The implementation logic of the monitor instruction processor 232 and the heartbeat instruction processor are essentially the same. Similarly, all current applications are acquired at the time of system startup to form an application list. Unlike the heartbeat instruction processor 231, the monitoring instruction processor 232 does not focus on instance information, only on applications. When receiving the monitoring report reported by the client 100, the corresponding application monitoring state is found from the application list according to the application name, and the information in the application monitoring state is updated.
Link log sorting and storage processor 233
The client 100 reports the error information when collecting the error information, and the server 200 aggregates the error positions and then displays the error information in a centralized manner instead of using a single error link log as the display information. The advantages of centralized display after aggregation in wrong positions are mainly as follows:
the different error distributions can be macroscopically distinguished by the error categories
The error times of different error types can be seen
The common errors are displayed in a centralized manner, batch processing is facilitated, and the error positions are used as classification marks, so that the error positions need to be separated from the reported logs at first. Because the reported log information carries the code position information output by the log, the information can be used as the error position identification. When the mark is separated, only direct writing storage is needed. And when in subsequent inquiry, the information can be inquired in groups according to the error position.
Considering that error log information is generated continuously over time, if all data is stored in a table (considering using a relational database as a storage means), subsequent query pressure and query delay must be increased continuously over time. To solve this problem, the storage of the error link log needs to be divided into two tables:
current unprocessed link log table
Processing a record sheet
The link log information reported by the client 100 is stored in the "currently unprocessed link log table", and when the responsible person of the corresponding application feeds back and processes the specific log, the log information is moved from the "currently unprocessed link log table" to the "processing record table".
The queries for the two tables are separate. In the 'current unprocessed link log table', the link logs are not aggregated in advance, and the grouping effect displayed on the interface is obtained by grouping and querying immediately during querying. Such queries are more stressful on the database, but given that the data in this table is relatively small, it is controlled to a reasonable level (which depends on the processing interval and processing speed). Thus still keeping the query performance itself in a reasonable interval.
When querying the "processing record table", the data in the table are aggregated in advance, that is, the query for the table only needs to be simply queried. If the record deep digging is needed, the processing record is inquired about the corresponding details, and the link log information corresponding to the processing record can be found through the processing record identification.
Link log regularization processor 234
During the operation of the system, errors of the same kind may accumulate a lot of errors. In general, data errors, external system response anomalies, and the like occur in a system, and errors often repeat within a period of time. More similar or identical error link log information is accumulated. If these similar or identical information are handled manually one by one, the efficiency is low, especially in the presence of anomalies due to historical error data. Because of such anomalies, often thousands, it is clearly undesirable to manually process one by one. Therefore, the platform needs to provide a batch processing mode, which can implement processing of the link logs in batch, but at the same time, the content that does not belong to the processing category is not processed in error. The platform provides a way to do batch processing based on regularization.
On the front-end page, a batch processing function is provided, as shown in fig. 6, an operator inputs matching text information, and according to the input text information, link log matching is performed in the link log grouping with the error position as aggregation this time. And if the text mode is matched correctly, marking the corresponding link log information as the corresponding processing state and processing result. The method realizes the rapid batch processing, and can avoid possible wrong result processing caused by directly performing batch processing on the packet logs.
Link log alarm 235
When the central alarm end receives the error link log sent by the client, the alarm rule needs to be selected and triggered according to the error log registration of the error link, the error code position and other information. The rules of the alarm can be divided into:
artificially set alarm rules
Alarm rules based on priority
Alarm rules depending on code location
Three different priority rules, the specific execution flow can be as shown in fig. 7.
In summary, the active early warning system for the service system error of the embodiment has the following beneficial effects:
(1) can discover the service error generated by the service system
(2) The method can actively report the service error generated by the service instead of searching and finding the result after the fact
(3) Can remind the developer of the specific application based on the business error
Referring to fig. 8, a second embodiment of the present invention further provides a method for actively warning a service system error, which includes:
s201, when each session starts, a client creates a tracking number for the session, and associates the tracking number with a log output by the session, so that complete logs of the session in the whole request link are connected in series to form a link log;
s202, when the client detects that the service system outputs the error log, taking the error code position of the current error log as a cut-off point, completely extracting the link log of the current session, and outputting the link log to the server to send to the server;
s203, the server receives and stores the link logs which are uploaded by the client and are based on the tracking numbers in series, carries out error classification according to the code positions triggered by errors in the link logs, and carries out different logic processing according to different error grades.
In the embodiments provided in the present invention, it should be understood that the disclosed apparatus and method can be implemented in other ways. The apparatus and method embodiments described above are illustrative only, as the flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
In addition, the functional modules in the embodiments of the present invention may be integrated together to form an independent part, or each module may exist separately, or two or more modules may be integrated to form an independent part.
The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention or a part thereof which substantially contributes to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, an electronic device, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes. It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (6)

1. The active early warning system for the errors of the business system is characterized by comprising a client used for data acquisition and a server used for analyzing and warning, wherein:
the client is used for creating a tracking number for each session when the session starts, associating the tracking number with a log output by the session, sequentially and serially connecting complete logs of the session in the whole request link to form a link log, completely extracting the link log of the current session by taking the error code position of the current error log as an interception point when detecting that the service system outputs an error log, and outputting the link log to the server to send the link log to the server;
the server is used for receiving and storing the link logs which are uploaded by the client and are based on the tracking numbers in series, performing error classification according to the error triggering code positions in the link logs, and performing different logic processing according to different error grades;
the server side comprises:
the data uniform receiving layer is used for arranging the data sent by the client into a uniform format;
the data stream distribution processing coordination layer is used for constructing a data processing production line and promoting the data processing of the production line;
the processing module layer is used for coordinating the corresponding modules to process the data according to the processing sequence so as to realize the pipeline processing effect; the processing module layer comprises a heartbeat instruction processor, a monitoring instruction processor, a link log classifying and storing processor, a link log regularizing processor and a link log alarm processor;
the heartbeat instruction processor is used for acquiring information of all current applications and application instances when the system is started, and maintaining a complete mapping table of the application instances and heartbeat time in a memory according to the information of the applications and the application instances;
the monitoring instruction processor is used for acquiring all current applications when the system is started to form an application list, and when a monitoring report reported by the client is received, finding a corresponding application monitoring state from the application list according to an application name and updating information in the application monitoring state;
the link log categorization and storage processor is specifically configured to:
aggregating the link logs by the error code positions and then intensively displaying; the link log reported by the client carries error code position information, and the error code position information is used as an error position identifier; after the identification is separated, the identification is directly written into the storage, and thus, during subsequent query, information is queried in groups according to the error position;
storing the link log through two tables, wherein the two tables comprise a current unprocessed link log table and a processing record table; the link log reported by the client is stored in the current unprocessed link log table, and when the log of the current unprocessed link log table is fed back and processed, the log information is moved into a processing record table from the current unprocessed link log table; wherein the queries of the two tables are separate; in the current unprocessed link log table, the link logs are not aggregated in advance, the grouping effect displayed on the server interface is obtained by grouping and querying immediately during querying, and when a processing record table is queried, data in the table is aggregated in advance;
the link log regularization processor is specifically configured to: receiving matched text information input by an operator; according to the input text information, taking the error code position as the aggregated link log group to carry out link log matching; when the text information is correctly matched, marking the corresponding link log information as a corresponding processing state and a corresponding processing result;
the link log alarm processor is specifically configured to: selecting and triggering an alarm rule according to an error log registered in a link log and error code position information; wherein, the alarm rule comprises: artificially set alarm rules; alarm rules according to priority; alarm rules depending on the code location.
2. The active warning system for business system errors as claimed in claim 1, wherein the client comprises: the log interface agent layer, the hot update layer and the function realization layer, wherein:
the log interface agent layer is used for decorating the log output device in the service system, so that access in the service system and output of logs are realized;
the hot updating layer is used for providing a transition bridge between the log interface agent layer and the function realizing layer so as to realize the hot updating function realizing layer in operation;
and the function realization layer is used for establishing a tracking number for each session, connecting logs corresponding to the session in series based on the tracking number, realizing complete storage of the link logs, and extracting the complete link logs for uploading when the error logs are output.
3. The active warning system for business system errors as claimed in claim 2, wherein the hot update layer is specifically configured to:
receiving the current latest version number carried in the heartbeat response;
judging whether the latest version number prints a local version number or not;
if so, downloading the jar package file from the specified interface of the server to the specified path of the client; the downloading path takes the root path as a starting point and takes the latest version number as the name of the folder;
creating a jar file package in a URLClasLoader instance loading folder;
according to the class name in the latest version information, loading out the corresponding class by using a ClassLoader, and instantiating the object of the class by reflection;
directing interface variables in the log interface proxy layer to the newly instantiated object;
and calling a shutdown method of the original object to close the resource.
4. The active warning system for business system errors as claimed in claim 2, wherein the function implementation layer is specifically configured to:
acquiring current parameters; the current parameters comprise a tracking number and log contents;
acquiring a log object of a current thread;
judging whether the log object exists or not;
if the log object does not exist, initializing a log object by using the tracking number, adding the log content into a queue of the log object, and setting the log object into the current thread;
if so, judging whether the tracking number in the log object is consistent with the tracking number of the current access parameter;
if not, initializing a log object by using the tracking number, adding the log content into a queue of the log object, and setting the log object into the current thread;
and if the log objects are consistent, adding the log contents into the queue of the log object.
5. The active warning system for business system errors as claimed in claim 4, wherein the function implementation layer is further configured to:
acquiring and counting the total request number of a service system, and reporting the total request number to a server, so that the server can speculate and analyze the operation condition according to the error in the total request number; counting the total number of requests is realized by adding 1 to each request entry through a counter by an interceptor of Http; the reporting of the total number of requests is triggered by the timing task.
6. A business system error active early warning method is characterized by comprising the following steps:
when each session starts, a client creates a tracking number for the session, and associates the tracking number with a log output by the session, so that complete logs of the session in the whole request link are sequentially connected in series to form a link log;
when the client detects that the service system outputs the error log, taking the error code position of the current error log as a cut-off point, completely extracting the link log of the current session, and outputting the link log to the server to send the link log to the server;
the server receives and stores a link log which is uploaded by a client and is based on the tracking number series connection, carries out error classification according to the error triggering code position in the link log, and carries out different logic processing according to different error grades; the server side comprises:
the data uniform receiving layer is used for arranging the data sent by the client into a uniform format;
the data stream distribution processing coordination layer is used for constructing a data processing production line and promoting the data processing of the production line;
the processing module layer is used for coordinating corresponding modules to process data according to a processing sequence so as to realize the pipeline processing effect; the processing module layer comprises a heartbeat instruction processor, a monitoring instruction processor, a link log classifying and storing processor, a link log regularizing processor and a link log alarm processor;
the heartbeat instruction processor is used for acquiring information of all current applications and application instances when the system is started, and maintaining a complete mapping table of the application instances and heartbeat time in a memory according to the information of the applications and the application instances;
the monitoring instruction processor is used for acquiring all current applications when the system is started to form an application list, and when a monitoring report reported by the client is received, finding a corresponding application monitoring state from the application list according to an application name and updating information in the application monitoring state;
the link log categorization and storage processor is specifically configured to:
aggregating the link logs by the error code positions and then intensively displaying; the link log reported by the client carries error code position information, and the error code position information is used as an error position identifier; after the identification is separated, the identification is directly written into the storage, and thus, during subsequent query, information is queried in groups according to the error position;
storing the link log through two tables, wherein the two tables comprise a current unprocessed link log table and a processing record table; the link log reported by the client is stored in the current unprocessed link log table, and when the log of the current unprocessed link log table is fed back and processed, the log information is moved into a processing record table from the current unprocessed link log table; wherein the queries of the two tables are separate; in the current unprocessed link log table, the link logs are not aggregated in advance, the grouping effect displayed on the server interface is obtained by grouping and querying immediately during querying, and when a processing record table is queried, data in the table are aggregated in advance;
the link log regularization processor is specifically configured to: receiving matched text information input by an operator; according to the input text information, taking the error code position as the aggregated link log group to carry out link log matching; when the text information is correctly matched, marking the corresponding link log information as a corresponding processing state and a corresponding processing result;
the link log alarm processor is specifically configured to: selecting and triggering an alarm rule according to an error log registered in a link log and error code position information; wherein, the alarm rule comprises: artificially set alarm rules; alarm rules according to priority; alarm rules depending on the code location.
CN202110571651.6A 2021-05-25 2021-05-25 Active early warning system and method for errors of service system Active CN113254309B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110571651.6A CN113254309B (en) 2021-05-25 2021-05-25 Active early warning system and method for errors of service system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110571651.6A CN113254309B (en) 2021-05-25 2021-05-25 Active early warning system and method for errors of service system

Publications (2)

Publication Number Publication Date
CN113254309A CN113254309A (en) 2021-08-13
CN113254309B true CN113254309B (en) 2022-08-23

Family

ID=77184337

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110571651.6A Active CN113254309B (en) 2021-05-25 2021-05-25 Active early warning system and method for errors of service system

Country Status (1)

Country Link
CN (1) CN113254309B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116170321B (en) * 2022-12-09 2024-04-02 广州市玄武无线科技股份有限公司 Data collection method, device, equipment and storage medium for link tracking
CN116049115B (en) * 2023-01-13 2023-12-01 深圳安科百腾科技有限公司 Software log processing method

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108197200B (en) * 2017-12-27 2021-06-15 金蝶软件(中国)有限公司 Log tracking method and device, computer equipment and storage medium
CN107977473B (en) * 2017-12-28 2020-05-08 政采云有限公司 Logback-based distributed system log retrieval method and system
US11086619B2 (en) * 2019-01-04 2021-08-10 Morgan Stanley Services Group Inc. Code analytics and publication platform
CN110855477A (en) * 2019-10-29 2020-02-28 浙江大搜车软件技术有限公司 Link log monitoring method and device, computer equipment and storage medium
CN111459766B (en) * 2019-11-14 2024-01-12 国网浙江省电力有限公司信息通信分公司 Micro-service system-oriented call chain tracking and analyzing method

Also Published As

Publication number Publication date
CN113254309A (en) 2021-08-13

Similar Documents

Publication Publication Date Title
CN112612675B (en) Distributed big data log link tracking method and system under micro-service architecture
US11876809B2 (en) Identifying a cyber-attack impacting a particular asset
US9940373B2 (en) Method and system for implementing an operating system hook in a log analytics system
US20210133634A1 (en) Efficiently executing commands at external computing services
CN113254309B (en) Active early warning system and method for errors of service system
US7689688B2 (en) Multiple-application transaction monitoring facility for debugging and performance tuning
US20200090027A1 (en) Anomaly detection based on predicted textual characters
Mayer et al. An approach to extract the architecture of microservice-based software systems
US11775501B2 (en) Trace and span sampling and analysis for instrumented software
Rabkin et al. Chukwa: a system for reliable {Large-Scale} log collection
US7870244B2 (en) Monitoring performance of applications in a distributed environment
US11615082B1 (en) Using a data store and message queue to ingest data for a data intake and query system
CN105653425B (en) Monitoring system based on complex event processing engine
CN112965874B (en) Configurable monitoring alarm method and system
US11042525B2 (en) Extracting and labeling custom information from log messages
US11966797B2 (en) Indexing data at a data intake and query system based on a node capacity threshold
CN110895488B (en) Task scheduling method and device
US7069184B1 (en) Centralized monitoring and early warning operations console
CN111124609B (en) Data acquisition method and device, data acquisition equipment and storage medium
WO2021072742A1 (en) Assessing an impact of an upgrade to computer software
US20070189509A1 (en) Data path identification and analysis for distributed applications
US20120072589A1 (en) Information Processing Apparatus and Method of Operating the Same
CN109409948B (en) Transaction abnormity detection method, device, equipment and computer readable storage medium
CN116841831A (en) Fault-tolerant processing method and device based on comprehensive inspection
US10116512B2 (en) Service discovery and/or effort estimation in networked computing environments

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 361000 one of 504, No. 18, guanri Road, phase II, software park, Xiamen, Fujian

Applicant after: XIAMEN YILIANZHONG YIHUI TECHNOLOGY CO.,LTD.

Address before: Room 504, No.18, guanri Road, phase II, software park, Xiamen City, Fujian Province, 361000

Applicant before: XIAMEN YILIANZHONG YIHUI TECHNOLOGY CO.,LTD.

GR01 Patent grant
GR01 Patent grant