CN113138896A - Application running condition monitoring method, device and equipment - Google Patents

Application running condition monitoring method, device and equipment Download PDF

Info

Publication number
CN113138896A
CN113138896A CN202110448378.8A CN202110448378A CN113138896A CN 113138896 A CN113138896 A CN 113138896A CN 202110448378 A CN202110448378 A CN 202110448378A CN 113138896 A CN113138896 A CN 113138896A
Authority
CN
China
Prior art keywords
target
application
running
fault
log
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110448378.8A
Other languages
Chinese (zh)
Inventor
郭相权
任政
郑杰
王启宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202110448378.8A priority Critical patent/CN113138896A/en
Publication of CN113138896A publication Critical patent/CN113138896A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/302Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a software system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • G06F11/3068Monitoring arrangements determined by the means or processing involved in reporting the monitored data where the reporting involves data format conversion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/32Monitoring with visual or acoustical indication of the functioning of the machine
    • G06F11/323Visualisation of programs or trace data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • G06F16/168Details of user interfaces specifically adapted to file systems, e.g. browsing and visualisation, 2d or 3d GUIs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1805Append-only file systems, e.g. using logs or journals to store data
    • G06F16/1815Journaling file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Computing Systems (AREA)
  • Human Computer Interaction (AREA)
  • Mathematical Physics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The embodiment of the specification provides a method, a device and equipment for monitoring application running conditions, and relates to the technical field of artificial intelligence, wherein the method comprises the following steps: determining a universal index information set of a plurality of target applications; defining a target log format according to the general index information set; recording logs generated by running the plurality of target applications by utilizing a distributed search analysis engine; the format of the log generated by running the target applications is the target log format; and monitoring the running condition of each target application based on the log generated by the running of the target application by combining the distributed search engine and the open source visualization platform. In the embodiment of the description, the reusability of each application is effectively improved, and the running condition of each target application can be efficiently monitored through each general index in the log.

Description

Application running condition monitoring method, device and equipment
Technical Field
The embodiment of the specification relates to the technical field of artificial intelligence, in particular to a method, a device and equipment for monitoring application running conditions.
Background
The fault finding timeliness in the operation and maintenance process of the application is important, the real-time application operation condition is recorded by the transaction log in the operation process of the application, and spider-line trails of the fault can be found through the transaction log. In the prior art, application logs are usually visually displayed and monitored and alarmed to be customized, a set of personalized log formats including service and technical index fields need to be artificially defined for different applications, and a monitoring visual chart is customized according to the formats. Therefore, the reusability among different applications is poor, and the access cost of new applications is high, so that the technical scheme in the prior art cannot efficiently monitor the running conditions of different applications.
In view of the above problems, no effective solution has been proposed.
Disclosure of Invention
The embodiment of the specification provides a method, a device and equipment for monitoring application running conditions, so as to solve the problem that the running conditions of different applications cannot be monitored efficiently in the prior art.
An embodiment of the present specification provides a method for monitoring an application running condition, including: determining a universal index information set of a plurality of target applications; defining a target log format according to the general index information set; recording logs generated by running the plurality of target applications by utilizing a distributed search analysis engine; the format of the log generated by running the target applications is the target log format; and monitoring the running condition of each target application based on the log generated by the running of the target application by combining the distributed search engine and the open source visualization platform.
An embodiment of the present specification further provides a monitoring device for an application running condition, including: the determining module is used for determining a universal index information set of a plurality of target applications; the definition module is used for defining a target log format according to the general index information set; the recording module is used for recording logs generated by running the target applications by utilizing a distributed search analysis engine; the format of the log generated by running the target applications is the target log format; and the monitoring module is used for monitoring the running condition of each target application based on the log generated by the running of the target application by combining the distributed search engine and the open source visualization platform.
The embodiment of the present specification further provides an application operation condition monitoring device, which includes a processor and a memory for storing processor-executable instructions, where the processor executes the instructions to implement the steps of the application operation condition monitoring method.
The embodiment of the specification also provides a computer readable storage medium, on which computer instructions are stored, and when the instructions are executed, the steps of the monitoring method for the application running condition are realized.
The embodiment of the present specification provides a method for monitoring an application running condition, which can determine a common index information set of a plurality of target applications, and define a target log format according to the common index information set, so that common indexes and log formats included in logs of different target applications can be unified, reusability among the applications is effectively improved, and a new application access cost is reduced. Furthermore, a distributed search analysis engine can be used for recording logs in a target log format generated by running of a plurality of target applications, and the logs generated by running of the target applications recorded in the distributed search engine are visually displayed by using an open source visualization platform, so that the running condition of each target application can be efficiently monitored through each general index in the logs.
Drawings
The accompanying drawings, which are included to provide a further understanding of the embodiments of the disclosure, are incorporated in and constitute a part of this specification, and are not intended to limit the embodiments of the disclosure. In the drawings:
fig. 1 is a schematic step diagram of a monitoring method for application operation conditions according to an embodiment of the present disclosure;
FIG. 2 is a schematic diagram of an application overall operation condition presentation module provided according to an embodiment of the present specification;
fig. 3 is a schematic diagram of a sub-module operation condition display module in an application provided in an embodiment of the present specification;
FIG. 4 is a schematic structural diagram of a monitoring system for monitoring application operating conditions provided in an embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of a monitoring apparatus for monitoring application operation conditions provided in an embodiment of the present disclosure;
fig. 6 is a schematic structural diagram of a monitoring device for applying an operation condition according to an embodiment of the present specification.
Detailed Description
The principles and spirit of the embodiments of the present specification will be described with reference to a number of exemplary embodiments. It should be understood that these embodiments are presented merely to enable those skilled in the art to better understand and to implement the embodiments of the present description, and are not intended to limit the scope of the embodiments of the present description in any way. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
As will be appreciated by one skilled in the art, implementations of the embodiments of the present description may be embodied as a system, an apparatus, a method, or a computer program product. Therefore, the disclosure of the embodiments of the present specification can be embodied in the following forms: entirely hardware, entirely software (including firmware, resident software, micro-code, etc.), or a combination of hardware and software.
Although the flow described below includes operations that occur in a particular order, it should be appreciated that the processes may include more or less operations that are performed sequentially or in parallel (e.g., using parallel processors or a multi-threaded environment).
Referring to fig. 1, the present embodiment can provide a method for monitoring an application running condition. The application running condition monitoring method can be used for efficiently monitoring the running conditions of a plurality of different applications. The method for monitoring the application running condition comprises the following steps.
S101: a set of universal metric information for a plurality of target applications is determined.
In this embodiment, a set of general indicator information for a plurality of target applications may be determined. The target applications may be applications of different industries and types that need to be monitored in real time, and the general index information set may include a plurality of general indexes.
In the embodiment, the common technologies and service indexes for application operation are extracted by investigating and researching applications of various industries and types, and the determined common technologies and service indexes are screened to obtain a plurality of general indexes. The general indicators may include, but are not limited to, at least one of: log type, log identifier, serial number, application abbreviation, node IP, module name, service name, method name, network point, mechanism, return code, resource transfer amount, resource transfer success rate, resource transfer time consumption, overtime resource transfer amount, etc. Of course, the determination method of the general indicators is not limited to the above examples, and other modifications may be made by those skilled in the art in light of the technical spirit of the embodiments of the present disclosure, but the functions and effects achieved by the present disclosure are all within the scope of the embodiments of the present disclosure.
S102: and defining a target log format according to the general index information set.
In the present embodiment, in order to better monitor each target application, one of effective monitoring means is log monitoring. Therefore, the target log format can be defined according to the general index information set, so that the log formats of the target applications are unified, and the target applications can generate logs in a unified format.
In this embodiment, the log may be divided into: transaction logs, error logs, batch logs, and the like. The transaction log can be used for counting transaction responses, counting transaction amount, analyzing transaction process and analyzing transaction success rate, analyzing the response time of the transaction according to the starting time and the ending time of each transaction, and analyzing the change of the transaction response in real time; the transaction amount can be classified and counted in real time according to information such as transaction messages generated by transactions, the requirement of displaying the transaction amount of an application system in real time can be met by a transaction log counting method, the transaction generation process can be sorted and analyzed according to the transaction log with minimum influence on the application system, and the application problem can be positioned more efficiently from the perspective of the transaction process; the transaction success rate in unit time can be calculated in real time according to the returned information statistics of each transaction in the log.
In this embodiment, the target log format may be defined according to the general indicators included in the general indicator information set, and all the general indicators in the general indicator information set may be included in the target log format. Because each transaction is whether failure or abnormal, the transaction log should have clear and complete "start" and "end" identifications, so that the start and the end of each transaction can be clearly found in the log analysis to distinguish the transactions. And because the log file may be controlled by multiple threads or processes, each transaction log should have a uniquely identified transaction code, such as a thread ID or process ID, during the beginning and end of each transaction log. The determined log format may be: < TIMESTAMP > < ThreadID/ProcessID > < Class/File > < Level > [ transaction information, which may be start/end/message/exception information, etc. ], wherein TIMESTAMP is a time stamp; threadID/processID is thread or process ID; Class/File is the type or File that performs the transaction; level is the log Level. Of course, the method for determining the target log format is not limited to the above example, and those skilled in the art may make other modifications according to the actually determined general indicators in the light of the technical spirit of the embodiments of the present disclosure, but as long as the functions and effects achieved by the method are the same as or similar to those of the embodiments of the present disclosure, the method should be covered by the scope of the embodiments of the present disclosure.
S103: recording logs generated by running a plurality of target applications by using a distributed search analysis engine; the format of the log generated by running the target applications is a target log format.
In this embodiment, logs generated by running a plurality of target applications may be recorded by using a distributed search analysis engine (Elasticsearch), where a format of the logs generated by running the plurality of target applications may be a target log format.
In this embodiment, the distributed search analysis engine is a Lucene-based search server. It provides a distributed multi-user full-text search engine, which can conveniently make a large amount of data have the capabilities of searching, analyzing and exploring. The Elasticsearch was developed in the Java language and published as open source under the Apache licensing terms, a popular enterprise level search engine. The Lucene is a full-text retrieval engine toolkit of an open source code issued by the Apache software foundation, and the elastic search can achieve the effects of real-time search, stability, reliability, rapidness, convenience in installation and use and the like when being used for cloud computing.
In the embodiment, logs of different applications can be stored respectively, and an Elasticsearch distributed log storage and search cluster can be built. In some embodiments, a Fluentd tool may be used to obtain logs generated by target applications in real time or at intervals, establish corresponding application indexes in the Elasticsearch according to application names of the respective target applications, and send logs generated by running of the target applications to an Elasticsearch log center for storage in an HTTP request manner, so that the application structured logs may be written into an Elasticsearch distributed log storage and search cluster.
In this embodiment, the HTTP request refers to a request message from the client to the server, and includes a request method for a resource in the message head line, an identifier of the resource, and a protocol used. Fluentd is an open-source log collection system, and can collect various logs and convert the logs into a format convenient for machine processing.
S104: and monitoring the running condition of each target application based on the log generated by the running of the target application by combining a distributed search engine and an open source visualization platform.
In this embodiment, the distributed search engine and the open source visualization platform (Grafana) may be combined to monitor the operation condition of each target application based on a log generated by the operation of the target application. The logs generated by the running of the target applications recorded in the distributed search engine can be visually displayed by using an open source visualization platform, so that the running condition of each target application is monitored.
In this embodiment, the open source visualization platform (Grafana) is an open source data visualization tool developed by Go language (a programming language), and can perform data monitoring and data statistics, and has an alarm function, and can visually define fault identification rules for important characteristic indexes, and Grafana can continuously evaluate the characteristic indexes and send a notification of fault prompt information.
In this embodiment, the running condition of each target application may be used to represent whether the target application is abnormal in the running process, the running condition of the target application may include a real-time change condition of a characteristic index that may be used to represent the running condition in a plurality of general indexes, and if the value of the characteristic index is not within a normal range, it indicates that an abnormality occurs, and analysis and notification of a fault root cause need to be performed.
From the above description, it can be seen that the embodiments of the present specification achieve the following technical effects: the method can determine the general index information sets of a plurality of target applications and define the target log format according to the general index information sets, thereby unifying the general indexes and the log formats contained in the logs of different target applications, effectively improving the reusability among the applications and reducing the cost of new application access. Furthermore, a distributed search analysis engine can be used for recording logs in a target log format generated by running of a plurality of target applications, and the logs generated by running of the target applications recorded in the distributed search engine are visually displayed by using an open source visualization platform, so that the running condition of each target application can be efficiently monitored through each general index in the logs.
In one embodiment, logging generated by running a plurality of target applications by using a distributed search analysis engine may include: the method comprises the steps that a plurality of target applications call an interface provided by an online transaction monitoring software development kit to generate a log in a target log format in the running process, the log generated by running of the plurality of target applications can be obtained by using a fluent tool, and indexes of the target applications are established in a distributed search analysis engine according to application names. Further, logs generated by running of a plurality of target applications can be sent to the distributed search analysis engine for storage in an HTTP request mode based on indexes of the target applications.
In this embodiment, a Software Development Kit (SDK) for online transaction monitoring may be extracted according to a defined target log format, and a target application may introduce the SDK and call an interface provided by the SDK to record an application log, so that the application unifies the format when generating the log. In some embodiments, the online transaction monitoring SDK may also reserve a Map application custom field set to support the target application to add personalized service metrics on the basis of the general metrics. The Map is used for storing key value pairs, key repetition is not allowed, and values can be repeated.
In this embodiment, a fluntd tool may be used to obtain logs generated by running a target application in real time or at intervals, a corresponding application index may be established in an Elasticsearch according to an application name of each target application, and logs generated by running the target application may be sent to an Elasticsearch log center for storage in an HTTP request manner, so that logs structured by an application may be written into an Elasticsearch distributed log storage and search cluster.
In this embodiment, the HTTP request refers to a request message from the client to the server, and includes a request method for a resource in the message head line, an identifier of the resource, and a protocol used. Fluentd is an open-source log collection system, and can collect various logs and convert the logs into a format convenient for machine processing.
In one embodiment, in combination with a distributed search engine and an open source visualization platform, monitoring the running condition of each target application based on a log generated by running the target application may include: and calling an open source visual platform interface to newly build a monitoring instrument board, a fault identification rule and a fault notification channel of each target application according to the application name of each target application and the index of each target application in the distributed search analysis engine. Furthermore, the running conditions of the multiple target applications can be displayed according to the monitoring instrument panels of the multiple target applications based on the indexes of the target applications in the distributed search analysis engine.
In this embodiment, a high-availability tool cluster cloud can be built by using a Grafana data visualization tool, and the index of each target application in the distributed search analysis engine can be used as a data source generated by a monitoring instrument panel of each target application. In order to distinguish the monitoring instrument panel, the open source visual platform interface can be called according to the application name of each target application to newly build the monitoring instrument panel, the fault identification rule and the fault notification channel of each target application.
In the present embodiment, the corresponding failure recognition rules may be different for different applications due to their different characteristics. The failure notification channel may be a mail, a short message, or the like, and may of course be any other possible channel, for example: the specific details of the telephone and the like can be determined according to actual situations, and the embodiments of the present specification do not limit the details.
In this embodiment, each target application may define a fault identification rule based on a corresponding characteristic index, for example, define a fault identification rule based on a characteristic index such as a success rate, a time consumption, and an overtime transaction amount, so as to define a condition for triggering a fault alarm, for example: and performing fault alarm when the average success rate of resource transfer is lower than 80% within 5 minutes. The specific situation can be determined according to actual situations, and the embodiment of the present specification does not limit the specific situation.
In the embodiment, after information such as a monitoring instrument board, a fault identification rule, a fault notification channel and the like of each target application is customized and completed in the Grafana background, the information can be persistently stored in the MySQL database for subsequent calling. The MySQL is a relational database management system of open source codes.
In this embodiment, the Grafana runtime engine may consume the logs of each application from the log center of the distributed search analysis engine based on the index of each target application in the distributed search analysis engine, so as to generate a corresponding icon for foreground display according to the monitoring dashboard of each target application.
In an embodiment, invoking an open source visualization platform interface to create a monitoring dashboard of each target application according to an application name of each target application and an index of each target application in a distributed search analysis engine may include: determining at least one characteristic index from the general index information set, and calling an open source visual platform interface to newly build a monitoring instrument panel of each target application according to the characteristic index, the application name of each target application and the index of each target application in a distributed search analysis engine; wherein, the control instrument board of target application is used for showing the behavior of each characteristic index, and the control instrument board includes: the system comprises an application overall operation condition display module, an application sub-module operation condition display module, a node operation condition display module in a sub-module and a formal gray level environment comparison display module of the sub-module.
In this embodiment, since not all of the general indicators may be used to determine whether there is an abnormality in the application, at least one characteristic indicator may be determined from the general indicator information set, and the characteristic indicators of different target applications may be different. The characteristic indicators may include, but are not limited to: the success rate of resource transfer, the time consumed by resource transfer, the number of overtime resource transfers, etc., it is understood that other possible indexes may be included, for example: the average resource transfer success rate and the like may be determined according to actual conditions, and this is not limited in the embodiments of the present specification.
In this embodiment, a key access function can be implemented, and a Grafana interface is invoked through the feature index, the application name of each target application, and the index of each target application in the distributed search analysis engine, so as to implement application creation, user creation, data source creation corresponding to the index of each target application in the distributed search analysis engine, monitor dashboard creation, failure recognition rule creation, failure notification channel creation, and the like.
In the embodiment, the personalized service index expansion of the target application in the log can be supported, the personalized feature index expansion can be abstracted by the target application according to the service index in the log, and the correspondingly established visual view of the monitoring instrument panel and the fault identification rule can be dynamically adjusted on line in real time according to the self requirement of the target application. After online, the application overview view and the operation condition of each level monitoring can be checked in real time, and the chart query logic and the fault identification rule can be adjusted at any time.
In the embodiment, the monitoring instrument board and the fault identification rule can be dynamically adjusted on line in real time, and the adjustment can be effective after being stored. The target application can add and modify a monitoring panel, modify the rule of inquiring the index of the statistical log, and select various forms of charts such as a line chart, a bar chart, an oil table chart, a pie chart, a table and the like for displaying. The specific situation can be determined according to actual situations, and the embodiment of the present specification does not limit the specific situation.
In this embodiment, an application overall operation condition display module, an application sub-module operation condition display module, a node operation condition display module in a sub-module, and a formal gray scale environment comparison display module of the sub-module may be newly created according to the determined characteristic indexes. Therefore, the application overview chart of each application can be checked according to the application dimension, the application topological graph can be configured according to the calling relationship among the modules of the target application, the monitoring of the running condition of the sub-module can be checked by clicking each module on the topological graph, the comparison monitoring of the module and the running condition of the gray module can be checked by clicking each module on the topological graph, and the monitoring of the running condition of the node in the module can be checked by clicking the node list of the module. The overall application operation condition display module in the monitoring instrument panel may be as shown in fig. 2, and the sub-module application operation condition display module may be as shown in fig. 3. Wherein, Full GC is used for cleaning the whole stack space, including young generation and permanent generation, and the reason needs to be found out when the Full GC of the system is frequent; slow SQL is a structured query statement that runs slower is recorded in a log.
In an embodiment, after the calling of the open source visualization platform interface creates a monitoring dashboard, a fault identification rule, and a fault notification channel of each target application, the method may further include: and selecting an application name corresponding to the target user, and setting the viewing permission of the target user.
In this embodiment, application names that each user has authority may be set, and the application overview view and the level monitoring may be viewed by anonymously docking Grafana. The target user may be an operation and maintenance person, a developer, and the like corresponding to the target application, which may be determined specifically according to an actual situation, and this is not limited in the embodiment of this specification.
In this embodiment, since native Grafana only supports anonymous viewing of a single application, Grafana source code may be modified in some embodiments to enable anonymous viewing of multiple different application monitoring dashboards.
In one embodiment, after monitoring the running condition of each target application based on the log generated by the running of the target application, the method may further include: and generating a fault prompt sending request under the condition that the fault of the target application is determined based on a log generated by the running of the target application and a fault identification rule, putting the fault prompt sending request into a target queue, and preprocessing the fault prompt sending request in the target queue. Further, a target processing object and a target fault notification channel corresponding to the target fault prompt sending request in the preprocessed target queue can be determined; the target fault prompt sending request is a fault prompt sending request which is not misinformed and determines a fault reason. And sending fault prompt information corresponding to the target fault prompt sending request to the target processing object based on the target fault notification channel.
In the embodiment, when determining that the target application has a fault based on the fault identification rule, the fault identification rule may trigger an alarm, generate a fault prompt sending request, and the Grafana may send fault prompt information according to a fault notification channel configured by the application. In some embodiments, the generated fault prompt send request may be sent to the alarm processing platform synchronously.
In this embodiment, the fault-hint send request may be placed in a target queue of an alarm processing platform, and the alarm processing platform may be configured to perform multi-thread preprocessing on the fault-hint send request in the target queue. In actual operation, the number of the target queues may be multiple.
In this embodiment, the preprocessing may include data cleansing, and may be combined with expert base of recent alarms and historical alarms for preliminary analysis, such as: the same alarm content received in the last 5 minutes for the same application can be screened out as a repeat alarm. Of course, the above-mentioned preprocessing method is not limited to the above-mentioned examples, and other modifications are possible for those skilled in the art in light of the technical spirit of the embodiments of the present disclosure, but all that can be achieved by the preprocessing method is covered by the scope of the embodiments of the present disclosure as long as the functions and effects achieved by the preprocessing method are the same as or similar to the embodiments of the present disclosure.
In this embodiment, the failure prompt sending conditions in the preprocessed target queue are all non-false-alarm failure prompt sending requests, and therefore, whether the corresponding failure cause and the corresponding suggested processing mode can be determined according to the history data or not is further based on the failure type and the failure content corresponding to the target failure prompt sending request in the preprocessed target queue. And if so, determining a target processing object and a target fault notification channel corresponding to the target fault prompt sending request in the preprocessed target queue, and sending fault prompt information corresponding to the target fault prompt sending request to the target processing object based on the target fault notification channel.
In this embodiment, the fault notification information may include a fault type, fault content, and a suggested processing method, and the target processing object may be an operation and maintenance person or a developer of a target application, which may be determined according to actual situations, and this is not limited in this embodiment of the present specification.
In one embodiment, after preprocessing the fault indication sending request in the target queue, the method may further include: transmitting a characteristic fault prompt sending request in the preprocessed target queue to an AIOps intelligent root cause analysis system for comprehensive fault root cause analysis; the characteristic fault prompt sending request is a fault prompt sending request which is not misinformed and the fault reason is not determined.
In this embodiment, if the fault type and the fault content corresponding to the feature fault prompt sending request in the preprocessed target queue cannot determine the corresponding fault cause and the suggested processing mode according to the historical data, the feature fault prompt sending request in the preprocessed target queue may be transmitted to an AIOps intelligent root cause analysis system to perform comprehensive fault root cause analysis based on logs, system indexes, infrastructure and the like, where the feature fault prompt sending request is a fault prompt sending request that is not misinformation and has no determined fault cause.
In the embodiment, the AIOps intelligent root cause analysis system applies artificial intelligence to the operation and maintenance field, and further solves the problem that the automatic operation and maintenance cannot be solved by a machine learning mode based on the existing operation and maintenance data (logs, monitoring information, application information and the like). AIOps does not rely on artificially specified rules, advocates that the rules are continuously learned, continuously refined and summarized from massive operation and maintenance data automatically by a machine learning algorithm.
In one embodiment, the monitoring system for the application operation condition may be divided into four layers in the architecture as shown in fig. 4, wherein the acquisition layer: the method comprises the steps that an application APP (on-cloud or off-cloud) generates logs in a unified format based on an online transaction monitoring SDK, a Fluentd tool is used for obtaining the logs generated by application operation in real time or at intervals, corresponding application indexes are established in an Elasticisarch according to application names of various target applications, and the logs generated by application operation can be sent to an Elasticisarch log center for storage in an HTTP request mode; a storage layer: building an Elasticissearch Cluster distributed log Cluster storage application log; a consumption layer: and the Grafana background customizes an application monitoring instrument panel, a fault identification rule and a fault notification channel, persistently stores the application monitoring instrument panel, the fault identification rule and the fault notification channel in a MySQL database, consumes an application log from a log center by the Grafana running engine, generates a chart for a foreground to display, and performs alarm cleaning and AIOps root cause analysis on the monitoring triggering alarm through an alarm processing platform. A presentation layer: the information such as an overview of the application, module-by-module node checking and monitoring, formal gray level environment monitoring and comparison, an alarm list and the like can be displayed through a one-key access function.
Based on the same inventive concept, the embodiment of the present specification further provides a monitoring device for application operation conditions, as in the following embodiments. Because the principle of solving the problem of the monitoring device for the application running condition is similar to that of the monitoring method for the application running condition, the implementation of the monitoring device for the application running condition can refer to the implementation of the monitoring method for the application running condition, and repeated parts are not described again. As used hereinafter, the term "unit" or "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated. Fig. 5 is a block diagram of a structure of a monitoring apparatus for monitoring an application operation condition according to an embodiment of the present disclosure, and as shown in fig. 5, the monitoring apparatus may include: the structure of the device is described below with reference to a determining module 501, a defining module 502, a recording module 503, and a monitoring module 504.
A determining module 501, configured to determine a set of universal indicator information of a plurality of target applications;
a definition module 502, which may be configured to define a target log format according to the general index information set;
a recording module 503, which can be used for recording logs generated by running a plurality of target applications by using a distributed search analysis engine; the format of the log generated by running the target applications is a target log format;
the monitoring module 504 may be configured to monitor the operation condition of each target application based on a log generated by the operation of the target application in conjunction with the distributed search engine and the open source visualization platform.
The embodiment of the present specification further provides an electronic device, which may specifically refer to a schematic structural diagram of the electronic device shown in fig. 6 based on the monitoring method for the application running condition provided by the embodiment of the present specification, and the electronic device may specifically include an input device 61, a processor 62, and a memory 63. The input device 61 may be specifically configured to input the determined common indicator information sets of the plurality of target applications. The processor 62 may be specifically configured to define a target log format according to the universal index information set; recording logs generated by running a plurality of target applications by using a distributed search analysis engine; the format of the log generated by running the target applications is a target log format; and monitoring the running condition of each target application based on the log generated by the running of the target application by combining a distributed search engine and an open source visualization platform. The memory 63 may be specifically configured to store parameters such as a target log format.
In this embodiment, the input device may be one of the main apparatuses for information exchange between a user and a computer system. The input devices may include a keyboard, mouse, camera, scanner, light pen, handwriting input panel, voice input device, etc.; the input device is used to input raw data and a program for processing the data into the computer. The input device can also acquire and receive data transmitted by other modules, units and devices. The processor may be implemented in any suitable way. For example, the processor may take the form of, for example, a microprocessor or processor and a computer-readable medium that stores computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, an embedded microcontroller, and so forth. The memory may in particular be a memory device used in modern information technology for storing information. The memory may include multiple levels, and in a digital system, memory may be used as long as binary data can be stored; in an integrated circuit, a circuit without a physical form and with a storage function is also called a memory, such as a RAM, a FIFO and the like; in the system, the storage device in physical form is also called a memory, such as a memory bank, a TF card and the like.
In this embodiment, the functions and effects specifically realized by the electronic device can be explained by comparing with other embodiments, and are not described herein again.
Embodiments of the present specification further provide a computer storage medium for a monitoring method based on application running conditions, where the computer storage medium stores computer program instructions, and when the computer program instructions are executed, the computer storage medium may implement: determining a universal index information set of a plurality of target applications; defining a target log format according to the general index information set; recording logs generated by running a plurality of target applications by using a distributed search analysis engine; the format of the log generated by running the target applications is a target log format; and monitoring the running condition of each target application based on the log generated by the running of the target application by combining a distributed search engine and an open source visualization platform.
In this embodiment, the storage medium includes, but is not limited to, a Random Access Memory (RAM), a Read-Only Memory (ROM), a Cache (Cache), a Hard Disk Drive (HDD), or a Memory Card (Memory Card). The memory may be used to store computer program instructions. The network communication unit may be an interface for performing network connection communication, which is set in accordance with a standard prescribed by a communication protocol.
In this embodiment, the functions and effects specifically realized by the program instructions stored in the computer storage medium can be explained by comparing with other embodiments, and are not described herein again.
It will be apparent to those skilled in the art that the modules or steps of the embodiments of the present specification described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed over a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different from that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, embodiments of the present description are not limited to any specific combination of hardware and software.
Although the embodiments herein provide method steps as in the embodiments or flowcharts described above, more or fewer steps may be included in a method based on conventional or non-inventive efforts. In the case of steps where no causal relationship is logically necessary, the order of execution of the steps is not limited to that provided by the embodiments of the present description. When implemented in an actual apparatus or end product, the methods of (1) can be performed sequentially or in parallel (e.g., in the context of parallel processors or multi-threaded processing) according to the embodiments or methods shown in the figures.
It is to be understood that the above description is intended to be illustrative, and not restrictive. Many embodiments and many applications other than the examples provided will be apparent to those of skill in the art upon reading the above description. The scope of embodiments of the present specification should, therefore, be determined not with reference to the above description, but should instead be determined with reference to the appended claims, along with the full scope of equivalents to which such claims are entitled.
The above description is only a preferred embodiment of the embodiments of the present disclosure, and is not intended to limit the embodiments of the present disclosure, and it will be apparent to those skilled in the art that various modifications and variations can be made in the embodiments of the present disclosure. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the embodiments of the present disclosure should be included in the protection scope of the embodiments of the present disclosure.

Claims (11)

1. A method for monitoring application running conditions is characterized by comprising the following steps:
determining a universal index information set of a plurality of target applications;
defining a target log format according to the general index information set;
recording logs generated by running the plurality of target applications by utilizing a distributed search analysis engine; the format of the log generated by running the target applications is the target log format;
and monitoring the running condition of each target application based on the log generated by the running of the target application by combining the distributed search engine and the open source visualization platform.
2. The method of claim 1, wherein logging the plurality of target application runs with a distributed search analysis engine comprises:
the target applications call an interface provided by an online transaction monitoring software development kit to generate a log in a target log format in the running process;
acquiring logs generated by running the multiple target applications by using a fluent tool;
establishing indexes of all target applications in the distributed search analysis engine according to application names;
and sending the logs generated by running the target applications to the distributed search analysis engine for storage in an HTTP request mode based on the indexes of the target applications.
3. The method of claim 1, wherein monitoring the running condition of each target application based on the log generated by the running of the target application in combination with the distributed search engine and the open source visualization platform comprises:
calling an open source visual platform interface to establish a monitoring instrument board, a fault identification rule and a fault notification channel of each target application according to the application name of each target application and the index of each target application in the distributed search analysis engine;
and displaying the running conditions of the target applications according to the monitoring instrument panels of the target applications based on the indexes of the target applications in the distributed search analysis engine.
4. The method of claim 3, wherein invoking an open source visualization platform interface to create a monitoring dashboard for each target application according to the application name of each target application and the index of each target application in the distributed search analysis engine comprises:
determining at least one characteristic index from the general index information set;
calling an open source visual platform interface to establish a monitoring instrument board of each target application according to the characteristic index, the application name of each target application and the index of each target application in the distributed search analysis engine; the monitoring instrument board of the target application is used for displaying the running condition of each characteristic index, and comprises: the system comprises an application overall operation condition display module, an application sub-module operation condition display module, a node operation condition display module in a sub-module and a formal gray level environment comparison display module of the sub-module.
5. The method of claim 3, after calling an open source visual platform interface to create a monitoring dashboard, a fault identification rule and a fault notification channel of each target application, further comprising:
selecting an application name corresponding to a target user, and setting the viewing permission of the target user.
6. The method according to claim 3, further comprising, after monitoring the running condition of each target application based on the log generated by the running of the target application:
generating a fault prompt sending request under the condition that the target application is determined to have a fault based on the log generated by the running of the target application and the fault identification rule;
putting the fault prompt sending request into a target queue;
preprocessing a fault prompt sending request in the target queue;
determining a target processing object and a target fault notification channel corresponding to a target fault prompt sending request in the preprocessed target queue; the target fault prompt sending request is a non-false-alarm fault prompt sending request for determining a fault reason;
and sending fault prompt information corresponding to the target fault prompt sending request to the target processing object based on the target fault notification channel.
7. The method of claim 6, after preprocessing the request for sending the fault indication in the target queue, further comprising:
transmitting the characteristic fault prompt sending request in the preprocessed target queue to an AIOps intelligent root cause analysis system for comprehensive fault root cause analysis; and the characteristic fault prompt sending request is a fault prompt sending request which is not misinformed and the fault reason is not determined.
8. The method of claim 1, wherein the generic indicator comprises: log type, log identifier, serial number, application abbreviation, node IP, module name, service name, method name, network point, mechanism, return code, resource transfer amount, resource transfer success rate, resource transfer time consumption and overtime resource transfer amount.
9. An application behavior monitoring device, comprising:
the determining module is used for determining a universal index information set of a plurality of target applications;
the definition module is used for defining a target log format according to the general index information set;
the recording module is used for recording logs generated by running the target applications by utilizing a distributed search analysis engine; the format of the log generated by running the target applications is the target log format;
and the monitoring module is used for monitoring the running condition of each target application based on the log generated by the running of the target application by combining the distributed search engine and the open source visualization platform.
10. A monitoring device for application performance comprising a processor and a memory for storing processor-executable instructions, the processor implementing the steps of the method of any one of claims 1 to 8 when executing the instructions.
11. A computer-readable storage medium having stored thereon computer instructions which, when executed, implement the steps of the method of any one of claims 1 to 8.
CN202110448378.8A 2021-04-25 2021-04-25 Application running condition monitoring method, device and equipment Pending CN113138896A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110448378.8A CN113138896A (en) 2021-04-25 2021-04-25 Application running condition monitoring method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110448378.8A CN113138896A (en) 2021-04-25 2021-04-25 Application running condition monitoring method, device and equipment

Publications (1)

Publication Number Publication Date
CN113138896A true CN113138896A (en) 2021-07-20

Family

ID=76811902

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110448378.8A Pending CN113138896A (en) 2021-04-25 2021-04-25 Application running condition monitoring method, device and equipment

Country Status (1)

Country Link
CN (1) CN113138896A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117331795A (en) * 2023-12-01 2024-01-02 南京研利科技有限公司 Service index calculation method and system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070239799A1 (en) * 2006-03-29 2007-10-11 Anirudh Modi Analyzing log files
CN109800223A (en) * 2018-12-12 2019-05-24 平安科技(深圳)有限公司 Log processing method, device, electronic equipment and storage medium
CN110309130A (en) * 2018-03-21 2019-10-08 中国人民财产保险股份有限公司 A kind of method and device for host performance monitor
CN110990218A (en) * 2019-11-22 2020-04-10 深圳前海环融联易信息科技服务有限公司 Visualization and alarm method and device based on mass logs and computer equipment
CN111459782A (en) * 2020-04-02 2020-07-28 网易(杭州)网络有限公司 Method and device for monitoring business system, cloud platform system and server
CN112380091A (en) * 2020-11-13 2021-02-19 中国人寿保险股份有限公司 Service operation condition monitoring method and device and related equipment
CN112506743A (en) * 2020-12-09 2021-03-16 天津狮拓信息技术有限公司 Log monitoring method and device and server

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070239799A1 (en) * 2006-03-29 2007-10-11 Anirudh Modi Analyzing log files
CN110309130A (en) * 2018-03-21 2019-10-08 中国人民财产保险股份有限公司 A kind of method and device for host performance monitor
CN109800223A (en) * 2018-12-12 2019-05-24 平安科技(深圳)有限公司 Log processing method, device, electronic equipment and storage medium
CN110990218A (en) * 2019-11-22 2020-04-10 深圳前海环融联易信息科技服务有限公司 Visualization and alarm method and device based on mass logs and computer equipment
CN111459782A (en) * 2020-04-02 2020-07-28 网易(杭州)网络有限公司 Method and device for monitoring business system, cloud platform system and server
CN112380091A (en) * 2020-11-13 2021-02-19 中国人寿保险股份有限公司 Service operation condition monitoring method and device and related equipment
CN112506743A (en) * 2020-12-09 2021-03-16 天津狮拓信息技术有限公司 Log monitoring method and device and server

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117331795A (en) * 2023-12-01 2024-01-02 南京研利科技有限公司 Service index calculation method and system
CN117331795B (en) * 2023-12-01 2024-01-26 南京研利科技有限公司 Service index calculation method and system

Similar Documents

Publication Publication Date Title
CN112162878B (en) Database fault discovery method and device, electronic equipment and storage medium
US11457029B2 (en) Log analysis based on user activity volume
US10116534B2 (en) Systems and methods for WebSphere MQ performance metrics analysis
JP5285084B2 (en) System for supporting action execution according to detection event, method for supporting action execution according to detection event, support apparatus, and computer program
CN106487574A (en) Automatic operating safeguards monitoring system
CN101632093A (en) Be used to use statistical analysis to come the system and method for management of performance fault
CN109960635B (en) Monitoring and alarming method, system, equipment and storage medium of real-time computing platform
CN113342559A (en) Diagnostic framework in a computing system
JP2014102661A (en) Application determination program, fault detection device, and application determination method
US11651271B1 (en) Artificial intelligence system incorporating automatic model updates based on change point detection using likelihood ratios
CN112380089A (en) Data center monitoring and early warning method and system
US20150326446A1 (en) Automatic alert generation
CN113360041B (en) Display method, display device, electronic equipment and storage medium
KR20070080313A (en) Method and system for analyzing performance of providing services to client terminal
US11636377B1 (en) Artificial intelligence system incorporating automatic model updates based on change point detection using time series decomposing and clustering
CN113138896A (en) Application running condition monitoring method, device and equipment
JP2015194797A (en) Omitted monitoring identification processing program, omitted monitoring identification processing method and omitted monitoring identification processor
CN113641567A (en) Database inspection method and device, electronic equipment and storage medium
CN114860563A (en) Application program testing method and device, computer readable storage medium and equipment
CN110968475A (en) Method and device for monitoring webpage, electronic equipment and readable storage medium
CN114896128A (en) Application program performance testing method and device based on block chain
CN109947615A (en) The monitoring method and device of distributed system
CN115718690A (en) Data accuracy monitoring system and method
CN110008098B (en) Method and device for evaluating operation condition of nodes in business process
US20210026755A1 (en) Method for analyzing the resource consumption of a computing infrastructure, alert and sizing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination