WO2023093394A1 - 一种基于日志的异常监测方法、系统、装置及存储介质 - Google Patents

一种基于日志的异常监测方法、系统、装置及存储介质 Download PDF

Info

Publication number
WO2023093394A1
WO2023093394A1 PCT/CN2022/126493 CN2022126493W WO2023093394A1 WO 2023093394 A1 WO2023093394 A1 WO 2023093394A1 CN 2022126493 W CN2022126493 W CN 2022126493W WO 2023093394 A1 WO2023093394 A1 WO 2023093394A1
Authority
WO
WIPO (PCT)
Prior art keywords
log
unit
output
information
log unit
Prior art date
Application number
PCT/CN2022/126493
Other languages
English (en)
French (fr)
Inventor
骆旭剑
张宙
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2023093394A1 publication Critical patent/WO2023093394A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • G06F11/3072Monitoring arrangements determined by the means or processing involved in reporting the monitored data where the reporting involves data filtering, e.g. pattern matching, time or event triggered, adaptive or policy-based reporting
    • G06F11/3082Monitoring arrangements determined by the means or processing involved in reporting the monitored data where the reporting involves data filtering, e.g. pattern matching, time or event triggered, adaptive or policy-based reporting the data filtering being achieved by aggregating or compressing the monitored data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present application relates to the technical field of communications, in particular to a log-based abnormality monitoring method, system, device and storage medium.
  • logs are used as a method to record system operation information and are used for troubleshooting and locating faults.
  • the communication network is becoming more and more modular, and the complexity of system networking is increasing, increasing the dispersion of log information and increasing the difficulty of using logs to troubleshoot faults.
  • the output of a large number of low-level logs will increase the load of input and output, and affect the normal operation of the business system. Therefore, under normal circumstances, the output of low-level logs is not turned on, which is not conducive to root cause analysis of failures.
  • Embodiments of the present application provide a log-based abnormality monitoring method, system, device, and storage medium.
  • the embodiment of the present application provides a log-based abnormality monitoring method, including the following steps: obtaining log information; performing structured processing on the log information to obtain structured log information; The information is classified and aggregated to obtain a log unit; when the log unit is abnormal, it is judged whether the abnormal type of the log unit belongs to the abnormal type output for the first time; if the abnormal type of the log unit does not belong to the abnormal type output for the first time, then output Brief information of the log unit; if the exception type of the log unit belongs to the first output exception type, then output the log unit.
  • the embodiment of the present application proposes a log-based abnormality monitoring system, including: a log collection module configured to obtain log information; a log management module configured to perform structured processing on the log information to obtain Structured log information; classifying and aggregating the structured log information to obtain a log unit; the abnormality monitoring module is configured to determine whether the abnormal type of the log unit belongs to the first output abnormality when the log unit is abnormal Type; the log output module is set to output brief information of the log unit when the exception type of the log unit does not belong to the first output exception type; when the exception type of the log unit belongs to the first output exception type, output The log unit.
  • an embodiment of the present application provides a log-based abnormality monitoring device, including: at least one processor; at least one memory configured to store at least one program; when the at least one program is processed by the at least one When the processor is executed, the at least one processor is enabled to implement the above log-based abnormality monitoring method.
  • the embodiment of the present application provides a storage medium, which stores a processor-executable program, and the processor-executable program is used to implement the above log-based anomaly monitoring method when executed by the processor .
  • Fig. 1 is a schematic flow chart of a log-based anomaly monitoring method provided by the present application
  • FIG. 2 is an architecture diagram of a business system corresponding to a log monitoring system in the related art
  • FIG. 3 is a schematic structural diagram of a log-based anomaly monitoring system provided by the present application.
  • FIG. 4 is a schematic flow diagram of an embodiment of a log-based anomaly monitoring method provided by the present application.
  • FIG. 5 is a schematic flowchart of another embodiment of a log-based anomaly monitoring method provided by the present application.
  • FIG. 6 is a schematic flowchart of another embodiment of a log-based anomaly monitoring method provided by the present application.
  • FIG. 7 is a schematic structural diagram of a log-based abnormality monitoring device provided by the present application.
  • logs In the communication system, as a method of recording system operation information, logs also perform an important function of troubleshooting and locating problems.
  • the communication network is becoming more and more modular.
  • the network of the 5G core network has more types of network elements than the 3G and 4G networks.
  • the horizontal expansion of the network The number of network elements is also gradually increasing.
  • the increase in the complexity of system networking increases the dispersion of log information, and also increases the difficulty of using logs to troubleshoot faults.
  • the log information is generally sorted and output by time, and the logs that meet the level are output indiscriminately.
  • the correlation between the log information is not enough, and there are many repeated information.
  • each module only makes log records for its own operations, without higher-level log monitoring and management.
  • the system may not have exception logs, forming a loophole in monitoring.
  • the embodiment of the present application proposes a log-based abnormality monitoring method, system, device, and storage medium.
  • This method can output the log information when the log is abnormal, which is beneficial to the root cause analysis of the fault and can meet the requirements of fault location.
  • through structured processing and classification aggregation it is helpful to alleviate the lack of correlation between log information and improve the efficiency of abnormal monitoring.
  • the implementation environment of the log-based anomaly monitoring method provided in the embodiment of the present application may include a terminal and a server.
  • the terminal can be a user-side device, which can install and run application programs.
  • the resource package of the application program may include program codes for collecting log information, and the terminal may report the log information generated during the running of the application program to the server.
  • the terminal can also report the log information generated by itself during operation to the server.
  • the above-mentioned server may be a background server, a test server, etc. corresponding to the application program, and the server may perform data processing on the log information reported by the terminal, and monitor the abnormality of the corresponding system through the log information.
  • the abnormality monitoring method provided in this application can be applied to any system in a terminal or a server, and this application does not limit a specific application environment.
  • the above-mentioned terminals can be smartphones, tablet computers, e-book readers, MP3 (Moving Picture Experts Group Audio Layer III, moving picture experts compression standard audio layer 3) players, MP4 (Moving Picture Experts Group Audio Layer IV, moving picture experts Compress standard audio level 4) Players, laptops, desktops, smart speakers, smart watches, etc.
  • the above-mentioned server can be an independent physical server, or a server cluster or distributed system composed of multiple physical servers, and can also provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, Cloud servers for basic cloud computing services such as middleware services, domain name services, security services, CDN (Content Delivery Network, content distribution network), and big data and artificial intelligence platforms.
  • the terminal and the server can be connected through a wired network or a wireless network, so that data exchange can be performed between the terminal and the server.
  • a wired network or a wireless network so that data exchange can be performed between the terminal and the server.
  • the number of the above-mentioned terminal may be only one, or the number of the above-mentioned terminal may be tens or hundreds, or more. This application does not limit the number of terminals and device types.
  • the log-based anomaly monitoring method provided by the embodiment of the present application can be combined with various application scenarios.
  • the embodiment of the present application can be applied when monitoring the business system for abnormalities technical solutions provided. Collect the log information generated by the communication process between each terminal and the server, and process the log information. When the log information is abnormal, output the log information, which is conducive to the analysis of the root cause of the fault and can meet the needs of fault location. .
  • a log-based abnormality monitoring method is proposed in the embodiment of the present application, which mainly includes the following steps:
  • the log information may be generated by a system in the terminal, may also be generated by a system in the server, or may be generated by a related system communicating between the terminal and the server.
  • This method first obtains the log information.
  • the log information may include session ID, transaction ID, module ID, log point ID, session start ID, current time, etc.
  • the log information may also include information such as operation ID, log source ID, etc. This application does not limit specific log information Content.
  • the session identifier is used to mark a certain transaction session; the transaction identifier is used to mark the transaction type, for example, such as processing session tickets, processing timer messages, etc.; the module identifier is used to mark the module name or number of the output log information; the log The dot identifier is used to mark the code point of the program output log. For example, it can be a value obtained by hashing the file name + line number; the session start identifier is used to mark the start of a certain transaction session; the current time is used to mark The local time when the log is submitted; the log information can also include other business-related information recorded in the log.
  • those skilled in the art may acquire log information periodically or in a planned manner according to actual needs, and for different systems, adopt different implementation schemes to acquire log information.
  • S102 Perform structured processing on the log information to obtain structured log information
  • structured log information is processed to obtain structured log information.
  • Structured information refers to information that can be represented by data or a unified structure.
  • log information can be processed according to set rules to obtain structured log information.
  • the log information generated by the business system is converted into a data format of "log name, location, time, and generating module name", and the specified characters can be used as divisions between different fields.
  • the above data format is only an exemplary description, and does not constitute a specific limitation on the manner of structured processing of log information, and the structured processing of log information can be performed in other ways.
  • structured log information facilitates the establishment of associations between log information and alleviates the phenomenon of redundant and disorganized log information. This application does not limit the method adopted when the log information is structured, nor does it limit the specific expression form of the structured log information.
  • structured log information is classified and aggregated to obtain a log unit.
  • structured log information can be classified according to set identifiers, structured log information can also be classified according to set attributes, and structured log information can also be classified according to set purposes Classify, and then aggregate to get the log unit.
  • set identifiers structured log information can also be classified according to set attributes
  • structured log information can also be classified according to set purposes Classify, and then aggregate to get the log unit.
  • S104 When the log unit is abnormal, determine whether the abnormal type of the log unit belongs to the abnormal type output for the first time;
  • the abnormal type of the log unit belongs to the abnormal type output for the first time. That is to judge whether the current abnormal log unit belongs to the same abnormal type as the output abnormal log unit.
  • anomalies whose cause and result are the same can be regarded as the same type of anomalies.
  • those skilled in the art can judge whether it belongs to the abnormal type output for the first time by adding the time attribute, that is, judge whether the abnormal type of the log unit belongs to the abnormal type output for the first time within a period of time.
  • the setting of a period of time may be real time, a fixed local time period, or a non-fixed time period. According to the frequency of use or importance of the system in different time periods, time periods of different interval lengths are set to meet system requirements and provide diversified options.
  • the brief information of the log unit is output.
  • the brief information of the log unit is used to represent information such as attributes, identifiers or fields of the log unit, so that users, administrators or systems can find the log unit through the brief information.
  • the brief information of the log unit may be some set attributes representing the log unit, or some setting information representing the log unit, or some The logo is set, and the present application does not limit the specific expression form of the brief information of the log unit.
  • the log unit when the abnormal type of the log unit belongs to the abnormal type output for the first time, the log unit is output, that is, for the daily type that appears for the first time, all log information of the abnormal log unit is output, which is beneficial to the root cause of the failure analyze.
  • the current log unit has the same transaction identifier and the same exception identifier as the log unit of the historical output, it is determined that the exception type of the current log unit does not belong to the first output exception type;
  • the current log unit has the same transaction ID, the same log point ID, and the same exception ID as the log unit of the historical output, then it is determined that the exception type of the current log unit does not belong to the first output exception type;
  • the log unit has the first output identifier, it is determined that the exception type of the log unit does not belong to the first output exception type.
  • the method of judging whether it belongs to the abnormal type output for the first time can be flexibly selected according to the actual situation.
  • the abnormal identification of the log unit output historically (referred to as the historical abnormal identification) can be stored in the database, so that when making a specific judgment, the abnormal identification of the current log unit (abbreviated as the current abnormal identification) can be stored in the database.
  • the exceptions identify the same log unit, which can be regarded as the same type of exception log at the system level.
  • the method for determining whether the types of exception logs are the same can be set as follows: for log units with the same transaction ID, if the exception ID is also the same, it can be considered as the same type of exception log unit at the transaction level .
  • the method for determining whether the types of abnormal logs are the same can be set as: for log units with the same transaction ID and the same log point ID, if the exception If the flags are also the same, it can be considered as the same type of exception log at the module level.
  • the transaction level for the case where the current log unit and the log unit of historical output have the same exception ID and have the same transaction ID; or, at the module level, for the current log unit and the log unit of historical output have the same
  • the exception identifier the same transaction identifier and the same log point identifier
  • the first output identifier is used to identify the abnormal type output for the first time, and can be specifically identified by numerical value, field or text.
  • the first output identifier can be marked by a specific value, 0 represents an abnormal type that does not belong to the first output, and 1 represents an abnormal type that belongs to the first output.
  • 0 represents an abnormal type that does not belong to the first output
  • 1 represents an abnormal type that belongs to the first output.
  • Those skilled in the art may mark the first output identifier in other ways, and this application does not limit the specific marking method of the first output identifier. It can be understood that by establishing the first output identifier, the user, administrator or system can more conveniently and directly judge whether the abnormal type of the log unit belongs to the abnormal type output for the first time, and improve the monitoring efficiency of the system.
  • the log-based abnormality monitoring method in the embodiment of the present application also includes a step of judging whether the log unit is abnormal, and the step includes one of the following:
  • the log unit includes an exception identifier, then determine that the log unit is abnormal
  • the log unit if the log unit lacks the session end identifier, it is determined that the log unit is abnormal;
  • the session time of the log unit is greater than the preset session time threshold, it is determined that the log unit is abnormal
  • the session is used to represent the time and operation space from the beginning to the end of the transaction.
  • a step of judging whether the log unit is abnormal is proposed.
  • the log unit includes an exception identifier
  • the log unit can be considered to be abnormal.
  • the log unit can be abnormal by marking the above two abnormalities as abnormal judgment processing.
  • the preset session threshold it can be comprehensively set according to the transaction duration of the system and the busyness of the system.
  • the abnormal judgment methods of the above three log units are only partial examples, and do not constitute specific restrictions on the abnormal judgment methods of the log unit. Those skilled in the art can understand that the abnormal log can also be detected in other ways. judge.
  • the log-based anomaly monitoring method in the embodiment of the present application includes the following steps:
  • the log information satisfying the preset condition is combined to obtain a log unit.
  • the preset condition may be that the log information has the same attribute, or that the log information has the same description, or that the log information has the same information, and so on.
  • the above three preset conditions are only partial illustrations, and do not constitute specific restrictions on the judgment and operation of merging log information according to the preset conditions, and other In this way, the judgment and operation of merging log information are performed.
  • the specific merging method may be merging logs with the same content into interconnected information or data, for example, merging logs with the same content into a character string whose header and tail fields are connected,
  • This application does not limit the specific combination method. Merge several log information that meet the preset conditions, and only output once when the log is abnormal, which is beneficial to reduce the number of judgments by users, administrators or systems, and improve the efficiency of abnormal monitoring.
  • the structured log information has a session identifier, and the preset condition is to have the same session identifier; several log information satisfying the preset condition are combined to obtain a log unit, including :
  • the preset arrangement may be arranged in chronological order, or arranged according to the name of the module that generates the log, and this application does not limit the specific arrangement.
  • the brief information of the log unit includes the association description information of the log unit, and the association description information of the log unit is used to describe the association relationship between the current log unit and the historical output log unit; If the abnormal type of the log unit does not belong to the abnormal type output for the first time, the step S105 of outputting the brief information of the log unit includes:
  • the associated description information of the current log unit is output.
  • the association description information between the current log unit and the historical output log unit needs to be output. If the user, administrator or system When you need to know the information of the abnormal log unit, you can search through the associated description information, which can not only meet the needs of fault finding and locating, but also help reduce the amount of output log information and improve system performance.
  • the association description information may be a mark representing a connection relationship with a log unit of historical output, for example, a pointer, a character mark, and the like.
  • the present application does not limit the specific expression form of the association description information, and those skilled in the art may choose other forms to express the connection relationship between the current log unit and the historically output log unit as required.
  • the brief information of the log unit includes the preset identification of the log unit, and if the abnormal type of the log unit does not belong to the abnormal type output for the first time, the step of outputting the brief information of the log unit S105, including:
  • the preset identifier of the log unit is output.
  • the preset identifier may be an identifier specified by a user, an administrator or a system, it may be information that characterizes the relevant attributes of the log unit of the exception, or it may be the relevant description of the log unit that characterizes the exception information. It should be understood by those skilled in the art that the foregoing is only an exemplary example of the preset identifier, and is not intended to limit a specific expression form of the preset identifier. Those skilled in the art can check the specific information of the abnormal log unit through the preset identification, which is beneficial to fault location and root cause analysis.
  • the log information includes first log information and second log information, and the log information is generated through the following steps:
  • the second log information is obtained by updating the first log information, and the transaction is used to represent a series of operations when processing messages or data.
  • log information can be generated during the transaction process.
  • the first log information is generated, and at the exit of the transaction processing, the updated first log information, that is, the second log information is generated.
  • the log information can represent system-level transactions, and also includes system-level exception log information, which is conducive to improving the monitoring scope of anomaly monitoring and improving the practicability of the anomaly monitoring method.
  • the output alarm information is added to remind the user, administrator or system to pay attention to troubleshooting, so as to locate the fault in time and restore the operation of the system.
  • an alarm message is output.
  • it can also be set to output the alarm information when the abnormal type output for the first time exceeds the preset times threshold.
  • it can also be set as required to output an alarm message when the number of occurrences of a certain type of abnormality exceeds a preset number threshold.
  • the setting of specific alarm information output conditions can be selected according to actual needs.
  • the preset threshold value can be set after comprehensive consideration of system usage, usage frequency, and importance.
  • a time attribute may be added to count the number of occurrences of log unit exceptions within a period of time.
  • deleting the normal log unit is beneficial to reduce the load on the system and improve the performance of the system.
  • the log-based anomaly monitoring method in the embodiment of the present application is executed by an anomaly monitoring system.
  • the anomaly monitoring system includes a first log management module, a second log management module, an anomaly monitoring module, and a log output module, and performs structured log information Classification and aggregation to obtain log units, including:
  • the abnormality monitoring module determines that the log unit is abnormal, the abnormality monitoring module sends an output request to the log output module;
  • the first log unit sent by the first log management module is obtained by the log output module
  • the second log sent by the second log management module is obtained by the log output module unit, and then the log output module performs secondary aggregation of the first log unit and the second log unit to obtain the log unit;
  • the log output module sends a secondary aggregation request to the first log management module or the second log management module, so that the first log management module or the second log management module
  • the log management module performs secondary aggregation of the first log unit and the second log unit to obtain the log unit.
  • a distributed cache may be used to classify and aggregate log information.
  • the log information can be classified and aggregated through the first log management module and the second log management module to improve the system response speed.
  • the abnormality monitoring module determines that the log unit is abnormal, it will perform secondary aggregation, which will help reduce the amount of messages and improve system performance.
  • the abnormality monitoring module sends an output request to the log output module, and the log output module obtains the first log unit sent by the first log management module, and obtains The second log unit sent by the second log management module, and then through the log output module, the above-mentioned first log unit and the second log unit are aggregated twice to obtain the log unit, and then the abnormal type is judged, and then according to the above-mentioned introduction
  • the output mode outputs the log unit.
  • the anomaly monitoring module sends an output request to the log output module, and the log output module sends a secondary aggregation request to the first log management module or the second log management module, so that the first log management module or The second log management module performs secondary aggregation of the first log unit and the second log unit to obtain the log unit, and the log output module outputs the log unit.
  • the second aggregation operation may be performed by the first log management module, or the second aggregation operation may be performed by the second log management module, or the first log management module and the second log management module may jointly Carry out the second polymerization operation.
  • the aggregation process is: the log output module sends a second aggregation request to the first log management module, and the second aggregation request carries the second log unit
  • the first log management module can perform secondary aggregation of the first log unit and the second log unit to obtain the log unit, and return the log unit to the log output module to output the log unit.
  • the log-based abnormality monitoring method proposed in the embodiment of the present application obtains log information, performs structured processing and classification and aggregation of log information, and obtains a log unit; when the log unit is abnormal, judges the abnormal type of the log unit Whether it belongs to the first output exception type; if it does not belong to the first output exception type, then output the brief information of the log unit; if it belongs to the first output exception type, then output the log unit.
  • the system includes a log collection module, a log management module, an abnormality monitoring module and a log output module.
  • the log information can be output when the log is abnormal, which is conducive to the analysis of the root cause of the fault and can meet the needs of fault location; at the same time, it is beneficial to alleviate the lack of correlation between the log information, Improve the efficiency of anomaly monitoring.
  • FIG. 2 An architecture diagram of a business system corresponding to a log-based anomaly monitoring system proposed by the present invention is shown in FIG. 2 .
  • the exception monitoring system is part of the log management subsystem.
  • the log management subsystem is an integral part of the operation maintenance management (OAM) function of the service system (network element).
  • the service system is composed of several service nodes (exemplarily, as shown in service node 1, service node 2 and service node 3 in Fig. , such as business module 11, business module 12, business module 13, and business module 32 in Figure 2), each business module outputs log information to the log management subsystem, and the abnormality monitoring system can obtain the log information, through the above-mentioned
  • the anomaly monitoring method monitors the system for anomalies. It can be understood that the number of service nodes and corresponding service modules shown in the figure is exemplary, and those skilled in the art can adjust the specific number according to actual needs.
  • FIG. 3 Based on the system architecture of FIG. 2, a schematic structural diagram of a log-based anomaly monitoring system according to an embodiment of the present application is proposed. As shown in FIG. 3, the system includes:
  • the log collection module 310 is configured to obtain log information
  • the log management module 320 is configured to perform structured processing on the log information to obtain structured log information; classify and aggregate the structured log information to obtain log units;
  • the abnormal monitoring module 330 is configured to determine whether the abnormal type of the log unit belongs to the abnormal type output for the first time when the log unit is abnormal;
  • the log output module 340 is configured to output the brief information of the log unit when the abnormal type of the log unit does not belong to the abnormal type output for the first time; and output the log unit when the abnormal type of the log unit belongs to the abnormal type output for the first time.
  • the abnormal alarm module is configured to count the number of abnormal occurrences of the log unit when the log unit is abnormal; and output alarm information when the abnormal occurrence times of the log unit are greater than a preset times threshold.
  • Example 1 As shown in Figure 4, taking a single business module as an example, the log-based anomaly monitoring method proposed in this application is described. The method includes the following steps 401-410:
  • Step 401 when the transaction starts, the business module registers the session with the session management module, and the session management module generates log information, the log information includes the session ID, and the session management module is also configured to ensure the uniqueness of the session ID in the business system.
  • Step 402 The business module submits the log information of the start of the transaction.
  • Step 402 can be divided into step 402a and step 402b:
  • Step 402a In the subsequent process of the transaction, the system generates subsequent log information at the passed log point, and the log collection module acquires the log information. Exemplarily, if it is an exception log, the log information will carry an exception flag.
  • Step 402b When the transaction processing ends, the system generates session end log information and sends the information to the log collection module. At the same time, the session end flag is used to indicate the end of a certain transaction session.
  • Step 403 The log collection module obtains the above log information.
  • Step 404 and Step 405 the log collection module caches the above log information to the log management module, so that the log management module classifies and aggregates the above log information to form a log unit.
  • Step 406 and Step 407 The abnormality monitoring module judges the log unit. When the log unit is abnormal, it judges whether the abnormal type of the log unit belongs to the abnormal type output for the first time, and the log output module is set to output the log unit.
  • Step 407 and step 408 can be subdivided into the following steps:
  • Step 407a and Step 408a if the abnormal type of the log unit does not belong to the abnormal type output for the first time, then output the brief information of the log unit;
  • Step 407b and Step 408b if the abnormal type of the log unit belongs to the abnormal type output for the first time, output the complete information of the log unit; if the log unit is normal, delete the log unit.
  • Step 409 The log management module counts the occurrence times of abnormal log units within a certain time range.
  • Step 410 If the number of occurrences of the abnormal log unit exceeds the preset number threshold, report an alarm, and the alarm carries specific abnormal information.
  • Example 2 As shown in Figure 5, taking the multi-service module as an example, the log information is processed in a centralized cache manner.
  • the method includes the following steps 501-510:
  • Step 501 When a transaction starts, the first service module registers a session with the session management module.
  • Step 502 The first business module submits the log information of transaction start.
  • Step 502 can be divided into step 502a and step 502b:
  • Step 502a Following the flow of transaction processing, the second business module submits the log information in the transaction.
  • Step 502b When the transaction processing ends, the first business module submits the log information of the transaction end.
  • Step 503 The log collection module collects the above log information.
  • Steps 504 to 510 are the same as steps 404 to 410 in Example 1.
  • any one of the first business module and the second business module can be executed by any one of the first business module and the second business module.
  • any one of the second business modules generates log information, and sends the log information to the log collection module.
  • the pressure of the system to generate log information when there are many log information is relieved, the system response speed is improved, and the user experience is improved.
  • Example 3 As shown in Figure 6, taking the multi-service module, the log information is processed in a distributed cache manner, and the log unit is obtained by secondary aggregation by the first log management module or the second log management module.
  • the method includes the following steps 601-610:
  • Step 601 When a transaction starts, the first service module registers a session with the session management module.
  • Step 602 The first business module submits the transaction start log information to the first log collection module.
  • Step 602 can be divided into step 602a and step 602b:
  • Step 602a Following the transaction process, the second business module submits the log information in the transaction to the second log collection module.
  • Step 602b When the transaction processing ends, the first business module submits the log information of the transaction end to the first log collection module.
  • Step 603 and step 604 can be subdivided into the following steps:
  • Step 603a and Step 604a the first log collection module acquires the above log information, and caches the above log information to the first log management module;
  • Step 603b and Step 604b the second log collection module acquires the above log information, and caches the above log information to the second log management module.
  • Step 605 can be subdivided into the following steps:
  • Step 605a The first log management module performs structured processing, classification and aggregation on the log information to obtain the first log unit.
  • Step 605b The second log management module performs structured processing, classification and aggregation on the log information to obtain a second log unit.
  • Step 606 The abnormality monitoring module judges the abnormality of the log unit for the first log unit and the second log unit.
  • Step 607 If the log unit is abnormal, the log output module sends an output request.
  • Step 608a and Step 608b After receiving the above output request, the first log management module or the second log management module performs a secondary aggregation operation on the first log unit and the second log unit to obtain the log unit.
  • Step 609 The log output module outputs the log unit after receiving the log unit.
  • Step 610 to Step 611 The abnormal log unit is processed according to the above abnormal alarm processing method, and if the preset threshold is reached, an alarm message is output.
  • multiple business modules, multiple log collection modules and multiple log management modules can be arranged in the system. Any one of the log collection modules is classified and aggregated by multiple log management modules to obtain a log unit.
  • By laying out multiple business modules, multiple log collection modules, and multiple log management modules it is beneficial to improve the operating performance of the system and meet the time performance requirements for abnormal monitoring.
  • the numbers of business modules, log collection modules, and log management modules in the above three examples are exemplary, and this application does not limit the number of business modules, log collection modules, and log management modules. specific number.
  • the above three examples are exemplary examples, and are not intended to limit the specific implementation manners of the present application.
  • the embodiment of the present application provides a log-based abnormality monitoring device, including:
  • At least one processor 710 At least one processor 710;
  • At least one memory 720 configured to store at least one program
  • the at least one processor 710 When the at least one program is executed by the at least one processor 710, the at least one processor 710 is enabled to implement the log-based anomaly monitoring method.
  • the content in the above-mentioned method embodiment is applicable to this device embodiment.
  • the functions realized by this device embodiment are the same as those of the above-mentioned method embodiment, and the beneficial effects achieved are the same as those achieved by the above-mentioned method embodiment. Also the same.
  • the embodiment of the present application provides a log-based anomaly monitoring method.
  • the method acquires log information, performs structured processing, classification and aggregation of log information to obtain a log unit, and when the log unit is abnormal, judges whether the abnormal type of the log unit belongs to The exception type output for the first time; if the exception type of the log unit does not belong to the exception type output for the first time, then output the brief information of the log unit; if the exception type of the log unit belongs to the exception type output for the first time, output the log unit.
  • This method can output the log information when the log is abnormal, which is beneficial to the root cause analysis of the fault and can meet the needs of fault location; at the same time, through structured processing and classification aggregation, it is beneficial to alleviate the lack of correlation between log information , to improve the efficiency of anomaly monitoring.
  • the functions/operations noted in the block diagrams may occur out of the order noted in the operational diagrams.
  • two blocks shown in succession may, in fact, be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending upon the functionality/operations involved.
  • the embodiments presented and described in the flowcharts of this application are provided by way of example for the purpose of providing a more comprehensive understanding of the technology. The disclosed methods are not limited to the operations and logical flow presented herein. Embodiments are contemplated in which the order of various operations is changed and in which sub-operations described as part of larger operations are performed independently.
  • the functions described above are realized in the form of software function units and sold or used as independent products, they can be stored in a computer-readable storage medium.
  • the technical solution of the present application is essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several programs are used to make a computer device (which may be a personal computer, server, or network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disc, etc., which can store program codes. .
  • a "computer-readable medium” may be any device that can contain, store, communicate, propagate, or transmit a program for use in or in conjunction with a program execution system, device, or device.
  • computer-readable media include the following: electrical connection with one or more wires (electronic device), portable computer disk case (magnetic device), random access memory (RAM), Read Only Memory (ROM), Erasable and Editable Read Only Memory (EPROM or Flash Memory), Fiber Optic Devices, and Portable Compact Disc Read Only Memory (CDROM).
  • the computer-readable medium may even be paper or other suitable medium on which the program can be printed, as it may be possible, for example, by optically scanning the paper or other medium, followed by editing, interpreting, or other suitable processing if necessary.
  • the program is processed electronically and stored in computer memory.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Computer Hardware Design (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

本申请提出了一种基于日志的异常监测方法、系统、装置及存储介质。该方法通过获取日志信息(S101),将日志信息进行结构化处理(S102)和分类聚合,得到日志单元(S103);当日志单元异常,判断日志单元的异常类型是否属于首次输出的异常类型(S104);若不属于首次输出的异常类型,则输出日志单元的简要信息(S105);若属于首次输出的异常类型,则输出日志单元(S106)。该系统包括日志采集模块(310)、日志管理模块(320)、异常监测模块(330)和日志输出模块(340)。

Description

一种基于日志的异常监测方法、系统、装置及存储介质
相关申请的交叉引用
本申请基于申请号为202111421618.1、申请日为2021年11月26日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本申请涉及通信技术领域,尤其是一种基于日志的异常监测方法、系统、装置及存储介质。
背景技术
在通信系统中,日志作为记录系统运行信息的一种方法,用于排查故障及故障定位。随着虚拟化的发展,通信网络越来越趋于模块化,系统组网复杂性不断增加,增加了日志信息的分散度,也加大了用日志来排查故障的难度。相关技术中的日志系统,大量低级别日志的输出,会增大输入输出的负荷,影响业务系统的正常运行。因此,通常情况下,不会打开低级别日志的输出,从而不利于故障的根因分析。同时,日志信息之间缺乏关联性,重复信息较多。
发明内容
本申请实施例提供一种基于日志的异常监测方法、系统、装置及存储介质。
一方面,本申请实施例提供了一种基于日志的异常监测方法,包括以下步骤:获取日志信息;将所述日志信息进行结构化处理,得到结构化的日志信息;将所述结构化的日志信息进行分类聚合,得到日志单元;当所述日志单元异常,判断所述日志单元的异常类型是否属于首次输出的异常类型;若所述日志单元的异常类型不属于首次输出的异常类型,则输出所述日志单元的简要信息;若所述日志单元的异常类型属于首次输出的异常类型,则输出所述日志单元。
另一方面,本申请实施例提出了一种基于日志的异常监测系统,包括:日志采集模块,被设置为获取日志信息;日志管理模块,被设置为将所述日志信息进行结构化处理,得到结构化的日志信息;将所述结构化的日志信息进行分类聚合,得到日志单元;异常监测模块,被设置为当所述日志单元异常,判断所述日志单元的异常类型是否属于首次输出的异常类型;日志输出模块,被设置为当所述日志单元的异常类型不属于首次输出的异常类型,输出所述日志单元的简要信息;当所述日志单元的异常类型属于首次输出的异常类型,输出所述日志单元。
另一方面,本申请实施例提供了一种基于日志的异常监测装置,包括:至少一个处理器;至少一个存储器,被设置为存储至少一个程序;当所述至少一个程序被所述至少一个处理器执行时,使得所述至少一个处理器实现上述的基于日志的异常监测方法。
另一方面,本申请实施例提供了一种存储介质,其中存储有处理器可执行的程序,所述处理器可执行的程序在由处理器执行时用于实现上述的基于日志的异常监测方法。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面对本申请实施例中的相关技术方案附图作以下介绍,应当理解的是,下面介绍中的附图仅仅为了方便清晰表述本申请的技术方案中的部分实施例,对于本领域的技术人员来说,在无需付出创造性劳动的前提下,还可以根据这些附图获取到其他附图。
图1为本申请提供的一种基于日志的异常监测方法的流程示意图;
图2为相关技术中的日志监测系统对应的业务系统的架构图;
图3为本申请提供的一种基于日志的异常监测系统的结构示意图;
图4为本申请提供的一种基于日志的异常监测方法的一种实施例的流程示意图;
图5为本申请提供的一种基于日志的异常监测方法的另一种实施例的流程示意图;
图6为本申请提供的一种基于日志的异常监测方法的另一种实施例的流程示意图;
图7为本申请提供的一种基于日志的异常监测装置的结构示意图。
具体实施方式
下面详细描述本申请的实施例,所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,仅用于解释本申请,而不能理解为对本申请的限制。对于以下实施例中的步骤编号,其仅为了便于阐述说明而设置,对步骤之间的顺序不做任何限定,实施例中的各步骤的执行顺序均可根据本领域技术人员的理解来进行适应性调整。
在通信系统中,日志作为记录系统运行信息的一种方法,同时行使着排查故障定位问题的重要功能。随着虚拟化的发展,通信网络越来越趋于模块化,示例性地,如5G核心网的网络,与3G和4G网络相比,网元类型更多,同时随着网络的水平扩展能力的提升,网元的数量也逐渐增多。系统组网复杂性的增加,增加了日志信息的分散度,也加大了用日志来排查故障的难度。
此外,现有的日志系统,大多数采用分级打印机制,通常分成错误、警告、信息、调试等级别,级别越低,记录信息越多,对定位故障越有利,但是日志存储和输入输出量也会增 加。所以一般商用系统只开启高级别日志。当系统发生故障,需要定位问题时,再打开低级别日志,因此对于偶发性问题,故障定位缺乏时效性,同时低级别日志的打开时间难以确定。另外,大量低级别日志的输出,会增加系统输入输出的负荷,影响业务系统的正常运行,所以商用环境也不适合打开低级别日志输出。
日志信息一般按时间排序输出,满足级别的日志无差别输出,日志信息之间的关联性不够,重复信息较多。同时,每个模块只针对自身操作做日志记录,没有更高级别的日志监控管理,对于模块间、节点间的通信或处理异常,系统可能就不存在异常日志,形成监测的漏洞。
对此,本申请实施例提出一种基于日志的异常监测方法、系统、装置及存储介质,该方法能够在日志出现异常时输出该日志信息,有利于故障的根因分析,能够满足故障定位的需求;同时,通过结构化处理和分类聚合,有利于缓解日志信息之间缺乏关联性的情况,提升异常监测的效率。下面详细介绍本申请实施例提出的技术方案。
本申请实施例提供的一种基于日志的异常监测方法的实施环境可以包括终端和服务器。该终端可以为用户侧设备,可以安装和运行有应用程序。该应用程序的资源包中可以包括用于采集日志信息的程序代码,该终端可以将应用程序在运行过程中所产生的日志信息上报至服务器。该终端也可将自身在运行过程中所产生的日志信息上报至服务器。上述服务器可以是该应用程序对应的后台服务器、测试服务器等,该服务器可以对该终端上报的日志信息进行数据处理,并通过该日志信息对相应的系统进行异常监测。本申请提供的异常监测方法,可以应用于终端或服务器中的任何系统,本申请并不限定具体的应用环境。
上述终端可以是智能手机、平板电脑、电子书阅读器、MP3(Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面3)播放器、MP4(Moving Picture Experts Group Audio Layer IV,动态影像专家压缩标准音频层面4)播放器、膝上型便携计算机、台式计算机、智能音箱、智能手表等。上述服务器可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统,还可以是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、CDN(Content Delivery Network,内容分发网络)、以及大数据和人工智能平台等基础云计算服务的云服务器。终端和服务器之间可以通过有线网络或无线网络相连,使终端和服务器之间可以进行数据交互。本领域技术人员可以知晓,上述终端的数量可以仅为一个,或者上述终端为几十个或几百个,或者更多数量。本申请对终端的数量和设备类型不加以限定。
本申请实施例提供的基于日志的异常监测方法可以与多种应用场景相结合,例如,在若 干终端与服务器进行通信的业务系统中,对该业务系统进行异常监控时,可以应用本申请实施例提供的技术方案。采集各个终端与服务器之间的通信过程所产生的日志信息,并对该日志信息进行处理,当日志信息存在异常时,输出该日志信息,有利于故障的根因分析,能够满足故障定位的需求。
下面参照附图详细描述根据本申请实施例提出的一种基于日志的异常监测方法和系统,首先将参照附图描述本申请实施例提出的一种基于日志的异常监测方法。
参照图1,本申请实施例中提出一种基于日志的异常监测方法,该方法主要包括以下步骤:
S101:获取日志信息;
本申请实施例中,日志信息可以由终端中的系统产生,也可以由服务器中的系统产生,还可以由终端与服务器之间通信的相关系统产生,本方法首先获取日志信息。该日志信息可以包括会话标识、事务标识、模块标识、日志点标识、会话开始标识、当前时间等,该日志信息还可以包括操作标识,日志来源标识等信息,本申请并不限定具体的日志信息的内容。其中,会话标识用于标记某一次事务会话;事务标识用于标记事务类型,示例性地,如处理会话单、处理定时器消息等;模块标识用于标记输出日志信息的模块名称或编号;日志点标识用于标记程序输出日志的代码点,示例性地,可以是文件名+行号经过哈希运算得到一个值;会话开始标识,用于标记某个事务会话开始;当前时间,用于标记日志提交时的本地时间;日志信息还可以包括在日志中记录的其它业务相关信息。当然,本领域技术人员可以根据实际需要,周期性地或者计划性地获取日志信息,对于不同的系统,采用不同的实施方案获取日志信息。
S102:将日志信息进行结构化处理,得到结构化的日志信息;
本申请实施例中,将日志信息进行结构化处理,得到结构化的日志信息。结构化信息即能够用数据或统一的结构加以表示的信息,示例性地,可以将日志信息按照设定规则处理,得到结构化的日志信息。在一些可能的实施方式中,将业务系统产生的日志信息转化为“日志名称、位置、时间、产生模块名称”的数据格式,可以通过规定的字符作为不同的字段之间的划分。本领域技术人员可以理解的是,以上数据格式仅是示例性的说明,并不构成对日志信息的结构化处理的方式的具体限定内容,可以通过其它的方式,进行日志信息的结构化处理。通过将日志信息进行结构化处理,简化日志解析,使得日志的后续处理、分析或查询变得方便高效,提高异常监测的效率。同时,结构化的日志信息,方便建立日志信息之间的关联,缓解日志信息冗余、杂乱无章的现象。本申请并不限定对日志信息进行结构化处理时 所采用的方法,也不限定结构化的日志信息的具体表现形式。
S103:将结构化的日志信息进行分类聚合,得到日志单元;
本申请实施例中,将结构化的日志信息进行分类聚合,得到日志单元。在一些可能的实施方式中,可以将结构化的日志信息按照设定标识进行分类,也可以将结构化的日志信息按照设定属性进行分类,还可以将结构化的日志信息按照设定用途进行分类,然后聚合得到日志单元。可以理解的是,上述方式是示例性的列举,并不限制分类聚合的具体方式方法。通过将日志信息进行分类聚合,有利于提升日志信息之间的关联度,提升异常监测的效率,同时,便于用户进行故障定位。
S104:当日志单元异常,判断日志单元的异常类型是否属于首次输出的异常类型;
本申请实施例中,对于日志单元存在异常的情况,判断日志单元的异常类型是否属于首次输出的异常类型。即判断当前异常的日志单元,是否与已输出的异常的日志单元属于同种异常类型。在一些可能的实施方式中,可以将出现异常的原因和结果均相同的异常视为同种类型的异常。同样,本领域技术人员可以通过增加时间属性,进行是否属于首次输出的异常类型的判断,即判断日志单元的异常类型在一段时间内是否属于首次输出的异常类型。对于一段时间的设定,可以是实时的时间,也可以是固定的当地时间段,还可以是非固定的时间段。根据系统在不同时间段内的使用频率或重要程度,设定不同区间长度的时间段,满足系统需求的同时,提供多样化的选择。
S105:若日志单元的异常类型不属于首次输出的异常类型,则输出日志单元的简要信息;
本申请实施例中,对于日志单元的异常类型不属于首次输出的异常类型的情况,输出日志单元的简要信息。其中,日志单元的简要信息用于表征日志单元的属性、标识或字段等信息,使得用户、管理员或系统可以通过该简要信息找到该日志单元。在一些可能的实施方式中,日志单元的简要信息可以是表征该日志单元的某些设定属性,也可以是表征该日志单元的某些设定信息,还可以是表征该日志单元的某些设定标识,本申请并不限定日志单元的简要信息的具体表现形式。通过输出日志单元的简要信息,减少日志单元的输出信息量,有利于减轻系统的负荷,同时,对于相同异常类型的日志单元,只输出一次日志单元的完整信息,便于用户、管理员或系统进行故障定位,降低故障定位的难度。
S106:若日志单元的异常类型属于首次输出的异常类型,则输出日志单元。
本申请实施例中,对于日志单元的异常类型属于首次输出的异常类型的情况,输出日志单元,即对于首次出现的日常类型,输出该异常的日志单元的所有日志信息,有利于故障的根因分析。
本申请实施例中的基于日志的异常监测方法,对于判断日志单元的异常类型是否属于首次输出的异常类型的方法,包括以下之一:
若当前日志单元与历史输出的日志单元具有相同的异常标识,则确定当前日志单元的异常类型不属于首次输出的异常类型;
或者,若当前日志单元与历史输出的日志单元具有相同的事务标识且具有相同的异常标识,则确定当前日志单元的异常类型不属于首次输出的异常类型;
或者,若当前日志单元与历史输出的日志单元具有相同的事务标识、具有相同的日志点标识且具有相同的异常标识,则确定当前日志单元的异常类型不属于首次输出的异常类型;
或者,若日志单元具有第一输出标识,则确定日志单元的异常类型不属于首次输出的异常类型。
本步骤中,可以根据实际的情况灵活选取是否属于首次输出的异常类型的判断方式。
在一些可能的实施方式中,对于当前日志单元与历史输出的日志单元具有相同的异常标识的情况,可以确定当前日志单元的异常类型不属于首次输出的异常类型。在一些可能的实施方式中,可以将历史输出的日志单元的异常标识(简称历史异常标识)存储到数据库中,这样在具体的判断时,就可以将当前日志单元的异常标识(简称当前异常标识)与数据库中的历史异常标识进行比较,若通过比较发现数据库中存在与当前异常标识相同的历史异常标识,则判定当前日志单元的异常类型不属于首次输出的异常类型;若数据库中不存在与当前异常标识相同的历史异常标识,则判定当前日志单元的异常类型属于首次输出的异常类型。
在一些可能的实施方式中,对于日志信息包括事务会话的情况,异常标识相同的日志单元,可以认为是系统级别下的相同类型的异常日志。而对于事务级别的日志单元,判定异常日志的类型是否相同的方法可以设定为:对于事务标识相同的日志单元,若异常标识也相同,则可以认为是事务级别下的相同类型的异常日志单元。同样,对于模块级别下的日志单元,即同时包含事务标识和日志点标识时,判定异常日志的类型是否相同的方法可以设定为:对于事务标识相同且日志点标识相同的日志单元,若异常标识也相同,则可以认为是模块级别下的相同类型的异常日志。对应地,事务级别下,对于当前日志单元与历史输出的日志单元具有相同的异常标识且具有相同的事务标识的情况;或者,模块级别下,对于当前日志单元与历史输出的日志单元具有相同的异常标识、具有相同的事务标识且具有相同的日志点标识的情况,可以确定当前日志单元的异常类型不属于首次输出的异常类型。通过上述与历史输出的日志单元之间的比较判断,可以判断出异常的日志单元属于哪个维度下的异常类型,并不局限于模块本身,有利于提高异常监测的广度和精度。
在一些可能的实施方式中,对于异常的日志单元,可以通过建立第一输出标识的方式,判断当前日志单元的异常类型是否属于首次输出的异常类型。其中,第一输出标识,用于标识首次输出的异常类型,可以通过数值、字段或文字等方式来进行具体的标识。示例性地,第一输出标识可以通过具体的数值进行标记,0代表不属于首次输出的异常类型,1代表属于首次输出的异常类型。本领域技术人员可以通过其它方式标记第一输出标识,本申请并不限定第一输出标识的具体标记方法。可以理解,通过建立第一输出标识,用户、管理员或系统可以更方便直接地对日志单元的异常类型是否属于首次输出的异常类型进行判断,提高系统的监测效率。
本申请实施例中的基于日志的异常监测方法,还包括判断日志单元是否异常的步骤,该步骤包括以下之一:
若日志单元包括异常标识,则确定日志单元异常;
或者,若日志单元缺少会话结束标识,则确定日志单元异常;
或者,若日志单元的会话时间大于预设会话时间阈值,则确定日志单元异常;
其中,会话用于表示事务从开始至结束的时间和操作空间。
本步骤中,提出了判断日志单元是否异常的步骤。对于日志单元包括异常标识的情况,则可确定日志单元异常。示例性地,当日志单元缺少会话结束标识或日志单元的会话时间大于预设会话时间阈值,可以认为该日志单元异常,同样,可通过将上述两种异常标记异常标识的方法,进行日志单元异常的判断处理。对于预设会话阈值,可以根据系统的事务时长和系统的繁忙程度综合设定。以上三种日志单元的异常判断方式仅是部分的示例说明,并不构成对日志单元的异常判断方式的具体限定内容,本领域技术人员可以理解的是,还可以通过其它的方式进行异常日志的判断。
本申请实施例中的基于日志的异常监测方法,其中将结构化的日志信息进行分类聚合,包括以下步骤:
从结构化的日志信息中获取满足预设条件的若干个日志信息;
将满足预设条件的若干个日志信息进行合并,得到日志单元。
本步骤中,对于日志信息中包含若干个的情况,将满足预设条件的日志信息进行合并,得到日志单元。在一些可能的实施例中,预设条件可以是日志信息具有相同的属性,也可以是日志信息具有相同的描述,还可以是日志信息具有相同的信息等。当然,本领域技术人员可以理解的是,以上三种预设条件仅是部分的示例说明,并不构成对根据预设条件进行日志信息的合并的判断和操作的具体限定内容,也可以通过其它的方式,进行日志信息的合并的 判断和操作。在一些可能的实施方式中,具体的合并方式可以是,将具有相同内容的日志合并为相互连接的信息或数据,示例性地,将具有相同内容的日志合并为首尾字段相连接的字符串,本申请并不限定具体的合并方式。将满足预设条件的若干个日志信息进行合并,在日志异常的情况下,只输出一次,有利于减少用户、管理员或系统的判断次数,提升异常监测的效率。
本申请实施例中的基于日志的异常监测方法,结构化的日志信息具有会话标识,预设条件为具有相同的会话标识;将满足预设条件的若干个日志信息进行合并,得到日志单元,包括:
将具有相同的会话标识的若干个日志信息按照预设的排列方式进行合并,得到日志单元。
本步骤中,将具有相同的会话标识的若干个日志信息按照预设的排列方式进行合并,得到日志单元。在一些可能的实施方式中,预设的排列方式可以是按照时间顺序进行排列,也可以是按照产生日志的模块名称进行排列,本申请并不限定具体的排列方式。
本申请实施例中的基于日志的异常监测方法,日志单元的简要信息包括日志单元的关联描述信息,日志单元的关联描述信息用于描述当前日志单元与历史输出的日志单元之间的关联关系;若日志单元的异常类型不属于首次输出的异常类型,则输出日志单元的简要信息这一步骤S105,包括:
若当前日志单元的异常类型不属于首次输出的异常类型,则输出当前日志单元的关联描述信息。
本步骤中,对于当前日志单元的异常类型不属于首次输出的异常类型的情况,输出当前日志单元的关联描述信息。对于不属于首次输出的异常类型的异常的日志单元,不需要输出该日志单元的所有信息,只需要输出当前日志单元与历史输出的日志单元之间的关联描述信息,用户、管理员或系统若需要了解该异常的日志单元的信息时,可以通过该关联描述信息进行查找,既能够满足故障查找定位的需求,又有利于降低输出的日志信息量,提升系统性能。在一些可能的实施方式中,关联描述信息可以是表征与历史输出的日志单元之间的连接关系的标记,示例性地,如指针、字符标记等。本申请并不限定该关联描述信息的具体表现形式,本领域技术人员可以根据需要选择其它形式来表述当前日志单元与历史输出的日志单元之间的连接关系。
本申请实施例中的基于日志的异常监测方法,日志单元的简要信息包括日志单元的预设标识,若日志单元的异常类型不属于首次输出的异常类型,则输出日志单元的简要信息这一步骤S105,包括:
若日志单元的异常类型不属于首次输出的异常类型,输出日志单元的预设标识。
本步骤中,对于日志单元的异常类型不属于首次输出的异常类型的情况,输出该日志单元的预设标识。在一些可能的实施例中,该预设标识可以是用户、管理员或系统指定的标识,可以是表征该异常的日志单元相关属性的信息,还可以是表征该异常的日志单元的相关描述的信息。本领域技术人员应当理解的是,上述只是该预设标识的示例性举例,并不用于限定具体的预设标识的表达形式。本领域技术人员可以通过该预设标识,查看该异常的日志单元的具体信息,有利于故障的定位和根因分析。
本申请实施例中的基于日志的异常监测方法,日志信息包括第一日志信息和第二日志信息,日志信息通过以下步骤生成:
在事务处理的入口生成第一日志信息;
在事务处理的出口输出第二日志信息;
其中,第二日志信息由第一日志信息更新得到,事务用于表示处理消息或数据时的一系列操作。
本步骤中,日志信息可以在事务的进程中生成。在每个事务处理的入口,生成第一日志信息,在事务处理的出口生成更新后的第一日志信息,即第二日志信息。该日志信息可以表征系统级别的事务,同时也包括了系统级别的异常的日志信息,有利于提升异常监测的监测范围,提升该异常监测方法的实用性。
本申请实施例中的基于日志的异常监测方法,还包括:
当日志单元异常,统计日志单元异常的出现次数;
若日志单元异常的出现次数大于预设的次数阈值,输出告警信息。
本步骤中,对于异常的日志单元比较多的情况,增加了输出告警信息,提醒用户、管理员或系统注意排故,以便及时进行故障定位,恢复系统的运行。在一些可能的实施方式中,当日志单元异常的总量超过预设的次数阈值时,输出告警信息。本领域技术人员可以理解的是,也可以设置为当首次输出的异常类型超过预设的次数阈值时,输出告警信息。当然,还可以根据需要设置为当某种异常类型出现的次数超过预设的次数阈值时,输出告警信息。具体告警信息输出条件的设置,可以根据实际需要选择。该预设的次数阈值,可以从系统的使用情况,使用频率和重要程度等方面综合考量后进行设定。在一些可能的实施例中,可以增加时间属性,统计一段时间内的日志单元异常的出现次数。
本申请实施例中的基于日志的异常监测方法,还包括:
当日志单元正常,删除日志单元。
本步骤中,对于日志单元正常的情况,将日志单元删除。正常的日志单元,对于异常监测系统而言,参考价值不大,同时占用内存空间。因此,删除正常的日志单元,有利于减轻系统的负荷,提升系统的性能。
本申请实施例中的基于日志的异常监测方法,由异常监测系统执行,异常监测系统包括第一日志管理模块、第二日志管理模块、异常监测模块和日志输出模块,将结构化的日志信息进行分类聚合,得到日志单元,包括:
通过第一日志管理模块将结构化的日志信息进行分类聚合,得到第一日志单元;
通过第二日志管理模块将结构化的日志信息进行分类聚合,得到第二日志单元;
当所述异常监测模块确定所述日志单元异常,由所述异常监测模块发送输出请求给所述日志输出模块;
根据所述输出请求,通过所述日志输出模块获取所述第一日志管理模块发送的所述第一日志单元,通过所述日志输出模块获取所述第二日志管理模块发送的所述第二日志单元,进而由所述日志输出模块将所述第一日志单元和所述第二日志单元进行二次聚合,得到所述日志单元;
或者,根据所述输出请求,通过所述日志输出模块发送二次聚合请求给所述第一日志管理模块或所述第二日志管理模块,以使所述第一日志管理模块或所述第二日志管理模块将所述第一日志单元和所述第二日志单元进行二次聚合,得到所述日志单元。
本步骤中,对于日志信息较多的系统,可以采用分布式缓存的方式进行日志信息的分类聚合。可以通过第一日志管理模块和第二日志管理模块分别对日志信息进行分类聚合,提升系统响应速度。当异常监测模块确定日志单元异常时,再进行二次聚合,有利于减少消息量,提升系统性能。当然,日志管理模块可以是一个,也可以是两个,还可以是多个,本领域技术人员可以根据实际需要选择不同数量的日志管理模块进行布局。在一些可能的实施例中,若日志单元异常,需要输出日志单元时,由异常监测模块发送输出请求给日志输出模块,通过日志输出模块获取第一日志管理模块发送的第一日志单元,并获取第二日志管理模块发送的第二日志单元,然后通过日志输出模块对上述第一日志单元和第二日志单元进行二次聚合后,得到日志单元,进而进行异常类型的判断,然后按照上述介绍的输出方式输出日志单元。在一些可能的实施例中,由异常监测模块发送输出请求给日志输出模块,由日志输出模块发送二次聚合请求给第一日志管理模块或第二日志管理模块,以使第一日志管理模块或第二日志管理模块将第一日志单元和第二日志单元进行二次聚合,得到日志单元,并由日志输出模块输出日志单元。在一些可能的实施方式中,可以由第一日志管理模块进行二次聚合操作, 也可以由第二日志管理模块进行二次聚合操作,还可以由第一日志管理模块和第二日志管理模块共同进行二次聚合操作。示例性地,以由第一日志管理模块进行二次聚合操作为例,聚合过程为:由日志输出模块发送二次聚合请求给第一日志管理模块,二次聚合请求中携带有第二日志单元信息,这样,第一日志管理模块就可以将第一日志单元和第二日志单元进行二次聚合,得到日志单元,并返回给日志输出模块输出该日志单元。
通过以上描述可知,本申请实施例中提出的基于日志的异常监测方法,通过获取日志信息,将日志信息进行结构化处理和分类聚合,得到日志单元;当日志单元异常,判断日志单元的异常类型是否属于首次输出的异常类型;若不属于首次输出的异常类型,则输出日志单元的简要信息;若属于首次输出的异常类型,则输出日志单元。该系统包括日志采集模块、日志管理模块、异常监测模块和日志输出模块。通过使用本申请中提供的方法,能够在日志出现异常时输出该日志信息,有利于故障的根因分析,能够满足故障定位的需求;同时,有利于缓解日志信息之间缺乏关联性的情况,提升异常监测的效率。
其次,参照附图2和附图3描述根据本申请实施例提出的一种基于日志的异常监测系统。
本发明所提出的一种基于日志的异常监测系统对应的业务系统的架构图如图2所示。异常监测系统属于日志管理子系统的一部分。日志管理子系统属于业务系统(网元)的操作维护管理(OAM)功能的组成部分。业务系统由若干业务节点(示例性地,如图2中的业务节点1、业务节点2和业务节点3所示)和一个日志管理子系统组成,每个业务节点包含若干业务模块(示例性地,如图2中的业务模块11、业务模块12、业务模块13和业务模块32等),每个业务模块都输出日志信息至日志管理子系统,异常监测系统可以获取该日志信息,通过上述的异常监测方法对系统进行异常监测。可以理解的是,图中显示的业务节点和相应的业务模块的个数是示例性的,本领域技术人员可以根据实际需要调整具体个数。
基于图2的系统架构,提出本申请一个实施例的基于日志的异常监测系统的结构示意图,如图3所示,该系统包括:
日志采集模块310,被设置为获取日志信息;
日志管理模块320,被设置为将日志信息进行结构化处理,得到结构化的日志信息;将结构化的日志信息进行分类聚合,得到日志单元;
异常监测模块330,被设置为当日志单元异常,判断日志单元的异常类型是否属于首次输出的异常类型;
日志输出模块340,被设置为当日志单元的异常类型不属于首次输出的异常类型,输出日志单元的简要信息;当日志单元的异常类型属于首次输出的异常类型,输出日志单元。
本申请实施例中的基于日志的异常监测系统,还包括:
异常告警模块,被设置为当所述日志单元异常时,统计所述日志单元异常的出现次数;当所述日志单元异常的出现次数大于预设的次数阈值时,输出告警信息。
可见,上述方法实施例中的内容均适用于本系统实施例中,本系统实施例所具体实现的功能与上述方法实施例相同,并且达到的有益效果与上述方法实施例所达到的有益效果也相同。
为了更好地说明本方案提出的异常监测方法和异常监测系统,下面通过三个示例进行具体说明:
示例一:如图4所示,以单个业务模块为例,对本申请提出的基于日志的异常监测方法进行说明,该方法包括以下步骤401-步骤410:
步骤401:事务开始时,业务模块向会话管理模块注册会话,会话管理模块生成日志信息,该日志信息包括会话标识,会话管理模块还被设置为保证该会话标识在该业务系统中的唯一性。
步骤402:业务模块提交事务开始的日志信息。
步骤402可以划分为步骤402a和步骤402b:
步骤402a:事务的后续进程中,系统在经过的日志点生成后续的日志信息,日志采集模块获取该日志信息。示例性地,如果是异常日志,则该日志信息中会携带异常标示。
步骤402b:事务处理结束时,系统生成会话结束的日志信息,向日志采集模块发送该信息,同时,会话结束标识,用于表示某个事务会话结束。
步骤403:日志采集模块获取上述日志信息。
步骤404和步骤405:日志采集模块将上述日志信息缓存至日志管理模块,以使日志管理模块对上述日志信息进行分类聚合,形成日志单元。
步骤406和步骤407:异常监测模块对该日志单元进行判断,当日志单元异常,判断日志单元的异常类型是否属于首次输出的异常类型,日志输出模块被设置为输出该日志单元。
步骤407和步骤408可以细分为以下步骤:
步骤407a和步骤408a:若日志单元的异常类型不属于首次输出的异常类型,则输出日志单元的简要信息;
步骤407b和步骤408b:若日志单元的异常类型属于首次输出的异常类型,则输出日志单元的完整信息;若日志单元正常,则删除该日志单元。
步骤409:日志管理模块统计一定时间范围内的异常日志单元出现次数。
步骤410:若异常日志单元出现次数超过预设的次数阈值,则上报告警,告警携带具体异常信息。
示例二:如图5所示,以多业务模块为例,日志信息采用集中缓存的方式进行处理,在事务处理的流程中,该方法包括以下步骤501-步骤510:
步骤501:事务开始时,第一业务模块向会话管理模块注册会话。
步骤502:第一业务模块提交事务开始的日志信息。
步骤502可以划分为步骤502a和步骤502b:
步骤502a:随着事务处理的流程,第二业务模块提交事务中的日志信息。
步骤502b:事务处理结束时,第一业务模块提交事务结束的日志信息。
步骤503:日志采集模块采集上述日志信息。
步骤504至步骤510同示例一中的步骤404至步骤410。
本领域技术人员可以理解的是,上述步骤可由第一业务模块和第二业务模块中的任意一个模块执行,图5中所述的步骤流程线路属于示例性地描述,即可通过第一业务模块或第二业务模块中的任意一个业务模块进行日志信息的生成,并将日志信息发送至日志采集模块。通过多业务模块的布局,缓解了日志信息繁多时的系统生成日志信息的压力,提升系统响应速度,提高用户体验。
示例三:如图6所示,以多业务模块,日志信息采用分布缓存的方式进行处理,由第一日志管理模块或第二日志管理模块进行二次聚合得到日志单元为例,在事务处理的流程中,该方法包括以下步骤601-步骤610:
步骤601:事务开始时,第一业务模块向会话管理模块注册会话。
步骤602:第一业务模块提交事务开始的日志信息至第一日志采集模块。
步骤602可以划分为步骤602a和步骤602b:
步骤602a:随着事务处理的流程,第二业务模块提交事务中的日志信息至第二日志采集模块。
步骤602b:事务处理结束时,第一业务模块提交事务结束的日志信息至第一日志采集模块。
步骤603和步骤604可以细分为以下步骤:
步骤603a和步骤604a:第一日志采集模块获取上述日志信息,并将上述日志信息缓存至第一日志管理模块;
步骤603b和步骤604b:第二日志采集模块获取上述日志信息,并将上述日志信息缓存 至第二日志管理模块。
步骤605可以细分为以下步骤:
步骤605a:第一日志管理模块将日志信息进行结构化处理和分类聚合,得到第一日志单元。
步骤605b:第二日志管理模块将日志信息进行结构化处理和分类聚合,得到第二日志单元。
步骤606:异常监测模块对第一日志单元和第二日志单元进行日志单元的异常判断。
步骤607:若日志单元异常,则由日志输出模块发出输出请求。
步骤608a和步骤608b:第一日志管理模块或第二日志管理模块接收到上述输出请求后,对第一日志单元和第二日志单元进行二次聚合操作,得到日志单元。
步骤609:日志输出模块接收上述日志单元后,输出日志单元。
步骤610至步骤611:对异常日志单元按照上述异常告警处理方法进行处理,如达到预设阈值时,输出告警信息。
从上述的描述可以看出,系统中可以布局多个业务模块、多个日志采集模块和多个日志管理模块,在日志信息生成阶段,多个业务模块均参与工作,后将日志信息发送至多个日志采集模块中的任意一个,后经过多个日志管理模块的分类聚合,得到日志单元。通过布局多个业务模块、多个日志采集模块和多个日志管理模块,有利于提升系统的运行性能,满足对于异常监测的时间性能的要求。本领域技术人员可以理解的是,上述三个示例中的业务模块、日志采集模块和日志管理模块的个数属于示例性地展示,本申请并不限定业务模块、日志采集模块和日志管理模块的具体个数。以上三个示例属于示例性的举例,并不作为对本申请具体实施方式的限制。
参照图7,本申请实施例提供了一种基于日志的异常监测装置,包括:
至少一个处理器710;
至少一个存储器720,被设置为存储至少一个程序;
当所述至少一个程序被所述至少一个处理器710执行时,使得所述至少一个处理器710实现所述的基于日志的异常监测方法。
同理,上述方法实施例中的内容均适用于本装置实施例中,本装置实施例所具体实现的功能与上述方法实施例相同,并且达到的有益效果与上述方法实施例所达到的有益效果也相同。
本申请实施例提供了一种基于日志的异常监测方法,该方法通过获取日志信息,将日志 信息进行结构化处理和分类聚合,得到日志单元,当日志单元异常,判断日志单元的异常类型是否属于首次输出的异常类型;若日志单元的异常类型不属于首次输出的异常类型,则输出日志单元的简要信息;若日志单元的异常类型属于首次输出的异常类型,则输出日志单元。该方法能够在日志出现异常时输出该日志信息,有利于故障的根因分析,能够满足故障定位的需求;同时,通过结构化处理和分类聚合,有利于缓解日志信息之间缺乏关联性的情况,提升异常监测的效率。
在一些实施例中,在方框图中提到的功能/操作可以不按照操作示图提到的顺序发生。例如,取决于所涉及的功能/操作,连续示出的两个方框实际上可以被大体上同时地执行或所述方框有时能以相反顺序被执行。此外,在本申请的流程图中所呈现和描述的实施例以示例的方式被提供,目的在于提供对技术更全面的理解。所公开的方法不限于本文所呈现的操作和逻辑流程。所述实施例是可预期的,其中各种操作的顺序被改变以及其中被描述为较大操作的一部分的子操作被独立地执行。
此外,虽然在功能性模块的背景下描述了本申请,但应当理解的是,除非另有相反说明,功能和/或特征中的一个或多个可以被集成在单个物理装置和/或软件模块中,或者一个或多个功能和/或特征可以在单独的物理装置或软件模块中被实现。还可以理解的是,有关每个模块的实际实现的详细讨论对于理解本申请是不必要的。更确切地说,考虑到在本文中公开的装置中各种功能模块的属性、功能和内部关系的情况下,在工程师的常规技术内将会了解该模块的实际实现。因此,本领域技术人员运用普通技术就能够在无需过度试验的情况下实现在权利要求书中所阐明的本申请。还可以理解的是,所公开的特定概念仅仅是说明性的,并不意在限制本申请的范围,本申请的范围由所附权利要求书及其等同方案的全部范围来决定。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干程序用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
在流程图中表示或在此以其他方式描述的逻辑和/或步骤,例如,可以被认为是用于实现逻辑功能的可执行程序的定序列表,可以具体实现在任何计算机可读介质中,以供程序执行系统、装置或设备(如基于计算机的系统、包括处理器的系统或其他可以从程序执行系统、 装置或设备取程序并执行程序的系统)使用,或结合这些程序执行系统、装置或设备而使用。就本说明书而言,“计算机可读介质”可以是任何可以包含、存储、通信、传播或传输程序以供程序执行系统、装置或设备或结合这些程序执行系统、装置或设备而使用的装置。
计算机可读介质的更具体的示例(非穷尽性列表)包括以下:具有一个或多个布线的电连接部(电子装置),便携式计算机盘盒(磁装置),随机存取存储器(RAM),只读存储器(ROM),可擦除可编辑只读存储器(EPROM或闪速存储器),光纤装置,以及便携式光盘只读存储器(CDROM)。另外,计算机可读介质甚至可以是可在其上打印所述程序的纸或其他合适的介质,因为可以例如通过对纸或其他介质进行光学扫描,接着进行编辑、解译或必要时以其他合适方式进行处理来以电子方式获得所述程序,然后将其存储在计算机存储器中。
应当理解,本申请的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中,多个步骤或方法可以用存储在存储器中且由合适的程序执行系统执行的软件或固件来实现。例如,如果用硬件来实现,和在另一实施方式中一样,可用本领域公知的下列技术中的任一项或他们的组合来实现:具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路,具有合适的组合逻辑门电路的专用集成电路,可编程门阵列(PGA),现场可编程门阵列(FPGA)等。
在本说明书的上述描述中,参考术语“一个实施方式/实施例”、“另一实施方式/实施例”或“某些实施方式/实施例”等的描述意指结合实施方式或示例描述的具体特征、结构、材料或者特点包含于本申请的至少一个实施方式或示例中。在本说明书中,对上述术语的示意性表述不一定指的是相同的实施方式或示例。而且,描述的具体特征、结构、材料或者特点可以在任何的一个或多个实施方式或示例中以合适的方式结合。
尽管已经示出和描述了本申请的实施方式,本领域的普通技术人员可以理解:在不脱离本申请的原理和宗旨的情况下可以对这些实施方式进行多种变化、修改、替换和变型,本申请的范围由权利要求及其等同物限定。
以上是对本申请的若干实施进行了具体说明,但本申请并不限于所述实施例,熟悉本领域的技术人员在不违背本申请本质的前提下还可做作出种种的等同变形或替换,这些等同的变形或替换均包含在本申请权利要求所限定的范围内。

Claims (15)

  1. 一种基于日志的异常监测方法,包括以下步骤:
    获取日志信息;
    将所述日志信息进行结构化处理,得到结构化的日志信息;
    将所述结构化的日志信息进行分类聚合,得到日志单元;
    当所述日志单元异常,判断所述日志单元的异常类型是否属于首次输出的异常类型;
    若所述日志单元的异常类型不属于首次输出的异常类型,则输出所述日志单元的简要信息;
    若所述日志单元的异常类型属于首次输出的异常类型,则输出所述日志单元。
  2. 根据权利要求1所述的基于日志的异常监测方法,其中,所述判断所述日志单元的异常类型是否属于首次输出的异常类型,包括以下之一:
    若当前所述日志单元与历史输出的日志单元具有相同的异常标识,则确定当前所述日志单元的异常类型不属于首次输出的异常类型;
    或者,若当前所述日志单元与历史输出的日志单元具有相同的事务标识且具有相同的异常标识,则确定当前所述日志单元的异常类型不属于首次输出的异常类型;
    或者,若当前所述日志单元与历史输出的日志单元具有相同的事务标识、具有相同的日志点标识且具有相同的异常标识,则确定当前所述日志单元的异常类型不属于首次输出的异常类型;
    或者,若所述日志单元具有第一输出标识,则确定所述日志单元的异常类型不属于首次输出的异常类型。
  3. 根据权利要求1所述的基于日志的异常监测方法,其中,所述方法还包括判断所述日志单元是否异常的步骤,所述判断所述日志单元是否异常的步骤包括以下之一:
    若所述日志单元包括异常标识,则确定所述日志单元异常;
    或者,若所述日志单元缺少会话结束标识,则确定所述日志单元异常;
    或者,若所述日志单元的会话时间大于预设会话时间阈值,则确定所述日志单元异常;
    其中,会话用于表示事务从开始至结束的时间和操作空间。
  4. 根据权利要求1所述的基于日志的异常监测方法,其中,所述将所述结构化的日志信息进行分类聚合,包括:
    从所述结构化的日志信息中获取满足预设条件的若干个日志信息;
    将所述满足预设条件的若干个日志信息进行合并,得到所述日志单元。
  5. 根据权利要求4所述的基于日志的异常监测方法,其中,所述结构化的日志信息具有会话 标识,所述预设条件为具有相同的所述会话标识;所述将所述满足预设条件的若干个日志信息进行合并,得到所述日志单元,包括:
    将具有相同的所述会话标识的若干个日志信息按照预设的排列方式进行合并,得到所述日志单元。
  6. 根据权利要求1所述的基于日志的异常监测方法,其中,所述日志单元的简要信息包括所述日志单元的关联描述信息,所述日志单元的关联描述信息用于描述当前所述日志单元与历史输出的日志单元之间的关联关系;所述若所述日志单元的异常类型不属于首次输出的异常类型,则输出所述日志单元的简要信息这一步骤,包括:
    若当前所述日志单元的异常类型不属于首次输出的异常类型,则输出当前所述日志单元的关联描述信息。
  7. 根据权利要求1所述的基于日志的异常监测方法,其中,所述日志单元的简要信息包括所述日志单元的预设标识,所述若所述日志单元的异常类型不属于首次输出的异常类型,则输出所述日志单元的简要信息这一步骤,包括:
    若所述日志单元的异常类型不属于首次输出的异常类型,输出所述日志单元的预设标识。
  8. 根据权利要求1所述的基于日志的异常监测方法,其中,所述日志信息包括第一日志信息和第二日志信息,所述日志信息通过以下步骤生成:
    在事务处理的入口生成所述第一日志信息;
    在事务处理的出口输出所述第二日志信息;
    其中,所述第二日志信息由所述第一日志信息更新得到,所述事务用于表示处理消息或数据时的一系列操作。
  9. 根据权利要求1所述的基于日志的异常监测方法,还包括:
    当所述日志单元异常,统计所述日志单元异常的出现次数;
    若所述日志单元异常的出现次数大于预设的次数阈值,输出告警信息。
  10. 根据权利要求1所述的基于日志的异常监测方法,还包括:
    当所述日志单元正常,删除所述日志单元。
  11. 根据权利要求1所述的基于日志的异常监测方法,其中,所述方法由异常监测系统执行,所述异常监测系统包括第一日志管理模块、第二日志管理模块、异常监测模块和日志输出模块,所述将所述结构化的日志信息进行分类聚合,得到日志单元,包括:
    通过所述第一日志管理模块将所述结构化的日志信息进行分类聚合,得到第一日志单元;
    通过所述第二日志管理模块将所述结构化的日志信息进行分类聚合,得到第二日志单元;
    当所述异常监测模块确定所述日志单元异常,由所述异常监测模块发送输出请求给所述日志输出模块;
    根据所述输出请求,通过所述日志输出模块获取所述第一日志管理模块发送的所述第一日志单元,通过所述日志输出模块获取所述第二日志管理模块发送的所述第二日志单元,进而由所述日志输出模块将所述第一日志单元和所述第二日志单元进行二次聚合,得到所述日志单元;
    或者,根据所述输出请求,通过所述日志输出模块发送二次聚合请求给所述第一日志管理模块或所述第二日志管理模块,以使所述第一日志管理模块或所述第二日志管理模块将所述第一日志单元和所述第二日志单元进行二次聚合,得到所述日志单元。
  12. 一种基于日志的异常监测系统,包括:
    日志采集模块,被设置为获取日志信息;
    日志管理模块,被设置为将所述日志信息进行结构化处理,得到结构化的日志信息;将所述结构化的日志信息进行分类聚合,得到日志单元;
    异常监测模块,被设置为当所述日志单元异常,判断所述日志单元的异常类型是否属于首次输出的异常类型;
    日志输出模块,被设置为当所述日志单元的异常类型不属于首次输出的异常类型,输出所述日志单元的简要信息;当所述日志单元的异常类型属于首次输出的异常类型,输出所述日志单元。
  13. 根据权利要求12所述的基于日志的异常监测系统,其中,所述系统还包括:
    异常告警模块,被设置为当所述日志单元异常时,统计所述日志单元异常的出现次数;当所述日志单元异常的出现次数大于预设的次数阈值时,输出告警信息。
  14. 一种基于日志的异常监测装置,包括:
    至少一个处理器;
    至少一个存储器,被设置为存储至少一个程序;
    当所述至少一个程序被所述至少一个处理器执行,使得所述至少一个处理器实现如权利要求1-11中任一项所述的一种基于日志的异常监测方法。
  15. 一种存储介质,其中存储有处理器可执行的程序,所述处理器可执行的程序在由处理器执行时用于实现如权利要求1-11中任一项所述的一种基于日志的异常监测方法。
PCT/CN2022/126493 2021-11-26 2022-10-20 一种基于日志的异常监测方法、系统、装置及存储介质 WO2023093394A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111421618.1A CN116185752A (zh) 2021-11-26 2021-11-26 一种基于日志的异常监测方法、系统、装置及存储介质
CN202111421618.1 2021-11-26

Publications (1)

Publication Number Publication Date
WO2023093394A1 true WO2023093394A1 (zh) 2023-06-01

Family

ID=86431194

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/126493 WO2023093394A1 (zh) 2021-11-26 2022-10-20 一种基于日志的异常监测方法、系统、装置及存储介质

Country Status (2)

Country Link
CN (1) CN116185752A (zh)
WO (1) WO2023093394A1 (zh)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110296244A1 (en) * 2010-05-25 2011-12-01 Microsoft Corporation Log message anomaly detection
CN106407077A (zh) * 2016-09-21 2017-02-15 广州华多网络科技有限公司 一种实时告警方法及系统
US20180129579A1 (en) * 2016-11-10 2018-05-10 Nec Laboratories America, Inc. Systems and Methods with a Realtime Log Analysis Framework
US20180314835A1 (en) * 2017-04-26 2018-11-01 Elasticsearch B.V. Anomaly and Causation Detection in Computing Environments
US20200210865A1 (en) * 2017-09-27 2020-07-02 Nec Corporation Log analysis system, log analysis method, log analysis program, and storage medium
CN113495820A (zh) * 2020-04-03 2021-10-12 北京沃东天骏信息技术有限公司 异常信息收集、处理方法和装置以及异常监控系统

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110296244A1 (en) * 2010-05-25 2011-12-01 Microsoft Corporation Log message anomaly detection
CN106407077A (zh) * 2016-09-21 2017-02-15 广州华多网络科技有限公司 一种实时告警方法及系统
US20180129579A1 (en) * 2016-11-10 2018-05-10 Nec Laboratories America, Inc. Systems and Methods with a Realtime Log Analysis Framework
US20180314835A1 (en) * 2017-04-26 2018-11-01 Elasticsearch B.V. Anomaly and Causation Detection in Computing Environments
US20200210865A1 (en) * 2017-09-27 2020-07-02 Nec Corporation Log analysis system, log analysis method, log analysis program, and storage medium
CN113495820A (zh) * 2020-04-03 2021-10-12 北京沃东天骏信息技术有限公司 异常信息收集、处理方法和装置以及异常监控系统

Also Published As

Publication number Publication date
CN116185752A (zh) 2023-05-30

Similar Documents

Publication Publication Date Title
CN111984499B (zh) 一种大数据集群的故障检测方法和装置
CN107634848B (zh) 一种采集分析网络设备信息的系统和方法
US8676965B2 (en) Tracking high-level network transactions
WO2021068831A1 (zh) 一种业务告警方法、设备及存储介质
CN111885040A (zh) 分布式网络态势感知方法、系统、服务器及节点设备
WO2021029928A1 (en) Transforming a data stream into structured data
WO2020228276A1 (zh) 网络告警的方法及装置
WO2020167464A1 (en) Fault prediction and detection using time-based distributed data
JP2014102661A (ja) 適用判定プログラム、障害検出装置および適用判定方法
CN113076229B (zh) 一种通用的企业级信息技术监控系统
CN109034580A (zh) 一种基于大数据分析的信息系统整体健康度评估方法
CN108965049A (zh) 提供集群异常解决方案的方法、设备、系统及存储介质
CN111049673A (zh) 一种服务网关中api调用统计和监控的方法及系统
CN110460454A (zh) 基于深度学习的网络设备端口故障智能预测方法及原理
US8850321B2 (en) Cross-domain business service management
CN111669295A (zh) 业务管理方法和装置
CN111680900A (zh) 一种工单发布方法、装置、电子设备及存储介质
Pan et al. Proactive microwave link anomaly detection in cellular data networks
CN112333020A (zh) 一种基于五元组的网络安全监测及数据报文解析系统
CN109032904A (zh) 被监控、管理服务器及数据获取、分析方法和管理系统
CN112141832A (zh) 一种电梯物联网可视化运营平台
CN107515807B (zh) 一种存储监控数据的方法及装置
CN113595776B (zh) 监控数据处理方法与系统
Borghesi et al. M100 ExaData: a data collection campaign on the CINECA’s Marconi100 Tier-0 supercomputer
WO2023093394A1 (zh) 一种基于日志的异常监测方法、系统、装置及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22897477

Country of ref document: EP

Kind code of ref document: A1