CN114629786A - Log real-time analysis method, device, storage medium and system - Google Patents

Log real-time analysis method, device, storage medium and system Download PDF

Info

Publication number
CN114629786A
CN114629786A CN202210287689.5A CN202210287689A CN114629786A CN 114629786 A CN114629786 A CN 114629786A CN 202210287689 A CN202210287689 A CN 202210287689A CN 114629786 A CN114629786 A CN 114629786A
Authority
CN
China
Prior art keywords
log
analyzed
alarm
keyword
real
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210287689.5A
Other languages
Chinese (zh)
Inventor
赵贵斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Kangjian Information Technology Shenzhen Co Ltd
Original Assignee
Kangjian Information Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kangjian Information Technology Shenzhen Co Ltd filed Critical Kangjian Information Technology Shenzhen Co Ltd
Priority to CN202210287689.5A priority Critical patent/CN114629786A/en
Publication of CN114629786A publication Critical patent/CN114629786A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/069Management of faults, events, alarms or notifications using logs of notifications; Post-processing of notifications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0677Localisation of faults

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention relates to the field of data processing, and provides a log real-time analysis method, a device, a storage medium and a system, wherein the log real-time analysis method comprises the following steps: acquiring a log to be analyzed, and writing the log to be analyzed into a message queue of a distributed publish-subscribe message system; setting the theme of the message queue of the log to be analyzed as the theme of the corresponding log category according to the log category of the log to be analyzed; acquiring a log to be analyzed corresponding to the subject of the message queue; screening out a log to be analyzed containing the alarm keywords according to preset alarm keywords corresponding to the theme; identifying the log containing the alarm keywords so as to facilitate the subsequent identification of the alarm log; the log real-time analysis method and the log real-time analysis device can identify the abnormal alarm log in the log, and quickly locate, troubleshoot and repair the problem after the fault occurs, thereby shortening the fault influence duration.

Description

Log real-time analysis method, device, storage medium and system
Technical Field
The present invention relates to the field of data processing, and in particular, to a method, an apparatus, a storage medium, and a system for real-time log analysis.
Background
At present, in daily operation and maintenance work, when an emergency abnormality and a fault occur, troubleshooting positioning, problem repairing and the like are generally performed based on received alarms or fault reports, and if the fault influence duration is to be shortened, the response speed and the technical level need to be improved so as to accelerate the intervention, troubleshooting and repairing speed.
In the prior art, the troubleshooting positioning is mainly based on the troubleshooting analysis of the logs, and when the number of hosts is small, the logs can be checked and the problems can be analyzed in a mode of logging in a remote server one by one; with the gradual increase of the service scale, the log scale is gradually huge, and the problem of low efficiency caused by manual filtering and log analysis is more and more obvious. Moreover, the processing method based on receiving the alarm or reporting the fault first and then performing log troubleshooting analysis cannot solve the problem that the fault is processed in advance based on the log, for example, faults such as a server disk, a disk array card, memory damage and the like can be calculated in advance through the log, and some faults can send the alarm in advance before the faults reach a critical value by analyzing relevant indexes in the log; and other faults can trigger an alarm immediately without waiting until the service is affected to discover the fault.
Therefore, how to obtain the alarm log as early as possible through log analysis to perform fault processing is a problem to be solved urgently.
Disclosure of Invention
In view of the above-mentioned shortcomings of the prior art, an object of the present invention is to provide a log real-time analysis method, device, storage medium and system, which are used to solve the problem in the prior art that an alarm log cannot be obtained as early as possible through log analysis for fault handling.
In order to achieve the above objects and other related objects, the present invention provides a log real-time analysis method, including the steps of: acquiring a log to be analyzed, and writing the log to be analyzed into a message queue of a distributed publish-subscribe message system; setting the theme of the message queue of the log to be analyzed as the theme of the corresponding log category according to the log category of the log to be analyzed; the log category comprises a server hardware log, an operating system log and a load balancing log; acquiring a log to be analyzed corresponding to the subject of the message queue; screening out a log to be analyzed containing the alarm keywords according to preset alarm keywords corresponding to the theme; and identifying the log containing the alarm keywords so as to facilitate the subsequent identification of the alarm log.
In an embodiment of the present invention, the obtaining the log to be analyzed and writing the log to be analyzed into a message queue of a distributed publish-subscribe message system includes: collecting the server hardware logs from a physical server cluster by utilizing a remote management card center, and writing the server hardware logs into the message queue; collecting the operating system log from the physical server cluster by using a system log center, and writing the operating system log into the message queue; and collecting the load balancing log from a load balancing cluster by utilizing a load balancing host collector, and writing the load balancing log into the message queue.
In an embodiment of the present invention, an implementation process of obtaining a log to be analyzed and writing the log to be analyzed into a message queue of a distributed publish-subscribe message system includes: actively acquiring the server hardware log from the physical server cluster by using an intelligent platform management interface module in the remote management card center, and storing the server hardware log in the remote management card center locally; writing the server hardware log into the distributed publish-subscribe message system by utilizing a first log data collector in the remote management card center; acquiring the operating system log from the physical server cluster by using a system log tool in the system log center, and storing the operating system log in the local system log center; writing the operating system log into the distributed publishing and subscribing message system by utilizing a second log data collector in the system log center; and collecting the load balancing logs from the load balancing cluster by using a third log data collector in the load balancing host, and writing the load balancing logs into the distributed publish-subscribe message system.
In an embodiment of the present invention, an implementation process of screening out a log to be analyzed including an alarm keyword according to a preset alarm keyword corresponding to the topic includes: the preset alarm keywords comprise blacklist keywords and white list keywords; when the log to be analyzed is matched with the blacklist keyword, determining the log to be analyzed as a necessary trigger alarm log; and when the log to be analyzed is matched with the white list keyword, determining that the log to be analyzed is an alarm irrelevant log.
In an embodiment of the present invention, the identifying the log including the alarm keyword is performed to facilitate subsequent identification of the alarm log, and an implementation process of the identifying includes: marking the warning state attribute in the log code corresponding to the bound trigger warning log as a warning mark; and when detecting that the warning state attribute in the log code corresponding to the log to be analyzed is the warning identifier, determining that the log to be analyzed is a warning log.
In an embodiment of the present invention, before the screening out the log to be analyzed including the alarm keyword according to the preset alarm keyword corresponding to the topic, the method further includes: matching and extracting the logs to be analyzed according to a preset host name format rule; when the host name cannot be extracted from the log to be analyzed, filtering the log to be analyzed; and when the host name is extracted from the log to be analyzed, reserving the log to be analyzed.
In an embodiment of the present invention, after the retaining the log to be analyzed, the method further includes: when the log to be analyzed contains preset abnormal keywords, wherein the abnormal keywords contain a first abnormal keyword and a second abnormal keyword; obtaining a first Boolean result according to the matching of the log to be analyzed and the first abnormal keyword; obtaining a second Boolean result according to the matching of the log to be analyzed and the second abnormal keyword; performing an and operation on the first boolean result and the second boolean result; and when the AND operation result is a true value, determining the log to be analyzed as an alarm log.
Correspondingly, the invention provides a log real-time analysis device, which comprises: the first processing module is used for acquiring a log to be analyzed and writing the log to be analyzed into a message queue of the distributed publish-subscribe message system; setting the theme of the message queue of the log to be analyzed as the theme of the corresponding log category according to the log category of the log to be analyzed; the log category comprises a server hardware log, an operating system log and a load balancing log; the acquisition module is used for acquiring the logs to be analyzed corresponding to the subjects of the message queue; the second processing module is used for screening out a log to be analyzed containing the alarm keyword according to a preset alarm keyword corresponding to the theme; and the third processing module is used for identifying the log containing the alarm keywords so as to facilitate the subsequent identification of the alarm log.
The present invention provides a storage medium having stored thereon a computer program which, when executed by a processor, implements the above-described log real-time analysis method.
The invention provides a log analysis device, comprising a memory for storing a computer program; and the processor is used for running the computer program to realize the log real-time analysis method.
As described above, the log real-time analysis method and apparatus of the present invention have the following beneficial effects:
(1) the abnormal alarm log in the log can be identified, and the problems can be quickly located, checked and repaired after the fault occurs, so that the fault influence duration is shortened.
(2) It is possible to deduce a future failure from the log in advance, or to issue an alarm in advance/immediately by log analysis.
(3) Under the condition that the log quantity is suddenly increased, the speed of log analysis processing is slower than the log speed and cannot be influenced.
Drawings
Fig. 1 is a flowchart illustrating a log real-time analysis method according to an embodiment of the invention.
Fig. 2 is a diagram of a log collection framework of the log real-time analysis method according to an embodiment of the present invention.
FIG. 3 is a flowchart illustrating a development process of a distributed real-time big data processing framework according to an embodiment of the log real-time analysis method of the present invention.
Fig. 4 is a schematic structural diagram of a log real-time analysis device according to an embodiment of the invention.
Fig. 5 is a system diagram of real-time log analysis performed by the log real-time analysis device according to an embodiment of the invention.
Description of the element reference numerals
41 first processing module
42 acquisition module
43 second processing module
44 third processing module
51 processor
52 memory
Detailed Description
The embodiments of the present invention are described below with reference to specific embodiments, and other advantages and effects of the present invention will be easily understood by those skilled in the art from the disclosure of the present specification. The invention is capable of other and different embodiments and of being practiced or of being carried out in various ways, and its several details are capable of modification in various respects, all without departing from the spirit and scope of the present invention. It is to be noted that the features in the following embodiments and examples may be combined with each other without conflict.
It should be noted that the drawings provided in the following embodiments are only for illustrating the basic idea of the present invention, and the components related to the present invention are only shown in the drawings rather than drawn according to the number, shape and size of the components in actual implementation, and the type, quantity and proportion of the components in actual implementation may be changed freely, and the layout of the components may be more complicated.
According to the log real-time analysis method and device, abnormal alarm logs in the logs can be identified, and problems can be quickly located, checked and repaired after a fault occurs, so that the fault influence duration is shortened; and can deduce the fault to happen through the log in advance, or send out the alarm in advance/immediately through the log analysis; in addition, the method can ensure that the speed of log analysis processing is slower than the log speed and cannot be influenced when the log quantity is suddenly increased.
As shown in fig. 1, in the embodiment, the log real-time analysis method of the present invention includes the following steps:
step S1, obtaining a log to be analyzed, and writing the log to be analyzed into a message queue of the distributed publish-subscribe message system; setting the theme of the message queue of the log to be analyzed as the theme of the corresponding log category according to the log category to which the log to be analyzed belongs; the log categories include server hardware logs, operating system logs, and load balancing logs.
Specifically, a remote management card center is used for collecting the server hardware logs from a physical server cluster and writing the server hardware logs into the message queue; collecting the operating system log from the physical server cluster by using a system log center, and writing the operating system log into the message queue; and collecting the load balancing log from a load balancing cluster by utilizing a load balancing host collector, and writing the load balancing log into the message queue.
Further specifically, an intelligent platform management interface module in the remote management card center is used for actively acquiring the server hardware logs from the physical server cluster and storing the server hardware logs in the remote management card center; writing the server hardware log into the distributed publish-subscribe message system by utilizing a first log data collector in the remote management card center; acquiring the operating system log from the physical server cluster by using a system log tool in the system log center, and storing the operating system log in the local system log center; writing the operating system log into the distributed publishing and subscribing message system by utilizing a second log data collector in the system log center; and collecting the load balancing logs from the load balancing cluster by using a third log data collector in the load balancing host, and writing the load balancing logs into the distributed publish-subscribe message system.
For example, as shown in fig. 2, in the log collection framework diagram of the present invention, a first log data collector is deployed in advance in a remote management card center, a management interface module of an intelligent platform is applied to pull server hardware logs from a physical server cluster, store the logs to a local place of the remote management card center, collect the server hardware logs in real time through the deployed first log data collector, and write the logs into a message middleware distributed publish-subscribe message system. The operating system logs comprise operating system kernel logs, operating system startup logs and operating system message logs, a second log data collector is deployed in a system log center in advance, an application system log tool pulls the operating system logs from a physical server cluster, the logs are stored in the system log center locally, the operating system logs are collected in real time through the deployed second log data collector, and the logs are written into the message middleware distributed publishing and subscribing message system.
Further specifically, the load balancing log comprises a normal log and an abnormal log, wherein the two logs are under the unified directory of the server of the load balancing host, both take dates as starting characters, and ending marks of the normal log and the abnormal log are different; and a third log data collector is pre-deployed on the load balancing host, and the deployed third log data collector collects load balancing logs in real time and writes the logs into the message middleware distributed publish-subscribe message system.
Further specifically, the log categories include a server hardware log, an operating system log, and a load balancing log, and based on the log category to which the log to be analyzed belongs, the topic of the message queue in the distributed publish-subscribe message system is set as the topic of the corresponding log category.
And step S2, acquiring the log to be analyzed corresponding to the subject of the message queue.
And obtaining the logs of corresponding classification from the message queue with the set subject, and packaging each log data into a tuple to prepare for subsequent processing.
And step S3, screening out the log to be analyzed containing the alarm keywords according to the preset alarm keywords corresponding to the theme.
Specifically, the preset alarm keywords include blacklist keywords and whitelist keywords; when the log to be analyzed is matched with the blacklist keyword, determining the log to be analyzed as a necessary trigger alarm log; and when the log to be analyzed is matched with the white list keyword, determining that the log to be analyzed is an alarm irrelevant log.
For example, for the operating system log, the preset white list keywords are: "ICMP Error", "nfsidmap", "system-tmples", "sshd", "ABRT", "syslog-ng", "ASA-4-313005", "nslcd", "a 2-buffer", "A2-RT-", "A2-SEC-FW-", "A2-SW-", "A2-VPN-FW-", "FG 1500D-", "vif-", "su", "mail", "nslcd [", "usb", "hub", "sfcb [," ETIMEDOUT "," is not allowed "," readpatch Error "," Unoverned read Error "," Medium Error "," nfidmap "," sys-tmples "," cross "" and "" cross "" address " "time out", "type", "cluster", "dna-data-Error", "CIDR", "circle", "Cannot bound user", "dbus", "use the force", "Deny", "kubel", "OCI", "synchronizing", "stream copy", "mfsc chunk", "club-system", "TLS handover hash", "system-d-location", "controller-manager", "Cannot extract a stored", "command-b-command & c", "lcd _ tah _ entry _ location", "Starting Switch roll", "Starting", "Error correction", "TACACS _ ROR _ MESSAGE _ admixture DEVICESCAN", "Harvest _ Hardware DEVICESCAN", "Hardware Switch mounted", "Hardware-", when log data in a message queue corresponding to an operating system log are detected and matched with the white list keyword, the log does not need to be concerned, is irrelevant to an alarm and can be ignored during alarm processing, and the log is determined to be an alarm irrelevant log; the set blacklist keywords are: "failed", "error", "err", "face", "fail", "refused", "hash", "coast", "panic", "File system full", "Out of memory", "Critical", "reboot", "read-only", "blocked", "FAULT", "reset", "bond0: link status definition up for interface", "nf _ conntrack: table full", "table overflow", when the log data in the message queue corresponding to the operating system log is detected to match the blacklist keyword, it indicates that the log is the alarm log that is triggered certainly. And setting the priority of the blacklist keywords to be higher than that of the white list keywords, and determining that the log still must trigger an alarm log when the blacklist keywords and the white list keywords simultaneously appear in the log. In addition, a keyword containing security concern is set for the operating system log, specifically: when the keyword containing the security concern is detected and matched based on log data in a message queue corresponding to the operating system log, a security mark is marked on the log for subsequent log analysis processing.
For example, for the load balancing log, the preset white list keywords are: "nella-plus", "falcon-control", "bh.pajk-ent.com", "proxy _ temp", "temporal", "try-es-cube-log", "try-es-log", "dubbo-dubbo-admin", "ultra-magnus", "oceans", "try", "chihuahuahuahuahua", "skyline", "fz-zhongdian", "polestar" and "/hive/", when detecting that the log data in the message queue corresponding to the load balancing log matches the white list keyword, it means that the log does not need to be concerned and is irrelevant to the alarm, and when the alarm processing can be ignored, the log is determined to be an alarm irrelevant log; for example, when detecting that the relevant key word "upstream timed out" or "Connection returned" in the log data corresponding to the load balancing log indicates that the request is overtime or the link is rejected, the back-end service is unavailable, and an alarm needs to be triggered, when the log including the above key word is detected and matched with the white list key word, it is still determined that the log can be ignored during the alarm based on the white list key word. When keywords of 'inside-lb', 'api-lb' and 'out-lb' are detected and matched in the log, a load balancing mark is marked on the log; other keywords in the load balancing log and the server hardware log are not described in detail herein.
Further specifically, before the log to be analyzed including the warning keyword is screened out according to a preset warning keyword corresponding to the theme, the method further includes: matching and extracting the logs to be analyzed according to a preset host name format rule; when the host name cannot be extracted from the log to be analyzed, filtering the log to be analyzed; and when the host name is extracted from the log to be analyzed, reserving the log to be analyzed.
For example, acquiring a to-be-analyzed log corresponding to the subject of the message queue as a server hardware log, and converting the server hardware log data from a JavaScript object numbered notation format into a Java object; in the conversion process, identifying and extracting the host name in the log according to a preset host name format rule, wherein the preset host name format is that the head of the host name is firstly started with 'a 2' or 'b 2', one of environments 'dev', 'test', 'pre' or 'prod', the last 16 bits of the internet address of the host are added, and the tail of the host name format rule is finally ended with 'sh', for example, the host name format rule is a2-prod-task-center-145-209.sh, when the format rule exists in the server hardware log, the host name is the host name containing the preset format rule, and the host name is extracted by using regular matching; and when the host name in the server hardware log data is identified to be not in accordance with the rule format, namely the server hardware log fails to extract the host name, and the server hardware log is filtered.
Further specifically, after the log to be analyzed is retained, the method further includes: when the log to be analyzed contains preset abnormal keywords, wherein the abnormal keywords contain a first abnormal keyword and a second abnormal keyword; obtaining a first Boolean result according to the matching of the log to be analyzed and the first abnormal keyword; obtaining a second Boolean result according to the matching of the log to be analyzed and the second abnormal keyword; performing an and operation on the first boolean result and the second boolean result; and when the AND operation result is a true value, determining the log to be analyzed as an alarm log.
For example, a first abnormal keyword of a disk array card of a host hardware log is "predictive failure reported for disk", a second abnormal keyword is "of Integrated raid controller", and a server hardware log to be analyzed is matched with the first abnormal keyword to obtain a first boolean result; matching the server hardware log to be analyzed with a second abnormal keyword to obtain a second Boolean result; performing AND operation on the first Boolean result and the second Boolean result, and determining the server hardware log as an alarm log when the result of the AND operation is true; performing AND operation similar to the method, and determining the log as an alarm log when the result is true value; the keyword corresponding to the durability of the solid state disk in the system message log is "Percent _ Life _ Remaining", and after the keyword is identified, whether the keyword reaches a threshold is calculated according to a specific value of the durability, for example, the current threshold is set to 10 according to experience, when the specific value of the identified durability is reduced to 10, an alarm is triggered, and the solid state disk is immediately replaced.
By the processing method, in the log filtering process, different filtering rules are respectively set for the server hardware log, the operating system log and the load balancing log, wherein white list keywords and black list keywords are respectively set for the server hardware log, the operating system log and the load balancing log for subsequent log alarm identification analysis; respectively setting respective special keywords for a server hardware log, an operating system log and a load balancing log for subsequent log analysis and statistics; in addition, for some specific abnormal logs, Boolean type matching operation is performed based on specific keywords so as to immediately or in advance perform emergency alarm processing.
And step S4, identifying the log containing the alarm keywords, so as to facilitate the subsequent identification of the alarm log.
Specifically, the warning state attribute in the log code corresponding to the certain trigger warning log is marked as a warning mark; and when the alarm state attribute in the log code corresponding to the log to be analyzed is detected to be the alarm identifier, determining that the log to be analyzed is an alarm log. Specifically, a warning information type is defined in a log code corresponding to a log to be analyzed in advance; the warning information type comprises a warning state attribute; when the log to be analyzed is determined to be a bound alarm log, setting the alarm state attribute in the log code corresponding to the log to be analyzed to be a true value; and when detecting the log code corresponding to the log to be analyzed subsequently and the alarm state attribute is the true value, determining that the log to be analyzed is the alarm log.
Further specifically, after determining that the log to be analyzed is the alarm log, when sending the alarm log, in order to avoid continuous sending of alarm short messages of the same type, a convergence rule is set according to needs, for example, a abnormality occurs in the host a disk, a maximum of 3 short message notifications are set to be sent, if 1 message is currently checked to be sent, the number of sending messages set by the convergence rule is not yet triggered, and the number of sending times that can still be sent is 2; if the current check shows that 3 pieces are transmitted, the subsequent transmission can not be carried out. In addition, if no convergence rule is set according to the requirement, the corresponding alarm short message can be directly sent. And when detecting that one log is the log needing to send the alarm and the number of sending logs set by the convergence rule is not triggered or the convergence rule of the alarm short message is not set, constructing an alarm short message case, calling a monitoring system interface, and triggering the alarm short message. For example, the alarm short message case is: [A1] [ PROBLEM ] [ a2-real-dns-129-84.sh ] [ ] [ LC-machine-Message detailed for Disk 5in Backplane 1.all (#1)1 ═ 1] [ O12018-06-0615: 00:57 ].
More specifically, based on the filtered log, the corresponding storage operation is performed according to the set function classification. Format adjustment is carried out on the filtered server hardware logs, the filtered server hardware logs and the filtered server operating system logs and the filtered server load balancing logs, the filtered server hardware logs, the filtered server operating system logs and the filtered server load balancing logs are all converted into logs with a uniform format, and the logs are stored into corresponding databases according to different storage processing flows based on different functional classifications so as to be used for different subsequent analysis processing of the logs.
The preset storage preprocessing flow comprises the following steps: the method comprises a log write message middleware process, a log write cache database process and a log static analysis storage process. The log writing message middleware process writes the logs converted into the uniform format back to message queues corresponding to other topics of the message middleware distributed publish-subscribe message system, and is used for subscribing the logs in the message queues through configuration of the configuration file subsequently, so that the logs are written into the distributed file system for use by other business systems, for example, the logs can be provided for an offline distributed big data processing system for analysis and processing.
The log writing cache database process is used for writing the alarm logs in the logs converted into the unified format into the data structure database, logging in a log platform after maintenance personnel receive the alarm, performing problem location only based on the corresponding alarm logs in the data structure database without checking each log, and quickly locating and finding other abnormal logs related to the alarm logs.
The log static analysis storage process is used for calculating and summarizing the logs converted into the unified format to generate common log statistical data, and the common log statistical data is displayed on the query platform in an icon form so as to be convenient for more clearly and conveniently knowing various alarm conditions, such as total alarm times, hardware alarm times, kernel alarm times, system log alarm times, attack alarm times and the like within a certain time, and meanwhile, the occupation ratio of each alarm can be calculated to be used for quickly knowing the current overall condition.
Further specifically, on the query platform, system abnormal logs, high-risk command operation records and one-time security attack logs can be queried; for example, the query of the high-risk command operation record is applied to log data marked with a safety mark in an operating system log, the log data marked with the safety mark is subjected to operation query, and whether the high-risk command operation record exists in a query result is verified to perform corresponding processing; the primary security attack log is applied to the log data marked with the load balancing marks, operation query is carried out on the log data marked with the load balancing marks, and whether a security attack operation record exists once or not is verified in a query result so as to carry out corresponding processing.
As shown in fig. 3, in the embodiment, the flow chart of the distributed real-time big data processing framework development of the present invention corresponds to the processing of the above steps S1-S4 through the flow in the diagram, so that the abnormal alarm log in the log can be identified, and the problem can be quickly located and troubleshot and repaired after the fault occurs, thereby shortening the fault impact duration; and can deduce the fault to happen through the log in advance, or send out the alarm in advance/immediately through the log analysis; in addition, the method can ensure that the speed of log analysis processing is slower than the log speed and cannot be influenced when the log quantity is suddenly increased.
As shown in fig. 4, in the present embodiment, the log real-time analysis device of the invention includes:
the first processing module 41 is configured to obtain a log to be analyzed, and write the log to be analyzed into a message queue of a distributed publish-subscribe message system; setting the theme of the message queue of the log to be analyzed as the theme of the corresponding log category according to the log category of the log to be analyzed; the log category comprises a server hardware log, an operating system log and a load balancing log;
an obtaining module 42, configured to obtain a to-be-analyzed log corresponding to a topic of the message queue;
the second processing module 43 is configured to screen out a log to be analyzed that includes a preset warning keyword corresponding to the theme;
and a third processing module 44, configured to identify the log that includes the alarm keyword, so as to facilitate subsequent identification of the alarm log. The technical features of the specific implementation of the log real-time analysis apparatus in this embodiment are basically the same as the principles of the steps in the log real-time analysis method in embodiment 1, and the general technical contents between the method and the apparatus are not repeated.
The storage medium of the present invention stores thereon a computer program that, when executed by a processor, implements the log real-time analysis method described above.
As shown in fig. 5, in the embodiment, the log real-time analysis system of the present invention includes: a processor 51 and a memory 52.
The memory 52 is used for storing computer programs.
The memory 52 includes: various media that can store program codes, such as ROM, RAM, magnetic disk, U-disk, memory card, or optical disk.
The processor 51 is connected to the memory 52 and configured to execute the computer program stored in the memory 52, so that the log real-time analysis system executes the log real-time analysis method.
Preferably, the Processor 51 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), a Graphics Processing Unit (GPU), and the like, where in practical applications, the general-purpose Processor mainly used is the graphics Processor; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a discrete Gate or transistor logic device, or a discrete hardware component.
In conclusion, the log real-time analysis method and device can identify abnormal alarm logs in the log, and quickly locate, troubleshoot and repair problems after a fault occurs, so that the fault influence duration is shortened; and can deduce the fault to happen through the log in advance, or send out the alarm in advance/immediately through the log analysis; in addition, the method can ensure that the speed of log analysis processing is slower than the log speed and cannot be influenced when the log quantity is increased suddenly. Therefore, the invention effectively overcomes various defects in the prior art and has high industrial utilization value.
The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Claims (10)

1.A log real-time analysis method is characterized by comprising the following steps:
acquiring a log to be analyzed, and writing the log to be analyzed into a message queue of a distributed publish-subscribe message system; setting the theme of the message queue of the log to be analyzed as the theme of the corresponding log category according to the log category of the log to be analyzed; the log category comprises a server hardware log, an operating system log and a load balancing log;
acquiring a log to be analyzed corresponding to the subject of the message queue;
screening out a log to be analyzed containing the alarm keywords according to preset alarm keywords corresponding to the theme;
and identifying the log containing the alarm keywords so as to facilitate the subsequent identification of the alarm log.
2. The log real-time analysis method according to claim 1, wherein the obtaining the log to be analyzed and writing the log to be analyzed into a message queue of a distributed publish-subscribe message system comprises:
collecting the server hardware logs from a physical server cluster by using a remote management card center, and writing the server hardware logs into the message queue;
collecting the operating system log from the physical server cluster by using a system log center, and writing the operating system log into the message queue;
and collecting the load balancing log from a load balancing cluster by utilizing a load balancing host collector, and writing the load balancing log into the message queue.
3. The log real-time analysis method according to claim 2, wherein one implementation process of obtaining the log to be analyzed and writing the log to be analyzed into a message queue of a distributed publish-subscribe message system comprises:
actively acquiring the server hardware log from the physical server cluster by using an intelligent platform management interface module in the remote management card center, and storing the server hardware log in the remote management card center locally;
writing the server hardware log into the distributed publish-subscribe message system by utilizing a first log data collector in the remote management card center;
acquiring the operating system log from the physical server cluster by using a system log tool in the system log center, and storing the operating system log in the local system log center;
writing the operating system log into the distributed publishing and subscribing message system by utilizing a second log data collector in the system log center;
and collecting the load balancing logs from the load balancing cluster by using a third log data collector in the load balancing host, and writing the load balancing logs into the distributed publish-subscribe message system.
4. The real-time log analysis method according to claim 1, wherein one implementation process of screening out the log to be analyzed including the alarm keyword according to the preset alarm keyword corresponding to the topic comprises:
the preset alarm keywords comprise blacklist keywords and white list keywords;
when the log to be analyzed is matched with the blacklist keyword, determining the log to be analyzed as a necessary trigger alarm log;
and when the log to be analyzed is matched with the white list keyword, determining that the log to be analyzed is an alarm irrelevant log.
5. The method as claimed in claim 4, wherein the step of identifying the log containing the alarm keyword for facilitating subsequent identification of the alarm log comprises:
marking the warning state attribute in the log code corresponding to the bound trigger warning log as a warning mark;
and when detecting that the warning state attribute in the log code corresponding to the log to be analyzed is the warning identifier, determining that the log to be analyzed is a warning log.
6. The real-time log analysis method according to claim 1, wherein before the log to be analyzed including the alarm keyword is screened out according to a preset alarm keyword corresponding to the topic, the method further comprises:
matching and extracting the logs to be analyzed according to a preset host name format rule;
when the host name cannot be extracted from the log to be analyzed, filtering the log to be analyzed;
and when the host name is extracted from the log to be analyzed, reserving the log to be analyzed.
7. The real-time log analysis method according to claim 6, wherein after the log to be analyzed is retained, the method further comprises:
when the log to be analyzed contains preset abnormal keywords, wherein the abnormal keywords contain a first abnormal keyword and a second abnormal keyword; obtaining a first Boolean result according to the matching of the log to be analyzed and the first abnormal keyword;
obtaining a second Boolean result according to the matching of the log to be analyzed and the second abnormal keyword;
performing an and operation on the first boolean result and the second boolean result;
and when the AND operation result is a true value, determining the log to be analyzed as an alarm log.
8. A log real-time analysis device, comprising:
the first processing module is used for acquiring a log to be analyzed and writing the log to be analyzed into a message queue of the distributed publish-subscribe message system; setting the theme of the message queue of the log to be analyzed as the theme of the corresponding log category according to the log category of the log to be analyzed; the log category comprises a server hardware log, an operating system log and a load balancing log;
the acquisition module is used for acquiring the logs to be analyzed corresponding to the subjects of the message queue;
the second processing module is used for screening out a log to be analyzed containing the alarm keyword according to a preset alarm keyword corresponding to the theme;
and the third processing module is used for identifying the log containing the alarm keyword so as to facilitate the subsequent identification of the alarm log.
9. A log analysis device characterized by: comprising a memory for storing a computer program; a processor for running the computer program to implement the steps of the log real-time analysis method as claimed in any one of claims 1 to 7.
10. A storage medium storing program instructions, wherein the program instructions, when executed, implement the steps of the log real-time analysis method of any one of claim 1 to claim 7.
CN202210287689.5A 2022-03-22 2022-03-22 Log real-time analysis method, device, storage medium and system Pending CN114629786A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210287689.5A CN114629786A (en) 2022-03-22 2022-03-22 Log real-time analysis method, device, storage medium and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210287689.5A CN114629786A (en) 2022-03-22 2022-03-22 Log real-time analysis method, device, storage medium and system

Publications (1)

Publication Number Publication Date
CN114629786A true CN114629786A (en) 2022-06-14

Family

ID=81904206

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210287689.5A Pending CN114629786A (en) 2022-03-22 2022-03-22 Log real-time analysis method, device, storage medium and system

Country Status (1)

Country Link
CN (1) CN114629786A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115714718A (en) * 2022-09-23 2023-02-24 上海芯赛云计算科技有限公司 Log early warning method and system based on memory, computer equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100229182A1 (en) * 2009-03-05 2010-09-09 Fujitsu Limited Log information issuing device, log information issuing method, and program
CN107301115A (en) * 2017-06-26 2017-10-27 中国铁道科学研究院电子计算技术研究所 Application exception is monitored and restoration methods and equipment
CN109308329A (en) * 2018-09-27 2019-02-05 深圳供电局有限公司 A kind of log collecting method and device based on cloud platform
CN111274095A (en) * 2020-02-24 2020-06-12 深圳前海微众银行股份有限公司 Log data processing method, device, equipment and computer readable storage medium
CN113420032A (en) * 2021-07-20 2021-09-21 奇安信科技集团股份有限公司 Classification storage method and device for logs
CN114143171A (en) * 2021-11-30 2022-03-04 中国电信集团系统集成有限责任公司 Alarm root cause positioning method and system based on TR069 protocol

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100229182A1 (en) * 2009-03-05 2010-09-09 Fujitsu Limited Log information issuing device, log information issuing method, and program
CN107301115A (en) * 2017-06-26 2017-10-27 中国铁道科学研究院电子计算技术研究所 Application exception is monitored and restoration methods and equipment
CN109308329A (en) * 2018-09-27 2019-02-05 深圳供电局有限公司 A kind of log collecting method and device based on cloud platform
CN111274095A (en) * 2020-02-24 2020-06-12 深圳前海微众银行股份有限公司 Log data processing method, device, equipment and computer readable storage medium
CN113420032A (en) * 2021-07-20 2021-09-21 奇安信科技集团股份有限公司 Classification storage method and device for logs
CN114143171A (en) * 2021-11-30 2022-03-04 中国电信集团系统集成有限责任公司 Alarm root cause positioning method and system based on TR069 protocol

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115714718A (en) * 2022-09-23 2023-02-24 上海芯赛云计算科技有限公司 Log early warning method and system based on memory, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN107729210B (en) Distributed service cluster abnormity diagnosis method and device
US9760468B2 (en) Methods and arrangements to collect data
US7664986B2 (en) System and method for determining fault isolation in an enterprise computing system
US7213176B2 (en) Adaptive log file scanning utility
CN106789306B (en) Method and system for detecting, collecting and recovering software fault of communication equipment
CN110928718A (en) Exception handling method, system, terminal and medium based on correlation analysis
CN106649040A (en) Automatic monitoring method and device for performance of Weblogic middleware
CN107229556A (en) Log Analysis System based on elastic components
CN111881011A (en) Log management method, platform, server and storage medium
CN114356499A (en) Kubernetes cluster alarm root cause analysis method and device
CN110231998B (en) Detection method and device for distributed timing task and storage medium
CN104574219A (en) System and method for monitoring and early warning of operation conditions of power grid service information system
CN113505044B (en) Database warning method, device, equipment and storage medium
KR20220166760A (en) Apparatus and method for managing trouble using big data of 5G distributed cloud system
CN108809729A (en) The fault handling method and device that CTDB is serviced in a kind of distributed system
CN114629786A (en) Log real-time analysis method, device, storage medium and system
US8949669B1 (en) Error detection, correction and triage of a storage array errors
CN117312098B (en) Log abnormity alarm method and device
JP2009181496A (en) Job processing system and job management method
CN112910733A (en) Full link monitoring system and method based on big data
CN116881100A (en) Log detection method, log alarm method, system, equipment and storage medium
CN111240936A (en) Data integrity checking method and equipment
CN113708986A (en) Server monitoring apparatus, method and computer-readable storage medium
CN115525392A (en) Container monitoring method and device, electronic equipment and storage medium
CN113342596A (en) Distributed monitoring method, system and device for equipment indexes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination