CN116126807A - Log analysis method and related device - Google Patents

Log analysis method and related device Download PDF

Info

Publication number
CN116126807A
CN116126807A CN202211704431.7A CN202211704431A CN116126807A CN 116126807 A CN116126807 A CN 116126807A CN 202211704431 A CN202211704431 A CN 202211704431A CN 116126807 A CN116126807 A CN 116126807A
Authority
CN
China
Prior art keywords
log
logs
structured
structured system
system log
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211704431.7A
Other languages
Chinese (zh)
Inventor
王茜
戴之光
马晓平
孙淮松
冯毅
汤宇
娄峰
耿欣
王迪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Travelsky Technology Co Ltd
Original Assignee
China Travelsky Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Travelsky Technology Co Ltd filed Critical China Travelsky Technology Co Ltd
Priority to CN202211704431.7A priority Critical patent/CN116126807A/en
Publication of CN116126807A publication Critical patent/CN116126807A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/1734Details of monitoring file system events, e.g. by the use of hooks, filter drivers, logs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1805Append-only file systems, e.g. using logs or journals to store data
    • G06F16/1815Journaling file systems
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The application discloses a log analysis method, which relates to the technical field of computers and comprises the following steps: converting the unstructured system log into a structured system log; classifying the structured system logs, and determining the log weight of the structured system logs according to the classification of the structured system logs; obtaining the number of logs in a preset time slice according to the structured system logs and the corresponding log weights generated in the preset time slice; determining a system log set with abnormal log quantity according to the log quantity; and assigning weights to different fields of the structured system log in the system log set. The method can realize high-efficiency and lightweight log analysis. The application also discloses a log analysis device, equipment and a computer readable storage medium, which have the technical effects.

Description

Log analysis method and related device
Technical Field
The application relates to the technical field of computers, in particular to a log analysis method; also relates to a log analysis device, apparatus and computer readable storage medium.
Background
The system log contains rich information, so that the system log can be used for various scene analysis, such as equipment software/hardware state detection, network behavior analysis, user behavior analysis, intrusion detection, fault location and diagnosis, performance evaluation, predictive maintenance and the like. However, log analysis processes have the following features: 1) The system log is unstructured and often varies with the type of equipment, the provider, the model and the operating system, so that the current log analysis is mostly performed manually by operation and maintenance engineers, so that a great deal of time and effort are required to be consumed, log data are complicated, and log analysis based on manpower often causes omission of abnormal information; 2) The number of system logs is huge, so in order to accurately find out faults, an operation and maintenance engineer always searches the system logs of related equipment in a certain time window, extracts needed useful information from a large number of unstructured heterogeneous logs, and the operation is time-consuming and labor-consuming and easily ignores some details submerged by common logs. Although the text data nature and the structural resolvable of the system log can be used for log analysis through mathematical algorithms such as principal component analysis, singular value decomposition and the like and various machine learning algorithm methods, the machine learning algorithms are relatively complex and have high requirements on the computing capacity of the system. Whereas, with data analysis methods such as principal component analysis, singular value decomposition, etc., the analysis efficiency tends to be low when faced with a huge number of unstructured texts, and is not suitable for real-time detection, with low portability.
Therefore, providing an efficient and lightweight log analysis scheme has become a technical problem to be solved by those skilled in the art.
Disclosure of Invention
The purpose of the application is to provide a log analysis method which can realize high-efficiency and lightweight log analysis. Another object of the present application is to provide a log analyzing apparatus, a device, and a computer-readable storage medium, each having the above technical effects.
In order to solve the above technical problems, the present application provides a log analysis method, including:
converting the unstructured system log into a structured system log;
classifying the structured system logs, and determining the log weight of the structured system logs according to the classification of the structured system logs;
obtaining the number of logs in a preset time slice according to the structured system logs and the corresponding log weights generated in the preset time slice;
determining a system log set with abnormal log quantity according to the log quantity;
and assigning weights to different fields of the structured system log in the system log set.
Optionally, classifying the structured system log includes:
converting the structured system log into a multidimensional vector;
and processing the multidimensional vector through a clustering algorithm to obtain the category of the structured system log.
Optionally, the assigning weights to different fields of the structured system log in the system log set includes:
and assigning weights to the time stamps, event levels, log identifications and event detailed information of the structured system log.
Optionally, assigning weights to event details of the structured system log includes:
extracting an event detailed information template;
establishing an abnormal word stock according to the event detailed information template;
and giving weight to the words in the abnormal word stock.
Optionally, the extracting the event detailed information template includes:
and extracting event detailed information templates by using the FT-Tree.
Optionally, the system log set for determining that the log quantity is abnormal includes:
carrying out anomaly detection on the number of logs through a statistical algorithm;
and carrying out secondary filtering on the detection result of the statistical algorithm through a preset filtering rule, and determining the system log set.
Optionally, the anomaly detection on the log quantity through a statistical algorithm includes:
and carrying out anomaly detection on the log quantity through a nsigma algorithm.
In order to solve the above technical problem, the present application further provides a log analysis device, including:
the conversion module is used for converting the unstructured system log into a structured system log;
the classification module is used for classifying the structured system logs and determining the log weight of the structured system logs according to the types of the structured system logs;
the log quantity determining module is used for obtaining the log quantity in the preset time slice according to the structured system log generated in the preset time slice and the corresponding log weight;
the log set determining module is used for determining a system log set with abnormal log quantity according to the log quantity;
and the distribution module is used for distributing weights to different fields of the structured system log in the system log set.
In order to solve the above technical problem, the present application further provides a log analysis device, including:
a memory for storing a computer program;
a processor for implementing the steps of the log analysis method as claimed in any one of the preceding claims when executing the computer program.
To solve the above technical problem, the present application further provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the steps of the log analysis method as set forth in any one of the above.
The log analysis method provided by the application comprises the following steps: converting the unstructured system log into a structured system log; classifying the structured system logs, and determining the log weight of the structured system logs according to the classification of the structured system logs; obtaining the number of logs in a preset time slice according to the structured system logs and the corresponding log weights generated in the preset time slice; determining a system log set with abnormal log quantity according to the log quantity; and assigning weights to different fields of the structured system log in the system log set.
Therefore, according to the log analysis method provided by the application, log analysis is performed hierarchically, on the basis of automatically converting the unstructured system log into the structured system log, the log weight is set for the system log, the number of the logs is determined based on the log weight, and then the abnormal detection of the number of the logs is performed, and weights are distributed for different fields of the system log with abnormal number of the logs, so that the abnormal detection of the system log is performed according to the fields of the system log and the weights thereof. Compared with the traditional technical scheme of manual analysis and log analysis by various machine learning algorithm methods and the like, the log analysis method can realize high-efficiency and lightweight log analysis.
The log analysis device, the log analysis equipment and the computer readable storage medium provided by the application have the technical effects.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the following description will briefly explain the drawings needed in the prior art and embodiments, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flow chart of a log analysis method according to an embodiment of the present application;
fig. 2 is a schematic diagram of a log analysis flow provided in an embodiment of the present application;
fig. 3 is a schematic diagram of a log analysis device according to an embodiment of the present application;
fig. 4 is a schematic diagram of a log analysis device according to an embodiment of the present application.
Detailed Description
The core of the application is to provide a log analysis method which can realize high-efficiency and lightweight log analysis. Another core of the present application is to provide a log analysis device, apparatus, and computer-readable storage medium, which all have the above technical effects.
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
Referring to fig. 1, fig. 1 is a flow chart of a log analysis method provided in an embodiment of the present application, and referring to fig. 1, the method includes:
s101, converting unstructured system logs into structured system logs;
step S101 is directed to preprocessing the system log, converting the original unstructured system log into a structured system log. Although the system logs of different vendors, different models, and different types of devices are defined differently, the system logs are typically composed of a plurality of fields. Including, for example, a timestamp, device ID/device name, event level, log identification, event details, etc. The timestamp refers to the system time when the system log is generated, the device ID/device name refers to the device ID/device name when the system log is generated, the event level characterizes the severity of the event recorded by the system log, the log identifier is an information type descriptor, and the event detailed information contains free-form text describing the event.
The original unstructured system log may be processed using a log management platform such as an ELK platform. ELK is an abbreviation for elastic search, logstar and Kibana, and is a widely used log management platform. Logstack is responsible for extracting event data in a system log from different information sources, i.e. different devices, and converting these unstructured raw event data into a series of structured event data, which are then stored in an elastic search. The elastomer search is similar to a database, storing core data in the ELK platform. Kibana is a data presentation tool that reads structured log data from an elastic search, and draws different charts according to the user's requirements to facilitate data analysis.
S102, classifying the structured system logs, and determining the log weight of the structured system logs according to the classification of the structured system logs;
s103, obtaining the number of logs in a preset time slice according to the structured system logs and the corresponding log weights generated in the preset time slice;
s104, determining a system log set with abnormal log quantity according to the log quantity;
steps S102 to S104 aim at log number abnormality detection. A simple implementation of log number anomaly detection may be to count the number of system logs per fixed time slice and determine if the number of system logs per fixed time slice exceeds a threshold. However, this detection method has the following problems: 1) Errors such as "user login", "password error", "overtime exit", "illegal instruction", etc. may occur in the time slices where a large amount of useless or repeated information occurs, and when similar situations occur, the number of logs in the time slices to be detected may be increased sharply, so that the detection is erroneous. 2) The threshold value needs to be manually optimized along with the adjustment of the service, so that the problems of inflexibility, weaker expansibility and the like are caused; when the indexes to be detected are more, the operation and maintenance personnel can hardly adjust the threshold value timely and accurately, so that false alarm or missing alarm is often caused under different scenes. For example, when the threshold is set high but the traffic is in a low peak, a false alarm may be caused; when the threshold is low but the traffic is at peak, it can lead to a large number of repeated alarms in a short time.
In order to avoid the above-described problem, the present embodiment performs log number abnormality detection in the following manner:
according to the definition of "entropy" in shannon information theory:
Figure BDA0004025792410000051
uppercase P denotes different types of system logs, lowercase P denotes the probability of occurrence of different types of system logs within a certain time.
What information is sent by an information source is uncertain, the probability of occurrence of the information source can be used for measuring the information quantity represented by the information source, the larger the probability is, the higher the probability is, the smaller the uncertainty is, and conversely, the larger the uncertainty is. In the system log analysis scenario, some system logs often appear in many time slices, or periodically appear in a large number of specific time slices, which is considered normal. For example: user login information. Conversely, some system logs occur very infrequently, requiring significant attention from the service personnel. Therefore, the frequent system logs can be filtered, the abrupt system logs are kept for key analysis, and the influence caused by periodicity is eliminated as much as possible. Therefore, events with higher occurrence frequency are given lower weight, so that the events can be easily filtered; higher weight is given to events with lower occurrence frequency, so that the mutated events are more prominent. Based on this, in the present embodiment, the system logs are first classified, and then corresponding log weights are set for the system logs of each class.
The system logs may be classified according to a field of the system log, for example: classification is based on "event level". However, the above classification method is less accurate because the frequency of occurrence of events is not determined by only a single field of the event level. A more reasonable way would be to mine the information in the system log as much as possible, improving classification accuracy. Thus in a specific embodiment, said classifying said structured system log comprises: converting the structured system log into a multidimensional vector; and processing the multidimensional vector through a clustering algorithm to obtain the category of the structured system log.
Although the different system logs are entirely different from a literal perspective, they are all made up of the same field class, differing only in the content of the fields. The system log can thus be regarded as a multidimensional vector, a dimension of which is a field whose content is the value of the multidimensional vector in that dimension.
The field text data may be digitized for the purpose of adapting the algorithm implementation. The discrete field setting provides convenience for digitization, i.e., a field is often valued from a discrete set of states. For example, an "event level" is typically represented by several fixed severity levels, in which case different numbers need only be assigned to different levels, e.g., 0-7 different severity levels are encoded, and other fields can be similarly processed using this method. For more accurate subsequent model training, normalization operations can be added on a coded basis. After the steps are carried out, the log data of a large number of text categories are digitized, and no information is lost. In connection with fig. 2, the encoded vector data is model trained on historical data within a certain period of time by using a clustering algorithm in machine learning (an unsupervised machine learning algorithm). The model obtained after training can be used for efficiently and automatically classifying the future system logs.
After classifying the system logs, different weights are given to each type of logs. The weight can be set directly through the experience of the specialized operation and maintenance, and can also be obtained through analysis of historical data in fixed time, and the specific mode can be as follows: historical data in a period of time is selected, the length of the selected period of time should comprise a complete service/hardware performance period as much as possible, the number of logs of different types is counted, and based on an entropy principle, the weight should be inversely proportional to the frequency of occurrence.
If the computing power of the system allows, the system logs can be directly weighted by means of a deep learning algorithm, namely, training (training data is given by a clustering algorithm and inverse weighting marking) is performed, so that the deep learning model can classify unknown logs and pay important attention to logs with lower occurrence frequency. In addition, the deep learning model may be periodically updated.
After determining the log weight of the system log, further determining the number of the logs and determining a system log set with abnormal number of the logs.
In a specific embodiment, the system log set for determining that the log number is abnormal includes: carrying out anomaly detection on the number of logs through a statistical algorithm; and carrying out secondary filtering on the detection result of the statistical algorithm through a preset filtering rule, and determining the system log set. Wherein, the abnormality detection of the log quantity by the statistical algorithm may include: and carrying out anomaly detection on the log quantity through a nsigma algorithm.
Specifically, the time slice interval is set to τ, and the number of logs in the [ t, t+τ ] section is counted. A sliding window mechanism is adopted, and the sliding time interval is mu; wherein μ < τ). Setting the size of a detection interval and an algorithm parameter n, and carrying out anomaly detection on a sequence to be detected formed by the number of logs in a sliding window through a statistical algorithm nsigma to detect the conditions of sudden increase and sudden decrease of the number of the logs.
And the system logs generated in the time slice interval and the weights corresponding to the system logs are weighted and added to obtain the number of the logs in the time slice.
In addition, whether the index has periodicity can be roughly judged, and different data segments are selected for splicing according to different period lengths. For example: no obvious periodicity exists in the application scene, and the t0 period before the current time point can be selected as a sequence to be detected. When the periodicity is short, approximately taking a day as a unit, the time period t0 before the current time point and the time period t0 before and after the same time point as the previous day can be selected to be spliced to be used as a sequence to be detected. Similarly, when the periodicity is longer, the period t0 before the current time point and the period t0 before and after the same time point of the previous week can be selected to be spliced to be used as a sequence to be detected.
The preset filtering rules can be set differently, so that detection is more in line with the actual application scene. For example, it is possible to detect only a large number of sudden increases in the number of logs, i.e. to add a filter condition with a positive slope. In this case, when the sudden increase and the sudden decrease in the number of logs are detected, the sudden decrease is filtered out and the sudden increase is retained. Or, on the basis of abnormality detection by a statistical algorithm nsigma, performing secondary filtering by adopting a dynamic threshold mode. The dynamic threshold may be described as a polynomial, which may be expressed as follows:
T=f(t 1 ,t 2 ,…,t n )=a 1 t 1 +a 2 t 2 +…+a n t n the method comprises the steps of carrying out a first treatment on the surface of the Wherein t is 1 ,t 2 ,…,t n Representing factors affecting the dynamic threshold, such as: equipment grade, service grade associated with equipment, equipment flow order, etc.; a, a 1 ,a 2 ,…,a n The weight of different factors in the dynamic threshold is shown, and the weight can be preset through operation and maintenance experience, and can also be obtained through statistical fitting through historical data.
In addition, in practical application, a certain periodicity exists in the number of logs in a single device or the same functional area, so that a periodic mode can be mined by using an algorithm based on reinforcement learning for part of core devices or core functional areas under the condition of calculation force permission, and the detection precision is improved. By training an agent, it can accurately judge whether an abnormality occurs according to the number and content of the observed logs, and successful detection will obtain positive feedback, while missed detection or false detection will result in a certain penalty. After a long training time, the agent can find an approximately optimal criterion to maximize the cumulative gain, from which it is decided whether and, if so, how much periodicity is set. In the reinforcement learning algorithm, the state of the environment may be defined as the content of the log or the number of units of time, the action of the agent may be defined as whether an abnormality is detected, and the reward mechanism is determined by whether it is an accurate prognosis.
S105: and assigning weights to different fields of the structured system log in the system log set.
By executing steps S101 to S104, a structured system log daily set determined as an abnormal number in a certain time slice can be obtained. Step S105 is aimed at selecting different weight strategies for different information amounts characterized by different fields of the system log in the system log set, and setting corresponding weights so as to perform abnormal detection of the system log.
In a specific embodiment, the setting the weights of different fields of the structured log in the log set with the abnormal log number includes:
and setting the time stamp, the event level, the log identification and the weight of the event detailed information of the structured log.
Specifically, the time stamp represents the system time when the system log is generated, and can be weight-differentiated by determining whether or not it is a peak business period. For example: 09:00-17:00 a day can be considered as the peak business period, and 17:00 a day-the next day 09:00 a day can be considered as the low business period, with the peak business period being weighted higher than the low business period.
The event level represents the severity of the event, typically expressed in terms of a numerical interval or several discrete severity levels. For example: the event level of the vendor 1 system log is represented by a range of numbers from 0 to 7, the smaller the number, the more critical, and the higher the priority. The event level of vendor 2 system log may be represented by three discrete levels, "Major," Warning, "and" Error. In general, the weights of the different severity levels should be proportional to the severity of their characterization, so a linear mechanism may be employed. Alternatively, a logarithmic mechanism may be used, where the properties and correlations of the data are not changed after taking the logarithm, but the scale of the variable may be compressed, which is specifically expressed as: the degree of sensitivity to small-value partial differences is higher than the degree of difference sensitivity to large-value partial differences. Thus, to increase the sensitivity to higher severity levels, the weighting selection strategy may be to take the reciprocal on a logarithmic basis.
The simple log anomaly detection can judge the important condition of the log through the weight selection strategy, but under certain complex conditions, the field only can indicate the anomaly condition of a single attribute of a single device and can not indicate the actual influence degree of the event on the service, the network and other devices. Therefore, this embodiment does not directly use this field as an index of log anomaly detection, but as part of system log analysis.
Log identification is a brief introduction to events and event changes, as well as a general classification of different types of logs. In general, a device has a limited number of message types, such as "user login", "password error", "timeout exit", "illegal instruction", etc. Thus, the weight of the log identifier can be directly defined as several discrete values, and is appropriately adjusted according to the actual situation during the verification process.
The event detailed information is character string information actually output to the information center by each module, and is filled in each output by each module, and the specific content of the log is described in detail. Different devices have different ways of describing events, but the event details can mostly be parsed into event templates (constant parts) with some specific parameters (variable parts: e.g. IP address information, user information, interface information, etc.). The constant part is composed of fixed plain text and represents the corresponding event type. The constant part of the event detail information is similar to the subset that is further divided under the message type field, so it is also typically selected from a set of a limited number of types (facilitating the numerical conversion). The variable portion records the difference portion of the log/runtime information, status and values of parameters (e.g., port numbers, etc.), which may vary depending on the occurrence of different events (this portion facilitates event localization and specific analysis).
The method comprises the steps of distributing weights to different event detailed information, and firstly classifying the event detailed information, wherein one mode is to automatically separate a constant part and a variable part of an original system log, namely off-line parameterization or template extraction, and matching the system log with a fixed template on line.
Extraction of templates may be accomplished in different ways, for example: 1) A predefined regular expression is used. 2) A correlation analysis algorithm is used. 3) Other rule-based or machine learning algorithm based log parsers are used. For example: manually checking system logs or writing rules according to operation and maintenance domain knowledge to detect, and additionally adding keyword searching (such as 'warning', 'error', etc.); text recognition and classification using LSTM (long short term memory network) based and the like.
In a specific embodiment, assigning weights to event details of the structured system log includes: extracting an event detailed information template; establishing an abnormal word stock according to the event detailed information template; and giving weight to the words in the abnormal word stock. Wherein, the extracting event detailed information template may include: and extracting event detailed information templates by using the FT-Tree.
FT-Tree is an intelligent algorithm based on a frequent pattern Tree model for identifying frequent term combinations of words in a system log to generate a message template. Compared with a template extraction mode based on a rule or regular expression, the FT-Tree accuracy is higher, and incremental learning is supported. The definition of the abnormal word stock in this embodiment does not only include a keyword set like "warning" or "error", but any word stock that may carry "abnormal" information in the historical data sample. The method comprises the following three steps:
extracting event detail information templates by using FT-Tree: a correct template is typically a combination of words that frequently occur in the system log. In addition, the FT-Tree can reject the variable part in the system log by pruning while building the template Tree, so that the separation of the constant part and the variable part can be automatically completed.
Specifically, the establishment of an abnormal word stock: a trained log template library (each member in the library is a combination of event detail information words) can be obtained through FT-Tree, and the embodiment extracts an available abnormal word library based on the log template library by the following method: 1) Counting the collection of all the occurrence words (without distinguishing parts of speech) in the log template library; 2) Counting the frequency of each word in the word set in the training set; 3) Selecting a proper interval according to the statistical information after manually setting or drawing the frequency distribution histogram, and adding words with definition frequency in the interval into an abnormal word stock; 4) The number of the abnormal word banks can be multiple, the abnormal levels of different abnormal word banks are different, and the selected word frequency intervals are different.
Weights are given to words in the abnormal word bank: the weight of each word in the abnormal word library can be calculated by the following mechanism: 1) Taking the inverse proportion of the frequency of each word, and normalizing; 2) Based on the principle of "entropy", the weight is defined as the size of the word "information amount".
In summary, according to the log analysis method provided by the application, log analysis is performed hierarchically, on the basis of automatically converting an unstructured system log into a structured system log, log weights are set on the system log, the number of the logs is determined based on the log weights, and then abnormal log number detection is performed, and weights are distributed to different fields of the system log with abnormal log number, so that abnormal system log detection is performed according to the system log fields and the weights thereof. Compared with the traditional technical scheme of manual analysis and log analysis by various machine learning algorithm methods and the like, the log analysis method can realize high-efficiency and lightweight log analysis.
The present application also provides a log analysis device, which is described below and can be referred to in correspondence with the above-described method. Referring to fig. 3, fig. 3 is a schematic diagram of a log analysis device according to an embodiment of the present application, and in combination with fig. 3, the device includes:
a conversion module 10 for converting the unstructured system log into a structured system log;
a classification module 20, configured to classify the structured system log, and determine a log weight of the structured system log according to a class of the structured system log;
the log number determining module 30 is configured to obtain the log number in the preset time slice according to the structured system log and the corresponding log weight generated in the preset time slice;
a log set determining module 40, configured to determine a system log set with an abnormal log number according to the log number;
an allocation module 50 is configured to allocate weights to different fields of the structured system log in the system log set.
Based on the above embodiment, as a specific implementation manner, the classification module 20 includes:
a conversion unit, configured to convert the structured system log into a multidimensional vector;
and the processing unit is used for processing the multidimensional vector through a clustering algorithm to obtain the category of the structured system log.
Based on the above embodiments, as a specific implementation, the allocation module 50 is specifically configured to:
and assigning weights to the time stamps, event levels, log identifications and event detailed information of the structured system log.
Based on the above embodiment, as a specific implementation manner, the distribution module 50 includes:
the extraction unit is used for extracting the event detailed information template;
the building unit is used for building an abnormal word stock according to the event detailed information template;
and the assignment unit is used for assigning weights to the words in the abnormal word bank.
On the basis of the above embodiment, as a specific implementation manner, the extracting unit is specifically configured to:
and extracting event detailed information templates by using the FT-Tree.
Based on the above embodiment, as a specific implementation manner, the log set determining module 40 includes:
the detection unit is used for carrying out abnormal detection on the number of the logs through a statistical algorithm;
and the filtering unit is used for carrying out secondary filtering on the detection result of the statistical algorithm through a preset filtering rule, and determining the system log set.
On the basis of the above embodiment, as a specific implementation manner, the detection unit is specifically configured to:
and carrying out anomaly detection on the log quantity through a nsigma algorithm.
According to the log analysis device, log analysis is performed hierarchically, on the basis of automatically converting unstructured system logs into structured system logs, log weights are set for the system logs, the number of the logs is determined based on the log weights, and then abnormal log number detection is performed, weights are distributed to different fields of the system logs with abnormal log number, so that abnormal system log detection is performed according to the system log fields and the weights of the system log fields. Compared with the traditional technical scheme of manual analysis and log analysis by various machine learning algorithm methods and the like, the log analysis device can realize high-efficiency and lightweight log analysis.
The present application also provides a log analysis device, as shown with reference to fig. 4, comprising a memory 1 and a processor 2.
A memory 1 for storing a computer program;
a processor 2 for executing a computer program to perform the steps of:
converting the unstructured system log into a structured system log;
classifying the structured system logs, and determining the log weight of the structured system logs according to the classification of the structured system logs;
obtaining the number of logs in a preset time slice according to the structured system logs and the corresponding log weights generated in the preset time slice;
determining a system log set with abnormal log quantity according to the log quantity;
and assigning weights to different fields of the structured system log in the system log set.
For the description of the apparatus provided in the present application, reference is made to the above method embodiments, and the description is omitted herein.
The present application also provides a computer readable storage medium having a computer program stored thereon, which when executed by a processor, performs the steps of:
converting the unstructured system log into a structured system log;
classifying the structured system logs, and determining the log weight of the structured system logs according to the classification of the structured system logs;
obtaining the number of logs in a preset time slice according to the structured system logs and the corresponding log weights generated in the preset time slice;
determining a system log set with abnormal log quantity according to the log quantity;
and assigning weights to different fields of the structured system log in the system log set.
The computer readable storage medium may include: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
For the description of the computer-readable storage medium provided in the present application, reference is made to the above method embodiments, and the description is omitted herein.
In the description, each embodiment is described in a progressive manner, and each embodiment is mainly described by the differences from other embodiments, so that the same similar parts among the embodiments are mutually referred. For the apparatus, device and computer readable storage medium of the embodiment disclosure, since it corresponds to the method of the embodiment disclosure, the description is relatively simple, and the relevant points refer to the description of the method section.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The log analysis method, apparatus, device and computer readable storage medium provided in the present application are described in detail above. Specific examples are set forth herein to illustrate the principles and embodiments of the present application, and the description of the examples above is only intended to assist in understanding the methods of the present application and their core ideas. It should be noted that it would be obvious to those skilled in the art that various improvements and modifications can be made to the present application without departing from the principles of the present application, and such improvements and modifications fall within the scope of the claims of the present application.

Claims (10)

1. A method of log analysis, comprising:
converting the unstructured system log into a structured system log;
classifying the structured system logs, and determining the log weight of the structured system logs according to the classification of the structured system logs;
obtaining the number of logs in a preset time slice according to the structured system logs and the corresponding log weights generated in the preset time slice;
determining a system log set with abnormal log quantity according to the log quantity;
and assigning weights to different fields of the structured system log in the system log set.
2. The log analysis method of claim 1, wherein the classifying the structured system log comprises:
converting the structured system log into a multidimensional vector;
and processing the multidimensional vector through a clustering algorithm to obtain the category of the structured system log.
3. The method of log analysis according to claim 1, wherein said assigning weights to different fields of the structured system log in the system log set comprises:
and assigning weights to the time stamps, event levels, log identifications and event detailed information of the structured system log.
4. The log analysis method of claim 1, wherein assigning weights to event details of the structured system log comprises:
extracting an event detailed information template;
establishing an abnormal word stock according to the event detailed information template;
and giving weight to the words in the abnormal word stock.
5. The log analysis method as defined in claim 4, wherein the extracting the event detail information template comprises:
and extracting event detailed information templates by using the FT-Tree.
6. The log analysis method as claimed in claim 1, wherein the determining the system log set with abnormal log number comprises:
carrying out anomaly detection on the number of logs through a statistical algorithm;
and carrying out secondary filtering on the detection result of the statistical algorithm through a preset filtering rule, and determining the system log set.
7. The log analysis method as defined in claim 6, wherein the abnormality detection of the number of logs by a statistical algorithm comprises:
and carrying out anomaly detection on the log quantity through a nsigma algorithm.
8. A log analysis device, comprising:
the conversion module is used for converting the unstructured system log into a structured system log;
the classification module is used for classifying the structured system logs and determining the log weight of the structured system logs according to the types of the structured system logs;
the log quantity determining module is used for obtaining the log quantity in the preset time slice according to the structured system log generated in the preset time slice and the corresponding log weight;
the log set determining module is used for determining a system log set with abnormal log quantity according to the log quantity;
and the distribution module is used for distributing weights to different fields of the structured system log in the system log set.
9. A log analysis device, characterized by comprising:
a memory for storing a computer program;
a processor for implementing the steps of the log analysis method according to any one of claims 1 to 7 when executing the computer program.
10. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the log analysis method according to any of claims 1 to 7.
CN202211704431.7A 2022-12-29 2022-12-29 Log analysis method and related device Pending CN116126807A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211704431.7A CN116126807A (en) 2022-12-29 2022-12-29 Log analysis method and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211704431.7A CN116126807A (en) 2022-12-29 2022-12-29 Log analysis method and related device

Publications (1)

Publication Number Publication Date
CN116126807A true CN116126807A (en) 2023-05-16

Family

ID=86309457

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211704431.7A Pending CN116126807A (en) 2022-12-29 2022-12-29 Log analysis method and related device

Country Status (1)

Country Link
CN (1) CN116126807A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116701147A (en) * 2023-06-12 2023-09-05 北京优特捷信息技术有限公司 Log data processing method, device, equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116701147A (en) * 2023-06-12 2023-09-05 北京优特捷信息技术有限公司 Log data processing method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN111639497B (en) Abnormal behavior discovery method based on big data machine learning
CN111506478A (en) Method for realizing alarm management control based on artificial intelligence
CN111027615B (en) Middleware fault early warning method and system based on machine learning
CN113190421A (en) Detection and analysis method for equipment health state of data center
CN109992484B (en) Network alarm correlation analysis method, device and medium
CN111176953B (en) Abnormality detection and model training method, computer equipment and storage medium
CN112990656A (en) Health evaluation system and health evaluation method for IT equipment monitoring data
CN114185760A (en) System risk assessment method and device and charging equipment operation and maintenance detection method
CN115858794B (en) Abnormal log data identification method for network operation safety monitoring
CN116737510B (en) Data analysis-based intelligent keyboard monitoring method and system
CN112906738A (en) Water quality detection and treatment method
CN116126807A (en) Log analysis method and related device
CN116668039A (en) Computer remote login identification system and method based on artificial intelligence
CN116932523B (en) Platform for integrating and supervising third party environment detection mechanism
CN113891342B (en) Base station inspection method and device, electronic equipment and storage medium
CN117370548A (en) User behavior risk identification method, device, electronic equipment and medium
CN113704389A (en) Data evaluation method and device, computer equipment and storage medium
CN116383645A (en) Intelligent system health degree monitoring and evaluating method based on anomaly detection
CN116030955A (en) Medical equipment state monitoring method and related device based on Internet of things
CN113469247A (en) Network asset abnormity detection method
CN114528909A (en) Unsupervised anomaly detection method based on flow log feature extraction
CN115293379B (en) Knowledge graph-based on-orbit spacecraft equipment anomaly detection method
CN117828539B (en) Intelligent data fusion analysis system and method
CN116976318A (en) Intelligent auditing system for switching operation ticket of power grid based on deep learning and model reasoning
CN117972595A (en) Method, system, device and medium for analyzing electric charge abnormality

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination