CN110389874B - Method and device for detecting log file abnormity - Google Patents

Method and device for detecting log file abnormity Download PDF

Info

Publication number
CN110389874B
CN110389874B CN201810359152.9A CN201810359152A CN110389874B CN 110389874 B CN110389874 B CN 110389874B CN 201810359152 A CN201810359152 A CN 201810359152A CN 110389874 B CN110389874 B CN 110389874B
Authority
CN
China
Prior art keywords
log
information
logic diagram
classification
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810359152.9A
Other languages
Chinese (zh)
Other versions
CN110389874A (en
Inventor
付瑞林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BYD Co Ltd
Original Assignee
BYD Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BYD Co Ltd filed Critical BYD Co Ltd
Priority to CN201810359152.9A priority Critical patent/CN110389874B/en
Publication of CN110389874A publication Critical patent/CN110389874A/en
Application granted granted Critical
Publication of CN110389874B publication Critical patent/CN110389874B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • G06F16/122File system administration, e.g. details of archiving or snapshots using management policies
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1805Append-only file systems, e.g. using logs or journals to store data
    • G06F16/1815Journaling file systems

Abstract

The invention discloses a log file abnormity detection method and a device, wherein the method comprises the following steps: obtaining a log in a log file; characterizing the log to extract the corresponding features of the log; classifying the log according to the characteristics, and acquiring classification information corresponding to the log; constructing a logic diagram of a log file according to the classification information corresponding to the log; and determining an exception in the log file according to the logic diagram. According to the log file abnormity detection method provided by the embodiment of the invention, the log in the log file is obtained, the log is characterized so as to extract the characteristics corresponding to the log, the log is classified according to the characteristics, the classification information corresponding to the log is obtained, the logic diagram of the log file is constructed according to the classification information corresponding to the log, and the abnormal part in the log file is determined according to the logic diagram, so that the abnormal part can be visually embodied, the problem in the system operation can be simply and conveniently determined, and the maintenance efficiency of a programmer is improved.

Description

Method and device for detecting log file abnormity
Technical Field
The invention relates to the technical field of information processing, in particular to a log file abnormity detection method and device.
Background
With the advent of the information-oriented era, more and more fields have started to use intelligent control systems to replace the traditional manual control mode to realize the control problem of complex systems which is difficult to solve. For example: in a complex system such as an Automatic Train monitoring system (ATS) in a rail transit, a plurality of operation bodies and a plurality of programs may continuously operate according to their own logics. In the process of developing and debugging the system, the system performance abnormity is easy to occur, but the key problem has no condition from the beginning of the search. At present, a mode of recording a log of a system is mainly adopted, and a position where a problem is generated is calibrated through mark information appearing in the log, so that error positioning is realized. However, the above method requires a programmer to manually locate and analyze the system based on his own experience, which is not convenient, intuitive and inefficient.
Disclosure of Invention
The invention provides a log file abnormity detection method and device, and aims to solve at least one of the technical problems.
The embodiment of the invention provides a log file abnormity detection method, which comprises the following steps:
obtaining a log in a log file;
characterizing the log to extract features corresponding to the log, wherein the features comprise first encoding information and second encoding information;
classifying the log according to the characteristics, and acquiring classification information corresponding to the log;
constructing a logic diagram of the log file according to the classification information corresponding to the log; and
and determining an abnormal part in the log file according to the logic diagram.
Optionally, characterizing the log to extract features corresponding to the log, including:
extracting predetermined format information in the log based on a regular expression, and generating the first encoding information;
and encoding the text content in the log after the information with the preset format is extracted to generate the second encoding information.
Optionally, classifying the log according to the features, and acquiring classification information corresponding to the log, including:
acquiring the length of first coding information of the log;
inputting the length of the first coding information and the first coding information into a decision tree, classifying by using the decision tree, and determining a first classification number corresponding to the log;
inputting the second coding information into the decision tree, classifying by using the decision tree, and determining a second classification number corresponding to the log;
and generating classification information corresponding to the log according to the first classification number and the second classification number.
Optionally, constructing a logic diagram of the log file according to the classification information corresponding to the log, including:
taking classification information corresponding to the log as a node in the logic diagram;
and counting the skipping probability among the classified information, and taking the skipping probability as an edge in the logic diagram.
Optionally, determining an exception in the log file according to the logic diagram includes:
comparing the skipping probability corresponding to the edges in the logic diagram with a preset probability, and determining the edges with the skipping probability lower than the preset probability as abnormal positions; or
And comparing the logic diagram with a historical logic diagram, and determining that the nodes or edges of the logic diagram which are inconsistent with the historical logic diagram are abnormal positions.
Optionally, after determining an exception in the log file according to the logic diagram, the method further includes:
and generating abnormal reminding information.
Optionally, the method further comprises:
after the length of first coding information of the log is obtained, calculating a length deviation value of the log according to the length of the first coding information;
determining a log with the largest length dispersion value in the log file;
and manually detecting whether the log with the largest length dispersion value is abnormal or not.
Another embodiment of the present invention provides a log file abnormality detection apparatus, including:
the acquisition module is used for acquiring log logs in the log file;
the extraction module is used for characterizing the log so as to extract the characteristics corresponding to the log, wherein the characteristics comprise first coding information and second coding information;
the classification module is used for classifying the log according to the characteristics and acquiring classification information corresponding to the log;
the construction module is used for constructing a logic diagram of the log file according to the classification information corresponding to the log; and
and the determining module is used for determining the abnormal part in the log file according to the logic diagram.
Optionally, the extracting module is configured to:
extracting predetermined format information in the log based on a regular expression, and generating the first encoding information;
and encoding the text content in the log after the information with the preset format is extracted to generate the second encoding information.
Optionally, the classification module includes:
an obtaining unit configured to obtain a length of first encoding information of the log;
the first classification unit is used for inputting the length of the first coding information and the first coding information into a decision tree, classifying by using the decision tree, and determining a first classification number corresponding to the log;
the second classification unit is used for inputting the second coding information into the decision tree, classifying by using the decision tree and determining a second classification number corresponding to the log;
and the generating unit is used for generating the classification information corresponding to the log according to the first classification number and the second classification number.
Optionally, the building module is configured to:
taking classification information corresponding to the log as a node in the logic diagram;
and counting the skipping probability among the classified information, and taking the skipping probability as an edge in the logic diagram.
Optionally, the determining module is configured to:
comparing the skipping probability corresponding to the edges in the logic diagram with a preset probability, and determining the edges with the skipping probability lower than the preset probability as abnormal positions; or
And comparing the logic diagram with a historical logic diagram, and determining that the nodes or edges of the logic diagram which are inconsistent with the historical logic diagram are abnormal positions.
Optionally, the method further includes:
and the reminding module is used for generating abnormal reminding information after determining the abnormal part in the log file according to the logic diagram.
Optionally, the apparatus further comprises:
and the recommending module is used for recommending the classified high-quality anchor program to audiences in the live broadcast platform based on a preset rule.
Optionally, the classification module further includes:
a calculating unit, configured to calculate a length deviation value of the log according to a length of first encoded information of the log after acquiring the length of the first encoded information;
the determining unit is used for determining the log with the largest length dispersion value in the log file;
and the detection unit is used for manually detecting whether the log with the largest length dispersion value is abnormal or not.
A further embodiment of the present invention provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the log file abnormality detection method according to the embodiment of the first aspect of the present invention.
Yet another embodiment of the present invention provides an electronic device, which includes a processor, a memory, and a computer program stored in the memory and executable on the processor, where the processor is configured to execute the log file abnormality detection method described in the first embodiment of the present invention.
The technical scheme provided by the embodiment of the invention has the following beneficial effects:
the log file is characterized by obtaining the log in the log file to extract the corresponding features of the log, classifying the log according to the features, obtaining the classification information corresponding to the log, constructing a logic diagram of the log file according to the classification information corresponding to the log, and determining the abnormal part in the log file according to the logic diagram, so that the abnormal part can be visually embodied, the problem in the system operation can be simply and conveniently determined, and the maintenance efficiency of a programmer can be improved.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a flow diagram of a log file anomaly detection method according to one embodiment of the present invention;
FIG. 2 is a flow diagram of obtaining classification information corresponding to a log according to one embodiment of the invention;
FIG. 3 is a schematic diagram of the effect of a logic diagram of a log file according to one embodiment of the invention;
FIG. 4 is a flow diagram of a log file anomaly detection method according to another embodiment of the present invention;
FIG. 5 is a flow diagram of a log file anomaly detection method according to yet another embodiment of the present invention;
fig. 6 is a block diagram showing the configuration of a log file abnormality detection apparatus according to an embodiment of the present invention;
fig. 7 is a block diagram of the structure of a log file abnormality detection apparatus according to another embodiment of the present invention;
fig. 8 is a block diagram of the structure of a log file abnormality detecting apparatus according to still another embodiment of the present invention;
fig. 9 is a block diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.
The log file abnormality detection method and apparatus according to the embodiment of the present invention are described below with reference to the drawings.
Fig. 1 is a flowchart of a log file anomaly detection method according to one embodiment of the present invention.
As shown in fig. 1, the log file abnormality detection method includes:
s101, obtaining log logs in the log file.
With the advent of the information age, control systems are becoming more and more complex. In the operation and maintenance process of the system, a programmer mainly searches, locates and analyzes problems generated during the operation of the system through a log of the system. However, log logs are only a location of problems and do not provide some data that is more intuitive for a programmer to analyze. Therefore, the invention provides a log file abnormity detection method, which can be used for rapidly detecting problems existing in the operation of a system.
In one embodiment of the invention, a log in a log file may be obtained. During the operation of the system, a large number of log files are generated every day, and each log file contains a log of a plurality of entries. And these log files are typically saved to a log server. Therefore, when abnormality detection is required for the log file, the log file to be analyzed can be extracted from the log server.
S102, the log is characterized so as to extract the corresponding features of the log.
Wherein the characteristic may include first encoded information and second encoded information.
Specifically, the predetermined format information in the log may be extracted based on a regular expression, and first encoding information may be generated, and then the text content in the log after the predetermined format information is extracted may be encoded to generate second encoding information. The log usually includes information with a certain format, such as process information, source program information, and a timestamp, and the content of the information can provide data support for a programmer. Therefore, characterizing a log is the process of extracting and encoding features.
For example: the log of an entry is:
“u'2017-09-23 06:37:57.270[main]info
o.s.c.a.annotationconfigapplicationcontext-refreshing
org.springframework.context.annotation.annotationconfigapplicationcontext@7637f22:startup date[sat sep 23 06:37:57cst 2017];root of context hierarchy\n'”
after the regular expression extraction, the obtained information is as follows: [ 2017-09-2306: 37:57.270], [ [ main ] ], [ info ], and [ o.s.c.a.innotationationfigapplicationcontext ], which may be encoded as 01, 02, 03, and 04, respectively. This is the first encoded information.
After that, the text content in the log can be encoded to generate second encoded information. The method mainly aims to extract two key features, namely word frequency and word sequence, in text content. For example: some log is
1, removeaddr,/172.24.0.18, { "header _ info" { "interface _ type" - "," send _ finder "-", "receive _ finder" - "," … … "-" t _ stamp "- {" sec "-," usec "-" zzl "-", }, wherein "interface _ type", "send _ finder", "receive _ finder", "t _ stamp" and the like belong to the same level of features and can be coded as 1; and "sec", "usec", "zzl" is the feature of the next level of "t _ stamp", and can be coded as 0 for the purpose of distinguishing. Therefore, the generated encoded information (second encoded information) is 1111000.
Of course, the encoding may also be performed in a word order encoding manner. Header _ info, interface _ type, receive _ render, remoteaddr, sec, sessionid, send _ render, t _ stamp, usec, zzl, etc. in the log may be encoded in order based on a preset encoding table, and thus, the encoded information (second encoded information) 74126385910 may be obtained. The preset encoding table can be as shown in table 1.
header_info 1
inface_type 2
receive_vender 3
remoteAddr 4
sec 5
send_vender 6
sessionId 7
t_stamp 8
usec 9
zzl 10
TABLE 1
When there are phrases in the log that have not appeared before, the phrases can be represented by the codes-1, -2, … … in order increasing. After the operation on the same day is finished, or other outage time of the system, the code table can be updated.
S103, classifying the log according to the characteristics, and acquiring classification information corresponding to the log.
After the features corresponding to the log are extracted, the log can be classified according to the features, and classification information corresponding to the log is obtained.
Specifically, as shown in fig. 2, the following steps may be included:
s201, obtaining the length of the first coding information of the log.
S202, inputting the length of the first coding information and the first coding information into a decision tree, and classifying by using the decision tree so as to determine a first classification number corresponding to the log.
S203, inputting the second coding information into the decision tree, classifying by using the decision tree, and determining a second classification number corresponding to the log.
And S204, generating classification information corresponding to the log according to the first classification number and the second classification number.
In the classification, clustering may be performed using both a decision tree and a KNN (k-nearest neighbor) algorithm.
In this embodiment, a decision tree algorithm is mainly used to classify the log. The reason is that most log logs in the system have different information lengths, and a word order coding mode can be used together with a decision tree branching mode, so that the operation amount can be effectively reduced, and the operation speed is improved.
In particular, the feature encoding length l of each log may be calculatediI ∈ {1,2, …, N }, i.e., the length of the first encoded information. Wherein n is the total number of log logs. The length of the first coding information and the first coding information can be input into a decision tree, branching is carried out through the decision tree, and finally the node meeting the conditions is the first classification number corresponding to the log. At this point, each log has been assigned a classification node. Log logs on the same classification node have the same code length, source information, city-entering information, etc. Then, for each classification node, the sequential code (second coding information) of the log is used as a feature, and the log is input into the decision tree for bifurcation until the log in the leaf node of each classification node contains the same coding information, that is, the classification is finished. By the method, each log is allocated with two classification numbers, namely the classification number (first classification number) of the characteristic information and the classification number (second classification number) of the coded information, and the two classification numbers jointly determine the classification information to which the log belongs. The format is as follows: (second class number. first class number), e.g., (-1.0), (3204.0), etc.
And S104, constructing a logic diagram of the log file according to the classification information corresponding to the log.
In generating log correspondencesAfter the information is classified, a logic diagram of the log file can be constructed according to the classification information corresponding to the log. Specifically, the classification information corresponding to the log may be used as nodes in the logic diagram, and then the jump probabilities between the classification information are counted, and the jump probabilities are used as edges in the logic diagram. Wherein each classification information includes an inflow and an outflow. The inflow node is the classification to which the first log of the log file belongs, and the outflow node is the classification to which the last log of the log file belongs. For example: for class 1cl1And class 3cl3Class 1cl1Inflow Classification 3cl3Can be expressed as Pr { cl1→cl3cl1}=p1→3I.e. the probability that class 3 appears immediately after class 1. This forms an edge from node 1 to node 3. By statistically analyzing the inflow and outflow of all nodes in the log file, a complete logic diagram as shown in fig. 3 can be formed. As can be seen from FIG. 3, the probability of node (-1,0) flowing into node (211.0) is 0.02, while the inverse, the probability of node (211.0) flowing into node (-1,0) is 0.0012.
The constructed logic diagram can intuitively reflect the operation hierarchy, the parallel relation, the communication relation and the like in the system. Such as: the process information in the predetermined format information can reflect which computer in the network or which process on the computer the program corresponding to the log is generated from, and can generally reflect the concurrency of the program and even the attribution information of the system. Through the construction of the logic diagram, the most detailed program circulation process can be presented for the programmer, so that the programmer in charge of system debugging can integrally know the circulation relation among the log logs of each operation main body, and help is provided for the programmer to analyze the log logs.
And S105, determining an abnormal part in the log file according to the logic diagram.
After the logic diagram is successfully built, an exception in the log file may be determined from the logic diagram.
Specifically, the jump probability corresponding to the edge in the logic diagram may be compared with a preset probability, and the edge with the jump probability lower than the preset probability is determined as an abnormal part. For example: if the jump probability of an edge is 0.0012 and is lower than the preset probability 0.01, it indicates that the edge jumps from the start node of the edge to the destination node of the edge, that is, if the probability of the occurrence of the corresponding event is too low, it indicates that the edge is abnormal.
Of course, the logic diagram may also be compared with the historical logic diagram to determine that the node or edge where the logic diagram is inconsistent with the historical logic diagram is an abnormal point. For example, in the logic diagram of today, a jump relation which does not appear in the logic diagram of yesterday appears, which may imply that a certain part of the program is not running, and a system exception is caused. The abnormality can be visually seen from the figure.
According to the log file abnormity detection method provided by the embodiment of the invention, the log in the log file is obtained, the log is characterized so as to extract the characteristics corresponding to the log, the log is classified according to the characteristics, the classification information corresponding to the log is obtained, the logic diagram of the log file is constructed according to the classification information corresponding to the log, and the abnormal part in the log file is determined according to the logic diagram, so that the abnormal part can be visually embodied, the problem in the system operation can be simply and conveniently determined, and the maintenance efficiency of a programmer is improved.
As shown in fig. 4, the log file abnormality detection method may further include:
and S106, generating abnormal reminding information.
After the abnormal part in the log file is determined according to the logic diagram, abnormal reminding information can be generated, so that a programmer is reminded, and the programmer is helped to timely deal with problems in the system operation process.
As shown in fig. 5, the log file abnormality detection method may further include:
and S205, calculating a length deviation value of the log according to the length of the first coded information.
After calculating the length of the first encoded information for each log, the mean μ of the lengths of the first encoded information, and the variance σ of the lengths of the first encoded information for each two log logs can also be calculated2Further calculate the length deviation of log
Figure BDA0001635500340000081
S206, determining the log with the largest length dispersion value in the log file.
And S207, manually detecting whether the log with the largest length dispersion value is abnormal or not.
Most log logs will be within a certain range of length. A certain range here refers to a cognitive limit, such as a log having 400 rows of content, and it is likely that a plurality of events are piled up in the log. Therefore, the log with the largest dispersion value in the log file can be checked in a sampling mode, and whether the log is abnormal or not can be determined manually by a programmer. If the programmer considers the log to be abnormal, the log can be analyzed to process corresponding faults, and therefore normal operation of the system is guaranteed.
In order to implement the foregoing embodiment, the present invention further provides a log file abnormality detection apparatus, and fig. 6 is a block diagram of a structure of the log file abnormality detection apparatus according to an embodiment of the present invention, and as shown in fig. 6, the apparatus includes an obtaining module 610, an extracting module 620, a classifying module 630, a constructing module 640, and a determining module 650.
The obtaining module 610 is configured to obtain a log in a log file.
And an extracting module 620, configured to perform characterization on the log to extract features corresponding to the log, where the features include first encoding information and second encoding information.
The classification module 630 is configured to classify the log according to the features and obtain classification information corresponding to the log.
And the building module 640 is configured to build a logic diagram of the log file according to the classification information corresponding to the log.
A determining module 650 for determining an exception in the log file according to the logic diagram.
Wherein, the classification module 630 further includes an obtaining unit 631, a first classification unit 632, a second classification unit 633 and a generating unit 634.
The obtaining unit 631 is configured to obtain a length of the first encoded information of the log.
The first classification unit 632 is configured to input the length of the first encoded information and the first encoded information into a decision tree, perform classification by using the decision tree, and determine a first classification number corresponding to the log.
The second classification unit 633 is configured to input the second encoding information into the decision tree, perform classification by using the decision tree, and determine a second classification number corresponding to the log.
The generating unit 634 is configured to generate classification information corresponding to the log according to the first classification number and the second classification number.
As shown in fig. 7, the log file abnormality detection apparatus may further include a reminder module 660.
And the reminding module 660 is used for generating the abnormal reminding information after determining the abnormal part in the log file according to the logic diagram.
As shown in fig. 8, the classification module 630 may further include a calculation unit 635, a determination unit 636, and a detection unit 637.
Wherein, the calculating unit 635 is configured to calculate a length deviation value of the log according to the length of the first encoded information after acquiring the length of the first encoded information of the log.
And a determining unit 636, configured to determine a log with a largest length deviation value in the log file.
The detecting unit 637 is configured to manually detect whether the log with the largest dispersion value is abnormal.
It should be noted that the foregoing explanation of the log file abnormality detection method is also applicable to the log file abnormality detection apparatus in the embodiment of the present invention, and details not disclosed in the embodiment of the present invention are not repeated herein.
According to the log file abnormity detection device provided by the embodiment of the invention, the log in the log file is obtained, the log is characterized so as to extract the characteristics corresponding to the log, the log is classified according to the characteristics, the classification information corresponding to the log is obtained, the logic diagram of the log file is constructed according to the classification information corresponding to the log, and the abnormal part in the log file is determined according to the logic diagram, so that the abnormal part can be visually embodied, the problem in the system operation can be simply and conveniently determined, and the maintenance efficiency of a programmer is improved.
In order to implement the above embodiments, the present invention further provides an electronic device.
As shown in fig. 9, the electronic device 900 includes a processor 910, a memory 920, and a computer program 901 stored in the memory 920 and executable on the processor 910, and the processor 910 is configured to execute the log file abnormality detection method according to the embodiment of the first aspect of the present invention.
For example, a computer program may be executed by a processor to perform a log file anomaly detection method of:
s101', obtaining log logs in the log file.
S102', the log is characterized so as to extract the corresponding features of the log, wherein the features comprise first coding information and second coding information.
S103', classifying the log according to the characteristics, and acquiring classification information corresponding to the log.
And S104', constructing a logic diagram of the log file according to the classification information corresponding to the log.
S105', determining an abnormal part in the log file according to the logic diagram.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.
Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.
The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device. More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a random access memory (ram), a read-only memory (rom), an erasable programmable read-only memory (eeprom or flash memory), an optical fiber device, and a portable compact disc read-only memory (cdrom). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for realizing a logic function for a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a programmable gate array (pga), a field programmable gate array (fpga), and the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware that is related to instructions of a program, and the program may be stored in a computer-readable storage medium, and when executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a separate product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc. Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (14)

1. A log file anomaly detection method is characterized by comprising the following steps:
obtaining a log in a log file;
characterizing the log to extract features corresponding to the log, wherein the features comprise first encoding information and second encoding information;
classifying the log according to the characteristics, and acquiring classification information corresponding to the log;
constructing a logic diagram of the log file according to the classification information corresponding to the log; and
determining an exception in the log file according to the logic diagram;
wherein, the logic diagram of the log file is constructed according to the classification information corresponding to the log, and the logic diagram comprises the following steps:
taking classification information corresponding to the log as a node in the logic diagram;
and counting the skipping probability among the classified information, and taking the skipping probability as an edge in the logic diagram.
2. The method of claim 1, wherein characterizing the log to extract features corresponding to the log comprises:
extracting predetermined format information in the log based on a regular expression, and generating the first encoding information;
and encoding the text content in the log after the information with the preset format is extracted to generate the second encoding information.
3. The method of claim 1, wherein classifying the log according to the features and obtaining classification information corresponding to the log comprises:
acquiring the length of first coding information of the log;
inputting the length of the first coding information and the first coding information into a decision tree, classifying by using the decision tree, and determining a first classification number corresponding to the log;
inputting the second coding information into the decision tree, classifying by using the decision tree, and determining a second classification number corresponding to the log;
and generating classification information corresponding to the log according to the first classification number and the second classification number.
4. The method of claim 1, wherein determining anomalies in the log file from the logic map comprises:
comparing the skipping probability corresponding to the edges in the logic diagram with a preset probability, and determining the edges with the skipping probability lower than the preset probability as abnormal positions; or
And comparing the logic diagram with a historical logic diagram, and determining that the nodes or edges of the logic diagram which are inconsistent with the historical logic diagram are abnormal positions.
5. The method of claim 1, after determining an anomaly in the log file from the logic map, further comprising:
and generating abnormal reminding information.
6. The method of claim 3, further comprising:
after the length of first coding information of the log is obtained, calculating a length deviation value of the log according to the length of the first coding information;
determining a log with the largest length dispersion value in the log file;
and manually detecting whether the log with the largest length dispersion value is abnormal or not.
7. An apparatus for detecting an abnormality in a log file, comprising:
the acquisition module is used for acquiring log logs in the log file;
the extraction module is used for characterizing the log so as to extract the characteristics corresponding to the log, wherein the characteristics comprise first coding information and second coding information;
the classification module is used for classifying the log according to the characteristics and acquiring classification information corresponding to the log;
the construction module is used for constructing a logic diagram of the log file according to the classification information corresponding to the log; and
the determining module is used for determining an abnormal part in the log file according to the logic diagram;
wherein the building module is further configured to:
taking classification information corresponding to the log as a node in the logic diagram;
and counting the skipping probability among the classified information, and taking the skipping probability as an edge in the logic diagram.
8. The apparatus of claim 7, wherein the extraction module is to:
extracting predetermined format information in the log based on a regular expression, and generating the first encoding information;
and encoding the text content in the log after the information with the preset format is extracted to generate the second encoding information.
9. The apparatus of claim 7, wherein the classification module comprises:
an obtaining unit configured to obtain a length of first encoding information of the log;
the first classification unit is used for inputting the length of the first coding information and the first coding information into a decision tree, classifying by using the decision tree, and determining a first classification number corresponding to the log;
the second classification unit is used for inputting the second coding information into the decision tree, classifying by using the decision tree and determining a second classification number corresponding to the log;
and the generating unit is used for generating the classification information corresponding to the log according to the first classification number and the second classification number.
10. The apparatus of claim 7, wherein the determination module is to:
comparing the skipping probability corresponding to the edges in the logic diagram with a preset probability, and determining the edges with the skipping probability lower than the preset probability as abnormal positions; or
And comparing the logic diagram with a historical logic diagram, and determining that the nodes or edges of the logic diagram which are inconsistent with the historical logic diagram are abnormal positions.
11. The apparatus of claim 7, further comprising:
and the reminding module is used for generating abnormal reminding information after determining the abnormal part in the log file according to the logic diagram.
12. The apparatus of claim 9, wherein the classification module further comprises:
a calculating unit, configured to calculate a length deviation value of the log according to a length of first encoded information of the log after acquiring the length of the first encoded information;
the determining unit is used for determining the log with the largest length dispersion value in the log file;
and the detection unit is used for manually detecting whether the log with the largest length dispersion value is abnormal or not.
13. A non-transitory computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the log file anomaly detection method according to any one of claims 1 to 6.
14. A terminal comprising a processor, a memory, and a computer program stored on the memory and executable on the processor, the processor being configured to perform the log file anomaly detection method according to any one of claims 1-6.
CN201810359152.9A 2018-04-20 2018-04-20 Method and device for detecting log file abnormity Active CN110389874B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810359152.9A CN110389874B (en) 2018-04-20 2018-04-20 Method and device for detecting log file abnormity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810359152.9A CN110389874B (en) 2018-04-20 2018-04-20 Method and device for detecting log file abnormity

Publications (2)

Publication Number Publication Date
CN110389874A CN110389874A (en) 2019-10-29
CN110389874B true CN110389874B (en) 2021-01-19

Family

ID=68283539

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810359152.9A Active CN110389874B (en) 2018-04-20 2018-04-20 Method and device for detecting log file abnormity

Country Status (1)

Country Link
CN (1) CN110389874B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111107079A (en) * 2019-12-16 2020-05-05 北京神州绿盟信息安全科技股份有限公司 Method and device for detecting uploaded files
CN113111280B (en) * 2020-01-09 2023-05-23 福建天泉教育科技有限公司 Method for displaying log content in flow chart mode and storage medium
CN111221707B (en) * 2020-01-17 2024-03-26 中体彩科技发展有限公司 Method and system for monitoring physical color random number generator
CN113553244A (en) * 2020-04-24 2021-10-26 阿里巴巴集团控股有限公司 Anomaly detection method and device
CN111563178A (en) * 2020-04-28 2020-08-21 深圳壹账通智能科技有限公司 Rule logic diagram comparison method, device, medium and electronic equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104268064A (en) * 2014-09-11 2015-01-07 百度在线网络技术(北京)有限公司 Abnormity diagnosis method and device of product logs
CN105653427A (en) * 2016-03-04 2016-06-08 上海交通大学 Log monitoring method based on abnormal behavior detection
CN107888602A (en) * 2017-11-23 2018-04-06 北京白山耘科技有限公司 A kind of method and device for detecting abnormal user

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5468837B2 (en) * 2009-07-30 2014-04-09 株式会社日立製作所 Anomaly detection method, apparatus, and program
JP5346141B1 (en) * 2012-11-13 2013-11-20 靖彦 横手 Database system and control method thereof
US11457029B2 (en) * 2013-12-14 2022-09-27 Micro Focus Llc Log analysis based on user activity volume
CN104616205B (en) * 2014-11-24 2019-10-25 北京科东电力控制系统有限责任公司 A kind of operation states of electric power system monitoring method based on distributed information log analysis
CN105468677B (en) * 2015-11-13 2019-11-19 国家计算机网络与信息安全管理中心 A kind of Log Clustering method based on graph structure
CN106250471A (en) * 2016-07-29 2016-12-21 东北大学 A kind of data for train ATP automatically extract and store system and method
CN106407071A (en) * 2016-09-06 2017-02-15 珠海迈科智能科技股份有限公司 Automatic analysis tool for content service background logs based on Linux
CN107391353B (en) * 2017-07-07 2020-07-28 西安电子科技大学 Method for detecting abnormal behavior of complex software system based on log

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104268064A (en) * 2014-09-11 2015-01-07 百度在线网络技术(北京)有限公司 Abnormity diagnosis method and device of product logs
CN105653427A (en) * 2016-03-04 2016-06-08 上海交通大学 Log monitoring method based on abnormal behavior detection
CN107888602A (en) * 2017-11-23 2018-04-06 北京白山耘科技有限公司 A kind of method and device for detecting abnormal user

Also Published As

Publication number Publication date
CN110389874A (en) 2019-10-29

Similar Documents

Publication Publication Date Title
CN110389874B (en) Method and device for detecting log file abnormity
CN107301115B (en) Application program exception monitoring and recovery method and device
CN107729210B (en) Distributed service cluster abnormity diagnosis method and device
AU2019275633B2 (en) System and method of automated fault correction in a network environment
US10248517B2 (en) Computer-implemented method, information processing device, and recording medium
CN112464105B (en) Internet platform information pushing method based on big data positioning and cloud computing center
CN111078513B (en) Log processing method, device, equipment, storage medium and log alarm system
JP6780655B2 (en) Log analysis system, method and program
CN109241223B (en) Behavior track identification method and system
US20220278914A1 (en) Anomaly detection method and apparatus
CN111538642A (en) Abnormal behavior detection method and device, electronic equipment and storage medium
CN110178121A (en) A kind of detection method and its terminal of database
CN110807068A (en) Equipment switching user identification method and device, computer equipment and storage medium
CN111078512A (en) Alarm record generation method and device, alarm equipment and storage medium
JPWO2018069950A1 (en) Log analysis method, system and program
JPWO2018122890A1 (en) Log analysis method, system and program
CN111966339B (en) Buried point parameter input method and device, computer equipment and storage medium
CN112565422B (en) Method, system and storage medium for identifying fault data of power internet of things
CN112637888B (en) Coverage hole area identification method, device, equipment and readable storage medium
EP4071616A1 (en) Method for generating topology diagram, anomaly detection method, device, apparatus, and storage medium
TWI699663B (en) Segmentation method, segmentation system and non-transitory computer-readable medium
CN110489416B (en) Information storage method based on data processing and related equipment
CN106294470A (en) The method that real-time incremental log information based on cutting daily record reads
CN107577760B (en) text classification method and device based on constraint specification
CN115904698A (en) Method for managing OPC UA architecture by using YAML format file

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant