CN111741029A - Log data processing method, processing device and storage medium - Google Patents

Log data processing method, processing device and storage medium Download PDF

Info

Publication number
CN111741029A
CN111741029A CN202010860149.2A CN202010860149A CN111741029A CN 111741029 A CN111741029 A CN 111741029A CN 202010860149 A CN202010860149 A CN 202010860149A CN 111741029 A CN111741029 A CN 111741029A
Authority
CN
China
Prior art keywords
log data
log
regular expression
merging
rule
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010860149.2A
Other languages
Chinese (zh)
Other versions
CN111741029B (en
Inventor
饶志波
赵时晴
周磊
姜双林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Andi Technology Co Ltd
Original Assignee
Beijing Andi Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Andi Technology Co Ltd filed Critical Beijing Andi Technology Co Ltd
Priority to CN202010860149.2A priority Critical patent/CN111741029B/en
Publication of CN111741029A publication Critical patent/CN111741029A/en
Application granted granted Critical
Publication of CN111741029B publication Critical patent/CN111741029B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/10Protocols in which an application is distributed across nodes in the network
    • H04L67/1097Protocols in which an application is distributed across nodes in the network for distributed storage of data in networks, e.g. transport arrangements for network file system [NFS], storage area networks [SAN] or network attached storage [NAS]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/2866Architectures; Arrangements
    • H04L67/30Profiles

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention relates to a processing method, a processing device and a storage medium of log data, wherein the method comprises the steps of acquiring and loading a first configuration file and a second configuration file; collecting first log data to be processed; judging whether the regular expression can analyze the first log data or not; if the regular expression can analyze the first log data, analyzing the first log data and generating second log data; formatting the second log data according to the formatting rule, and generating third log data; determining a target merging rule in the merging rules according to the third log data; merging the third log data according to the target merging rule; judging whether a log event carrying first log data occurs for the first time, and if so, reporting third log data; if not, the log data acquisition is executed circularly until the preset time is reached, and the log data are reported after merging. The scheme can improve the processing efficiency of the log data.

Description

Log data processing method, processing device and storage medium
Technical Field
The present invention relates to the field of data analysis and processing technologies, and in particular, to a method, a device, and a storage medium for processing log data.
Background
The log data processing system is used for collecting, analyzing, merging and storing the security event information from the whole local area network. The log data processing system needs to collect logs of various data types such as various safety protection devices, network devices, hosts, application systems and the like, but the log formats and data types of various device manufacturers are different.
Currently, a log data processing system mainly adopts an analysis program to analyze logs of various data types. However, as the number of network products increases, the data types of the log are increased, and adaptive modification needs to be performed on program codes in the analysis program, which is not beneficial to improving the efficiency of log data processing.
Therefore, it is desirable to provide a method, an apparatus and a storage medium for processing log data to solve the above problems.
Disclosure of Invention
The technical problem to be solved by the present invention is to provide a method, a device and a storage medium for processing log data, aiming at the defects in the prior art, wherein the efficiency of log data processing is not high with the continuous increase of the data types of the logs.
In order to solve the above technical problem, the present invention provides a method for processing log data, including:
acquiring and loading at least one first configuration file and at least one second configuration file; the first configuration file carries at least one merging rule, the second configuration file carries a formatting rule, and the formatting rule comprises at least one regular expression;
collecting first log data of a log event to be processed;
judging whether the at least one regular expression can analyze the first log data or not;
if the at least one regular expression can analyze the first log data, analyzing the first log data and generating second log data;
formatting the second log data according to the formatting rule, and generating third log data;
determining a target merge rule of the at least one merge rule according to the third log data;
merging the third log data according to the target merging rule;
judging whether the log event carrying the first log data appears for the first time, and if so, reporting and storing the merged third log data;
if not, accumulating the occurrence times of the log events carrying the first log data;
and circularly executing the collection of the first log data of the log event to be processed until the merging time of the log event reaches the preset time, and reporting the merged third log data.
Optionally, the merging the third log data according to the target merging rule includes:
determining attribute information of the third log data according to the regular expression;
and merging the third log data according to the target merging rule and the attribute information of the third log data.
Optionally, the collecting first log data of the log event to be processed includes:
and collecting the log event, and polling the log event according to set time until the first log data is obtained by polling.
Optionally, the first log data comprises a first number of characters, and the regular expression comprises a second number of quantum expressions;
the judging whether the at least one regular expression can analyze the first log data includes:
for any regular expression, calculating a matching value of the regular expression for matching the first log data according to the following formula:
Figure 444744DEST_PATH_IMAGE001
wherein Q is used for representing a matching value of the regular expression for matching log data, m is used for representing the first number of characters, n is used for representing the second number of characters, and k is used for representing the first number of charactersiA scale factor for characterizing the ith sub-expression for matching log data, hiFactor for characterizing whether the ith sub-expression matches log data, said XijA jth character used for characterizing whether the ith sub-expression is matched with log data, GijWeight F (h) for representing ith character expression to match jth character of log datai,Xij,Gij) A matching value used for representing the ith sub-expression of the regular expression to match the jth character of the log data;
if a regular expression exists, the matching value of the first log data matched by the regular expression is larger than a preset matching value, and the regular expression is determined to be capable of analyzing the first log data;
and if the matching value of any regular expression for matching the first log data is not greater than the preset matching value, determining that the regular expression cannot analyze the first log data.
Optionally, if the at least one regular expression cannot parse the first log data, determining a target regular expression according to the first log data;
determining a target formatting rule according to the target regular expression;
acquiring and loading at least one third configuration file; wherein the third configuration file carries the target formatting rule.
The invention also provides a log data processing device, which comprises: the device comprises a loading module, an acquisition module, a judgment module, a processing module, a merging module and a circular execution module;
the loading module is used for acquiring and loading at least one first configuration file and at least one second configuration file; the first configuration file carries at least one merging rule, the second configuration file carries a formatting rule, and the formatting rule comprises at least one regular expression;
the acquisition module is used for acquiring first log data to be processed;
the judging module is configured to judge whether the at least one regular expression in the second configuration file loaded to the memory by the loading module can analyze the first log data;
the processing module is configured to, if the judging module judges that the at least one regular expression can analyze the first log data, analyze the first log data acquired by the acquisition module to generate second log data, and format the second log data according to the formatting rule to generate third log data;
the merging module is configured to determine a target merging rule in the at least one merging rule according to the third log data generated by the processing module, and merge the third log data according to the target merging rule;
the cycle execution module is used for executing the following steps:
judging whether the log event carrying the first log data occurs for the first time, if so, reporting and storing the third log data merged by the merging module;
if not, accumulating the occurrence times of the log events carrying the first log data;
and circularly executing the collection of the first log data of the log event to be processed until the merging time of the log event reaches the preset time, and reporting the merged third log data.
Optionally, the merging module is configured to perform the following operations:
determining attribute information of the third log data according to the regular expression;
and merging the third log data according to the target merging rule and the attribute information of the third log data.
Optionally, the acquisition module is configured to perform the following operations:
collecting the log events;
and polling the log events according to set time until the first log data is obtained through polling.
An embodiment of the present invention further provides a data processing apparatus, where the apparatus includes: at least one memory and at least one processor;
the at least one memory to store a machine readable program;
the at least one processor is configured to invoke the machine readable program to perform the method of any one of the above-described log data processing methods.
An embodiment of the present invention further provides a storage medium, where the storage medium stores computer instructions, and the computer instructions, when executed by a processor, cause the processor to execute any one of the above log data processing methods.
The log data processing method, the log data processing device and the log data storage medium have the following beneficial effects that:
the compiled merging rules and the formatting rules are used as configuration files, and the configuration files carrying the merging rules and the configuration files carrying the formatting rules are loaded into a memory when the system starts to run, so that the collected log data are subjected to formatting operation and merging operation. Because the log data is processed by adopting the mode of loading the configuration file, when the log data type needs to be modified or updated, only the modification and reloading are needed in the configuration file, and the problem that the difficulty of modifying the data by adopting a programming mode is high is solved. Therefore, the expandability and maintainability of the system can be improved, and the efficiency of log data processing can be improved. In addition, the method and the device set a merging period of the log event and judge whether the merging time of the log event reaches the merging period to determine whether the log event needs to be reported and stored, so that a user can set a proper merging period for different event types according to needs without reporting and/or storing each merged log data, thereby being beneficial to improving the execution efficiency of a system and being beneficial to the management of the user.
Drawings
Fig. 1 is a flowchart of a method for processing log data according to an embodiment of the present invention;
fig. 2 is a flowchart of a method for processing log data according to another embodiment of the present invention;
fig. 3 is a schematic diagram of a device in which a log data processing apparatus according to an embodiment of the present invention is located;
fig. 4 is a schematic diagram of a processing apparatus for log data according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, are within the scope of the present invention.
As shown in fig. 1, an embodiment of the present invention provides a method for processing log data, where the method may include the following steps:
step 101: acquiring and loading at least one first configuration file and at least one second configuration file; the first configuration file carries at least one merging rule, the second configuration file carries a formatting rule, and the formatting rule comprises at least one regular expression;
step 102: collecting first log data to be processed;
step 103: judging whether the at least one regular expression can analyze the first log data or not;
if the at least one regular expression can analyze the first log data, analyzing the first log data and generating second log data;
step 104: formatting the second log data according to the formatting rule, and generating third log data;
step 105: determining a target merge rule of the at least one merge rule according to the third log data;
step 106: merging the third log data according to the target merging rule;
step 107: judging whether the log event carrying the first log data appears for the first time, and if so, reporting and storing the merged third log data;
step 108: if not, accumulating the occurrence times of the log events carrying the first log data;
and circularly executing the collection of the first log data to be processed until the merging time of the log event reaches the preset time, and reporting the merged third log data.
In the embodiment of the present invention, the written merge rule and formatting rule are used as configuration files, and when the system starts to operate, the configuration file carrying the merge rule and the configuration file carrying the formatting rule are loaded into the memory for performing formatting operation and merge operation on the acquired log data. Because the log data is processed by adopting the mode of loading the configuration file, when the log data type needs to be modified or updated, only the modification and reloading are needed in the configuration file, and the problem that the difficulty of modifying the data by adopting a programming mode is high is solved. Therefore, the expandability and maintainability of the system can be improved, and the efficiency of log data processing can be improved. In addition, the method and the device set a merging period of the log event and judge whether the merging time of the log event reaches the merging period to determine whether the log event needs to be reported and stored, so that a user can set a proper merging period for different event types according to needs without reporting and storing each merged log data, thereby being beneficial to improving the execution efficiency of a system and being beneficial to the management of the user.
Based on the processing method of log data shown in fig. 1, in the embodiment of the present invention, merging the third log data according to the target merging rule includes:
determining attribute information of the third log data according to the regular expression;
and merging the third log data according to the target merging rule and the attribute information of the third log data.
In the embodiment of the invention, when a certain type of log data of a certain log event needs to be focused and the log data needs to be merged, the attribute information in the log data can be determined in a targeted manner, the attribute information in the log data is extracted and converted through the regular expression, and the converted log data is merged according to the merging rule.
Based on the log processing method shown in fig. 1, in the embodiment of the present invention, the acquiring first log data to be processed includes:
and collecting the log event, and polling the log event according to set time until the first log data is obtained by polling.
In the embodiment of the invention, when the log data is acquired, the log data can be acquired by adopting passive data receiving, or the log data can be acquired by adopting an active mode and visiting the log events at specified intervals, so that a user can select a data acquisition mode in a targeted manner according to the type of the event to be concerned or the frequency of the occurrence of the event, thereby avoiding the system from occupying a large amount of memory and causing the reduction of the execution efficiency of the system.
Based on the processing method of log data shown in fig. 1, in the embodiment of the present invention, the first log data includes a first number of characters, and the regular expression includes a second number of quantum expressions;
the judging whether the at least one regular expression can analyze the first log data includes:
for any regular expression, calculating a matching value of the regular expression for matching the first log data according to the following formula:
Figure 424202DEST_PATH_IMAGE001
wherein Q is used for representing a matching value of the regular expression for matching log data, m is used for representing the first number of characters, n is used for representing the second number of characters, and k is used for representing the first number of charactersiA scale factor for characterizing the ith sub-expression for matching log data, hiFactor for characterizing whether the ith sub-expression matches log data, said XijA jth character used for characterizing whether the ith sub-expression is matched with log data, GijWeight F (h) for representing ith character expression to match jth character of log datai,Xij,Gij) A matching value used for representing the ith sub-expression of the regular expression to match the jth character of the log data;
if a regular expression exists, the matching value of the first log data matched by the regular expression is larger than a preset matching value, and the regular expression is determined to be capable of analyzing the first log data;
and if the matching value of any regular expression for matching the first log data is not greater than the preset matching value, determining that the regular expression cannot analyze the first log data.
According to the method, the log data are determined to be composed of m characters according to the principle that the regular expression is used for matching the characters, meanwhile, the regular expression used for analyzing the log data is determined to be composed of n sub-expressions, therefore, the matching value of each regular expression used for matching the log data can be calculated through the formula, and whether the regular expression can analyze the log data is determined by judging the size of the matching value and the size of the preset matching value, so that the accuracy of the regular expression used for matching the log data can be improved.
Based on the log data processing method shown in fig. 1, in the embodiment of the present invention, after determining whether the at least one regular expression can parse the first log data, the method further includes:
if the at least one regular expression cannot analyze the first log data, determining a target regular expression according to the first log data;
determining a target formatting rule according to the target regular expression;
acquiring and loading at least one third configuration file; wherein the third configuration file carries the target formatting rule.
In the embodiment of the invention, if the analysis of the collected log data is not successful, the formatting rule can be written for the log data, so that the scheme has universality and can be suitable for merging any log event.
As shown in fig. 2, another embodiment of the present invention further provides a method for processing log data, which may include the following steps:
step 201: and acquiring and loading the configuration file.
In this step, when the configuration file is loaded, one configuration file may be loaded or a plurality of configuration files may be loaded simultaneously. Different types of merging rules and formatting rules can be carried in the configuration file according to the event type. For example, the configuration file loaded to the memory and carrying the formatting rule mainly includes a value of the regular expression, so that the configuration file can be used for extracting and format converting the content of the log file, further performing matching operation on the corresponding content and the regular expression, and performing corresponding format conversion on the matched data.
Step 202: first log data of a log event to be processed is collected.
In this step, on one hand, the log data collection module has two working modes, namely a passive receiving mode and an active inquiry mode, and in the passive receiving mode, the collection module maintains network connection with the agent unit managed by the collection module, so that the agent unit collects log information in real time and transmits the log information to the collection module. In the active query mode, when the data acquisition module needs log data, the proxy unit is called to acquire the log data, so that the further processing of the log data is completed. Therefore, the data acquisition mode can select the working mode as required to eliminate a large amount of internal memory occupied by data real-time acquisition so as to improve the data processing efficiency.
On the other hand, the collection of log data adopts a distributed collection mode, which mainly includes three types: the first is a dedicated collection agent, which collects data through a software agent installed in the host; the second is log protocol acquisition, the acquisition mode supports protocols such as HTTP, ICMP, SNMP, UDP, TCP, SYS, SSH and the like, and can widely support equipment of different manufacturers; the third is a monitoring third-party probe acquisition mode which can be compatible with monitoring acquisition software such as Ntop, Nmap, Collected and the like. Therefore, the collection mode of the log data has multiple types and can support multiple protocols, so that the collection mode of the log data has more universality and is suitable for various types of log data.
Step 203: and judging whether the regular expression can analyze the first log data or not.
The engine determines a regular expression matching method and an internal search process, and the main popular engines at present include two engines, namely a DFA engine and an NFA engine, wherein the DFA engine matches and searches the regular expressions one by using character string characters, the NFA engine mainly uses the regular expressions to search the character strings one by one, and the NFA engine is exemplified by the NFA engine. For the string "DEF", D, E, F is included with three characters and four numeric positions of 0, 1, 2, and 3: 0D1E2F3, all source strings have characters and positions for regular expressions. Regular expressions will be demarked one by one from position 0. For example, for a source character DEF, the corresponding flags are: 0D1E2F3, the matching regular expression is: the process of/D \ w + F/can be understood as follows: firstly, obtaining a control right by a regular expression character/D/starting matching from a position 0, matching 'D' by the regular expression character/D/successfully, and handing the control right to a character/\ w +/; since "D" has been/D/matched, \\ w +/try to match starting at position 1, \ w + greedy mode, an alternative state will be recorded, the longest character will match by default, match directly to EF, and the match is successful, this time to current position 3. Then control is given to/F/, and by/F/match failure, \\ w + match will go back one bit, and the current location becomes 2. And passes control to the F/matching character F. Therefore, \\ w + here matches the E character, and the match is complete. Therefore, the log data can be matched, and the log data can be analyzed.
Further, in the embodiment of the present invention, log data is determined to be composed of m characters, a regular expression for analyzing the log data is determined to be composed of n character expressions, a matching value for matching the log data by the regular expression is calculated by a formula, and the size of the matching value and a preset matching value is determined to determine whether the regular expression can analyze the log data, where the formula for calculating the matching value is as follows:
Figure 531835DEST_PATH_IMAGE001
wherein Q is used for representing a matching value of the regular expression for matching the log data, m is used for representing the first number of characters, and n is used for representing the second number of charactersA number of characters, said kiA scale factor for characterizing the ith sub-expression for matching log data, hiFactor for characterizing whether the ith sub-expression matches log data, said XijA jth character used for characterizing whether the ith sub-expression is matched with log data, GijWeight F (h) for representing ith character expression to match jth character of log datai,Xij,Gij) A matching value used for representing the ith sub-expression of the regular expression to match the jth character of the log data;
for example, it is calculated that the matching value of the a regular expression for analyzing the first log data is 0.682, the matching value of the B regular expression for analyzing the first log data is 0.982, the matching value of the C regular expression for analyzing the first log data is 0.311, and the preset matching value is 0.95, so that it can be determined that the B regular expression exists to analyze the first log data.
Step 204: if the regular expression can parse the log data, the first log data is parsed and second log data is generated.
In this step, if the analysis of the first log data is realized through the analysis process in the above example according to the regular expression in the configuration file, the log data is analyzed to generate second log data.
Step 205: if the regular expression cannot analyze the first log data, a target regular expression needs to be determined according to the first log data, and a formatting rule is determined according to the target regular expression and loaded into the memory in a configuration file manner.
In this step, if there is no regular expression capable of analyzing the first log data in the configuration file loaded into the memory, the regular expression may be written according to the first log data and reloaded into the system memory in the form of the configuration file, so that when the first log data is analyzed again, the analysis may be successful by using the reloaded regular expression.
Step 206: and formatting the second log data according to the formatting rule, and generating third log data.
In this step, the parsed second log data needs to be formatted. Specifically, for a log event, the content that may be included in the log event includes log receiving time, generation time, duration, user name, source address, source MAC address, source port, destination address, destination MAC address, destination port, operation, log event name, event level, event type, event body, event content, and protocol, and by writing a regular expression, some information of the event is extracted in a targeted manner, and formatting operation is completed.
For example, the following original messages are analyzed and converted into formats:
original message:
<188>2018/09/14 14:48:00 USG6600 %%01DDOS/4/FIREWALLATCK(l): AttackType="Large ICMP attack", slot=" ", cpu="0", receive interface="GE1/0/0 ", proto="ICMP", src="192.168.1.194:0 ", dst="192.168.1.223:0 ", begin time="2018-9-1414:47:40", end time="2018-9-14 14:47:54", total packets="4", max speed="0",User="", Action="discard".
formatting rule (the content of the file of the formatting rule is regular expression):
Figure DEST_PATH_IMAGE003
the following contents show the log data information extracted from the original message in this example after passing through the regular expression:
item["level"]=188
item["date"]=2018/09/14 14:48:00
item["device_name"]=USG6600
item["device_type"]=FW
item["event_type"]=FIREWALLATCK
item["event_stype"]=0
item["message1"]=Large ICMP attack
item["message2"]=192.168.1.194
item["message3"]=0
item["message4"]=192.168.1.223
item["message5"]=0
further, the extracted log data is formatted to obtain the following log data:
<1>2020-08-12 14:48:00 DCD_NAME DCD 2020-08-12 14:48:00 ASSET_NAME192.168.1.2 VENDOR SVR 3 2 1 192.168.1.192 0 192.168.1.223 0 Large ICMPattack
step 207: determining a target merging rule according to the third log data, and merging the third log data according to the target merging rule;
in this step, the merge rule is loaded into the system memory in the form of a configuration file, and a plurality of configuration files may be loaded when the configuration file is loaded, in other words, according to different data types or different event types, a target merge rule for a third log event to be processed needs to be determined in the system memory, so that merging of the log event is completed according to the determined target merge rule.
The merge rule corresponds to a structure of data, one type of data corresponds to one type of merge rule, and the merge rule may have attributes including: enable (Enable), limit Length of the merge queue (Length), merge Time interval (Time), and Type (Type). For example, Enable represents whether merging is needed by taking a value of 0 or 1, Length represents the Length of a merging queue, Time represents the Time of a merging operation, and Type represents the Type value of corresponding data. Specifically, a merging rule for the 4582 type may be defined as: < Enable = "1", Length = "100". Time = "15", Type = "4582" >, further includes a rule policy under the merge rule, and the attribute of the merge rule policy may include an identifier ID, a merge filter mode, a merge policy specification, and the like, so that the merge policy rule may be defined, and further, the merge policy has a field merge policy corresponding to each field in the data structure, and the policy may include attributes such as a name and a merge manner in a structure field, so that the corresponding merge rule may be determined specifically according to the log data, and the log data is further merged according to the merge rule.
Step 208: and judging whether the log event carrying the first log data appears for the first time, and if so, reporting and storing the merged third log data.
In a specific application process, instead of reporting each log event after merging log data, the log event after merging log data within a period of time needs to be reported, and when a log event occurs for the first time or the log event occurs for the first time within a certain merging period, the log event also needs to be reported.
Step 209: if not, accumulating the occurrence times of the log events carrying the first log data.
The merging process of the log events has a merging period, for example, the merging period of the event a is 15min, and after the merging of the event a log data is completed once, it is determined by judgment that the event a does not occur for the first time in the merging period, and then the merging times of the event a log data need to be recorded, and the merging operation of the next log data is continuously performed.
Step 210: and circularly executing the collection of the first log data of the log event to be processed until the merging time of the log event reaches the preset time, and reporting the merged third log data.
In this step, if it is determined that the log event carrying the first log data does not occur for the first time, the log data acquisition operation needs to be executed in a loop, and the log data of the log event is reported until the time for merging the log event reaches the preset time. For example, for an application a, according to actual needs, log data logged into the application within 10min needs to be recorded, so that the merging period can be set to 10min, and thus, the system reports the log event at an interval of 10 min. In addition, in a merging period, when a log event occurs for the first time, the system reports and stores the log event, and when a merging period expires, the system reports the log event and initializes to restart statistics.
As shown in fig. 3 and 4, an embodiment of the present invention provides a device in which a processing apparatus of log data is located and a processing apparatus of log data. The device embodiments may be implemented by software, or by hardware, or by a combination of hardware and software. From a hardware level, as shown in fig. 3, a hardware structure diagram of a device in which a log data processing apparatus is located is provided for an embodiment of the present invention, and besides the processor, the memory, the network interface, and the nonvolatile memory shown in fig. 3, the device in which the apparatus is located in the embodiment may also include other hardware, such as a forwarding chip responsible for processing a packet, and the like. Taking a software implementation as an example, as shown in fig. 4, as a logical apparatus, the apparatus is formed by reading a corresponding computer program instruction in a non-volatile memory into a memory by a CPU of a device in which the apparatus is located and running the computer program instruction. As shown in fig. 4, an embodiment of the present invention provides an apparatus for processing log data, where the apparatus includes: the system comprises a loading module 401, an acquisition module 402, a judgment module 403, a processing module 404, a merging module 405 and a circular execution module 406;
a loading module 401, configured to obtain and load at least one first configuration file and at least one second configuration file; the first configuration file carries at least one merging rule, the second configuration file carries a formatting rule, and the formatting rule comprises at least one regular expression;
an acquisition module 402, configured to acquire first log data to be processed;
a determining module 403, configured to determine whether at least one regular expression in a second configuration file loaded to the memory by the loading module 401 can analyze the first log data acquired by the acquiring module 402;
a processing module 404, configured to, if the determining module 403 determines that the at least one regular expression can analyze the first log data, analyze the first log data acquired by the acquiring module 402 to generate second log data, and format the second log data according to a formatting rule to generate third log data;
the merging module 405 is configured to determine a target merging rule in the at least one merging rule according to the third log data generated by the processing module 404, and merge the third log data according to the target merging rule.
The loop execution module 406 is configured to perform the following operations:
judging whether the log event carrying the first log data appears for the first time, if so, reporting and/or storing the third log data merged by the merging module 405;
if not, accumulating the occurrence times of the log events carrying the first log data;
the acquisition module 402 performs cyclic acquisition of the first log data to be processed until the time for merging the log events reaches a preset time, and reports the third log data merged by the merging module 405.
In the schematic diagram of a log data processing apparatus shown in fig. 4, the merging module 405 is configured to perform the following operations:
determining attribute information of the third log data according to the regular expression in the second configuration file loaded into the memory by the loading module 401;
and merging the third log data according to the target merging rule and the attribute information of the third log data.
In the schematic diagram of a log data processing apparatus shown in fig. 4, the processing module 404 is configured to perform the following operations:
if the judging module 403 judges that at least one regular expression cannot analyze the first log data collected by the collecting module 402, determining a target regular expression according to the first log data collected by the collecting module 402;
determining a target formatting rule according to the target regular expression, and acquiring and loading at least one third configuration file; and the third configuration file carries the target formatting rule.
An embodiment of the present invention further provides a data processing apparatus, which is characterized by including: at least one memory and at least one processor;
the at least one memory to store a machine readable program;
the at least one processor is configured to invoke the machine-readable program to execute the method for processing log data according to any embodiment of the present invention.
The embodiment of the present invention further provides a storage medium, where the storage medium stores computer instructions, and the computer instructions, when executed by a processor, cause the processor to execute the processing method of log data in any embodiment of the present invention. Specifically, a method or an apparatus equipped with a storage medium on which a software program code that realizes the functions of any of the above-described embodiments is stored may be provided, and a computer (or a CPU or MPU) of the method or the apparatus is caused to read out and execute the program code stored in the storage medium.
In this case, the program code itself read from the storage medium can realize the functions of any of the above-described embodiments, and thus the program code and the storage medium storing the program code constitute a part of the present invention.
Examples of the storage medium for supplying the program code include a floppy disk, a hard disk, a magneto-optical disk, an optical disk (e.g., CD-ROM, CD-R, CD-RW, DVD-ROM, DVD-RAM, DVD-RW, DVD + RW), a magnetic tape, a nonvolatile memory card, and a ROM. Alternatively, the program code may be downloaded from a server computer via a communications network.
Further, it should be clear that the functions of any one of the above-described embodiments can be implemented not only by executing the program code read out by the computer, but also by performing a part or all of the actual operations by an operation method or the like operating on the computer based on instructions of the program code.
In summary, embodiments of the present invention provide a method, an apparatus, and a storage medium for processing log data, which have at least the following advantages:
1. in the embodiment of the invention, the compiled merging rule and the formatting rule are used as configuration files, and the configuration files carrying the merging rule and the configuration files carrying the formatting rule are loaded into the memory when the system starts to run, so that the collected log data are subjected to formatting operation and merging operation. Because the log data is processed by adopting the mode of loading the configuration file, when the log data type needs to be modified or updated, only the modification and reloading are needed in the configuration file, and the problem that the difficulty of modifying the data by adopting a programming mode is high is solved. Therefore, the expandability and maintainability of the system can be improved, and the efficiency of log data processing can be improved. In addition, the method and the device set a merging period of the log event and judge whether the merging time of the log event reaches the merging period to determine whether the log event needs to be reported and stored, so that a user can set a proper merging period for different event types according to needs without reporting and/or storing each merged log data, thereby being beneficial to improving the execution efficiency of a system and being beneficial to the management of the user.
2. In the embodiment of the invention, when a certain type of log data of a certain log event needs to be focused and the log data needs to be merged, the attribute information in the log data can be determined in a targeted manner, the attribute information in the log data is extracted and converted through the regular expression, and the converted log data is merged according to the merging rule.
3. In the embodiment of the invention, when the log data acquisition is needed, the log data can be acquired by adopting passive data receiving, or the log data can be acquired by adopting an active mode and visiting the log events at a specified interval time, so that a user can pertinently select a data acquisition mode according to the type of the event to be concerned or the frequency of the event to be concerned, and the reduction of the execution efficiency of the system caused by the fact that the system occupies a large amount of memory is avoided.
4. In the embodiment of the invention, according to the principle that the regular expression is used for matching characters, the log data is determined to be composed of m characters, and the regular expression used for analyzing the log data is determined to be composed of n sub-expressions.
5. In the embodiment of the invention, if the analysis of the collected log data is not successful, the formatting rule can be written for the log data, so that the scheme has universality and can be suitable for merging any log event.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A method for processing log data is characterized by comprising the following steps:
acquiring and loading at least one first configuration file and at least one second configuration file; the first configuration file carries at least one merging rule, the second configuration file carries a formatting rule, and the formatting rule comprises at least one regular expression;
collecting first log data of a log event to be processed;
judging whether the at least one regular expression can analyze the first log data or not;
if the at least one regular expression can analyze the first log data, analyzing the first log data and generating second log data;
formatting the second log data according to the formatting rule, and generating third log data;
determining a target merge rule of the at least one merge rule according to the third log data;
merging the third log data according to the target merging rule;
judging whether the log event carrying the first log data appears for the first time, and if so, reporting and storing the merged third log data;
if not, accumulating the occurrence times of the log events carrying the first log data;
and circularly executing the collection of the first log data of the log event to be processed until the merging time of the log event reaches the preset time, and reporting the merged third log data.
2. The method of claim 1, wherein merging the third log data according to the target merging rule comprises:
determining attribute information of the third log data according to the regular expression;
and merging the third log data according to the target merging rule and the attribute information of the third log data.
3. The method of claim 1, wherein collecting first log data of pending log events comprises:
and collecting the log event, and polling the log event according to set time until the first log data is obtained by polling.
4. The method of claim 1, wherein the first log data comprises a first number of characters, the regular expression comprises a second number of quantum expressions;
the judging whether the at least one regular expression can analyze the first log data includes:
for any regular expression, calculating a matching value of the regular expression for matching the first log data according to the following formula:
Figure 655499DEST_PATH_IMAGE001
wherein Q is used for representing a matching value of the regular expression for matching log data, m is used for representing the first number of characters, n is used for representing the second number of characters, and k is used for representing the first number of charactersiA scale factor for characterizing the ith sub-expression for matching log data, hiFactor for characterizing whether the ith sub-expression matches log data, said XijA jth character used for characterizing whether the ith sub-expression is matched with log data, GijWeight F (h) for representing ith character expression to match jth character of log datai,Xij,Gij) A matching value used for representing the ith sub-expression of the regular expression to match the jth character of the log data;
if a regular expression exists, the matching value of the first log data matched by the regular expression is larger than a preset matching value, and the regular expression is determined to be capable of analyzing the first log data;
and if the matching value of any regular expression for matching the first log data is not greater than the preset matching value, determining that the regular expression cannot analyze the first log data.
5. The method according to any of claims 1-4, further comprising, after determining whether the at least one regular expression can parse the first log data:
if the at least one regular expression cannot analyze the first log data, determining a target regular expression according to the first log data;
determining a target formatting rule according to the target regular expression;
acquiring and loading at least one third configuration file; wherein the third configuration file carries the target formatting rule.
6. An apparatus for processing log data, comprising: the device comprises a loading module, an acquisition module, a judgment module, a processing module, a merging module and a circular execution module;
the loading module is used for acquiring and loading at least one first configuration file and at least one second configuration file; the first configuration file carries at least one merging rule, the second configuration file carries a formatting rule, and the formatting rule comprises at least one regular expression;
the acquisition module is used for acquiring first log data of a log event to be processed;
the judging module is configured to judge whether the at least one regular expression in the second configuration file loaded to the memory by the loading module can analyze the first log data acquired by the acquisition module;
the processing module is configured to, if the judging module judges that the at least one regular expression can analyze the first log data, analyze the first log data acquired by the acquisition module to generate second log data, and format the second log data according to the formatting rule to generate third log data;
the merging module is configured to determine a target merging rule in the at least one merging rule according to the third log data generated by the processing module, and merge the third log data according to the target merging rule;
the cycle execution module is used for executing the following steps:
judging whether the log event carrying the first log data occurs for the first time, if so, reporting and storing the third log data merged by the merging module;
if not, accumulating the occurrence times of the log events carrying the first log data;
and circularly executing the collection of the first log data of the log event to be processed until the merging time of the log event reaches the preset time, and reporting the merged third log data.
7. The apparatus of claim 6,
the merging module is used for executing the following operations:
determining attribute information of the third log data according to the regular expression;
and merging the third log data according to the target merging rule and the attribute information of the third log data.
8. The apparatus of claim 6,
the acquisition module is used for executing the following operations:
collecting the log events;
and polling the log events according to set time until the first log data is obtained through polling.
9. An apparatus for processing data, comprising: at least one memory and at least one processor;
the at least one memory to store a machine readable program;
the at least one processor, configured to invoke the machine readable program, to perform the method of any of claims 1 to 5.
10. A computer storage medium comprising, in combination,
the storage medium has stored thereon computer instructions which, when executed by a processor, cause the processor to perform the method of any one of claims 1 to 5.
CN202010860149.2A 2020-08-25 2020-08-25 Log data processing method, processing device and storage medium Active CN111741029B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010860149.2A CN111741029B (en) 2020-08-25 2020-08-25 Log data processing method, processing device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010860149.2A CN111741029B (en) 2020-08-25 2020-08-25 Log data processing method, processing device and storage medium

Publications (2)

Publication Number Publication Date
CN111741029A true CN111741029A (en) 2020-10-02
CN111741029B CN111741029B (en) 2020-12-04

Family

ID=72658712

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010860149.2A Active CN111741029B (en) 2020-08-25 2020-08-25 Log data processing method, processing device and storage medium

Country Status (1)

Country Link
CN (1) CN111741029B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117009911A (en) * 2023-10-08 2023-11-07 深圳安天网络安全技术有限公司 Abnormality determination method and device for target event, medium and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110307488A1 (en) * 2009-02-27 2011-12-15 Mitsubishi Electric Corporation Information processing apparatus, information processing method, and program
CN102457475A (en) * 2010-10-15 2012-05-16 中国人民解放军国防科学技术大学 Integration and conversion system for network security data
CN106341257A (en) * 2016-08-18 2017-01-18 陈琛 Method and tool for customizing log analysis rules and automatically analyzing logs
CN110515695A (en) * 2019-07-26 2019-11-29 济南浪潮数据技术有限公司 A kind of daily record data processing method and system
CN110929896A (en) * 2019-12-04 2020-03-27 全球能源互联网研究院有限公司 Security analysis method and device for system equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110307488A1 (en) * 2009-02-27 2011-12-15 Mitsubishi Electric Corporation Information processing apparatus, information processing method, and program
CN102457475A (en) * 2010-10-15 2012-05-16 中国人民解放军国防科学技术大学 Integration and conversion system for network security data
CN106341257A (en) * 2016-08-18 2017-01-18 陈琛 Method and tool for customizing log analysis rules and automatically analyzing logs
CN110515695A (en) * 2019-07-26 2019-11-29 济南浪潮数据技术有限公司 A kind of daily record data processing method and system
CN110929896A (en) * 2019-12-04 2020-03-27 全球能源互联网研究院有限公司 Security analysis method and device for system equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117009911A (en) * 2023-10-08 2023-11-07 深圳安天网络安全技术有限公司 Abnormality determination method and device for target event, medium and electronic equipment
CN117009911B (en) * 2023-10-08 2023-12-08 深圳安天网络安全技术有限公司 Abnormality determination method and device for target event, medium and electronic equipment

Also Published As

Publication number Publication date
CN111741029B (en) 2020-12-04

Similar Documents

Publication Publication Date Title
US10091248B2 (en) Context-aware pattern matching accelerator
JP6599538B2 (en) Method and apparatus for identifying application information in network traffic
US6954789B2 (en) Method and apparatus for monitoring traffic in a network
US7437359B2 (en) Merging multiple log entries in accordance with merge properties and mapping properties
CN112714047B (en) Industrial control protocol flow based test method, device, equipment and storage medium
US7039702B1 (en) Network analyzer engine system and method
US8799923B2 (en) Determining relationship data associated with application programs
CN112463772B (en) Log processing method and device, log server and storage medium
CN112350989A (en) Log data analysis method
CN110768875A (en) Application identification method and system based on DNS learning
CN110958231A (en) Industrial control safety event monitoring platform and method based on Internet
CN113794605A (en) Method, system and device for detecting kernel packet loss based on eBPF
KR101602189B1 (en) traffic analysis and network monitoring system by packet capturing of 10-giga bit data
CN111741029B (en) Log data processing method, processing device and storage medium
EP3789882B1 (en) Automatic configuration of logging infrastructure for software deployments using source code
CN110581780B (en) Automatic identification method for WEB server assets
CN114281676A (en) Black box fuzzy test method and system for industrial control private protocol
CN112822213A (en) Attack evidence obtaining and tracing method for power monitoring system
CN115499230A (en) Network attack detection method and device, equipment and storage medium
CN111737091B (en) Log processing method and device and readable medium
CN115913655B (en) Shell command injection detection method based on flow analysis and semantic analysis
CN112104628A (en) Adaptive feature rule matching real-time malicious flow detection method
US7266088B1 (en) Method of monitoring and formatting computer network data
US7653742B1 (en) Defining and detecting network application business activities
JP2017199250A (en) Computer system, analysis method of data, and computer

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant