CN115129494A - Event log collection method and system based on Windows kernel - Google Patents

Event log collection method and system based on Windows kernel Download PDF

Info

Publication number
CN115129494A
CN115129494A CN202211061051.6A CN202211061051A CN115129494A CN 115129494 A CN115129494 A CN 115129494A CN 202211061051 A CN202211061051 A CN 202211061051A CN 115129494 A CN115129494 A CN 115129494A
Authority
CN
China
Prior art keywords
event
module
filtering
stream
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202211061051.6A
Other languages
Chinese (zh)
Other versions
CN115129494B (en
Inventor
陈铁明
仇学博
宋琪杰
朱添田
吕明琪
路晓明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN202211061051.6A priority Critical patent/CN115129494B/en
Publication of CN115129494A publication Critical patent/CN115129494A/en
Application granted granted Critical
Publication of CN115129494B publication Critical patent/CN115129494B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/542Event management; Broadcasting; Multicasting; Notifications
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9035Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/448Execution paradigms, e.g. implementations of programming paradigms
    • G06F9/4488Object-oriented
    • G06F9/449Object-oriented method invocation or resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5038Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses an event log acquisition method and system based on a Windows kernel, wherein the acquisition method comprises the following steps: initializing; carrying out basic configuration on an ETW frame, acquiring Windows native event information as an original event stream, and carrying out primary event filtering on the original event stream to obtain an effective original event stream; performing multi-thread concurrent processing of event analysis, event filtering and semantic correction on the effective original event stream to obtain an event object instance with corrected semantics; and (5) outputting the event of the event object instance with the corrected semantics to finish acquisition. The invention utilizes a self-defined filtering mechanism and a self-developed event analysis method based on attribute offset, and combines a multithreading concurrency technology to analyze, semantically fill, semantically correct and the like the event content, thereby realizing high-efficiency analysis and processing of Windows kernel events, ensuring the timeliness and ensuring the integrity of event logs.

Description

Event log collection method and system based on Windows kernel
Technical Field
The invention relates to the field of security research based on kernel log analysis, in particular to an event log acquisition method and system based on a Windows kernel.
Background
The system kernel event is taken as one of the security elements to be gradually paid attention and applied to the cloud network end cooperative integrated security. However, due to a series of problems of massive and heterogeneous event log data, no proper tool is available for collection and analysis. Trustwave research data shows that the original data collected on a single client can reach 5 GB/day, and the original data collected on an ordinary office network of thousands of people can reach 15 TB/day. By taking high-level persistent threat attack detection as a scene, the average latency from initial intrusion to influence is 83 days, and event log data generated in an enterprise by one attack can reach the PB (PetaBytes) level.
Most of current event log collection is still oriented to a user layer, behaviors and purposes of program software are recorded by collecting API call data, but the defects of unclear semantics, malicious tampering and the like exist. The kernel layer data records the related behaviors of the bottom fine granularity by collecting a system calling mode and related parameters, can better reflect the behaviors of users and software, and is generated by the kernel, so that the credibility is higher. ETW (event Tracing for Windows) is a native event tracking log system provided by Windows, but the event log generated by the framework has the problems of large data volume, missing semantics, unclear semantics and the like, and cannot be directly analyzed and applied. Therefore, based on the event generated by the ETW, filtering and analysis processing are performed to form an analyzable data set with rich semantics, and it is necessary to provide data source support for upper-layer malicious detection, behavior analysis and the like.
Disclosure of Invention
In order to solve the problems of poor semantic readability, semantic missing, event loss caused by large event quantity and the like of the Windows native kernel event, the invention provides the event log acquisition method and system based on the Windows kernel, which can efficiently acquire, filter and process the kernel event, ensure that the event semantic is highly complete and can realize high-performance acquisition.
The method comprises the steps of performing semantic level supplement and correction on native event information acquired by an ETW (extract-transform-write) framework based on Windows; the events are efficiently analyzed and processed through multithreading concurrence, so that the events can correctly reflect the behavior of upper-layer software; the filtering mechanism based on black and white lists solves the problem of important event loss.
An event log collection method based on a Windows kernel comprises the following steps:
1) initializing;
2) carrying out basic configuration on the ETW framework to obtain Windows native event information;
3) the method comprises the steps of taking Windows native event information as an original event stream, and performing primary event filtering on the original event stream to obtain an effective original event stream;
4) performing event analysis, event filtering and multithreading concurrent processing of semantic correction on the effective original event stream to obtain an event object example with corrected semantics;
5) and outputting the event of the event object instance with the corrected semantics to finish acquisition.
The invention utilizes a self-defined filtering mechanism and a self-developed event analysis method based on attribute offset, and combines a multithreading concurrency technology to analyze, semantically fill, semantically correct and the like the event content, thereby realizing the high-efficiency analysis and processing of the Windows kernel event, ensuring the timeliness and simultaneously ensuring the integrity of the event log.
In step 1), the initialization includes: reading an event structure file to initialize an event structure, reading a filtering configuration file to initialize filtering information, reading an initialization module file to initialize module information, initializing a directional output object module, and dynamically adjusting and processing an event analysis task through an initialization thread pool.
An event structure file: the event structure file contains information of each type of event, including service provider identifier and operation code field of corresponding event, and offset and attribute size of each attribute value contained in corresponding event in event stream.
And (3) filtering the configuration file: the filtering configuration file comprises a plurality of filtering information, including a process number blacklist set, a process number whitelist set, a blacklist set composed of event type identifiers and a blacklist set composed of module paths.
Initializing a module file: the initialization module file defines a module path which needs to be loaded when the collector is started, analyzes the loaded module information, and extracts the function information under the module.
In step 3), performing preliminary event filtering on the original event stream, specifically including:
3.1) analyzing the process number information of the original event stream, extracting the process number, and performing primary filtering on the event by searching whether the process number exists in the blacklist process set or not;
3.2) constructing an event identifier object by resolving the service provider identifier (ProviderID) and the operation code OpCode field of the event stream, and then searching the corresponding object in the event type blacklist to filter the event once.
In step 4), event analysis specifically includes:
4.1.1) adopting an initialized event structure file to carry out binary system data extraction on the effective original event stream, and converting the effective original event stream into corresponding event attribute information to obtain a binary event attribute stream;
4.1.2) extracting an event identifier object from the effective original event stream, acquiring a corresponding attribute offset set from an event structure mapping table according to the event identifier object, and traversing the attribute offset set to analyze the binary event attribute stream;
4.1.3) obtaining an event object example after the circulation traversal is completed.
In step 4), event filtering specifically includes:
4.2.1) obtaining the latest process number according to the event object instance, if the latest process number exists in the set of the filtering process, filtering the event object instance, and if the latest process number does not exist, entering the step 4.2.2);
4.2.2) extracting the event identifier of the event object instance, monitoring only module loading information and function calling information required by a user in a customized white list mode if the event is an Image event, meanwhile, further filtering the file aiming at a module path in a black list, and transmitting an effective event object instance if the event is not the Image event.
In step 4), the semantic correction specifically includes:
and (3) carrying out semantic filling and correction on each object of the effective event object instance in sequence, wherein the semantic filling function is to carry out attribute filling on part of the effective event object instance, and if the FileName attribute does not exist in part of the File event instance, the File event instance needs to be subjected to semantic filling.
The semantic correction function is to correct the default attribute value of the partial effective event object instance, for example, the default setting of the partial File event instance to the thread number attribute value is "0 xffffffff", which cannot reflect the thread actually operating the event.
Each object comprises the following objects in sequence: file objects (File objects), Process objects (Process objects), CallStack objects, Image objects, systemscall objects.
Specifically, the event log collection method based on the Windows kernel comprises the following steps:
the method comprises the following steps: and an initialization stage, namely preparing a filtering configuration file and an event structure file. Event filtering is carried out on the filtering file after the primary event is analyzed and in the semantic correction stage, and an event structure file assists in an event analysis method; dynamically adjusting and concurrently processing an event analysis task by initializing a thread pool; initializing an output object to realize directional output of the acquired content;
step two: carrying out basic configuration on the ETW so as to successfully acquire Windows native log information;
step three: analyzing the native event information, performing primary event filtering, processing the unfiltered information of the original binary event stream to convert the unfiltered information into attribute values with clearer semantics, and adding operation source information to each type of event so as to further reflect the relationship between objects;
step four: in the process of collecting and analyzing data, attribute mapping is carried out on partial events and a data structure maintained in a memory is updated so as to reflect the change of the operating environment in an operating system in real time;
step five: and filtering and compressing the events through a black and white list mechanism configured in a filtering file. Filtering compression is generated in two places, the first filtering is in a callback function for receiving a native event stream, and the second filtering is in a self-defined event analysis function;
step six: in the semantic correction stage, the native event stream information reserves some key fields by default and needs to carry out semantic conversion on the key fields; the semantic missing phenomenon of part of event information needs to be filled with semantics;
step seven: and the event output stage can be output to a file or transmitted to another host through a Socket. The output is defined as a JSON character string, an event format standard is defined, and each type of event has public attributes and private semantic information.
An event log collection system based on a Windows kernel, comprising:
the initial collector module is used for reading files;
the ETW configuration module is used for acquiring Windows native event information;
a first-layer filtering module for primarily filtering the original event stream output by the ETW configuration module;
an event analysis module for analyzing the effective original event stream output by the first layer of filtering module;
a two-layer filtering module for filtering the event object instance output by the event analysis module;
an event semantic correction module for performing semantic correction on the effective event object instance output by the two-layer filtering module;
and the event output module is used for outputting the event object instance with the corrected semantics, which is output by the event semantics correcting module, by an event.
The file read by the collector initial module comprises: the system comprises an event structure file, a filtering configuration file and an initialization module file.
The beneficial effects of the invention are as follows: (1) compared with the current mainstream event acquisition method, the method realizes high-efficiency event information acquisition through a multithreading concurrent processing mechanism, and guarantees the integrity of event records while guaranteeing the real-time performance of data; (2) the method provides richer semantic information, repairs missing or default semantics existing in part of event records, and has application value; (3) the calling stack information and the system calling address are converted into a specific calling module and a function name, so that the event semantics are clearer; (4) compared with a TDH (trace Data helper) analysis library provided by ETW, the method has greatly improved performance and greatly reduced event loss rate; (5) through a configurable filtering mechanism based on the process, the event type and the module, a user can modify the filtering configuration file to realize the customized filtering requirement. The mechanism filters useless event information and further reduces the occurrence of event loss.
Drawings
Fig. 1 is an overall work flow diagram of the collector.
FIG. 2 is a workflow diagram of collector initialization.
Fig. 3 is a flow chart of the operation of a layer of filter modules.
FIG. 4 is an event resolution module.
Fig. 5 is a two-layer filtration module.
FIG. 6 is an event semantics modification module.
Fig. 7 is an event output module.
FIG. 8 is a graph showing memory variation under dense operation.
FIG. 9 is a graph showing the variation of the CPU under the intensive operation.
FIG. 10 is a graph showing the variation of the memory curve during normal operation.
FIG. 11 is a graph showing the change of CPU curves in daily operation.
Detailed Description
The invention is further described below with reference to examples. The following examples are set forth merely to aid in understanding the invention. It should be noted that, for those skilled in the art, it is possible to make various improvements and modifications to the present invention without departing from the principle of the present invention, and those improvements and modifications also fall within the scope of the claims of the present invention.
As shown in fig. 1, the present invention comprises the following modules: the system comprises an initial collector module, an ETW configuration module, a first-layer filtering module, an event analysis module, a second-layer filtering module, an event semantic correction module and an output module. The initial collector module needs to read some configuration files to pre-store some information, so that the subsequent analysis module can work conveniently; the ETW configuration module is used for adjusting parameters of four parts (a controller, a session, a consumer and a provider) of the ETW framework; the first-layer filtering module performs preliminary filtering on the original data stream according to the process number and the event type; the event analysis module is used for carrying out data analysis on the event stream filtered by the first-layer filtering module so as to obtain the private semantic information of the event. The second-layer filtering module is mainly used for filtering the analyzed event object according to the process number and the module name; the semantic correction module corrects the phenomenon of semantic missing or default existing in a part of events, for example, a FileName attribute does not exist in a part of File events, and at the moment, semantic filling needs to be performed on the File events.
The method comprises the following steps: and the collector initial module is used for preparing a filtering configuration file and an event structure analysis file.
Furthermore, the initialization module of the collector in fig. 2 shows various tasks in the initialization phase: 1.1, firstly, reading a pre-programmed event structure file and storing the pre-programmed event structure file in a hash table according to the form of < eventiddentifier (event identifier) and offset pair (offset), so as to facilitate the subsequent event analysis work.
1.2, reading a filtering configuration file, wherein four parts of filtering information are contained: the process blacklist, the event type blacklist, the module name blacklist and the module name white list are convenient for the work of the following two filter modules by pre-storing the four information.
1.3 reading the configuration file of the dll module path list, wherein the file contains all dll module paths under the System of 'C: \ System 32' folder and 'C: \ Windows \ SysWOW 64' and some key dll module paths started by software, and analyzing a module path while reading the module path, after analyzing all function information under the module, saving the function information together with the base address of the module and the module size information, and reading the function information in the subsequent Call Stack analysis and System Call analysis links, thereby avoiding analyzing the module file in the analysis process, improving the analysis efficiency and reducing the time delay.
1.4 the user can specify the output path and the type of event to listen to, for example, can choose to output to a file or output to a destination host through a socket combination; it is possible to choose to listen to a combination of events of file type and process type or to listen to all types of events.
1.5 in the stage of initializing a thread pool, starting 4-8 working threads and a task queue with the size not less than 50000, improving the performance of a collector by concurrently analyzing event information, and maintaining a main thread in a callback function for receiving a native event stream for event processing; when a multi-thread mechanism is triggered, part of event types received in a main thread can be sent to a task queue, meanwhile, a working thread in a thread pool can acquire tasks from the task queue to analyze the events, a concurrent thread processing task mechanism greatly reduces the event loss phenomenon, meanwhile, the real-time performance of log output is ensured, and the method can be used for collecting log information in part of experimental scenes with severe requirements on the real-time performance.
Step two: and carrying out basic configuration on the ETW so as to successfully acquire the Windows native event stream information.
Step three: and analyzing the information of the native event stream, and performing primary event filtering. And then, the unfiltered original binary event stream information is processed and converted into an attribute value with clearer semantics. Meanwhile, in order to further reflect the relation between the objects, operation source information is added to each type of event so as to carry out context semantic association.
Further, the modules shown in the event parsing modules of fig. 3 and 4 list the basic steps: 3.1 first, a preliminary filtering is performed based on the flow of FIG. 3. And a first-layer filtering module performs primary event filtering on the original event stream information. Firstly, analyzing the process number information of the event stream, and filtering the event once by searching whether the process number exists in a blacklist process set; secondly, an event identifier object is constructed by analyzing the provider ProviderID and the operation code OpCode field of the event stream, and then the corresponding object is searched in an event type blacklist to filter the events once.
3.2, performing binary data extraction on the native event stream, wherein the information of the binary data stream has no readability, and therefore the information needs to be converted into corresponding event attribute information.
3.3 in the process of analyzing the process event, storing the father process and the current process data of each process, and updating the process-module mapping table to record the module file loaded by the corresponding process.
And 3.4, when the event related to the Image is processed, updating the process-module mapping table and the module-function mapping table to reflect the functions in the modules which are possibly loaded by each process at present in real time, and correctly analyzing the virtual address information in the subsequent CallStack event.
3.5 adding operation source information, such as file operation events, to each type of event, further reflecting which process the caller of the operation is; and for other event types, filling corresponding parent process information, further enriching the event semantic information, and realizing filling of the parent process semantic information through a relation mapping table recorded when processing the process event.
For the analysis of the 3.2 native binary data stream, furthermore, a method for analyzing stream data by a TDH (time domain library) provided by Windows is abandoned, and a self-research attribute type offset-based method is adopted to analyze data.
3.2.1 the file structure content is loaded into the memory in the initialization phase in combination with the prepared event structure file, whereby the event content can be parsed based on the offset of each attribute to form an event object instance. The file structure content is the event structure mapping table shown in fig. 4, which stores several sets of < key: EventIdentifier, value: < pair < property1, Offset1> … pair < property n, Offset n > > > structures. Wherein the key is an eventiddentifier, namely an event identifier object, and the value is a binary set formed by the attribute and the attribute offset of the object. For example, the structural information of the RegistryCreate event is: RegistrycCreateDetector, value: < pair < InitialTime,8> … pair < KeyHandle,4> >.
3.2.2 extracting event identifiers from the native event stream, and obtaining corresponding attribute offset sets from the event structure mapping table according to the identifiers.
3.2.3 traverse the set of attribute offsets to parse the data stream. If the native data stream of the registry event contains 34-bit binary attribute data, according to the attribute offset set, firstly analyzing the first 8-bit binary data obtained into data of a PULONG type, and correspondingly obtaining an InitialTime field value, namely a time value of registry operation; then, the data stream is shifted left by the corresponding analytic digit; the next attribute value continues to be resolved based on the set of attribute offsets. Next, the 4-bit binary data is parsed into a PUSHORT type variable whose value corresponds to the Status attribute, i.e., the resulting state of the registry operation.
3.2.4 cycle traversal until the binary data stream traversal is completed. At this point a complete instance of the event is obtained.
Compared with a native TDH library, the method needs to call multiple function calls, has high performance overhead, and cannot meet the requirement of high efficiency of kernel acquisition.
Step four: in the process of collecting and analyzing data, attribute mapping is carried out on partial events and a data structure maintained in a memory is updated, so that the change of the operating environment in an operating system is reflected in real time.
Further, the method comprises the following steps: 4.1 extracting the event identifier object and distributing the event object to different event processing functions according to the identifier. Different types of events require updated maintenance of the corresponding data structures.
4.2 mapping the event attribute, for example, if the event is a process and thread type event, storing the state information of the current process and thread, and updating the data structure maintained by the user; if the event related to Unload is analyzed, the old data information needs to be deleted.
4.3 each process maintains the minimum address information and the maximum address information of the module loaded by the process, which is helpful for filtering the address information in the CallStack event, and only the module and the function information required by the user need to be analyzed, so that the minimum address and the maximum address information of the module loaded by the process need to be updated during the processing of the Image event.
Step five: and filtering and compressing the events, wherein the filtering is realized through a self-defined black and white list mechanism and can be configured in a filtering file. Filter compression occurs in two places, first a preliminary filtering of events in a callback function that receives the native event stream, and a second filtering in a custom event parsing function.
One layer of filtering is applied in step three.
FIG. 5 illustrates the specific tasks of the two-layer filter module: 5.1 firstly, the module filters events according to the latest process number, and the difference between the module and the one-layer filtering module is that the process number of the one-layer module is extracted from the event header part, wherein, the process number of part of events is in a default condition (namely, the process ID value is '0 xfffffff'), and the event analysis part needs to correct the part of events according to the information that the process number may exist in the analyzed attributes. And then filtering the event according to the updated process number.
And 5.2, filtering the Image event, wherein only module loading information and function calling information required by a user can be monitored in a customized white list mode, and meanwhile, files are further filtered according to a module path in a black list, so that the method can be applied to log collection of certain specific viruses.
Step six: in the semantic correction stage, the native event stream information is subjected to default reservation on some key fields and needs to be subjected to semantic conversion; and the semantic missing phenomenon of part of event information needs to be filled with semantics.
The semantic correction module shown in fig. 6 introduces specific steps: 6.1 semantic filling and correction are performed to different degrees for each type of event type. For example, in a File-related event, part of semantics thereof needs to be filled, and a FileName attribute does not exist in the part of File event, for example, read FileName information is absent in a File read event, and at this time, semantic filling needs to be performed on the File event. The native TCP/IP event makes default setting for the thread ID, the default setting is '0 xffffffff' and cannot reflect the real operated thread object, and the semantic correction is carried out by combining the mapping relation between the processor number and the thread number in the CSwitch event so as to truly reflect the thread corresponding to the network event and the host process.
6.2 originally provided by the CallStack event ETW is virtual memory address information, which cannot explicitly reflect which module file function is actually called, and the information corresponding to the virtual addresses needs to be analyzed by means of the module and function information saved before. For example, the function virtual address "0 x41132FA 2" corresponds to the module function "C: \ Windows \ System32\ KernelBase.dll: RegOpenKeyExlnnalW".
6.3 when analyzing the event of the Image class, updating some mapping tables and correspondingly loading the module file.
There is a field ImageFileName in the Image type event that indicates the path information of which module is currently loaded or unloaded. And updating the data structure of the process-module mapping according to the process number and the module path which generate the event, and updating the mapping between the minimum address and the maximum address which the process has. And analyzing the module content, acquiring and storing function information in the module, wherein the analyzed information can be used in the analyzing stage of the subsequent CallStack event.
6.4 for SystemCall events two kernel files of the operating system need to be loaded: sys and ntkrnolos are used to obtain commonly used kernel function information. The virtual address mapping mode of SystemCall is different from the CallStack analysis method, and the virtual address can be directly mapped to the function address in the module.
6.5 part of the event does not need to be analyzed, and can be directly transmitted to an output module by default processing.
After the filtering is performed, valid event information remains. The output is defined as a JSON character string, an event format standard is defined, and each type of event has public attributes and private semantic information. To multiplex the generated event information, three output modes are defined: the first is to output to a command line, the second is to specify a file path for output, and the third is to output the event content to a specified host through Socket communication.
The event output module shown in fig. 7 defines specific tasks: 7.1 the module will process the event information in the output queue. The output mode can be Socket communication or file writing, and is determined according to command line options and parameters input by a user when the user executes the collector. The construction of the output object is completed in an initialization phase according to the output options input by the user when starting the program.
7.2 the worker threads in the thread pool pass the event data to the output queue after the event is resolved. Considering that the rate of generating events is fast, the output thread is opened to process the event set in the output queue, and in order to reduce the consumption of performance as much as possible, a condition variable is adopted to replace a spin lock to realize a synchronization mechanism.
7.3 when the output queue length increases to the output threshold, it wakes up the waiting output processing thread, which empties the output queue and transfers the data completed and to the destination.
The following conclusions can be obtained through experiments: the performance of the invention under a stable state (basic operation such as office) and a dense operation state (journey VS, 3D game) is shown on a Win10 operating system, an Intel I5-10400 (6 cores 2.9G HZ) and a 32GB memory host. In a stable state, the invention has lower expenditure, the CPU consumption is stabilized at about 1 percent, and the memory consumption is stabilized at 50 MB. In the intensive operation state, the consumption of the invention is improved, and the memory existence is gradually reduced after being increased to 170MB, because the event information and mapping relation which need to be loaded and processed under the intensive operation are more, and particularly the peak value is reached after the game is opened. But then gradually descends to be smooth. The CPU is always kept at about 10% to 1% of the last, the time delay from executing certain operation to collecting the related events is approximately stabilized at about 0.5 second, and efficient and lossless event log collection is realized under the condition of ensuring real-time performance.
The system resource consumption testing method comprises the following steps: 1. daily operation is to simulate the normal working condition of a user, namely to edit documents, edit forms, browse webpages and the like; 2. intensively operating, deleting files in batches and creating files in batches; large software and 3D games are opened.
The system resource consumption test results are shown in fig. 8, 9, 10, and 11.
The real-time testing method comprises the following steps: 1. scanning local E disc and C disc, and totaling 75578 folders and 324127 files; 2. another 20 web pages are visited, such as the common "google.com", "bin.com", "baidu.com", "qq.mail.com", etc.; 3. txt file of desktop is opened.
And (3) real-time test results: the "test. txt" related event data information can be obtained within 0.5 seconds of delay. The operation of the steps 1 and 2 is mainly used for proving that the relevant events can be still collected in real time after a large number of events generated by intensive operation are processed, and the high efficiency and the real-time performance of the method are embodied.

Claims (10)

1. An event log collection method based on a Windows kernel comprises the following steps:
1) initializing;
2) carrying out basic configuration on the ETW frame to obtain Windows native event information;
3) the Windows native event information is used as an original event stream, and the original event stream is subjected to primary event filtering to obtain an effective original event stream;
4) performing event analysis, event filtering and multithreading concurrent processing of semantic correction on the effective original event stream to obtain an event object example with corrected semantics;
5) and (5) outputting the event of the event object instance with the corrected semantics to finish acquisition.
2. The Windows kernel-based event log collection method according to claim 1, wherein in step 1), the initializing specifically includes:
reading the event structure file to initialize the event structure, reading the filter configuration file to initialize the filter information, reading the initialization module file to initialize the module information, and dynamically adjusting and processing the event analysis task through initializing the thread pool.
3. The Windows kernel-based event log collection method according to claim 2, wherein in step 1), the event structure file includes: the service provider identifier and the operation code field corresponding to the event and the offset and the attribute size of each attribute value contained in the corresponding event in the event stream;
the filtering configuration file comprises a process number blacklist set, a process number white list set, a blacklist set composed of event type identifiers and a blacklist set composed of module paths;
the initialization module file defines a module path loaded when the collector is started, analyzes the loaded module information and extracts function information under the module.
4. The method for collecting event logs based on the Windows kernel according to claim 1, wherein in the step 3), the preliminary event filtering is performed on the original event stream, which specifically includes:
3.1) analyzing the process number information of the original event stream, extracting the process number and performing primary filtering on the event by searching whether the process number exists in a blacklist process set of a filtering configuration file;
3.2) constructing an event identifier object by analyzing the service provider identifier and the operation code field of the original event stream, and then searching a corresponding object in an event type identifier blacklist of the filtering configuration file to filter the event once.
5. The Windows kernel-based event log collection method of claim 1, wherein in step 4), the event parsing of the valid original event stream specifically comprises:
4.1.1) adopting the initialized event structure file to extract attribute data of the effective original event stream to obtain an event attribute stream;
4.1.2) extracting an event identifier object from the effective original event stream, acquiring a set of corresponding attribute offsets from an event structure mapping table of the event structure file according to the event identifier object, and traversing the set of attribute offsets to analyze the event attribute stream;
4.1.3) obtaining an event object example after the circulation traversal is completed.
6. The Windows kernel-based event log collection method according to claim 1, wherein in step 4), the event filtering specifically includes:
4.2.1) obtaining the latest process number according to the event object instance, if the latest process number exists in a set of filtering processes of the filtering configuration file, filtering the event object instance, and if the latest process number does not exist, entering the step 4.2.2);
4.2.2) extracting an event identifier object of the event object instance, if the event is an Image type event, monitoring only module loading information and function calling information required by a user through a module path white list of the filtering configuration file, meanwhile, further filtering the event object instance aiming at the module path in the black list of the filtering configuration file to obtain an effective event object instance, and if the event is not the Image type event, transmitting the effective event object instance.
7. The Windows kernel-based event log collection method of claim 1, wherein in step 4), the semantic modification specifically comprises:
and carrying out semantic filling and correction on each object of the effective event object instance in sequence.
8. The Windows kernel-based event log collection method of claim 7, wherein each object sequentially comprises: file objects, Process objects, Callstack objects, Image objects, systemlall objects.
9. A system for realizing the Windows kernel-based event log collection method of any one of claims 1 to 8, comprising:
the collector initial module is used for reading files;
the ETW configuration module is used for acquiring Windows native event information;
a first layer of filtering module for primarily filtering the original event stream output by the ETW configuration module;
the event analysis module analyzes the effective original event stream output by the first layer of filtering module;
a two-layer filtering module for filtering the event object instance output by the event analysis module;
an event semantic correction module for performing semantic correction on the effective event object instance output by the two-layer filtering module;
and the event output module is used for outputting the event of the event object instance with the corrected semantics, which is output by the event semantics correcting module.
10. The system of claim 9, wherein the files read by the collector initial module comprise: the system comprises an event structure file, a filtering configuration file and an initialization module file.
CN202211061051.6A 2022-08-31 2022-08-31 Event log collection method and system based on Windows kernel Active CN115129494B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211061051.6A CN115129494B (en) 2022-08-31 2022-08-31 Event log collection method and system based on Windows kernel

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211061051.6A CN115129494B (en) 2022-08-31 2022-08-31 Event log collection method and system based on Windows kernel

Publications (2)

Publication Number Publication Date
CN115129494A true CN115129494A (en) 2022-09-30
CN115129494B CN115129494B (en) 2022-11-25

Family

ID=83387603

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211061051.6A Active CN115129494B (en) 2022-08-31 2022-08-31 Event log collection method and system based on Windows kernel

Country Status (1)

Country Link
CN (1) CN115129494B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116450885A (en) * 2023-02-14 2023-07-18 厦门市兴百邦科技有限公司 Data reconstruction method of Windows event log file
CN117667604A (en) * 2024-01-31 2024-03-08 腾讯科技(深圳)有限公司 Data monitoring method, device, electronic equipment and storage medium for tracking event
CN117742783A (en) * 2024-02-19 2024-03-22 成都九洲电子信息系统股份有限公司 Cross-language automatic log data recording method for software system

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100229157A1 (en) * 2009-03-06 2010-09-09 Microsoft Corporation Extracting and collecting platform use data
CN102779087A (en) * 2011-05-06 2012-11-14 Sap股份公司 Systems and methods for business process logging
US20140109112A1 (en) * 2012-03-26 2014-04-17 Nec Laboratories America, Inc. Method for Request Profiling in Service Systems with Kernel Events
CN109614300A (en) * 2018-11-09 2019-04-12 南京富士通南大软件技术有限公司 A kind of file operation in the WPD based on ETW monitors method
CN110288004A (en) * 2019-05-30 2019-09-27 武汉大学 A kind of diagnosis method for system fault and device excavated based on log semanteme
CN112468472A (en) * 2020-11-18 2021-03-09 中通服咨询设计研究院有限公司 Security policy self-feedback method based on security log association analysis
CN113676464A (en) * 2021-08-09 2021-11-19 国家电网有限公司 Network security log alarm processing method based on big data analysis technology
CN114780370A (en) * 2022-05-10 2022-07-22 中国平安财产保险股份有限公司 Data correction method and device based on log, electronic equipment and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100229157A1 (en) * 2009-03-06 2010-09-09 Microsoft Corporation Extracting and collecting platform use data
CN102779087A (en) * 2011-05-06 2012-11-14 Sap股份公司 Systems and methods for business process logging
US20140109112A1 (en) * 2012-03-26 2014-04-17 Nec Laboratories America, Inc. Method for Request Profiling in Service Systems with Kernel Events
CN109614300A (en) * 2018-11-09 2019-04-12 南京富士通南大软件技术有限公司 A kind of file operation in the WPD based on ETW monitors method
CN110288004A (en) * 2019-05-30 2019-09-27 武汉大学 A kind of diagnosis method for system fault and device excavated based on log semanteme
CN112468472A (en) * 2020-11-18 2021-03-09 中通服咨询设计研究院有限公司 Security policy self-feedback method based on security log association analysis
CN113676464A (en) * 2021-08-09 2021-11-19 国家电网有限公司 Network security log alarm processing method based on big data analysis technology
CN114780370A (en) * 2022-05-10 2022-07-22 中国平安财产保险股份有限公司 Data correction method and device based on log, electronic equipment and storage medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
JOHN DWYER等: "Finding anomalies in windows event logs using standard deviation", 《9TH IEEE INTERNATIONAL CONFERENCE ON COLLABORATIVE COMPUTING: NETWORKING, APPLICATIONS AND WORKSHARING》 *
吴红: "ETW在APT进程检测中的应用场景", 《网络安全技术与应用》 *
徐鲲等: "Windows NT下对磁盘性能监测的研究", 《计算机科学》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116450885A (en) * 2023-02-14 2023-07-18 厦门市兴百邦科技有限公司 Data reconstruction method of Windows event log file
CN116450885B (en) * 2023-02-14 2024-05-03 厦门市兴百邦科技有限公司 Data reconstruction method of Windows event log file
CN117667604A (en) * 2024-01-31 2024-03-08 腾讯科技(深圳)有限公司 Data monitoring method, device, electronic equipment and storage medium for tracking event
CN117667604B (en) * 2024-01-31 2024-05-14 腾讯科技(深圳)有限公司 Data monitoring method, device, electronic equipment and storage medium for tracking event
CN117742783A (en) * 2024-02-19 2024-03-22 成都九洲电子信息系统股份有限公司 Cross-language automatic log data recording method for software system
CN117742783B (en) * 2024-02-19 2024-06-07 成都九洲电子信息系统股份有限公司 Cross-language automatic log data recording method for software system

Also Published As

Publication number Publication date
CN115129494B (en) 2022-11-25

Similar Documents

Publication Publication Date Title
CN115129494B (en) Event log collection method and system based on Windows kernel
US20030131093A1 (en) System for generating usage data in a distributed information processing environment and method therefor
US9164998B2 (en) Archive-system-independent archive-type objects
CN106982150B (en) Hadoop-based mobile internet user behavior analysis method
US8010519B2 (en) Method and system for mitigating impact of user errors in data stores
US10521423B2 (en) Apparatus and methods for scanning data in a cloud storage service
US11604789B1 (en) Bi-directional query updates in a user interface
US20220075791A1 (en) Storing data items and identifying stored data items
CN102779244A (en) Method and device for carrying out file operation
CN113364801A (en) Management method, system, terminal device and storage medium of network firewall policy
CN105786997A (en) IOS-system-based image caching and compression method
US9658860B2 (en) System and method for optimizing bootup performance
US8600990B2 (en) Interacting methods of data extraction
US20100017863A1 (en) Portable storage apparatus for providing working environment migration service and method thereof
CN110210241A (en) A kind of data desensitization method and device
US20220398128A1 (en) Distributed task assignment in a cluster computing system
US7647588B2 (en) Smart archive for JAR files
US9684667B2 (en) System and method of optimizing the user application experience
Gao et al. A forensic method for efficient file extraction in HDFS based on three-level mapping
US7433875B2 (en) Web store events
Chen et al. Kellect: a kernel-based efficient and lossless event log collector
WO2020211371A1 (en) Image restoration method and apparatus, device and storage medium
Chen et al. Electronic evidence service research in cloud computing environment
US12079194B1 (en) Encoding table schema and storage metadata in a file store
CN109492037A (en) Collecting method and equipment based on Redis and Logstash

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant