CN113835952B - Linux system call monitoring method based on compiler code injection - Google Patents

Linux system call monitoring method based on compiler code injection Download PDF

Info

Publication number
CN113835952B
CN113835952B CN202111027217.8A CN202111027217A CN113835952B CN 113835952 B CN113835952 B CN 113835952B CN 202111027217 A CN202111027217 A CN 202111027217A CN 113835952 B CN113835952 B CN 113835952B
Authority
CN
China
Prior art keywords
function
call
information
system call
llvm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111027217.8A
Other languages
Chinese (zh)
Other versions
CN113835952A (en
Inventor
王震
徐少坤
苗泉强
秦富童
鲁智勇
刘迎龙
周超
樊永文
吴迪
王鹏
王少磊
石鹏飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Unit 63891 Of Pla
Original Assignee
Unit 63891 Of Pla
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Unit 63891 Of Pla filed Critical Unit 63891 Of Pla
Priority to CN202111027217.8A priority Critical patent/CN113835952B/en
Publication of CN113835952A publication Critical patent/CN113835952A/en
Application granted granted Critical
Publication of CN113835952B publication Critical patent/CN113835952B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/302Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system component is a software system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • G06F8/44Encoding

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Quality & Reliability (AREA)
  • Software Systems (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention relates to the technical field of network security, and discloses a Linux system call monitoring method based on compiler code injection, which comprises the following steps: the invention can carry out complete and efficient work and test on the construction and the function test of the environment of the monitoring system by calling the monitoring system provided by the monitoring scheme, and can also be used as a monitoring mechanism of a system protection mechanism during operation, and when an untrusted program requests to execute sensitive operation, the system can timely find and carry out security analysis or interception.

Description

Linux system call monitoring method based on compiler code injection
Technical Field
The invention relates to the technical field of network security, in particular to a Linux system call monitoring method based on compiler code injection.
Background
With the rapid development of the internet, information security events are also frequent, and people pay more and more attention to the security of server systems exposed to the internet. Linux is one of the most commonly used server operating systems, and is also very serious in terms of malicious code threats.
Research on malicious code analysis by information security has been continued for many years, wherein dynamic analysis is a better analysis mode in the analysis of highly confusing malicious codes at present, and system call monitoring is an important analysis information source in dynamic analysis. Through system call monitoring, security personnel can intuitively discover all sensitive operation behaviors of an application program and cannot be influenced by code confusion. Besides being used as a malicious code analysis tool, the system call monitoring can also be used as a monitoring mechanism of a system protection mechanism in the running process, and when an untrusted program requests to execute sensitive operation, the system can timely discover and conduct security analysis or interception. The current mainstream system call monitoring schemes need to modify the system environment or the kernel, are not very friendly for large-scale deployment in a microprocessor system, and therefore, a mode of realizing the system call monitoring scheme by injecting compiler codes is needed to directly integrate monitoring logic into compiled executable files, and the compiled executable files are distributed to each system to finish monitoring the system call.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention provides a Linux system call monitoring method based on compiler code injection.
In order to achieve the aim of the invention, the invention adopts the following technical scheme:
a Linux system call monitoring method based on compiler code injection comprises the following steps: the method comprises the steps of overall design of monitoring system modules, scheme flow design of the monitoring system, specific design of each module of the monitoring system, realization of a system call monitoring system, and test of the system call monitoring system, wherein the system call monitoring system is used for verifying and testing functions and system loss of the realized system; the method comprises the following steps:
1) The overall design of the monitoring system module is that a set of system calling execution is established for searching in real time when an application program runs, and calling related information is provided for security personnel, so that the security personnel can intuitively monitor the system calling behavior of the application program; the main functional requirements of the system are therefore:
(1) The starting point of the monitoring system for all the system call functions in the application program can be searched, namely the discovery of the system call functions, because the C library functions still appear in the form of function interfaces in the LLVM IR stage, the concrete implementation of the C library functions are still loaded in the subsequent linking process, and therefore, the monitoring system takes the user API functions corresponding to the system call as the monitored objects;
(2) Collecting real-time information of a calling function, wherein the information of the function in execution mainly comprises information such as function names, parameters, return values and the like, and the running information and process related information of an executable file need to be comprehensively collected;
(3) Outputting the collected information in a user readable form, wherein the pointed content is not permanently stored because the parameters of the system call are mostly in the form of pointers during execution
The readability is not strong, so we need to have the collected information in a form readable by the user
The output is convenient for the log record of security personnel;
the monitoring scheme is to realize the whole monitoring effect by injecting in the IR optimization stage of the LLVM compiler, and the system is totally composed of three modules, namely a system call searching module, a call information collecting module and a monitoring information formatting output module;
the system call searching module is responsible for searching a monitoring target of the monitoring system, and the searching granularity is the level of a system call API function; the call information collection module is responsible for collecting all information and process related information in one function call; the monitoring information formatting output module is responsible for formatting output of information collected by a monitoring system in the process of program operation and provides analysis;
2) The scheme flow design of the monitoring system is that the core processing time of the monitoring system is LLVM IR middle optimization stage of the LLVM compiler, all written processing logic is loaded into the LLVM IR middle optimization logic in a registered Pass mode, and an operation object is LLVM IR bit code file which can be divided into a search stage, an information collection stage and an information output stage according to the time sequence. These four phases also correspond to the four functional modules of the solution. The processed content is LLVM IR bit code file of the injection monitoring logic, and the LLVM project provided assembly tool set is used for generating a final executable file from the bit code file after injection;
3) The specific design of each module of the monitoring system is to process the specific operation details related to each module;
A. the system call searching module is a starting point of the whole monitoring system and is also a part for dividing the fine granularity of the whole monitoring system; according to the compiling flow characteristics of the LLVM and the appearance form of the system call in the LLVM IR, setting the system call API function as a monitoring starting position; the function of the system call searching module is to analyze the function call instruction in the LLVM IR bit code file, judge whether the function call is a system call API function or not, and execute further monitoring logic according to the judging result;
(1) Analyzing the LLVM IR bit code file, and searching a function call instruction in the LLVM IR bit code file;
(2) If the function call is not the system call API function call, skipping the call instruction, and continuing to search for the next function call instruction;
(3) If the function call belongs to the system call API function call, the method comprises the following steps:
a. acquiring a called API function from the function call instruction;
b. transmitting the adjusted API function instance to a system call information collection module;
c. after the information collection module finishes processing, continuing searching for the next calling instruction;
B. the calling information collection module is a core function of the whole system calling monitoring system and is a part for acquiring information when the system calls all programs to run; the function of the call information collection module is to process the input system call API function and analyze the call number and parameters of the system call;
the information needed to be obtained is divided into system call information, runtime API function information and related process information; the system call information comprises a system call name and a system call number; the running API function information comprises parameter content and return value information which are transmitted by the system call; the related process information is a process number for initiating the system call; the logic is as follows:
(1) Waiting for the system call searching module to transfer the system call API function;
(2) Analyzing system call information in the call function, taking out the system call and obtaining a corresponding system call number;
(3) Analyzing the runtime information in the calling function, and storing the input parameters and the return value;
(4) Injecting and acquiring process information of initiating a calling function, and storing the acquired process number;
because the API function information and the related process information are required to be acquired in real time when the program runs, the two information which need to be acquired are required to be stored in a form of a shape parameter, and the real value of the API function information and the related process information is required to be acquired when the program runs;
C. the monitoring information formatting output module is a result display part of the monitoring system, is responsible for outputting the collected monitoring information in a form readable by a user from the program running time, and is stored in a lasting manner, so that the subsequent further analysis of the calling behavior of the application program system is facilitated;
the formatted output of the monitoring information is divided into a formatted text generation part and a file output part; in order to facilitate analysis of monitoring results by users, firstly, the types of the acquired information are needed, corresponding formatted texts are generated according to the types of the acquired information, and the formatted texts are stored in an operation catalog of the program in a lasting mode in a file mode;
4) Implementation of a System Call monitoring System
4.1 realizing a system call searching module, wherein the function of the system call searching module is to search call instructions for initiating function call in LLVM IR bit code files, judge whether the called function belongs to a system call API function, if so, transmit the called function instance to a call information collecting module, and carry out subsequent flow; if not, continuing searching for the next instruction until all instructions in the LLVM IR file are traversed; the tasks of the system call search module are:
a. traversing all instructions in LLVM IR bit code file
b. Acquiring an instance of a function called in a function call instruction
c. Judging whether the called function is a system call API function
Through the realization of each task, the specific steps of the system calling module are implemented:
(1) For task 1, in the LLVM, the object processed by the Pass is a Module composed of one or more LLVM IR bit codes, and in the Pass file, the written Pass needs to be registered in an intermediate optimization tool opt through a declaration PassManger Interface method; in the process, a Pass subclass realizing a run member method is required to be transmitted into an addPass method of a Modulepass manager class, and an LLVM IR system function enables an opt tool to be input, namely a Module to be processed, to be transmitted into the run method;
So far, the whole Module real number of the bit code file to be processed is obtained, and then the instruction in the bit code file is searched in the Module real number; in the/llvm/IR/Module.h file, it can be seen that an iterator for traversing the functions in the Module class is implemented inside the Module class, so that the functions in the incoming Module can be traversed by using a for statement; the same Function class implements an iterator for the internal basic block, and the basic block implements an iterator for the internal instrumentation class; therefore, starting from the incoming Module real number, all instructions in the LLVM IR bit code file can be traversed by using three layers for loops;
in particular, since the LLVM IR phase also contains some C standard library functions that appear only in declarative form in the IR and are not implemented by internal functions, these declarative functions need to be skipped by the Function: iscycloration () method when traversing functions;
(2) For task 2, in LLVM IR, the Instruction class is a base class of all Instruction classes, and different types of instructions correspond to different Instruction class subclasses respectively; the instruction class subclass corresponding to the function call instruction call is a CallInst class;
In Pass development, a dyn_cast () method is provided to perform real number type conversion, if conversion can succeed, a converted real-like number is returned, and if failure, a null pointer is returned; thus, identification of the function call Instruction is accomplished by attempting a transition to the CallInst instance for each instance of instrumentation; after the CallInst instance corresponding to the function call instruction is obtained, a CallInst () method is used for obtaining the function instance called by the function call instruction;
(3) For task 3, after obtaining the Function real number of the called Function, obtaining the Function name of the Function by a Function name () method, wherein the system call API Function is required to be searched, and the Function names of the functions are known; therefore, a system call API name set is set, and the MAP data structure is used for transmitting the system call API function list into the name set when the monitoring system is initialized; searching the function name of the called function in the call information collection module, and if the search is successful, indicating that the called function is a system call API function, and transmitting the function instance to the call information collection module; otherwise, the target function is not searched, and continuing the next instruction traversal;
4.2. The realization of the calling information collection module, wherein the function of the calling information collection module is to process and analyze the input system calling API function, collect and integrate all relevant information interested by the user, and then position the injection point of the output module; the information to be collected is divided into system call information, runtime function information and related process information; the task of invoking the information collection module is therefore:
a. acquiring attribute information of the system call, including a system call name and a system call number;
b. acquiring runtime information of a system call API function, wherein the runtime information comprises the input parameters, a system call return value and a call instruction pointer;
c. acquiring relevant process information for initiating the system call, defining a SystemCallInfo class in a call information collection module by a process number for executing the API function of the system call, wherein the SystemCallInfo class is used for representing information required to be collected for each system call, and saving all the information required to be collected by using member variables, so that when a system call function instance is processed, all the information acquired after the processing is finished is saved in the SystemCallInfo instance;
(1) For task 1, in the system call search module, a predefined system call API function set is stored by using a Map data structure, wherein the key name of the function set is a system call name, and the key value is a corresponding system call number; therefore, by searching key value pairs in the system call API function set, the system call name and the corresponding system call number used in the call can be acquired and respectively stored in the callName member variable and the callNo member variable of the SystemCallInfo instance;
(2) For task 2, in LLVM IR, the parameters of the function call appear in the form of an opcode within the call instruction, so that the located opcode within the call instruction needs to be traversed to obtain all the parameters of the incoming system call API function; two methods of op_begin () and op_end () are provided in the function call class calllnst class, the first operand form parameter and the last operand form parameter are returned respectively, and the two return values are used as boundaries for traversing, so that all runtime parameters of the system call function can be obtained; because the parameters of different system call API functions are of different lengths, a vector data structure in C++ is used for declaring callArgs member variables for storing call parameters, and traversed parameter results are sequentially stored;
meanwhile, in LLVM pass programming, the instruction instance pointer is also a subclass of Value, can be directly used as an instruction return Value pointer, and can directly store the positioned call instruction pointer as a system call return Value into a callRet member variable of the SystemCallInfo instance; because all information of the task is stored in the form of a line parameter, logic for acquiring parameters needs to be injected into a source LLVM IR bit code during subsequent output, an injection point pointer of a formatted output module also needs to be stored in an information collecting module, and the next instruction pointer of an instruction where a system call is positioned is used as an injection point; the reason for choosing this position is two: firstly, after the system call is executed, the return value is determined, so that the injection point is required to be after the system call instruction; secondly, because the stored runtime information is the shape parameters, in order to avoid the change of the shape parameters in the subsequent execution as far as possible, the injection point should be selected as far as possible;
(3) For task 3, the system call monitoring is operated in a user mode of the Linux system, so that the process related information can be acquired only through a getpid () system call function, and in order to acquire the process of a call instruction where the system call API function is located as accurately as possible, the getpid () system call is injected into the call instruction where the system call API function is located to acquire the process number where the current call instruction is located;
in LLVM pass programming, the core function of injecting content into IR is IRBuilder, the CreateCall () method, the concrete steps are that the real number of the Builder injection node of an IRBuilder class is instantiated first, and two methods exist during the initialization of the Builder: firstly, a BasicbLock real number is transmitted, which leads the injection point pointer to point to the tail of the basic block; secondly, an instance of Instruction is entered, which will cause the injected content to be inserted before the Instruction; using a second initialization method, causing the getpid () system call to be injected before the monitored system call instruction; the CreateCall () receives the real number of the injection function of a function Calle class, and other implementation details of the injection function call are shown in the implementation of the next module task 2;
4.3. the monitoring information formatting output module is realized, and the function of the monitoring information formatting output module is to acquire the shape parameter information recorded by the calling information collecting module in real time and store the shape parameter information in a format text with strong readability in a lasting manner; the task of the module is therefore:
Judging the type of each piece of data acquired by the call information collecting module, and generating and storing a proper formatted text;
performing persistent storage on the converted formatted text;
(1) The system call name and the system call number saved by the call information collection module for the task 1 are predefined, so that the data types are known and are respectively a character string type and a shaping number type; the formatted text content herein is in a fixed format;
the data of the system call parameter, the system call return value and the process number of the process to which the system call instruction belongs are all value pointers in LLVM IR; in LLVM pass programming, all Value instances are subclasses of Value, the Value Type of the instance is obtained through a Value:: getType () method, and then a selection branch is constructed through a Type:: isIntegerTy (), type:: isFloatTy (). The Type discrimination method, and the data pointer and the corresponding formatted text are mapped in sequence, so that the formatted text content dynamically adapts to different data sizes and data types;
(2) For task 2, the formatted readable text is subjected to persistent storage in a mode of writing a file, so that the injection of a file operation related function is realized, and the file operation related function is divided into two parts, namely function statement injection and function call injection;
In LLVM pass programming, a core function used for declaring a C language standard library function is injected as a Module, namely a getOrInsertFunction () method, and the function type of the declared function needs to be transmitted into the method; the Function type refers to signature information of the Function, comprises parameter types and return value types of the Function, and is constructed by a Function () method; statement of fopen (), fprintf () and fclose () functions are injected in the module to facilitate call injection of file operation later;
the function call injection of the formatted output module is to inject an instruction for calling the log file operation function into the source LLVM IR bit code file; the method comprises the steps of firstly, declaring a corresponding injection Builder for each injection point, wherein the calling injection point of the fopen () function is the starting position of the main function, and before the calling injection point of the fclose () function is a return instruction of the main function, the calling injection point of the fprintf () function is the information output injection point stored in the information collection module; then, the call to the file operation function is injected through an IRBuilder CreateCall () method; unlike calling the getpid () function, the injection of the file operation function also needs to provide the parameter content that is imported by the function call, and these parameters are sequentially put into an ArrayRef < Value > type variable as the second parameter of the CreateCall () function; the parameter content which is input must be the existing value content in LLVM IR bit code, and a new parameter content is generated by a mode of inserting global variables through a Module, getOrInsertGlobal () method;
5) The system calls the test of the monitoring system, and the function and the system loss aspect of the implementation system are verified and tested;
5.1. the environment configuration of the monitoring system test, CPU Intel (R) Core (TM) i5 CPU@2.70GHz,
(1) The LLVM project realizes that the monitoring system core is a Pass component in an LLVM compiler, the source code version of the used LLVM project is 11.0, the assorted front-end clang version is 6.0, and the LLVM executable program after compiling is needed, and library files in the LLVM project are needed, so that the source code of the whole LLVM project is needed; because LLVM project versions iterate more, the difference among versions is also larger, so that the use of a git tool to download source codes of corresponding versions from LLVM official warehouses on the gitub is recommended;
(2) The method comprises the steps of (1) constructing a cmake as a dynamic shared library file to be provided for an intermediate code optimization Pass loading tool in the LLVM to finish injection of monitoring logic, wherein the construction tool used in the method is the cmake, and the version number is 3.20;
cmake uses procedural description files, commonly named cmake lists.txt, to specify the link library files, compilation parameters, used in the compilation process; the method comprises the steps of locally constructing through Ubuntu open source software source installation or downloading corresponding version source codes in an open source library through a git tool;
5.2. Functional testing of monitoring system
(1) The construction of a monitoring system Pass, wherein a include, lib, build, test folder is arranged under the Pass menu; partial header files used by the Pass are stored under the include, main body source codes of the Pass are stored under the lib, all monitoring logic main bodies are arranged in the include, build is used for storing compiled output files, the compiled dynamic shared library files are the Pass, and a c-language test sample code for testing the monitoring effect is provided under the test menu;
corresponding CMakeLists.txt files are provided under the Pass menu and the lib menu and are used for introducing and designating output positions of library files provided by LLVM items, so that the library files are constructed under the main catalog by using a cmake tool; it will automatically generate a compilation file according to the provided cmakelist.txt configuration, after which compilation of the required dynamic link library is completed using the make command; the finally generated dynamic shared library file is positioned in build/lib/libScout.
(2) The monitoring system realizes the effect, and in order to test the monitoring effect of the monitoring system on the system call function, 10 kinds of system calls are selected as test sample programs for testing.
Firstly, compiling test source codes into bit code IR files through a clang front end, and loading monitoring Pass provided herein by using opt tools provided in LLVM; the registered Pass names, here the positions of the dynamic link libraries where the scout and Pass are located, need to be specified in the opt command; after loading is finished, a bit code IR file after Pass processing is finished is obtained, at the moment, the processed bit code IR file is interpreted and executed in a JIT mode by using a lli tool, and the IR can be converted into an assembly file by using a llc tool and finally an executable file is generated for compiling and executing; since the test brk () function may have a stack error when using LLVM JIT execution, compilation execution is selected;
After the program is executed, all monitoring information is output to monitor.log under the test sample directory, and time, open, read, write, close system calling functions used in the test sample are captured by the monitoring system and run-time information of the time, open, read, write, close system calling functions is output to a log file monitor.log.
By adopting the technical scheme, the invention has the following advantages:
the invention provides an operating system call monitoring method based on compiler code injection, which is characterized in that through the overall design of a system call monitoring module, the scheme flow design and the design scheme of each module, the limitation analysis in the process of monitoring deployment of the current system call is provided, the implementation details of the system call monitoring system and the design requirements of system test are provided, and corresponding functional modules are designed according to the requirements.
In the design part, the relation among the modules and the processing flow inside the modules are disclosed, and the specific working scheme of the monitoring logic is specifically described through the flow. The method comprises the technical details of how to write and process a Pass component of the LLVM IR bit code file, how to traverse a call instruction in the IR file, how to inject function call logic and the like, which are related in a system module. In the monitoring system test, the implemented system is completely tested from two aspects of environment construction and function test of the monitoring system, and the environment construction module logic process and the monitoring function of the monitoring system have high monitoring effects, so that the feasibility of calling the monitoring system by the system is verified. The concrete design of the Linux system call monitoring scheme based on LLVM compiler code injection is the core of the invention, so the invention directly integrates the monitoring logic into the compiled executable file, and the compiled executable file is distributed to each system to complete the monitoring of the system call. When the untrusted program requests to execute sensitive operation, the system can timely discover and conduct security analysis or interception.
Drawings
FIG. 1 is a flow chart of a system call monitoring system based on compiler code injection;
FIG. 2 is a system call search flow diagram of compiler code injection;
fig. 3 formats the output flow chart.
Detailed Description
As shown in fig. 1 to 3, a Linux system call monitoring method based on compiler code injection includes: the method comprises the following steps of overall design of monitoring system modules, scheme flow design of the monitoring system, specific design of each module of the monitoring system, realization of a system call monitoring system, and testing of the system call monitoring system:
1. the monitoring system module is generally designed to construct a set of system call execution searching in real time when the application program runs, and provides related call information for security personnel, so that the security personnel can intuitively monitor the system call behavior of the application program. The main functional requirements of the system are therefore:
(1) The starting point of the monitoring system for all the system call functions in the application program can be searched, namely the discovery of the system call functions, because the C library functions still appear in the form of function interfaces in the LLVM IR stage, and the concrete implementation of the C library functions are still loaded in the subsequent linking process, the monitoring system takes the user API functions corresponding to the system call as the monitored objects.
(2) The real-time information of the calling function is collected, the information of the function in execution mainly comprises the information of function names, parameters, return values and the like, and the running information and the process related information of the executable file are required to be comprehensively collected, so that safety personnel can know the condition of the application program using the system call more clearly.
(3) The collected information is output in a user readable form, and because the parameters of the system call are mostly in the form of pointers, the pointed content cannot be permanently stored and the readability is not strong, so that the collected information needs to be output in the user readable form, and the log record of security personnel is facilitated.
The monitoring scheme is mainly used for realizing the whole monitoring effect by injecting in the IR optimization stage of the LLVM compiler, and the system is generally composed of three modules, namely a system call searching module, a call information collecting module and a monitoring information formatting output module.
The system call searching module is responsible for searching a monitoring target of the monitoring system, and the searching granularity is the level of a system call API function; the call information collection module is responsible for collecting all information and process related information in one function call; the monitoring information formatting output module is responsible for formatting output of information collected by the monitoring system in the process of program operation and provides the information for safety personnel to analyze.
2. The scheme flow design of the monitoring system is that the core processing time of the monitoring system is LLVM IR middle optimization stage of the LLVM compiler, all written processing logic is loaded into the LLVM IR middle optimization logic in a registered Pass mode, and an operation object is LLVM IR bit code file which can be divided into a search stage, an information collection stage and an information output stage according to the time sequence. These four phases also correspond to the four functional modules of the solution. The processed content is LLVM IR bit code file of the injection monitoring logic, and the LLVM project provided assembly tool set can be used for generating a final executable file from the injected bit code file.
3. The specific design method of each module of the monitoring system is to describe the specific operation details related to each module.
A. The system call searching module is a starting point of the whole monitoring system and is also a part of fine granularity division of the whole monitoring system. Here we set the system call API function as the monitoring start position according to the compilation flow characteristics of LLVM and the appearance form of the system call in LLVM IR. The main function of the system call search module is to analyze the function call instruction in the LLVM IR bit code file, judge whether the function call is a system call API function or not, and execute further monitoring logic according to the judging result.
(1) And analyzing the LLVM IR bit code file, and searching a function call instruction in the LLVM IR bit code file.
(2) If the function call is not the system call API function call, the call instruction is skipped, and the next function call instruction is continuously searched.
(3) If the function call belongs to the system call API function call, the method comprises the following steps:
(1) the called API function instance is obtained from the function call instruction.
(2) And transmitting the regulated API function instance to a system call information collection module.
(3) And after the information collection module finishes processing, continuing searching for the next calling instruction.
B. The calling information collection module is a core function of the whole system calling monitoring system and is a part for acquiring information when the system calls all programs. The main function of the call information collection module is to process the input system call API function instance and analyze the call number, parameters and the like of the system call.
Here, the information we need to obtain is largely divided into system call information, runtime API function information, and related process information. The system call information comprises a system call name and a system call number; the running API function information comprises parameter content and return value information which are transmitted by the system call; the related process information is the process number for initiating the system call. The main logic is as follows:
(1) And waiting for the system call searching module to enter the system call API function instance.
(2) Analyzing the system call information in the call function, taking out the system call and obtaining the corresponding system call number.
(3) And analyzing the runtime information in the calling function, and storing the input parameters and the return value.
(4) And injecting and acquiring process information of initiating the calling function, and storing the acquired process number.
Since the API function information and the related process information need to be obtained in real time when the program is running, we need to save the above two information needed to be obtained in the form of shape parameters, and wait for the program to obtain its true value when running.
C. The monitoring information formatting output module is a result display part of the monitoring system, is responsible for outputting the collected monitoring information in a form readable by a user from the program running time, is stored in a lasting mode, and is convenient for subsequent further analysis of the calling behavior of the application program system.
The formatted output of the monitoring information is mainly divided into a formatted text generation part and a file output part. In order to facilitate the analysis of the monitoring result by the user, we cannot directly store the acquired information in the form of metadata. Therefore, the type of the acquired information is needed first, corresponding formatted text is generated according to the type of the acquired information, and the formatted text is stored in a file form in a running catalog of the program in a lasting mode.
4. Implementation of system call monitoring system
4.1 implementation of System call search Module
Through the description of the module design part, the main function of the system call searching module is to search call instructions for initiating function call in LLVM IR bit code files and judge whether the called function belongs to a system call API function, if so, the called function instance is transmitted to the call information collecting module, and the subsequent flow is carried out; if not, the next instruction is searched until all instructions in the LLVM IR file are traversed. The main tasks of the system call search module are:
(1) traversing all instructions in LLVM IR bit code file
(2) Acquiring an instance of a function called in a function call instruction
(3) Judging whether the called function is a system call API function
Through the implementation method of each task, the specific implementation details of the system call module are introduced.
(1) For task 1
In LLVM, the objects processed by Pass are modules consisting of one or more LLVM IR bit codes. In the Pass document, the written Pass needs to be registered in the intermediate optimization tool opt by declaration PassManger Interface method. In this process, the Pass subclass implementing the run member method needs to be transferred into the addPass method of the ModulePassManager class, and the LLVM IR system function will enable the input of the opt tool, i.e. the Module to be processed, to be transferred into the run method.
Thus, the whole Module instance of the bit code file to be processed is obtained, and then the instruction in the bit code file is searched in the Module instance. Starting at line 607 of the/llvm/IR/module.h file, it is seen that an iterator is implemented inside the Module class that traverses its internal functions, so that for statements can be used to traverse the functions in the incoming Module. The same iterator to the internal basic block is implemented in the Function class, and the basic block class implements the iterator to the internal instrumentation class. Thus, all instructions in the LLVM IR bit code file can be traversed using a three-layer for loop, starting with the incoming Module instance.
In particular, since in the LLVM IR phase, some C standard library functions are also included, which appear only in declarative form in the IR and are not implemented by internal functions, these declarative functions need to be skipped by the Function: isclassification () method when traversing functions.
(2) For task 2
In LLVM IR, the Instruction class is a base class of all Instruction classes, and different types of instructions correspond to different Instruction class subclasses respectively. The instruction class subclass corresponding to the function call instruction call is a CallInst class. In Pass development, a dyn_cast () method is provided to make type conversion of an instance, return a converted class instance if conversion can succeed, and return a null pointer if failure. Thus, we can complete the identification of the function call Instruction by attempting a conversion to a CallInst instance for each instance of Instruction. After the callInst instance corresponding to the function call instruction is obtained, the callInst () method is used for obtaining the function instance called by the function call instruction.
(3) For task 3
After obtaining the Function instance of the called Function, the Function name of the Function can be obtained through the function:getName () method, the system call API Function is needed to be searched, and the Function names of the functions are known. Thus, a system call API name set is provided herein, and the system call API function list is imported into the name set at the time of monitoring system initialization using the MAP data structure. Searching the function name of the called function in the call information collection module, and if the search is successful, indicating that the called function is a system call API function, and transmitting the function instance to the call information collection module; otherwise, the target function is not searched, and the next instruction traversal is continued.
4.2. Realization of calling information collection module
The main function of the calling information collection module is to process and analyze the input system calling API function, collect and integrate all relevant information of interest to the user, and then output the location of the injection point of the module. The information that needs to be collected here is largely divided into system call information, runtime function information, and related process information. The main tasks of invoking the information collection module are therefore:
(1) And acquiring attribute information of the system call, including a system call name and a system call number.
(2) Runtime information of the system call API function is obtained, including the incoming parameters, the system call return value, and the call instruction pointer.
(3) And acquiring relevant process information for initiating the system call, wherein a SystemCallInfo class is defined in a call information collection module as a process number for executing the API function of the system call and used for representing information required to be collected for each system call, and all the information required to be collected is stored by using member variables, so that when a system call function instance is processed, all the information acquired after the processing is finished is stored in the SystemCallInfo instance.
(1) For task 1
In the system call search module, it is mentioned that a predefined system call API function set is stored using Map data structure, where the key name of the function set is the system call name and the key value is its corresponding system call number. Therefore, by searching key value pairs in the system call API function set, the system call name and the corresponding system call number used in the call can be obtained and respectively stored in the callName member variable and the callNo member variable of the SystemCallInfo instance.
(2) For task 2
In LLVM IR, the parameters of the function call are in the form of an opcode within the call instruction, so we need to traverse the opcode in the located call instruction to get all the parameters of the incoming system call API function. Two methods, namely op_begin () and op_end (), are provided in the function call class calllnst class, and the first operand form parameter and the last operand form parameter are returned respectively, and traversal is performed by taking the two returned values as boundaries, so that all runtime parameters of the system call function can be obtained. Because the parameters of different system call API functions are of different lengths, we use the vector data structure in C++ to declare the callArgs member variables storing the call parameters, and store the traversed parameter results in turn.
Meanwhile, in LLVM pass programming, the instruction instance pointer is also a subclass of Value, can be directly used as an instruction return Value pointer, and can directly store the positioned call instruction pointer into a callRet member variable of the SystemCallInfo instance as a system call return Value. Because all information of the task is stored in the form of a line parameter, logic for acquiring parameters needs to be injected into the source LLVM IR bit code during subsequent output, and therefore, an injection point pointer of a formatted output module needs to be stored in the information collecting module, and the next instruction pointer of an instruction where a system call is located is used as an injection point. The reason for choosing this position is two: firstly, after the system call is executed, the return value is determined, so that the injection point is required to be after the system call instruction; secondly, since the saved runtime information is the shape parameters, the injection point should be as far forward as possible in order to avoid the change of the shape parameters content in the subsequent execution as much as possible.
(3) For task 3
The designed system call monitoring is operated in a user mode of the Linux system, so that the process related information can be acquired only through a getpid () system call function, and in order to acquire the process of a call instruction where the system call API function is located as accurately as possible, the getpid () system call is injected into the call instruction where the system call API function is located to acquire the process number where the current call instruction is located.
In LLVM pass programming, the core function of injecting content into IR is IRBuilder, the CreateCall () method, the concrete step is to instantiate a Builder injection node instance of IRBuilder class first, there are two methods when Builder is initialized: one is to go into a basic block instance, which will point the injection point pointer to the end of the basic block; and secondly, an instance of Instruction is entered, which will cause the injected content to be inserted before the Instruction. The second initialization method is used here to cause the getpid () system call to be injected before the monitored system call instruction. CreateCall () receives an instance of the injection function of the FunctionCalle class, and other implementation details of the injection function call are as follows for the implementation of module task 2.
4.3. Implementation of monitoring information formatted output module
The main function of the monitoring information formatting output module is to acquire the shape parameter information recorded by the calling information collecting module in real time and store the shape parameter information in a format text lasting mode with high readability. The main tasks of the module are therefore:
(1) each data collected by the call information collection module is subjected to type judgment, and a proper formatted text is generated for storage
(1) Persistent storage of converted formatted text
(2) The system call name and the system call number saved in the call information collection module for task 1 are predefined, so the data types are known, namely a character string type and a shaping number type. The formatted text content herein is in a fixed format.
The data of the system call parameter, the system call return value and the process number of the process to which the system call instruction belongs are all value pointers in LLVM IR. In LLVM pass programming, all Value examples are subclasses of Value classes, the Value types of the examples can be obtained through a Value:: getType () method, and then a selection branch is constructed through Type:: isIntegerTy (), type:: isFloatTy (), type:: isPointerTy (), and other Type discrimination methods, and data pointers and corresponding formatted texts are mapped in sequence, so that the formatted text content is dynamically adapted to different data sizes and data types.
(3) For task 2
We choose to use the way of writing the file to persist the formatted readable text, where the implementation of the primary design is the injection of the file operation related function. The method is mainly divided into two parts, namely function statement injection and function call injection.
In LLVM pass programming, a core function used for declaring a C language standard library function is injected as a Module:: getOrInsertFunction () method, into which the function type of the declared function needs to be introduced. The Function type refers to signature information of the Function, and comprises parameter types and return value types of the Function, and the Function type is constructed through a function:get () method. We have injected statements in this module for fopen (), fprintf () and fclose () functions to facilitate call injection for file operations later.
The function call injection of the formatted output module is to inject an instruction for calling the log file operation function into the source LLVM IR bit code file. The operation of calling the information collection module to inject the getpid () function is the same as that of calling the information collection module, and the corresponding injection Builder is firstly stated for each injection point, wherein the calling injection point of the fopen () function is the starting position of the main function, and before the calling injection point of the fclose () function is the return instruction of the main function, the calling injection point of the fprintf () function is the information output injection point stored in the information collection module. Then the call to the file operation function is injected through the IRBuilder CreateCall () method. Unlike calling the getpid () function, the injection of the file operation function also requires providing the contents of parameters entered by the function call, which are placed in sequence into an ArrayRef < Value > type variable, which is entered as the second parameter of the CreateCall () function. The parameter content entered must be the value content already present in the LLVM IR bit code, and new parameter content can also be generated by the Module:: getOrInsertGlobal () method in a manner that inserts global variables.
5. System call monitoring system testing
This section will verify and test the functional and system loss aspects of the implementation system and briefly summarize the test results.
5.1. Environment configuration for monitoring system testing
Table 1 tester configuration information
(1) LLVM project
The core of the monitoring system realized in the method is a Pass component in an LLVM compiler, the source code version of the used LLVM project is 11.0, and the assorted front-end clip version is 6.0. Here we need not only the LLVM executables after compilation, but also the library files in the LLVM project to be used, so we cannot install LLVM from Ubuntu open source software source, but instead need the whole LLVM project source code. Since LLVM project versions iterate more and there is a greater difference between versions, it is recommended here to use the git tool to download the source code of the corresponding version from the LLVM official repository on the gitub.
(2)cmake
After the Pass writing is completed, the Pass needs to be constructed into a dynamic shared library file to be provided for an intermediate code optimization Pass loading tool in the LLVM to complete the injection of the monitoring logic, wherein the construction tool used by the user is cmake, and the version number is 3.20.
The cmake may use procedural description files, commonly named cmake lists.txt, to specify the link library files, compilation parameters, etc. used in the compilation process. Here we can build locally through Ubuntu open source software source installation or downloading the corresponding version source code in the open source library through the git tool.
5.2. Functional testing of monitoring system
(1) Construction of a monitoring System Pass
The Pass menu is mainly provided with include, lib, build, test folders. The method comprises the steps that partial header files used by the Pass are stored under the include, body source codes of the Pass are stored under the lib, all monitoring logic bodies are arranged in the include, the build is used for storing compiled output files, the compiled output files are mainly dynamic shared library files of the Pass, and a c-language test sample code for testing the monitoring effect is provided under a test menu.
The corresponding CMakeLists.txt files are provided under the Pass menu and the lib menu and are used for introducing library files provided by LLVM items, designating output positions and the like, so that the CMake tools can be used for constructing under the main catalogue. It will automatically generate a compilation file according to the provided cmakelist.txt configuration, after which the compilation of the required dynamic link library is completed using the make command. The dynamic shared library file finally generated is located at build/lib/libScout.
(2) The monitoring system achieves the effect
In order to test the monitoring effect of the monitoring system on the system call function, 10 system calls are selected as test sample programs for testing.
Table 2 System call monitoring test cases
First, the test source code needs to be compiled into a bit code IR file through the clang front end, and then the monitoring Pass provided herein is loaded using the opt tool provided in LLVM. In opt command we need to specify the registered Pass name (here, scout) and the dynamic link library location where the Pass is located. After loading, we can obtain the bit code IR file after Pass processing is completed, and lli tool can be used to interpret the processed bit code IR file in JIT form, or llc tool can be used to convert IR into assembly file and finally generate executable file for compiling and executing. Here we choose to compile execution because the test brk () function will have stack errors when using LLVM JIT execution.
After the program is executed, all monitoring information is output to monitor.log under the test sample list, so that the system call functions such as time, open, read, write, close used in the test sample list can be seen to be captured by the monitoring system and the running information is output to the log file monitor.log.
TABLE 3 log file monitor. Log content
/>

Claims (1)

1. A Linux system call monitoring method based on compiler code injection is characterized in that: comprising the following steps: the method comprises the steps of overall design of monitoring system modules, scheme flow design of the monitoring system, specific design of each module of the monitoring system, realization of a system call monitoring system, and test of the system call monitoring system, wherein the system call monitoring system is used for verifying and testing functions and system loss of the realized system; the method comprises the following steps:
1) The overall design of the monitoring system module is that a set of system calling execution is established for searching in real time when an application program runs, and calling related information is provided for security personnel, so that the security personnel can intuitively monitor the system calling behavior of the application program; the functional requirements of the system are therefore:
(1) The starting point of the monitoring system for all the system call functions in the application program can be searched, namely the discovery of the system call functions, because the C library functions still appear in the form of function interfaces in the LLVM IR stage, the concrete implementation of the C library functions are still loaded in the subsequent linking process, and therefore, the monitoring system takes the user API functions corresponding to the system call as the monitored objects;
(2) Collecting real-time information of a calling function, wherein the executing information of the function comprises function names, parameters and return value information, and the executing information and the process related information of an executable file need to be comprehensively collected;
(3) The collected information is output in a user readable form, and because the parameters of the system call are mostly in the form of pointers, the pointed content cannot be permanently stored and the readability is not strong, the collected information is required to be output in the user readable form, so that the log record of security personnel is facilitated;
The monitoring scheme is to realize the whole monitoring effect by injecting in the IR optimization stage of the LLVM compiler, and the system is totally composed of three modules, namely a system call searching module, a call information collecting module and a monitoring information formatting output module;
the system call searching module is responsible for searching a monitoring target of the monitoring system, and the searching granularity is the level of a system call API function; the call information collection module is responsible for collecting all information and process related information in one function call; the monitoring information formatting output module is responsible for formatting output of information collected by a monitoring system in the process of program operation and provides analysis;
2) The scheme flow design of the monitoring system is that the core processing time of the monitoring system is the LLVM IR middle optimization stage of the LLVM compiler, all written processing logics are loaded into the LLVM IR middle optimization logic in a registered Pass mode, operation objects are LLVM IR bit code files, the LLVM IR bit code files can be divided into a searching stage, an information collecting stage and an information outputting stage according to the time sequence, the four stages also correspond to four functional modules of the scheme, processed contents are LLVM IR bit code files injected into the monitoring logics, and an assembly tool set provided by LLVM projects is used for generating final executable files from the injected bit code files;
3) The specific design of each module of the monitoring system is to process the specific operation details related to each module;
A. the system call searching module is a starting point of the whole monitoring system and is also a part for dividing the fine granularity of the whole monitoring system; according to the compiling flow characteristics of the LLVM and the appearance form of the system call in the LLVM IR, setting the system call API function as a monitoring starting position; the function of the system call searching module is to analyze the function call instruction in the LLVM IR bit code file, judge whether the function call is a system call API function or not, and execute further monitoring logic according to the judging result;
(1) Analyzing the LLVM IR bit code file, and searching a function call instruction in the LLVM IR bit code file;
(2) If the function call is not the system call API function call, skipping the call instruction, and continuing to search for the next function call instruction;
(3) If the function call belongs to the system call API function call, the method comprises the following steps:
a. acquiring a called API function from the function call instruction;
b. transmitting the adjusted API function instance to a system call information collection module;
c. after the information collection module finishes processing, continuing searching for the next calling instruction;
B. The calling information collection module is a core function of the whole system calling monitoring system and is a part for acquiring information when the system calls all programs to run; the function of the call information collection module is to process the input system call API function and analyze the call number and parameters of the system call;
the information needed to be obtained is divided into system call information, runtime API function information and related process information; the system call information comprises a system call name and a system call number; the running API function information comprises parameter content and return value information which are transmitted by the system call; the related process information is a process number for initiating the system call; the logic is as follows:
(1) The system call searching module is used for transferring the system call API function;
(2) Analyzing system call information in the call function, taking out the system call and obtaining a corresponding system call number;
(3) Analyzing the runtime information in the calling function, and storing the input parameters and the return value;
(4) Injecting and acquiring process information of initiating a calling function, and storing the acquired process number;
because the API function information and the related process information are required to be acquired in real time when the program is running, the two information which need to be acquired are required to be stored in the form of a shape parameter, and the true value of the information is acquired when the program is running;
C. The monitoring information formatting output module is a result display part of the monitoring system, is responsible for outputting the collected monitoring information in a form readable by a user from the program running time, and is stored in a lasting manner, so that the subsequent further analysis of the calling behavior of the application program system is facilitated;
the formatted output of the monitoring information is divided into a formatted text generation part and a file output part; in order to facilitate analysis of monitoring results by users, firstly, the types of the acquired information are needed, corresponding formatted texts are generated according to the types of the acquired information, and the formatted texts are stored in an operation catalog of the program in a lasting mode in a file mode;
4) Implementation of a System Call monitoring System
4.1 The realization of the system call searching module is that a call instruction for initiating function call in the LLVM IR bit code file is searched through the function of the system call searching module, whether the called function belongs to a system call API function is judged, if so, the called function instance is transmitted to the call information collecting module, and the subsequent flow is carried out; if not, continuing searching for the next instruction until all instructions in the LLVM IR file are traversed; the tasks of the system call search module are:
a. Traversing all instructions in LLVM IR bit code file
b. Acquiring an instance of a function called in a function call instruction
c. Judging whether the called function is a system call API function
Through the realization of each task, the specific steps of the system calling module are implemented:
(1) For task a, in LLVM, the object processed by Pass is a Module composed of one or more LLVM IR bit codes, and in Pass file, the written Pass needs to be registered in an intermediate optimization tool opt by a declaration PassManger Interface method; in the process, a Pass subclass realizing a run member method is required to be transmitted into an addPass method of a Modulepass manager class, and an LLVM IR system function enables an opt tool to be input, namely a Module to be processed, to be transmitted into the run method;
so far, the whole Module real number of the bit code file to be processed is obtained, and then the instruction in the bit code file is searched in the Module real number; in the/llvm/IR/Module.h file, it can be seen that an iterator for traversing the functions in the Module class is implemented inside the Module class, so that the functions in the incoming Module can be traversed by using a for statement; the same Function class implements an iterator for the internal basic block, and the basic block implements an iterator for the internal instrumentation class; therefore, starting from the incoming Module real number, all instructions in the LLVM IR bit code file can be traversed by using three layers for loops;
In particular, since the LLVM IR phase also contains some C standard library functions that appear only in declarative form in the IR and are not implemented by internal functions, these declarative functions need to be skipped by the Function: iscycloration () method when traversing functions;
(2) For task b, in LLVM IR, the Instruction class is the base class of all Instruction classes, and different types of instructions correspond to different Instruction class subclasses respectively; the instruction class subclass corresponding to the function call instruction call is a CallInst class;
in Pass development, a dyn_cast () method is provided to perform real number type conversion, if conversion can succeed, a converted real-like number is returned, and if failure, a null pointer is returned; thus, identification of the function call Instruction is accomplished by attempting a transition to the CallInst instance for each instance of instrumentation; after the CallInst instance corresponding to the function call instruction is obtained, a CallInst () method is used for obtaining the function instance called by the function call instruction;
(3) For the task c, after obtaining the Function real number of the called Function, obtaining the Function name of the Function through a Function name () method, wherein the system call API Function is required to be searched, and the Function names of the functions are known; therefore, a system call API name set is set, and the MAP data structure is used for transmitting the system call API function list into the name set when the monitoring system is initialized; searching the function name of the called function in the call information collection module, and if the search is successful, indicating that the called function is a system call API function, and transmitting the function instance to the call information collection module; otherwise, the target function is not searched, and continuing the next instruction traversal;
4.2. The realization of the calling information collection module, wherein the function of the calling information collection module is to process and analyze the input system calling API function, collect and integrate all relevant information interested by the user, and then position the injection point of the output module; the information to be collected is divided into system call information, runtime function information and related process information; the task of invoking the information collection module is therefore:
a. acquiring attribute information of the system call, including a system call name and a system call number;
b. acquiring runtime information of a system call API function, wherein the runtime information comprises the input parameters, a system call return value and a call instruction pointer;
c. acquiring relevant process information for initiating the system call, defining a SystemCallInfo class in a call information collection module by a process number for executing the API function of the system call, wherein the SystemCallInfo class is used for representing information required to be collected for each system call, and saving all the information required to be collected by using member variables, so that when a system call function instance is processed, all the information acquired after the processing is finished is saved in the SystemCallInfo instance;
(1) For task a, storing a predefined system call API function set in a system call searching module by using a Map data structure, wherein the key name of the function set is a system call name and the key value is a corresponding system call number; therefore, by searching key value pairs in the system call API function set, the system call name and the corresponding system call number used in the call can be acquired and respectively stored in the callName member variable and the callNo member variable of the SystemCallInfo instance;
(2) For task b, in LLVM IR, the parameters of function call appear in the form of the operation code in the call instruction, so that the operation code in the located call instruction needs to be traversed to obtain all the parameters of the incoming system call API function; two methods of op_begin () and op_end () are provided in the function call class calllnst class, the first operand form parameter and the last operand form parameter are returned respectively, and the two return values are used as boundaries for traversing, so that all runtime parameters of the system call function can be obtained; because the parameters of different system call API functions are of different lengths, a vector data structure in C++ is used for declaring callArgs member variables for storing call parameters, and traversed parameter results are sequentially stored;
meanwhile, in LLVM pass programming, the instruction instance pointer is also a subclass of Value, can be directly used as an instruction return Value pointer, and can directly store the positioned call instruction pointer as a system call return Value into a callRet member variable of the SystemCallInfo instance; because all information of the task is stored in the form of a line parameter, logic for acquiring parameters needs to be injected into a source LLVM IR bit code during subsequent output, an injection point pointer of a formatted output module also needs to be stored in an information collecting module, and the next instruction pointer of an instruction where a system call is positioned is used as an injection point; the reason for choosing this position is two: firstly, after the system call is executed, the return value is determined, so that the injection point is required to be after the system call instruction; secondly, because the stored runtime information is the shape parameters, in order to avoid the change of the shape parameters in the subsequent execution as far as possible, the injection point should be selected as far as possible;
(3) For the task c, the system call monitoring is operated in a user mode of the Linux system, so that the process related information can be acquired only through a getpid () system call function, and in order to acquire the process of a call instruction where the system call API function is located as accurately as possible, the getpid () system call is injected into the call instruction where the system call API function is located to acquire the process number where the current call instruction is located;
in LLVM pass programming, the core function of injecting content into IR is IRBuilder, the CreateCall () method, the concrete steps are that the real number of the Builder injection node of an IRBuilder class is instantiated first, and two methods exist during the initialization of the Builder: firstly, a BasicbLock real number is transmitted, which leads the injection point pointer to point to the tail of the basic block; secondly, an instance of Instruction is entered, which will cause the injected content to be inserted before the Instruction; using a second initialization method, causing the getpid () system call to be injected before the monitored system call instruction; the CreateCall () receives the real number of the injection function of a function Calle class, and other implementation details of the injection function call are shown in the implementation of the next module task 2;
4.3. The monitoring information formatting output module is realized, and the function of the monitoring information formatting output module is to acquire the shape parameter information recorded by the calling information collecting module in real time and store the shape parameter information in a format text with strong readability in a lasting manner; the task of the module is therefore:
judging the type of each piece of data acquired by the call information collecting module, and generating and storing a proper formatted text;
performing persistent storage on the converted formatted text;
(1) The system call name and the system call number saved by the call information collection module for the task a are predefined, so that the data types of the task a are known and are respectively a character string type and a shaping number type; the formatted text content herein is in a fixed format;
the data of the system call parameter, the system call return value and the process number of the process to which the system call instruction belongs are all value pointers in LLVM IR; in LLVM pass programming, all Value instances are subclasses of Value, the Value Type of the instance is obtained through a Value:: getType () method, and then a selection branch is constructed through a Type:: isIntegerTy (), type:: isFloatTy (). The Type discrimination method, and the data pointer and the corresponding formatted text are mapped in sequence, so that the formatted text content dynamically adapts to different data sizes and data types;
(2) For the task b, the formatted readable text is subjected to persistent storage in a mode of writing the file, so that the injection of the file operation related function is realized, and the file operation related function is divided into two parts, namely function statement injection and function call injection;
in LLVM pass programming, a core function used for declaring a C language standard library function is injected as a Module, namely a getOrInsertFunction () method, and the function type of the declared function needs to be transmitted into the method; the Function type refers to signature information of the Function, comprises parameter types and return value types of the Function, and is constructed by a Function () method; statement of fopen (), fprintf () and fclose () functions are injected in the module to facilitate call injection of file operation later;
the function call injection of the formatted output module is to inject an instruction for calling the log file operation function into the source LLVM IR bit code file; the method comprises the steps of firstly, declaring a corresponding injection Builder for each injection point, wherein the calling injection point of the fopen () function is the starting position of the main function, and before the calling injection point of the fclose () function is a return instruction of the main function, the calling injection point of the fprintf () function is the information output injection point stored in the information collection module; then, the call to the file operation function is injected through an IRBuilder CreateCall () method; unlike calling the getpid () function, the injection of the file operation function also needs to provide the parameter content that is imported by the function call, and these parameters are sequentially put into an ArrayRef < Value > type variable as the second parameter of the CreateCall () function; the parameter content which is input must be the existing value content in LLVM IR bit code, and a new parameter content is generated by a mode of inserting global variables through a Module, getOrInsertGlobal () method;
5) The system calls the test of the monitoring system, and the function and the system loss aspect of the implementation system are verified and tested;
5.1. the environment configuration of the monitoring system test, CPU Intel (R) Core (TM) i5 CPU @2.70GHz,
(1) The LLVM project realizes that the monitoring system core is a Pass component in an LLVM compiler, the source code version of the used LLVM project is 11.0, the assorted front-end clang version is 6.0, and the LLVM executable program after compiling is needed, and library files in the LLVM project are needed, so that the source code of the whole LLVM project is needed; because LLVM project versions iterate more, the difference among versions is also larger, so that the use of a git tool to download source codes of corresponding versions from LLVM official warehouses on the gitub is recommended;
(2) The method comprises the steps of (1) constructing a cmake as a dynamic shared library file to be provided for an intermediate code optimization Pass loading tool in the LLVM to finish injection of monitoring logic, wherein the construction tool used in the method is the cmake, and the version number is 3.20;
the cmake uses procedural description files named cmake lists.txt to specify the link library files, compilation parameters used in the compilation process; the method comprises the steps of locally constructing through Ubuntu open source software source installation or downloading corresponding version source codes in an open source library through a git tool;
5.2. Functional testing of monitoring system
(1) The construction of a monitoring system Pass, wherein a include, lib, build, test folder is arranged under the Pass menu; partial header files used by the Pass are stored under the include, main body source codes of the Pass are stored under the lib, all monitoring logic main bodies are arranged in the include, build is used for storing compiled output files, the compiled dynamic shared library files are the Pass, and a c-language test sample code for testing the monitoring effect is provided under the test menu;
corresponding CMakeLists.txt files are provided under the Pass menu and the lib menu and are used for introducing and designating output positions of library files provided by LLVM items, so that the library files are constructed under the main catalog by using a cmake tool; it will automatically generate a compilation file according to the provided cmakelist.txt configuration, after which compilation of the required dynamic link library is completed using the make command; the finally generated dynamic shared library file is positioned in build/lib/libScout.
(2) The monitoring system realizes the effect, and in order to test the monitoring effect of the monitoring system on the system call function, 10 system calls are selected as test sample programs for testing;
Firstly, compiling test source codes into bit code IR files through a clang front end, and loading monitoring Pass provided herein by using opt tools provided in LLVM; the registered Pass names, here the positions of the dynamic link libraries where the scout and Pass are located, need to be specified in the opt command; after loading is finished, a bit code IR file after Pass processing is finished is obtained, at the moment, the processed bit code IR file is interpreted and executed in a JIT mode by using a lli tool, IR is converted into an assembly file by using a llc tool, and an executable file is finally generated for compiling and executing; since the test brk () function may have a stack error when using LLVM JIT execution, compilation execution is selected;
after the program is executed, all monitoring information is output to monitor.log under the test sample catalog, and time, open, read, write, close system calling functions used in the test sample are captured by the monitoring system and run-time information of the time, open, read, write, close system calling functions is output to log file monitor.log.
CN202111027217.8A 2021-09-02 2021-09-02 Linux system call monitoring method based on compiler code injection Active CN113835952B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111027217.8A CN113835952B (en) 2021-09-02 2021-09-02 Linux system call monitoring method based on compiler code injection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111027217.8A CN113835952B (en) 2021-09-02 2021-09-02 Linux system call monitoring method based on compiler code injection

Publications (2)

Publication Number Publication Date
CN113835952A CN113835952A (en) 2021-12-24
CN113835952B true CN113835952B (en) 2024-03-15

Family

ID=78962070

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111027217.8A Active CN113835952B (en) 2021-09-02 2021-09-02 Linux system call monitoring method based on compiler code injection

Country Status (1)

Country Link
CN (1) CN113835952B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230350782A1 (en) * 2022-04-28 2023-11-02 Twilio Inc. Data logging for api usage analytics

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107194245A (en) * 2017-05-12 2017-09-22 南京大学 A kind of funcall remodeling method isolated for linux kernel page table
AU2017239628A1 (en) * 2017-10-06 2019-05-02 Dynasoft Pty Ltd Dynamic Software System
CN109918903A (en) * 2019-03-06 2019-06-21 西安电子科技大学 A kind of program non-control attack guarding method based on LLVM compiler

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9684492B2 (en) * 2015-09-28 2017-06-20 Semmle Limited Automatic determination of compiler configuration

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107194245A (en) * 2017-05-12 2017-09-22 南京大学 A kind of funcall remodeling method isolated for linux kernel page table
AU2017239628A1 (en) * 2017-10-06 2019-05-02 Dynasoft Pty Ltd Dynamic Software System
CN109918903A (en) * 2019-03-06 2019-06-21 西安电子科技大学 A kind of program non-control attack guarding method based on LLVM compiler

Also Published As

Publication number Publication date
CN113835952A (en) 2021-12-24

Similar Documents

Publication Publication Date Title
US5956479A (en) Demand based generation of symbolic information
CA2244293C (en) A process and apparatus for tracing software entities in a distributed system
US8122440B1 (en) Method and apparatus for enumerating external program code dependencies
US20120110560A1 (en) Data type provider for a web semantic store
CN111782513B (en) Satellite ground universal automatic test method based on DLL
CN106227573A (en) Function call path extraction method based on controlling stream graph
CN106371997A (en) Code checking method and device
US8776010B2 (en) Data type provider for a data store
CN114860216B (en) C program dynamic tracking method and system for integrated development environment
CN113553035A (en) Design and construction method of universal front-end UI component library
CN113835952B (en) Linux system call monitoring method based on compiler code injection
Khatchadourian et al. [Engineering Paper] A Tool for Optimizing Java 8 Stream Software via Automated Refactoring
CN116302930A (en) Application testing method and device
WO2013184952A1 (en) Method for automatic extraction of designs from standard source code
Moise et al. Extracting and representing cross-language dependencies in diverse software systems
CN111857681B (en) Software-defined key function positioning and extracting method of C + + system
CN111880804A (en) Application program code processing method and device
CN109683900B (en) Universal upper computer symbol searching and analyzing method for ELF file debugging information
Li et al. Dynamic symbolic execution for polymorphism
Köhler et al. Automated refactoring to reactive programming
EP3816814A1 (en) Crux detection in search definitions
Boyd et al. Graphical visualization of compiler optimizations
Zhang et al. A declarative approach for Java code instrumentation
Zhang et al. Java source code static check eclipse plug-in based on common design pattern
Zhou et al. TC4JPF: Using Trace Compass to Visualize JPF Traces

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant