CN116126305A - Program segmentation system for data stream - Google Patents

Program segmentation system for data stream Download PDF

Info

Publication number
CN116126305A
CN116126305A CN202310175547.4A CN202310175547A CN116126305A CN 116126305 A CN116126305 A CN 116126305A CN 202310175547 A CN202310175547 A CN 202310175547A CN 116126305 A CN116126305 A CN 116126305A
Authority
CN
China
Prior art keywords
program
memory
module
log
function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310175547.4A
Other languages
Chinese (zh)
Inventor
韩皓
赵晨余
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN202310175547.4A priority Critical patent/CN116126305A/en
Publication of CN116126305A publication Critical patent/CN116126305A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • G06F8/31Programming languages or programming paradigms
    • G06F8/315Object-oriented languages
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • G06F8/31Programming languages or programming paradigms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/40Transformation of program code
    • G06F8/41Compilation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/445Program loading or initiating
    • G06F9/44521Dynamic linking or loading; Link editing at or after load time, e.g. Java class loading
    • G06F9/44526Plug-ins; Add-ons
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a program segmentation system oriented to data flow, which is used for reinforcing a C/C++ program at low cost so as to resist data flow attack. The system comprises: program instrumentation module, log analysis module, program rewrite module. All modules are implemented under the LLVM framework as compiler plug-ins. The method comprises the steps that an original program is firstly inserted through a program inserting module, an operation log is obtained through repeated operation of the inserted program, a log analysis module analyzes the original program and the operation log, a data stream is positioned, a segmentation scheme is generated according to the data stream, auxiliary information required by a rewritten program is obtained, and finally the program rewritten module rewrites the original program according to the segmentation scheme and the auxiliary information, so that a segmented multi-process program with unchanged functions is obtained.

Description

Program segmentation system for data stream
Technical Field
The invention relates to the fields of system safety and program segmentation, in particular to a program segmentation system oriented to data flow.
Background
The current program memory attack topics can be divided into control flow attacks and data flow attacks. The control flow attack is to reside specific data in the program memory through input, and when the program executes to the corresponding loophole, the data can cause the program to execute wrong function call or enter a path which cannot be reached, so as to execute certain privileged operations. However, as the detection and defense means for the control flow attack have become more and more perfect, the control flow attack has become more and more difficult to implement, and in this context, the data flow attack has the biggest feature of not changing the control flow of the program, so that the control flow attack is difficult to detect by the traditional method, and the attack mode generally uses the loopholes in the program and normal data flows (the data flows are abstract concepts describing the movement of data between instructions when the program runs) to construct the data flows which should not exist, and the data flow attack is mostly used for stealing sensitive data in the program, but can also affect the privileged operation by changing the parameters of the function call. Data flow attacks are typically implemented through the input of a program.
The program dividing technology is a program reinforcing technology, which divides a single block program into a plurality of program blocks, and the program blocks are arranged on different machines or run in different processes, and process or network isolation exists between the different program blocks, and each program block only contains data necessary for running. Since the data flow of each program block is limited, the data flow attack is difficult to implement, so that sensitive data can be prevented from being stolen by an attacker or privileged operations can be utilized by the attacker.
The current program segmentation method generally segments sensitive data or privileged operations, a user manually marks the sensitive data or privileged operations to be protected, and a program segmentation algorithm segments an original program into a plurality of program blocks according to the sensitive data or privileged operations, wherein some program blocks contain parts to be protected, and processes or network isolation is carried out among the program blocks. Most program splitting methods split the program into only two blocks, one of which contains all the content to be protected, without further splitting. Meanwhile, most program division methods require a user to manually mark sensitive data or privileged operations to be protected, and also require the user to manually rewrite the program after determining the division scheme.
Disclosure of Invention
The invention aims to solve the technical problem of providing a program segmentation system facing to data flow, which can effectively resist data flow attack by a segmented program, and comprises a program instrumentation module, a log analysis module and a program rewrite module;
the program instrumentation module is used for instrumentation of an original program, and inserting instructions (LLVM IR instructions) into program codes so that executable programs obtained through compiling and assembly can generate operation logs required by subsequent analysis at the operation time;
the log analysis module is used for combining an original program with the running log to analyze, positioning a data stream, generating a segmentation scheme through stain analysis, and providing needed auxiliary information for the program rewriting module;
the program rewriting module rewrites the original program according to the segmentation scheme and the auxiliary information, so that the original program becomes a multi-process program, different segmented program blocks run in different processes, and the program functions remain unchanged.
The program instrumentation module, the log analysis module and the program rewriting module are realized as compiler plug-ins under the LLVM framework, and can be used by opt commands of the LLVM (LLVM is a framework system of a framework compiler) or commands with the same function provided by a front-end compiler, and the analysis and the processing of the program are performed under the condition that the program is compiled into LLVM IR.
The running log is a file composed of a large number of log entries describing the running process of the program, and contains the following information:
memory addresses for executing and destroying the memory reading, writing, creating and destroying instructions;
function call and return, including the execution of the call instruction, the pointer type parameter and the address of the pointer type return value, and for calling the API function, also including whether the memory referenced by the pointer type parameter changes during the call;
memory addresses of global variables, command line parameters (main function parameters in the user program);
system calls during program execution with respect to input and file descriptors, and associated file descriptors.
The locating data flow is that when the memory referenced by the parameter changes during the calling of the API function or the API function has a return value, and the system call related to the input is recorded during the calling, the changed memory and the return value start the data flow.
The stain analysis is to take a data flow starting point as a stain source, take a non-access instruction for modifying parameters as harmless treatment, do not have a clear convergence point, accurately analyze in the coverage area of an operation log, obtain an input data flow required by executing each instruction according to the operation log and the data dependence in a basic block (in program analysis, if one instruction uses a value defined by the other instruction, the data dependence exists between the two instructions), perform conservative analysis outside the coverage area of the operation log, obtain all possible required input data flows when executing each instruction according to a program dependence graph, and automatically complete the analysis process.
The accurate analysis includes: the method comprises the steps of analyzing logs generated by each process according to the log generation sequence, establishing a simulation memory and a simulation call stack during the process, establishing, accessing and destroying the simulation memory according to the memory and pointer structures when the log reappears program execution, and recording program input contained in the memory and program input required by instruction execution, so that the program input required by each function execution is obtained.
The segmentation scheme includes: the original program is divided into more than two program blocks at the function level, each program block contains only the least program input for ensuring the program to execute correctly, the same function can be divided into more than two program blocks according to the program input required in execution, and the functions using the same file descriptor are necessarily divided into the same program block.
The program rewriting module rewrites the original program according to the segmentation scheme and the auxiliary information, and the rewritten content comprises:
adding inter-process communication based on pipeline communication provided by an operating system;
based on inter-process communication, adding codes to realize inter-process synchronous execution of system call, and when any program block executes the system call related to file descriptor, user permission and process creation and destruction, other program blocks execute the same system call together;
based on inter-process communication, adding codes to realize inter-process transmission of global variables, and updating according to modification conditions of the global variables in other program blocks when the program blocks need to access a certain global variable;
adding serialization and anti-serialization codes of function parameters, and adding cross-process function calling codes based on inter-process communication;
when the program enters the main function, more than two processes are created to correspond to the program blocks in the segmentation scheme, wherein one process executes the main function, and the other processes wait for inter-process communication.
The serialization and deserialization means that transmission of serialization parameters is performed before and after cross-process function call, and the sender serializes the content to be sent into a byte stream and then sends the byte stream, wherein the byte stream comprises the parameter itself, a memory directly or indirectly pointed by pointer type parameters, and pointers existing in the memory, and the receiver deserializes the byte stream to restore the formatted byte stream into the parameter, the memory and the pointers, and corrects the pointers which change due to different memory spaces.
The system of the invention is mainly characterized in that:
(1) For the C/C++ program, designing and realizing under the LLVM framework;
(2) Aiming at the characteristic that the data flow attack is implemented through program input, the program is divided into a plurality of blocks according to the program input, so that different inputs are separated as far as possible.
(3) The program is automatically segmented and rewritten without manual labeling or rewriting.
According to the method, the input of a program is automatically determined through the calling condition of an API function (provided by an operating system or a third party function library) and related system call during program running, then the input data stream required by each function execution is obtained through the stain analysis of dynamic and static combination according to the program memory operation recorded during running and the original program code, the original program is divided into a plurality of program blocks according to the information, each program block contains as few input data streams as possible, meanwhile, the auxiliary information required during rewriting is obtained, and finally the original program is rewritten into a multi-process program and the function is ensured to be unchanged.
In order to achieve the above object, the present invention provides a program segmentation system for data stream.
The invention is different from the existing program segmentation method and tool in that the invention does not require a user to manually mark, but automatically deduces the position of the input data according to the operation log, and does not require the user to manually rewrite the program after the segmentation scheme is obtained, but automatically completes the rewriting. In addition, the invention can divide the program into a plurality of blocks instead of only two blocks, and distinguish different contexts during division, and can provide finer granularity division than the existing program division method, and the specific beneficial effects are as follows:
(1) The pertinence is strong: aiming at the data flow attack, the method needs to be launched by means of program input and the characteristics of normal data flows in the program need to be utilized, so that different input data flows are separated as far as possible, and the data flows which can be utilized by an attacker are effectively reduced.
(2) Dividing a plurality of blocks: splitting the program into blocks provides finer granularity protection, even if one block is attacked, the sensitive data that an attacker can steal and the privileged operations of the access can be very limited.
(3) The performance influence is small: although the performance of the split program is degraded due to inter-process communication, the degradation degree is limited.
(4) The cost is low: the invention can be applied at extremely low cost by only knowing the use of the program without manual marking and manual program rewriting by a user, and program version iteration does not cause additional labor cost.
Drawings
FIG. 1 is a block diagram of a data flow oriented program partitioning tool;
FIG. 2 is a log structured schematic;
FIG. 3 is a schematic diagram of an analysis process of the parameter structure;
FIG. 4 is a schematic diagram of parameter serialization;
fig. 5 is a block call relationship diagram after segmentation.
Detailed Description
The invention provides a program segmentation system for data flow, which comprises a program instrumentation module, a log analysis module and a program rewrite module;
the program instrumentation module is used for instrumentation of an original program, and inserting instructions (LLVM IR instructions) into program codes so that executable programs obtained through compiling and assembly can generate operation logs required by subsequent analysis at the operation time;
the log analysis module is used for combining an original program with the running log to analyze, positioning a data stream, generating a segmentation scheme through stain analysis, and providing needed auxiliary information for the program rewriting module;
the program rewriting module rewrites the original program according to the segmentation scheme and the auxiliary information, so that the original program becomes a multi-process program, different segmented program blocks run in different processes, and the program functions remain unchanged.
The program instrumentation module, the log analysis module and the program rewriting module are realized as compiler plug-ins under the LLVM framework, and can be used by opt commands of the LLVM (LLVM is a framework system of a framework compiler) or commands with the same function provided by a front-end compiler, and the analysis and the processing of the program are performed under the condition that the program is compiled into LLVM IR.
The running log is a file composed of a large number of log entries describing the running process of the program, and contains the following information:
memory addresses for executing and destroying the memory reading, writing, creating and destroying instructions;
function call and return, including the execution of the call instruction, the pointer type parameter and the address of the pointer type return value, and for calling the API function, also including whether the memory referenced by the pointer type parameter changes during the call;
memory addresses of global variables, command line parameters (main function parameters in the user program);
system calls during program execution with respect to input and file descriptors, and associated file descriptors.
The locating data flow is that when the memory referenced by the parameter changes during the calling of the API function or the API function has a return value, and the system call related to the input is recorded during the calling, the changed memory and the return value start the data flow.
The stain analysis is to take a data flow starting point as a stain source, take a non-access instruction for modifying parameters as harmless treatment, do not have a clear convergence point, accurately analyze in the coverage area of an operation log, obtain an input data flow required by executing each instruction according to the operation log and the data dependence in a basic block (in program analysis, if one instruction uses a value defined by the other instruction, the data dependence exists between the two instructions), perform conservative analysis outside the coverage area of the operation log, obtain all possible required input data flows when executing each instruction according to a program dependence graph, and automatically complete the analysis process.
The accurate analysis includes: the method comprises the steps of analyzing logs generated by each process according to the log generation sequence, establishing a simulation memory and a simulation call stack during the process, establishing, accessing and destroying the simulation memory according to the memory and pointer structures when the log reappears program execution, and recording program input contained in the memory and program input required by instruction execution, so that the program input required by each function execution is obtained.
The segmentation scheme includes: the original program is divided into more than two program blocks at the function level, each program block contains only the least program input for ensuring the program to execute correctly, the same function can be divided into more than two program blocks according to the program input required in execution, and the functions using the same file descriptor are necessarily divided into the same program block.
The program rewriting module rewrites the original program according to the segmentation scheme and the auxiliary information, and the rewritten content comprises:
adding inter-process communication based on pipeline communication provided by an operating system;
based on inter-process communication, adding codes to realize inter-process synchronous execution of system call, and when any program block executes the system call related to file descriptor, user permission and process creation and destruction, other program blocks execute the same system call together;
based on inter-process communication, adding codes to realize inter-process transmission of global variables, and updating according to modification conditions of the global variables in other program blocks when the program blocks need to access a certain global variable;
adding serialization and anti-serialization codes of function parameters, and adding cross-process function calling codes based on inter-process communication;
when the program enters the main function, more than two processes are created to correspond to the program blocks in the segmentation scheme, wherein one process executes the main function, and the other processes wait for inter-process communication.
The serialization and deserialization means that transmission of serialization parameters is performed before and after cross-process function call, and the sender serializes the content to be sent into a byte stream and then sends the byte stream, wherein the byte stream comprises the parameter itself, a memory directly or indirectly pointed by pointer type parameters, and pointers existing in the memory, and the receiver deserializes the byte stream to restore the formatted byte stream into the parameter, the memory and the pointers, and corrects the pointers which change due to different memory spaces.
Examples
In order to enable those skilled in the art to better understand the technical solution of the present invention, the present invention will be described in further detail below by taking an open-source HTTP server thttpd as an example in conjunction with the specific embodiments.
The embodiment of the invention provides a program segmentation system facing a data stream, which comprises 3 LLVM Pass, and the program segmentation system respectively corresponds to a program instrumentation module, a log analysis module and a program rewrite module. The structure of which is shown in figure 1. When the system of the invention is used, a user firstly compiles an original program into LLVM IR and connects the LLVM IR into a single file, and then invokes a program instrumentation module processing program to compile the program with instrumentation. Executing the program with the instrumentation for multiple times to obtain an operation log, and then calling a log analysis module to analyze the original program and the operation log to obtain a program segmentation scheme and auxiliary information. And finally, calling a program rewriting module, and obtaining a segmented multi-process program according to the program segmentation scheme and the auxiliary information rewriting program obtained in the previous step.
Because the tool is realized under the LLVM framework, and needs to acquire the system call information related to the running of the program and needs to be based on inter-process communication when the program is automatically rewritten, the tool is realized under the Linux system and supports the C/C++ language.
(1) Program pile driver
First, the Clang is used to compile an original program, and "-flto-fuse-ld=lld-Wl, -save-temp" is added on the basis of the original command line parameters to obtain a complete LLVM IR program (code in the program is converted into LLVM IR form, and sentences in a high-level language are converted into instructions of corresponding functions). And then, the instrumentation module is called through an opt command to instrumentation the LLVM IR program, and an "-lpthread" is added on the basis of the original command line to continuously compile the instrumented program so as to obtain the instrumented executable program.
The instrumentation module inserts code output logs before and after load, store, alloca, call instructions and all API function (e.g., malloc, fopen, memset, etc.) calls, adds code usage ptrace monitoring system calls at the main function entry and outputs the logs. The log generated by all processes during the program execution is output to the same file in units of blocks, and the pid of the process is recorded, and the structure thereof is shown in fig. 2.
And finally, running the program after pile insertion for multiple times to obtain a complete running log.
(2) Log analysis
And calling a log analysis module through an opt command to analyze the program and the running log, so as to obtain a segmentation scheme and auxiliary information required by rewriting the program.
The log analysis module firstly reorganizes the logs according to the processes, connects the log according to the process creation positions, initializes the simulation memory, and respectively analyzes each process according to the log generation sequence. The propagation range of the input data stream is determined from the log by dynamic smear analysis, and the parts within the basic block and not running to are complemented with static smear analysis. After log analysis is finished, an input data stream set required by each function execution is obtained, the functions with the same set are respectively placed in the same program block to obtain an original segmentation scheme, and limitation caused by a file descriptor, a complex global variable and a complex parameter is introduced to obtain a final segmentation scheme.
In the above analysis process, the parameter structure of each function call is analyzed and stored in a tree structure, wherein the points and edges respectively correspond to the memory and the pointer, and for the memory record type information (heap/stack/static memory) and length information, for the pointer record position information (the offset of the position of the pointer relative to the starting address of the memory segment where the pointer is located) and the pointing information (whether the address pointed by the pointer is the starting address of the memory segment where the pointer is pointed or not). Firstly, obtaining a preliminary parameter structure according to type information in IR, then supplementing according to an actual memory structure recorded in a log, supplementing the type information of a memory and correcting errors caused by type conversion, and finally finding out unused parts after the transmission through static analysis so as to avoid unnecessary transmission. The analysis is shown in fig. 3.
(3) Program rewrite
And (3) rewriting the original LLVM IR program according to the segmentation scheme obtained by the log analysis module by using an opt command calling program rewriting module to obtain the segmented LLVM IR program.
The program rewriting module adds a plurality of tool functions to realize pipeline communication, global variable transmission, program block initialization and support part of system call, then generates cross-process function call codes according to a segmentation scheme and replaces original local function call, adds inter-process transmission codes of global variables for instructions accessing global variables, and adds corresponding inter-process synchronous execution codes for system call needing synchronous execution of all program blocks. The parameter serialization and deserialization required by the cross-process function call are generated according to the parameter structure obtained during log analysis, and the parameter serialization when the function receiver in fig. 3 is rewritten into the cross-process function call is shown in fig. 4.
And continuing to compile the segmented LLVM IR program by using Clang to obtain an executable program.
(4) Segmentation results
Taking thttpd as an example, the thttpd includes 6 main modules, namely an IO multiplexing module represented by fdwatch function, a match module for cgi symbol matching, an mmc module related to file caching and mapping, a libhttpd module for processing http requests, a timer module including a timer, and a main module for cycling and calling the modules. Wherein the example does not turn on cgi related functions, so the match module is not invoked.
thttpd produced 7 blocks after segmentation. Block 1 is the block with the largest volume and comprises main function and a series of functions for processing http requests in libhttpd; block 2 is a burst_args function that processes command line parameters; block 3 is the read_config function of the read configuration file; block 4 contains fdwatch, log generation, and some other functional functions; the 5 th block is an auth_check and auth_check2 function which reads the password file and performs matching; the remaining two blocks are some functional functions that are substantially uncorrelated with the input data. The call relationships for the 5 main modules can be approximately described as fig. 5.
Only the modules containing sensitive information are the modules where the auth_check and the auth_check2 are located, and the modules only have the two functions and a small number of other tool functions, so that unless vulnerabilities occur in the few functions, data of the module where the main function is located can only be obtained through data stream attack initiated by a network, and the user password information cannot be threatened.
After segmentation, the average time of the same single access by the thttpd processing is increased from 14ms to 17ms after the start-up is completed, and the average time is increased by 21.42%.
In a specific implementation, the application provides a computer storage medium and a corresponding data processing unit, where the computer storage medium is capable of storing a computer program, and when the computer program is executed by the data processing unit, the computer program can run part or all of the steps in the summary of the program segmentation system for data flow and the embodiments provided by the invention. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), a random-access memory (random access memory, RAM), or the like.
It will be apparent to those skilled in the art that the technical solutions in the embodiments of the present invention may be implemented by means of a computer program and its corresponding general hardware platform. Based on such understanding, the technical solutions in the embodiments of the present invention may be embodied essentially or in the form of a computer program, i.e. a software product, which may be stored in a storage medium, and include several instructions to cause a device (which may be a personal computer, a server, a single-chip microcomputer MUU or a network device, etc.) including a data processing unit to perform the methods described in the embodiments or some parts of the embodiments of the present invention.
The present invention provides a program segmentation system for data flow, and the method and the way for implementing the technical scheme are numerous, the above is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several improvements and modifications can be made without departing from the principle of the present invention, and these improvements and modifications should also be regarded as the protection scope of the present invention. The components not explicitly described in this embodiment can be implemented by using the prior art.

Claims (9)

1. The program segmentation system for the data stream is characterized by comprising a program instrumentation module, a log analysis module and a program rewrite module;
the program instrumentation module is used for instrumentation of an original program, and inserting instructions into program codes so that executable programs obtained through compiling and assembly can generate operation logs required by subsequent analysis when the executable programs run;
the log analysis module is used for combining an original program with the running log to analyze, positioning a data stream, generating a segmentation scheme through stain analysis, and providing needed auxiliary information for the program rewriting module;
the program rewriting module rewrites the original program according to the segmentation scheme and the auxiliary information, so that the original program becomes a multi-process program, different segmented program blocks run in different processes, and the program functions remain unchanged.
2. The system of claim 1, wherein the program instrumentation module, log analysis module, and program rewrite module are implemented as compiler plug-ins under the LLVM framework, and are capable of analyzing and processing programs with the programs compiled into LLVM IR, through use of opt commands of the LLVM or commands provided by a front-end compiler with equivalent functionality.
3. The system of claim 2, wherein the travel log is a file comprised of log entries describing the program's travel process, including the following information:
memory addresses for executing and destroying the memory reading, writing, creating and destroying instructions;
function call and return, including the execution of the call instruction, the pointer type parameter and the address of the pointer type return value, and for calling the API function, also including whether the memory referenced by the pointer type parameter changes during the call;
memory addresses of global variables, command line parameters;
system calls during program execution with respect to input and file descriptors, and associated file descriptors.
4. A system according to claim 3, wherein the locating data stream means that when the memory referenced by the parameter changes during the call of the API function or the API function has a return value, and the system call is recorded during the call with respect to the input, the changed memory and return value are the data stream start point.
5. The system of claim 4, wherein the stain analysis refers to taking a data stream starting point as a stain source, taking a non-access instruction for modifying parameters as innocent treatment, having no explicit convergence point, performing accurate analysis in a coverage area of a running log, obtaining an input data stream required by execution of each instruction according to the running log and data dependence in a basic block, performing conservative analysis outside the coverage area of the running log, obtaining all input data streams possibly required by execution of each instruction according to a program dependence graph, and automatically completing the analysis process.
6. The system of claim 5, wherein the accurate analysis comprises: the method comprises the steps of analyzing logs generated by each process according to the log generation sequence, establishing a simulation memory and a simulation call stack during the process, establishing, accessing and destroying the simulation memory according to the memory and pointer structures when the log reappears program execution, and recording program input contained in the memory and program input required by instruction execution, so that the program input required by each function execution is obtained.
7. The system of claim 6, wherein the partitioning scheme comprises: the original program is divided into more than two program blocks at the function level, each program block contains only the least program input for ensuring the program to execute correctly, the same function can be divided into more than two program blocks according to the program input required in execution, and the functions using the same file descriptor are necessarily divided into the same program block.
8. The system of claim 7, wherein the program rewrite module rewrites the original program according to the segmentation scheme and the auxiliary information, the rewritten content comprising:
adding inter-process communication based on pipeline communication provided by an operating system;
based on inter-process communication, adding codes to realize inter-process synchronous execution of system call, and when any program block executes the system call related to file descriptor, user permission and process creation and destruction, other program blocks execute the same system call together;
based on inter-process communication, adding codes to realize inter-process transmission of global variables, and updating according to modification conditions of the global variables in other program blocks when the program blocks need to access one global variable;
adding serialization and anti-serialization codes of function parameters, and adding cross-process function calling codes based on inter-process communication;
when the program enters the main function, more than two processes are created to correspond to the program blocks in the segmentation scheme, wherein one process executes the main function, and the other processes wait for inter-process communication.
9. The system of claim 8 wherein the serialization and deserialization means that the transmission of the serialization parameters is performed before and after the cross-process function call, the sender serializes the content to be sent into a byte stream and resends it, including the parameter itself, the memory to which the pointer type parameter points directly or indirectly, the pointers present in the memory, and the receiver deserializes, restores the formatted byte stream into the parameter, the memory, and the pointers, and corrects the pointers that change depending on the memory space.
CN202310175547.4A 2023-02-28 2023-02-28 Program segmentation system for data stream Pending CN116126305A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310175547.4A CN116126305A (en) 2023-02-28 2023-02-28 Program segmentation system for data stream

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310175547.4A CN116126305A (en) 2023-02-28 2023-02-28 Program segmentation system for data stream

Publications (1)

Publication Number Publication Date
CN116126305A true CN116126305A (en) 2023-05-16

Family

ID=86302849

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310175547.4A Pending CN116126305A (en) 2023-02-28 2023-02-28 Program segmentation system for data stream

Country Status (1)

Country Link
CN (1) CN116126305A (en)

Similar Documents

Publication Publication Date Title
US10698668B1 (en) Custom code transformations during compilation process
CN108090346B (en) Code multiplexing attack defense method and system based on data flow monitoring
CN109840410B (en) Method and system for isolating and protecting data in process
Dalton et al. Raksha: a flexible information flow architecture for software security
EP2324424B1 (en) Apparatus and method for handling page protection faults in a computing system
US20090271867A1 (en) Virtual machine to detect malicious code
US8037529B1 (en) Buffer overflow vulnerability detection and patch generation system and method
US20160210216A1 (en) Application Control Flow Models
US9900324B1 (en) System to discover and analyze evasive malware
US8990116B2 (en) Preventing execution of tampered application code in a computer system
US10528729B2 (en) Methods and systems for defending against cyber-attacks
US8775826B2 (en) Counteracting memory tracing on computing systems by code obfuscation
US20180189042A1 (en) Systems and/or methods for type inference from machine code
You et al. Pmp: Cost-effective forced execution with probabilistic memory pre-planning
US20170046196A1 (en) Real-time code and data protection via cpu transactional memory suppport
CN116150739A (en) Automatic stack overflow defense method based on dynamic protection of key address
CN113176926B (en) API dynamic monitoring method and system based on virtual machine introspection technology
CN112287357B (en) Control flow verification method and system for embedded bare computer system
CN113935041A (en) Vulnerability detection system and method for real-time operating system equipment
CN112445706A (en) Program abnormal code acquisition method and device, electronic equipment and storage medium
CN116126305A (en) Program segmentation system for data stream
CN111367505A (en) JavaScript source code secrecy method, device, equipment and storage medium
Jauernig et al. Lobotomy: An architecture for jit spraying mitigation
CN113419960A (en) Seed generation method and system for kernel fuzzy test of trusted operating system
CN113438273A (en) User-level simulation method and device for application program in Internet of things equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination