CN117610001A

CN117610001A - Automatic analysis method for fine-grained malicious behaviors in Internet of things malicious software

Info

Publication number: CN117610001A
Application number: CN202311833436.4A
Authority: CN
Inventors: 李森; 冯睿韬; 李晓红; 陈森; 李雪威
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2023-12-28
Filing date: 2023-12-28
Publication date: 2024-02-27

Abstract

The invention discloses an automatic analysis method for fine-grained malicious behaviors in internet of things malicious software, and belongs to the technical field of malicious software analysis; according to the method, a comprehensive and detailed analysis framework of the malicious software of the Internet of things is firstly established, a fine-grained malicious behavior knowledge base is secondly constructed, and finally an automatic analysis tool aiming at the malicious behavior in the malicious software of the Internet of things is developed based on the result. The method solves a series of problems in the current analysis field of the malicious software of the Internet of things, and comprises the steps of improving the analysis efficiency and accuracy of the malicious software of the Internet of things, reducing the requirements of professional skills, adapting to the rapidly-changing threat environment and coping with increasingly-complicated malicious software behaviors. Through automated analysis tools, researchers and security practitioners can more quickly acquire suspicious behaviors in the internet of things malicious software, so as to more effectively fight against malicious software threats in the internet of things environment, thereby protecting the increasingly growing technical field from attacks.

Description

Automatic analysis method for fine-grained malicious behaviors in Internet of things malicious software

Technical Field

The invention relates to the technical field of malware analysis, in particular to an automatic analysis method for fine-grained malicious behaviors in IoT (internet traffic) malware.

Background

The internet of things (IoT) has rapidly changed the way we live and work, whether in the healthcare field, devices for monitoring blood glucose, or smart furniture that helps our families more efficiently, has greatly changed our way of interaction with the digital world and become part of our daily lives and key infrastructure. By 2025, it was expected that there will be approximately 1000 hundred million devices networked, and internet of things devices will become more and more common.

Unfortunately, the rapid popularity of internet of things devices has not escaped the attention of cyber criminals and malware authors. While these devices offer convenience, efficiency and innovation to us, they introduce new security challenges as well. Because of the diversity of internet of things devices, the motivation behind malware for these devices has also become diversified, such as financial interests, political agenda, or network activities. This makes the internet of things device more likely to be the primary target for criminals and malicious actors. Unlike common devices, internet of things devices often suffer from a number of low quality code and architecture drawbacks, which make them more susceptible to the interest of network criminals and malware writers. In addition, many internet of things devices are still using outdated firmware and different levels of security, making them more vulnerable to malware.

For example, mirai family malware is a marked example in the field of internet of things, which rapidly scans devices on the network that still use default login credentials, enabling hackers to control more than 40 tens of thousands of devices simultaneously for malicious activity. This enables hackers to easily control the computing power of these devices and to launch highly destructive attacks. Once the source code of Mirai is published publicly, this results in the presence of more malware. One of the most well known attacks is an attack against the DNS service provider Dyn, resulting in hundreds of well known websites including Twitter, netflix, reddit and gituub being temporarily closed for hours. The rise and widespread spread of Mirai has highlighted security problems associated with internet of things devices. It discloses that many internet of things devices do not take into account basic security measures, such as changing default credentials or periodically updating firmware, when designing and deploying. Such malware has also led to a more extensive discussion about the security of internet of things devices and prompted researchers and security domain practitioners to explore new methods of protecting the internet of things ecosystem from threats. The internet of things malicious software is increasingly popular, and attacks are increased by 700% during epidemic situations, which indicates that the diversity range and rapid growth of the malicious software in the internet of things form a great challenge for security researchers and practitioners. Thus, security professionals and researchers must develop more efficient methods to detect, identify, prevent, and mitigate these threats. At the same time, detecting and locating critical malicious behavior in internet of things malware is becoming increasingly important.

To solve such problems, researchers have attempted to solve them by using various methods, but such efforts are still in the beginning, and there are some existing studies on malicious behavior in the malware of the internet of things, which are presumably of the following types:

(1) And (5) establishing a standardized analysis flow of the malicious software of the Internet of things. Currently, the devices of the internet of things are of a wide variety, and the hardware and software configuration of each device is different from the intelligent home devices to the industrial sensors. This heterogeneity makes the behavior of malware likely to exhibit different characteristics on different devices, increasing the difficulty of analyzing malware. The current mainstream analysis methods and analysis flows are also various, such as revealing suspicious communication modes and malicious behaviors through network traffic analysis and behavior analysis, or analyzing potential vulnerabilities and malicious code fragments of the internet of things malicious software through reverse engineering. Although the analysis methods for the internet of things malicious software are various at present, the quality of analysis reports obtained according to different analysis methods is uneven, so that a set of standardized analysis flow is very needed to be formed for the internet of things malicious software based on the current analysis method at present.

(2) The problem of classification of malicious software of the Internet of things. On the classification problem of the internet of things malicious software, the current technology mainly focuses on classification and family classification. Two-classification is a fundamental approach that simply separates software into malware and non-malware. Such methods typically rely on feature extraction and machine learning algorithms, such as decision trees, support vector machines, or deep learning models, to analyze software behavior and code features. However, the classification method, while effective in distinguishing malware from normal software, is insufficient to identify a specific family of malware. Family classification is more complex, which involves further classification of malware into specific families or types, such as Trojan horses, worms, or botnets. This typically requires more sophisticated analysis, such as behavior-based analysis, signature matching, or graph-based methods, to identify specific features of the malware family. Family classification is critical to understanding the behavioral patterns, modes of propagation, and their potential hazards of malware, but also faces challenges of higher complexity and continuously evolving threat models. With the increase of the types and the number of the devices of the internet of things, the classification methods cannot adapt to new security challenges, and meanwhile, the classification methods have obvious defects. First, current methods of malicious behavior detection generally focus on a relatively macroscopic classification granularity, i.e., classifying malware into a large class or family, while being relatively weak in fine-grained malicious behavior detection. Such coarser granularity classification methods may not accurately identify and distinguish more subtle malicious behavior patterns, such as custom attacks for a particular device or environment, and the like.

(3) Detection of malicious behavior in malware. In addition to the problem of internet of things malware classification, detection and recognition of malicious behaviors in malware is also an emerging research field in recent years. Currently, the mainstream analysis methods include sandbox-based dynamic analysis and executable file-based static analysis. Sandbox technology refers to the safe observation of suspicious malicious behavior by an analysis monitor in a sandbox by executing suspicious files or programs in a separate environment (sandbox), for example by analyzing system calls and network behavior patterns to dynamically detect suspicious behavior of malware, which would be recorded and reported if malicious activity or suspicious behavior were observed. Static analysis, on the other hand, involves deep inspection of software source code or compiled binary files to identify potential malicious behavior, suspicious formations, and security vulnerabilities without running malware, such as revealing security threats such as backdoors, malicious payloads, etc. by analyzing program structures, program data flows, and program control flows, to identify and block before the malicious behavior poses a threat to the system. However, the analysis process of malware behavior is different from the classification detection task. It is necessary not only to determine the existence of malicious behavior, but also to know its specific implementation and operation method in detail, and to identify its malicious function through various analysis techniques. And until now, the analysis process mainly depends on manual analysis of malicious software by researchers, so that the analysis process becomes tedious and time-consuming, a great deal of manpower and time resources are needed to be input, and the researchers are difficult to rapidly deploy countermeasures for the corresponding malicious software.

The complexity of the internet of things environment and the diversity of devices make traditional malware analysis methods a significant challenge in terms of accuracy, analysis granularity, and efficiency. And malware analysis at present often relies on a time-intensive and proprietary manual analysis process, which is time-consuming and inefficient. In addition, with the rapid growth of internet of things devices and the increasing diversity of malware types, traditional analysis methods are difficult to adapt to rapidly changing internet of things threat environments. In order to solve the problems, the invention provides an automatic analysis method for fine-grained malicious behaviors in IoT malicious software.

Disclosure of Invention

The invention aims to provide an automatic analysis method for fine-grained malicious behaviors in internet of things malicious software, which improves analysis efficiency of the internet of things malicious software so as to quickly obtain suspicious malicious behavior information from the internet of things malicious software, thereby taking response measures for the suspicious malicious behaviors.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

an automatic analysis method for fine-grained malicious behavior in IoT malware, comprising the steps of:

s1, based on the prior art, referring to a static and dynamic analysis method of a main stream in the existing analysis process of the malicious software of the Internet of things, formulating a malicious software analysis framework to simplify and standardize a manual analysis flow of a malicious software sample; the framework enables more efficient identification and classification of malware while reducing the complexity and redundancy of the analysis process. It includes in-depth analysis of behavioral patterns, threat levels, code structures, etc. of malware. In addition, the framework incorporates a detailed set of analysis guidelines and tools that can help an analyst process malware samples more systematically, thereby ensuring consistency and reliability of analysis results. By applying the framework, a malicious software analyst can quickly learn the characteristics of the malicious software, thereby providing support for further defense strategies.

S2, carrying out standardized analysis on the malicious software based on the malicious software analysis framework formulated in the S1, constructing a fine-granularity malicious behavior knowledge base according to the obtained analysis report, and associating the malicious behavior in the malicious software of the Internet of things with a complex implementation mode and threat level thereof through the knowledge base; this step can systematically understand the complexity of malware, helping to develop more accurate automated analysis tools.

S3, developing an automatic analysis tool based on the Ghidra reverse analysis tool to automatically analyze fine-grained behaviors of the malicious software, rapidly identifying and classifying specific behaviors of the malicious software, and accurately indicating and recording a code implementation method and specific positions of each fine-grained malicious behavior in the malicious software.

Preferably, the step of analyzing the malware framework in S1 for a piece of malware to be analyzed is as follows:

s1.1, preliminary analysis based on the degree of anti-virus: using an antivirus program to check whether the malware has been marked as malware;

s1.2, dynamic analysis based on sandbox technology: based on the preliminary analysis report obtained by the antivirus program in S1.1 as a reference, dynamically analyzing the malicious software of the Internet of things by using a sandbox technology;

s1.3, overall analysis based on a reverse analysis tool: reverse engineering is carried out on the executable file by using a Ghidra reverse analysis tool, and further analysis is carried out on the malicious file;

s1.4, supplementary analysis based on dynamic debugging tools: further additional analysis of functions that cannot be identified by the Ghidra reverse analysis tool analysis is performed using the dynamic debug tool.

Preferably, the S2 specifically includes the following:

s2.1, collecting a malicious software sample from an IoTPOT honey pot in a data acquisition and sample selection stage, and primarily classifying the malicious software sample by using an AVclass tool, so that the obtained sample set is comprehensive and balanced, and representative characteristics of various malicious software are covered;

s2.2, in a fine-grained malicious behavior modeling stage, manually analyzing the malicious software sample of the Internet of things according to the malicious software analysis flow in S1 to identify key malicious behaviors of the malicious software of the Internet of things, such as a system persistence strategy, system deception behaviors and system anti-debugging behaviors;

s2.3, modeling each fine-granularity malicious behavior according to the analysis result obtained in the S2.2, extracting key characteristics of the malicious behavior, and providing a basis for accurate classification and effective response;

s2.4, the operations described in S2.1-S2.3 are integrated, and a knowledge base is constructed to collect and integrate behavior intentions, behavior categories, behavior patterns and attack means of the malicious software, wherein the knowledge base not only improves analysis efficiency and accuracy, but also provides important support for research and practice in the field of network security.

Preferably, the S3 specifically includes the following:

s3.1, constructing a malicious behavior analysis function, and automatically realizing the manual analysis process in S2, wherein the method further comprises the following steps:

s3.1.1, analyzing calling modes and code fragments of key functions: analyzing assembly code obtained by disassembly, and identifying key functions and code segments which can provide deep understanding of functions and behaviors of the executable file;

s3.1.2, analyzing character strings in malware: analyzing strings used in a malware compilation list, such as file names, comments, or conditional checks, can provide clues that a malware creator intends to use;

s3.1.3, analyzing data structures in malware: analyzing data structures used in malicious files, such as arrays, structures, and linked lists, helps determine how programs store and process data;

s3.1.4 control flow for analyzing malware: analyzing the control flow of the malicious file, identifying a main execution path, and helping to analyze shared or similar execution paths among different varieties, thereby helping to find new malicious software varieties;

s3.2, designing a category mapping and positioning module of malicious behaviors, accurately classifying and positioning fine-grained malicious behaviors, and further comprising the following contents:

s3.2.1, the class mapping and positioning module receives data generated by the analysis function in the step S3.1, wherein the data comprises code fragments associated with fine-grained malicious behaviors;

s3.2.2, the category mapping and positioning module takes the fine-grained malicious behavior knowledge base constructed in the step S2 as a reference, and matches real-time analysis data with information in the knowledge base through comparative analysis to determine the specific category of the malicious behavior;

s3.2.3, the category mapping and positioning module calculates the specific position of the fine-grained malicious behavior in the file according to the memory position of the associated code segment in S3.2.1 so as to realize the malicious behavior positioning function.

Compared with the prior art, the invention provides an automatic analysis method for fine-grained malicious behaviors in the IoT malicious software, which has the following beneficial effects:

(1) The invention formulates a malware analysis framework. Researchers can use this framework to conduct malware analysis according to their steps. It will formulate a workflow based on the key tools and techniques required for malware analysis, including all the necessary steps, tools and techniques for system malware analysis. Compared with the prior art, the framework provides a more systematic and structured malicious software analysis flow. By integrating key tools and techniques, it greatly improves the efficiency and accuracy of the analysis. This organizational approach reduces the need for expertise, allowing even non-professionals to efficiently conduct analyses, thereby reducing human costs. In addition, the preset workflow reduces possible errors in analysis, and improves the safety and convenience of operation.

(2) The invention constructs a fine-grained malicious behavior knowledge base. Through the knowledge base, fine-grained malicious behavior in malicious software can be closely related to complex implementation modes and threat levels thereof. It will greatly accelerate our ability to analyze and understand the behavioral intent and threat level of fine-grained malicious behavior exhibited by malware. Compared with the prior art, the knowledge base provides a deeper and finer internet of things malicious software behavior classification system. It enables researchers to more quickly understand and identify specific behavioral intents and threat levels of malware, thereby speeding up response time and improving the efficiency of malware processing. This not only enhances the security defenses but also reduces the waste of resources due to misclassification, thereby reducing energy and time costs.

(3) The invention develops a set of automatic analysis tools for malicious behaviors in the malicious software of the Internet of things. We constructed an automated analysis tool based on Ghidra. When the internet of things malicious software is analyzed, the tool can be used for rapidly and accurately detecting, identifying and positioning fine-grained malicious behaviors in the internet of things malicious software. Compared with the traditional method, the tool not only can detect the fine-granularity malicious behaviors in the malicious software of the Internet of things more quickly, but also can identify and position the fine-granularity malicious behaviors quickly, so that the analysis efficiency is improved greatly. This automated method reduces the need for manual analysis, reduces labor costs, and reduces the risk that may be caused by manual analysis errors. The method provides a faster and more reliable analysis tool for researchers, thereby improving the overall safety of the environment of the Internet of things.

Drawings

FIG. 1 is an overall flowchart of an automatic analysis method for fine-grained malicious behavior in IoT malware in accordance with embodiment 1 of the present invention;

FIG. 2 is a flow chart of a malware analysis framework in accordance with embodiment 1 of the present invention;

FIG. 3 is a graph showing an example of the result of the VirusTotal analysis in example 1 of the present invention;

FIG. 4 is a graph showing an example of the analysis result of the LISA sandbox in example 1 of the present invention;

FIG. 5 is a Ghidra disassembled code diagram of embodiment 1;

FIG. 6 is a chart of Ghidra decompilation code in embodiment 1 of the present invention;

FIG. 7 is a Ghidra call graph in example 1 of the present invention;

FIG. 8 is a diagram showing a process of analysis of Radar 2 in embodiment 1 of the present invention;

FIG. 9 is a flow chart of data acquisition and sample selection in accordance with embodiment 1 of the present invention;

FIG. 10 is a diagram illustrating mapping of malicious behavior categories according to embodiment 1 of the present invention;

FIG. 11 is a modeling process of a knowledge base malicious behavior model in embodiment 1 of the present invention;

FIG. 12 is a flowchart of the construction of a fine-grained malicious behavior knowledge base in embodiment 1 of the invention;

FIG. 13 is a schematic diagram of an iterative function analysis algorithm in embodiment 1 of the present invention;

FIG. 14 is a flow chart of an automated analysis tool according to embodiment 1 of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments.

Example 1:

the invention provides an automatic analysis method for fine-grained malicious behaviors in internet of things malicious software, wherein the overall flow is shown in figure 1, namely, an executable file of the internet of things malicious software is input, and the executable file is output as a comprehensive and detailed analysis report for the fine-grained malicious behaviors in the malicious software. The detailed implementation of the present invention is described in detail below with reference to the associated drawings.

(1) Formulation of malware analysis framework based on prior art

Referring to fig. 2, the present invention establishes a malware analysis framework based on the prior art to simplify and standardize the manual analysis flow of malware samples, and the framework follows the currently mainstream malware analysis technology, and the standardized analysis tools adopted specifically are shown in table 1.

TABLE 1 malware analysis framework tool overview

For a piece of malware to be analyzed, the specific analysis steps and methods are as follows:

1) Based on preliminary analysis of the antivirus program.

In this framework, the present invention first uses an antivirus program to check whether the malware has been marked as malware. These antivirus programs employ signature-based detection and malware-based detection. Signature-based detection relies on looking at collected malicious file databases, as shown in FIG. 3, which typically use hashing algorithms, such as MD5, SHA-1, or SHA-256, to generate unique hash values for the files, relying on behavioral and pattern matching analysis to identify suspicious files based on heuristic detection methods. Typically, each antivirus program typically uses a different signature and heuristic detection method, and in order to obtain the most comprehensive coverage, we will check multiple antivirus programs to see if there is any information available.

In the framework, the invention uses the VirusTotal to perform unified scanning, wherein the VirusTotal is a malicious software online searching website, and a plurality of antivirus programs and result report information of an online scanning engine are collected. The virus total can compare the malware sample with databases of various antivirus programs, so as to return a result report with highest matching degree.

2) Dynamic analysis based on sandbox technology.

Based on the preliminary analysis report obtained by the antivirus software in the steps as a reference, the method and the system of the invention use a sandbox technology to dynamically analyze the malicious software of the Internet of things.

Sandboxed technology is a powerful method for evaluating and analyzing the behavior of malware in a secure, isolated environment, sandboxed is essentially a controlled virtual environment, allowing malware to run therein while preventing damage to the real system, which enables researchers to observe the behavior of malware while executing, including how it interacts with the operating system, whether it attempts to connect to a remote server, and whether it attempts to download or execute other malicious code.

By sandbox-based dynamic analysis, complex behavior that some static analysis techniques cannot detect, such as malicious activity triggered based on specific conditions or malicious operations hidden using advanced techniques, can be revealed. Furthermore, sandbox-based dynamic analysis also helps to generate behavioral characteristics of malware that can be used to enhance security products and policies to more effectively identify and block malware.

In this framework, the standard analysis Sandbox used by the present invention is the LISA Sandbox, which has advanced monitoring capabilities, through which a series of suspicious malicious activities such as file operations, network communications, and system calls can be obtained, as shown in fig. 4. Through the LISA Sandbox, the framework of the invention can more effectively identify and analyze various complex malicious software, and provides powerful support for safety research.

3) Based on the overall analysis of the reverse analysis tool.

Analysis methods based on sandboxed technology also have limitations, such as advanced malware can change its malicious behavior if it detects itself in the sandbox, thereby escaping the detection. Thus, for more in-depth analysis of malicious files, the present invention uses a reverse analysis tool named Ghidra to reverse engineer executable files.

The principle of Ghidra is to break down the binary code into smaller fragments and analyze each fragment independently. When a binary file is loaded into Ghidra, it automatically recognizes the instruction set and file format. It then breaks down the binary file into assembly code for investigation.

In the assembly process from Ghidra, the invention firstly analyzes the entry point of the malicious program, and can easily find the main functions of the malicious software by tracking the entry point and understand the operation of the malicious software written by a creator of the malicious software. By analyzing the assembly code of the malware, function calls and values can be inferred to understand the function of the malware.

One very useful tool in Ghidra is its built-in decompiler, which can convert assembly code into higher-level and more easily understood level code, which helps to better understand the execution of the program. The specific analysis procedure of this step is as follows: firstly, disassembling malicious software through Ghidra to obtain assembly codes, and then decompiling the obtained assembly codes by using a decompiler built in the Ghidra, so that corresponding high-level language code representations can be obtained.

Next, take an implementation method of persistent malicious behavior in the analysis internet of things malicious software as an example to analyze:

as shown in fig. 5, the decompiled code of the malicious file is first obtained, and then analyzed using the Ghidra built-in compiler, thereby obtaining the decompiled code, as shown in fig. 6. After the high-level language code generated by decompilation is obtained, the specific implementation method of the malicious behavior can be easily and clearly known, and the malicious software comprises the following steps:

malware will first attempt to open files that are routed/etc/rc.d/rc.local. If the file does not exist, the code will attempt to open/etc/rc. Next, a getwwd () function is used to attempt to acquire the current working directory, store it in the acStack6736 variable, and check if the current working directory is the root directory (/). If so, the opened file is closed and exited. Then, the length of the string pointed to by param_2 is calculated, the string is traversed until the last "/" character is found, and the current working list and the part after the last "/" in the string pointed to by param_2 are spliced together using the sprintf function, and the result is stored in the afStack6420 variable. Finally, the content is read row by row in the opened configuration file, checking whether the string in the afStack6420 variable is already contained, if a matching row is found, increasing the count of local_1a70, otherwise, opening the path pointed to by file local_1a5c in append mode, and writing the string in afStack6420 into the file. After the series of operations are completed, the file is closed.

From the above analysis results, we can simply record the following information in the manual analysis report:

in addition, based on Ghidra, the invention is able to record key malicious behavior of various malware samples, including malware that was intentionally confused by malware authors. However, malware creators often use anti-debugging techniques, which increase the difficulty of researchers' analysis of malware.

In the study of the present invention, some common anti-analysis techniques observed in internet of things malware include: such as code obfuscation, anti-debug, and packaging. Fortunately, ghidra is a very powerful decompiler that can combat many of these inverse analysis techniques. For example, as shown in fig. 7, the code analysis function is built in, and functions such as data flow analysis, control flow analysis and the like can be automatically performed. The analysis work of the present invention can be aided by online generation of function IDs. From the architecture information obtained previously, a function ID may be generated and then the signature of the function is analyzed using the function ID database file of Ghidra. Even in the case where the symbol table is stripped, the function ID tool of Ghidra is still able to effectively identify common system calls and functions, providing a great aid to the analysis process of the present invention.

4) Supplemental analysis based on dynamic debugging tools.

After completion of the series of analyses described above, further additional analysis may be performed using the dynamic debug tool for functions that cannot be identified by the Ghidra analysis. In this framework, the standard dynamic debugging tool used by the present invention is Radare2, which allows us to execute instructions row by row and selectively perform specific functions without running the whole program, and by using the dynamic debugger, can jump inside a single function to better understand its code function and specific implementation of key malicious behavior.

(2) Construction of fine-grained malicious behavior knowledge base

1) Data acquisition and sample selection.

As shown in fig. 9, a large number of internet of things malware samples were first obtained from the IoTPOT honeypot. These samples are then initially classified using an AVClass tool to enable their basic features and categories to be known. Samples are then scaled from each of the initially classified categories, such selection methods being intended to ensure that the manually analyzed sample set is both comprehensive and balanced. Through the strategy, a representative sample with strong representativeness and various types can be obtained, thereby laying a solid foundation for deep analysis of malicious software behaviors.

2) Analysis and modeling of fine-grained malicious behavior.

The process mainly involves subdividing the malicious behaviors of the internet of things malicious software into different categories, and as shown in fig. 10, the internet of things malicious software sample is manually analyzed based on the framework, and the code structure, execution mode and control flow of the malicious software are mainly analyzed, so that specific strategies for behavior hiding, system persistence, data confusion, system damage or network attack and the like are identified. And then summarizing the behaviors of the malicious behaviors obtained according to the analysis, and in the step, as shown in fig. 10, classifying the analyzed malicious behaviors by referring to an ATT & CK framework, wherein the process not only can link various behaviors in the malicious software with specific implementation modes thereof, but also can accelerate the speed of identifying the malicious software, and ensures that the malicious behaviors can be mapped to correct categories. Finally, according to the analysis result, modeling is carried out on the malicious behaviors, a specific model is shown in fig. 11, and each malicious behavior is divided into two parts according to abstract and information, and the specific model is as follows:

in addition, specific damage that malware may cause to the target system, such as system crashes, data leaks, etc., are further evaluated. And analyze the behavior patterns of the malware to explore how it interacts with other system components or network resources, thereby comprehensively evaluating its threat level and potential impact scope.

Finally, summarizing the analysis results to finally form a comprehensive and detailed manual analysis report of the malicious software, wherein the manual analysis report template of the malicious software formulated in the work is shown in table 2.

TABLE 2 Manual analysis report template for malware

3) And constructing a knowledge base.

The knowledge base construction flow is shown in fig. 12, and based on the analysis report obtained by the above-mentioned deep analysis, knowledge summarization is performed and the analysis result is further constructed as a specific knowledge base. This knowledge base aims to provide a comprehensive, detailed and solid basis for comprehensive analysis of malware. In the knowledge base, behavior categories, behavior patterns, attack methods and potential threats of fine-grained malicious behaviors in various malicious software are collected and integrated, so that the knowledge base becomes a resource with rich information and easy access. In general, the knowledge base is constructed to improve the efficiency and accuracy of malware analysis, and to provide key support for research and practice in the field of network security.

(3) Specific technology implementation of automated analysis tools

Based on comprehensive consideration of various aspects of an automatic analysis tool for malicious software, the invention decides to develop the API based on the Ghidra, automatically executes various tasks related to customized analysis of binary files by utilizing the API provided by the Ghidra reverse engineering framework, and can write the automatic analysis tasks by using Java or Python through the Ghidra. The Ghidra API allows for process automation of tasks such as data extraction, symbol renaming, function identification, etc. For the present invention, java was used for the main development language, as Java is the implementation language of Ghidra. This means that Java can provide the invention with all the functionality of the Ghidra API and allows the invention to utilize many existing Java libraries.

In the invention, firstly, the Ghidra is used for carrying out reverse engineering on the input malicious software of the Internet of things, so as to obtain the assembly code of the corresponding malicious software. Next, the generated assembly code is further analyzed as a main analysis object.

In the concrete implementation process:

when analyzing each fine-grained malicious behavior, according to four analysis methods of analyzing calling modes and code fragments of key functions, analyzing character strings in malicious software, analyzing data structures in the malicious software and analyzing control flows of the malicious software, a specific analysis function is built for each fine-grained malicious behavior by combining a knowledge base constructed in the previous step, and then a Ghidra API is adopted to realize the analysis function. The series of analysis functions is intended to analyze each fine-grained malware behavior to identify specific classes of malicious behavior. The analysis functions are also used as main automatic analysis tools of the invention to realize automatic analysis of malicious behaviors in the malicious software of the Internet of things.

When analyzing the malicious software of the whole Internet of things, firstly, a function iteration analysis algorithm is constructed, and the function iteration analysis algorithm is mainly used for circularly traversing the whole function set obtained by carrying out reverse engineering on the malicious software, so that the method can effectively search the whole malicious file.

As shown in FIG. 13, the specific implementation of the iterative function analysis algorithm is that the getFirstFunction () function in the Ghidra API is used to obtain the entry function func of the entire file to be analyzed _i . Then judging whether to successfully acquire func _i If the function is not empty, i.e. the function is successfully acquired, a while loop is entered, in which whether the next function func to be analyzed is acquired is judged according to the cancellation or not of the function monitor _i+1 A detailed analysis of this function is also performed (the detailed analysis of this step is the four analysis methods defined above).

In the actual analysis process of the malicious software, the constructed analysis function is used as a main analysis tool, and as shown in fig. 14, since the main analysis function of the invention is also constructed based on the behavior feature summary of each behavior class in the knowledge base, when a certain analysis function in the four analysis functions detects a certain fine-granularity malicious behavior, the class mapping information behavir_category of the malicious behavior can be firstly obtained according to the class attribution of the analysis function. And then, uniformly recording the assembly code implementation method behavir_implCode of the malicious behavior and the code implementation position behavir_loc of the malicious behavior obtained in the analysis process in an object array behavir_array, and returning the object array behavir_array as a result of the step after each function object is analyzed.

The object array behavior_array is defined as follows;

and then, carrying out repeated iterative analysis on the malicious software through a function iterative analysis algorithm until the complete analysis of the whole malicious file is completed.

And finally, summarizing and arranging the collected object array behavior_array to form a final automatic analysis report of the malicious software, and outputting the final automatic analysis report as an automatic analysis result.

The foregoing is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art, who is within the scope of the present invention, should make equivalent substitutions or modifications according to the technical scheme of the present invention and the inventive concept thereof, and should be covered by the scope of the present invention.

Claims

1. An automatic analysis method for fine-grained malicious behavior in IoT malware, comprising the steps of:

s1, based on the prior art, referring to a static and dynamic analysis method of a main stream in the existing analysis process of the malicious software of the Internet of things, formulating a malicious software analysis framework to simplify and standardize a manual analysis flow of a malicious software sample;

s2, carrying out standardized analysis on the malicious software based on the malicious software analysis framework formulated in the S1, constructing a fine-granularity malicious behavior knowledge base according to the obtained analysis report, and associating the malicious behavior in the malicious software of the Internet of things with a complex implementation mode and threat level thereof through the knowledge base;

2. The automatic analysis method for fine-grained malicious behavior in IoT malware according to claim 1, wherein the malware framework in S1 specifically analyzes a piece of malware to be analyzed as follows:

3. The automatic analysis method for fine-grained malicious behavior in IoT malware according to claim 1, wherein the S2 specifically comprises the following:

s2.1, collecting a malicious software sample from an IoTPOT honey pot in a data acquisition and sample selection stage, and primarily classifying the malicious software sample by using an AVclass tool;

s2.2, in a fine-grained malicious behavior modeling stage, manually analyzing the malicious software sample of the Internet of things according to the malicious software analysis flow in S1 to identify key malicious behaviors of the malicious software of the Internet of things;

s2.4, integrating the operations described in S2.1-S2.3, and constructing a knowledge base to collect and integrate behavior intents, behavior categories, behavior patterns and attack means of the malicious software.

4. The automatic analysis method for fine-grained malicious behavior in IoT malware according to claim 1, wherein the S3 specifically comprises the following:

s3.1.1, analyzing calling modes and code fragments of key functions: analyzing assembly codes obtained by disassembly, and identifying key functions and code fragments;

s3.1.2, analyzing character strings in malware: analyzing character strings used in the malware assembly list, such as file names, notes, or condition checks;

s3.1.3, analyzing data structures in malware: analyzing data structures used in malicious files, such as arrays, structures, and linked lists;

s3.1.4 control flow for analyzing malware: analyzing the control flow of the malicious file and identifying a main execution path;