CN112084492A - Method for detecting distributed malware by using IRP (anti-IRP) and local sequence alignment algorithm - Google Patents

Method for detecting distributed malware by using IRP (anti-IRP) and local sequence alignment algorithm Download PDF

Info

Publication number
CN112084492A
CN112084492A CN202010986734.7A CN202010986734A CN112084492A CN 112084492 A CN112084492 A CN 112084492A CN 202010986734 A CN202010986734 A CN 202010986734A CN 112084492 A CN112084492 A CN 112084492A
Authority
CN
China
Prior art keywords
irp
sequence
malicious software
malware
request
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010986734.7A
Other languages
Chinese (zh)
Inventor
郑敏
戴裕昇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongke Yuxin Technology Development Xuchang Co ltd
Original Assignee
Zhongke Yuxin Technology Development Xuchang Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongke Yuxin Technology Development Xuchang Co ltd filed Critical Zhongke Yuxin Technology Development Xuchang Co ltd
Priority to CN202010986734.7A priority Critical patent/CN112084492A/en
Publication of CN112084492A publication Critical patent/CN112084492A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/52Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity ; Preventing unwanted data erasure; Buffer overflow
    • G06F21/53Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems during program execution, e.g. stack integrity ; Preventing unwanted data erasure; Buffer overflow by executing in a restricted environment, e.g. sandbox or secure virtual machine
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/566Dynamic detection, i.e. detection performed at run-time, e.g. emulation, suspicious activities

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Virology (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention belongs to the technical field of network security, and discloses a method for detecting distributed malicious software by using an IRP (anti-IRP) and local sequence comparison algorithm, which comprises the following steps: in a sandbox environment, a driver filtering technology is adopted, a driver to be monitored and a corresponding IRP request are set, and an IRP sequence of malicious software is extracted; in a sandbox environment, a driving filtering technology is adopted, and an IRP sequence generated by a system is extracted according to a driver to be monitored and a corresponding IRP request; the IRP sequence is a combined sequence of a drive and an IRP request; replacing combinations of drivers and IRP requests in the IRP sequence with characters; and taking the IRP sequence of the malicious software as a shorter sequence B, taking the IRP sequence extracted from the system as a global long sequence A, performing local matching by using a local sequence comparison algorithm, and selecting the IRP sequence which is matched with the malicious software with the highest score. The distributed malicious software detection method can effectively detect the distributed malicious software, and has high detection precision.

Description

Method for detecting distributed malware by using IRP (anti-IRP) and local sequence alignment algorithm
Technical Field
The invention belongs to the technical field of network security, and particularly relates to a method for detecting distributed malicious software by using an IRP (anti-IRP) and a local sequence alignment algorithm.
Background
Today, with the rapid development of information technology, national information security and personal privacy are of great importance, while malicious software is used as an important carrier for cyber crime, seriously threatens the information security of countries and citizens, and has important significance on how to effectively detect the malicious software. However, with the continuous update of the malware technology, the escape technology of the malware is also continuously enhanced, so that the malware is more hidden and more difficult to detect. Therefore, how to effectively detect malware with escape behavior remains a difficult problem faced by current malware detection research.
The detection of the malicious software is mainly divided into dynamic detection and static detection, and because code obfuscation, a packing program and the like are used, the static detection of the malicious software becomes a difficult task. The dynamic detection can effectively overcome the defects of static detection, but is easy to be attacked by escaping of malicious software. The common malicious software escape technologies include environmental inspection, simulation attack, code reuse, code injection and the like, and the emergence of new technologies can cause certain challenges to detection technologies, but the malicious software with escape behaviors is a single-process running mode. The detection method based on the characteristics of the malicious software comprises an API sequence, malicious behaviors, integrated characteristics and other methods which can effectively respond. A very covert malware escape technique, malWash (Ispoglou KK, Payer M, et al. malWash: Washing malware to evade dynamic analysis, [ C]Usenix security symposium, 2016)), which splits malware into multiple fragments and injects the fragments into a benign program running in the system, and performs fragmentation by using code to minimize malicious behavior so as to achieve the purpose of escaping, and since malWash execution requires scheduling code for coordination, some behaviors may be exposed. D-TIME (Pavithran J, Patnaik M, Rebeiro C, et al. D-TIME: Distributed thread Independent mail software Execution for Runtime infection. C]Usenix security symposium,2019.) is an improvement on the basis of the malWash method, and the hidden performance of the distributively injected malware is further enhanced by using the hidden signal to schedule among execution blocks by using the Asynchronous Procedure Call (APC) function of Windows. For the above malware escape technology, the conventional detection method is not effective enough.
Figure BDA0002689510470000011
Et al (
Figure BDA0002689510470000021
G,Mondoc A,Portase R,et al.Evasive Malware Detection Using Groups of Processes[C]Information security,2017:32-45.) proposes to group processes in a system according to relevance, and detect malware injected by distribution using a heuristic detection mode according to behaviors generated by operations executed by the process groups. However, the benign processes selected by the distributed malware are not necessarily related, and monitoring a large number of processes also causes burden on a monitoring system, and affects detection efficiency. Otsuki et al (Otsuki Y, Kawakoya Y, Iwamura M, et al, heated the Analysis of Distributed Code Injection in Post-mortem catalysts [ C)]International works on security,2019: 391-. However, since the memory dump is a memory snapshot of the system at a certain time, we cannot guarantee that all suspicious objects are stored in the memory at the current time.
Disclosure of Invention
The invention provides a method for detecting distributed malicious software by using an IRP (anti-replay protocol) and a local sequence comparison algorithm, aiming at the problems of low detection efficiency and low detection precision of the existing distributed malicious software detection method.
In order to achieve the purpose, the invention adopts the following technical scheme:
a method of detecting distributed malware using IRPs and local sequence alignment algorithms, comprising:
step 1: in a sandbox environment, a driver filtering technology is adopted, a driver to be monitored and a corresponding IRP request are set, and an IRP sequence of malicious software is extracted; the IRP sequence is a combined sequence of a drive and an IRP request;
step 2: in a sandbox environment, a driving filtering technology is adopted, and an IRP sequence generated by a system is extracted according to a driver to be monitored and a corresponding IRP request; the IRP sequence is a combined sequence of a drive and an IRP request;
and step 3: replacing combinations of drivers and IRP requests in the IRP sequence with characters;
and 4, step 4: and taking the IRP sequence of the malicious software as a shorter sequence B, taking the IRP sequence extracted from the system as a global long sequence A, performing local matching by using a local sequence comparison algorithm, and selecting the IRP sequence which is matched with the malicious software with the highest score.
Further, the step 4 comprises:
setting a matching threshold, and if the IRP sequence of the malicious software is in sequence in the IRP sequence extracted from the system and the matched characters exceed the matching threshold, determining that the IRP sequence of the malicious software is detected;
and if the IRP sequences of a plurality of malicious software are matched at the same time, selecting the IRP sequence of the matched malicious software with the highest score according to the scoring rule of the local sequence comparison algorithm.
Further, if the IRP sequences of multiple malware are matched at the same time, selecting the IRP sequence of the matched malware with the highest score according to the scoring rule of the local sequence alignment algorithm includes:
let sequence A ═ a1,a2,a3,…,anB-B sequence1,b2,b3,…,bmWhere n, m are the length of sequences A and B, and n>m;a1,a2,a3,anThe character representation of the combination of the driver and the IRP request in the IRP sequence extracted from the system; b1,b2,b3,bmA character representation of a combination of a driver and an IRP request in an IRP sequence representing malware; s (a)i,bj) Is aiAnd bjThe similarity score is that i is more than or equal to 1 and less than or equal to n, and j is more than or equal to 1 and less than or equal to m; w is a gap penalty; h is a scoring matrix;
when the alignment is started to be executed, a scoring matrix H is initialized, the value is started from 0, the first row and the first column are both 0, a row corresponding sequence A and a column corresponding sequence B are set, and the size of the scoring matrix is Hn+1,m+1
Each entry H in the calculated scoring matrixi,j
Figure BDA0002689510470000031
The sequences a and B are traced back starting with the largest one found in the matrix H:
if ai=bjThen go back to item Hi-1,j-1
If ai≠bjAccording to Hi-1,j-1,Hi,j-1,Hi-1,jBacktracking the item with the largest median value, and if the items have the same value, according to Hi-1,j-1,Hi,j-1,Hi-1,jSelecting the items in the order of the first items;
until a locally optimal sequence is matched.
Further, gap penalties are performed using a finite state machine.
Compared with the prior art, the invention has the following beneficial effects:
the distributed injected malicious software realizes a malicious software hiding technology by the fragment splitting and synchronizing technology, and can effectively escape the dynamic detection malicious software technology. Most advanced malicious software detection technologies cannot effectively deal with the malicious software injected in a distributed mode, and the existing distributed malicious software detection method has the problems of low detection efficiency and low detection precision. Aiming at the problem, the invention provides a method for detecting distributed malicious software by using an IRP and local sequence comparison algorithm, wherein the IRP corresponding to a driver and the driver is used as a characteristic, the IRP extracted from the malicious software is used as a sample sequence, and the sample sequence is compared with a suspicious IRP sequence extracted from a system, so that the type of the driver and the request to which the IRP request belongs can be clearly distinguished. Through experimental verification, the IRP sequence characteristics provided by the invention can be used as characteristics of malware classification, and the detection accuracy can reach 93.4%. Meanwhile, the sandbox is used for detecting actual distributed malicious software modified by the malicious software, and the fact that the method provided by the invention can effectively detect most types of distributed malicious software is verified, the accuracy rate of detecting the distributed malicious software can reach 93%, and the detection performance is superior to that of the similar method in the near term.
Drawings
FIG. 1 is a basic flowchart of a method for detecting distributed malware using IRP and local sequence alignment algorithm according to an embodiment of the present invention;
FIG. 2 is a diagram of an example of IRP request statistics generated by malware;
FIG. 3 is a diagram of the result of IRP classification using word2vec and GRU;
FIG. 4 is an exemplary diagram of IRP sequence fragments matched according to a local alignment algorithm;
FIG. 5 is a statistical chart of detection results of two distributed malware threat models in different blocking modes;
FIG. 6 is a statistical chart of the detection results of the number of injection processes in two block modes.
Detailed Description
The invention is further illustrated by the following examples in conjunction with the accompanying drawings:
as shown in fig. 1, a method for detecting distributed malware using IRP and local sequence alignment algorithm, comprising:
step S101: in a sandbox environment, a drive filtering technology is adopted, a drive to be monitored and a corresponding IRP (I/O request package) request are set, and an IRP sequence of malicious software is extracted; the IRP sequence is a combined sequence of a drive and an IRP request; specifically, a system-wide selective monitoring of IRP requests is performed on the basis of driver filtering provided by open-source IRPMon (https:// github. com/MartinDrab/IRPMon);
step S102: in a sandbox environment, a driving filtering technology is adopted, and an IRP sequence generated by a system is extracted according to a driver to be monitored and a corresponding IRP request; the IRP sequence is a combined sequence of a drive and an IRP request;
step S103: replacing combinations of drivers and IRP requests in the IRP sequence with characters;
step S104: and taking the IRP sequence of the malicious software as a shorter sequence B, taking the IRP sequence extracted from the system as a global long sequence A, performing local matching by using a local sequence comparison algorithm, and selecting the IRP sequence which is matched with the malicious software with the highest score.
Further, the step S104 includes:
setting a matching threshold, and if the IRP sequence of the malicious software is in sequence in the IRP sequence extracted from the system and the matched characters exceed the matching threshold, determining that the IRP sequence of the malicious software is detected;
and if the IRP sequences of a plurality of malicious software are matched at the same time, selecting the IRP sequence of the matched malicious software with the highest score according to the scoring rule of the local sequence comparison algorithm.
Further, if the IRP sequences of multiple malware are matched at the same time, selecting the IRP sequence of the matched malware with the highest score according to the scoring rule of the local sequence alignment algorithm includes:
let sequence A ═ a1,a2,a3,…,anB-B sequence1,b2,b3,…,bmWhere n, m are the length of sequences A and B, and n>m;a1,a2,a3,anThe character representation of the combination of the driver and the IRP request in the IRP sequence extracted from the system; b1,b2,b3,bmA character representation of a combination of a driver and an IRP request in an IRP sequence representing malware; s (a)i,bj) Is aiAnd bjThe similarity score is that i is more than or equal to 1 and less than or equal to n, and j is more than or equal to 1 and less than or equal to m; w is a gap penalty; h is a scoring matrix;
when the alignment is started to be executed, a scoring matrix H is initialized, the value is started from 0, the first row and the first column are both 0, a row corresponding sequence A and a column corresponding sequence B are set, and the size of the scoring matrix is Hn+1,m+1
Each entry H in the calculated scoring matrixi,j
Figure BDA0002689510470000051
The sequences a and B are traced back starting with the largest one found in the matrix H:
if ai=bjThen go back to item Hi-1,j-1
If ai≠bjAccording to Hi-1,j-1,Hi,j-1,Hi-1,jBacktracking the item with the largest median value, and if the items have the same value, according to Hi-1,j-1,Hi,j-1,Hi-1,jSelecting the items in the order of the first items;
until a locally optimal sequence is matched.
Further, gap penalties are performed using a finite state machine; in particular, the purpose of using a finite state machine to make gap penalties is to avoid the inconsistency of malware execution due to system scheduling, thereby affecting the IRP sequence match score. Since the length of the sample sequence B is not a fixed length, the state machine needs to dynamically adjust the penalty length during the matching process. There may be instances where the gap length and matched character cross-occur, in which case a gap is set (gap length does not exceed the upper penalty limit) to be combined with consecutive matched characters, and the state machine considers a gap and starts to include penalties when consecutive occur.
To verify the effect of the present invention, the following experiment was performed:
1. IRP Performance evaluation
To verify that an IRP is characteristic of malware and can be used for detection of distributed malware, the IRP characteristics need to be evaluated first to determine that the IRP is suitable for detecting malware.
1.1 Experimental Environment and data
The computer environment in which the sandbox environment operates is such that the CPU uses Intel (R) core (TM) i5-6500@3.20 GHz; 8GB DDR3 memory is used. The Host machine of the cuckoo sandbox is installed in an Ubuntu16.04 system environment, and the Guest machine uses a Windows 732-bit operating system and a 2GB memory.
The malicious code data used experimentally was sourced from Malware Benchmark (http:// Malware banchmark. org /), which was authorized to use a total of about 27k malicious code samples of malicious code, collected over 2013. 2015 years, and a small amount of malware downloaded from the zoo (https:// githu. com/ytisf/the zoo); the number of malicious software split by the malWash method is 100 for distributed experiments. All samples can be divided into 227 families according to the family by VirusTotal, the problems discussed by the invention are mainly distributed malware detection methods, and the classification performance is not the main discussion content.
1.2IRP characteristics
In a Windows system, in a complete and clean environment, the running of an application program needs to interact with the hardware of a lower layer, and a driver is a special program in an operating system and is responsible for processing input and output, i.e., I/O, between an API and the hardware. In a Windows system, a complete set of drivers includes a motherboard, a serial port, a graphics card, a storage device, and the like. Each driver contains multiple I/O requests, and the basic drivers in the Windows7 system in the experimental environment of the present invention are hundreds of drivers, which means that the combination of drivers and IO requests will reach thousands, and if all drivers are monitored, huge data will be generated, which results in the storage of the monitored data into a virtual memory (hard disk), and the monitored data will be distorted. We therefore rely on malware and benign software running in the system, the data collected, and knowledge of malware behavior. 23 of the hundreds of combinations that can be monitored are selected and listed in Table 1.
TABLE 1 drivers to be monitored and corresponding IRP requests
Figure BDA0002689510470000061
Figure BDA0002689510470000071
The present invention uses IRP requests that explicitly belong to a certain driver, rather than using all IRPs and not categorizing them by driver. Thus, when the characteristic vectorization is carried out, the discrimination of some IRPs can be higher. The main reason for selecting these IRPs is that these IRPs appear most frequently in statistics, and the statistics are ranked higher. In addition, drives are hierarchical in the system, for example, access to a hard disk usually goes from top to bottom through volsp.sys, volmgr.sys, pnpmanger.sys, disk.sys, atapi.sys, and finally reaches the hardware. The system uses the hierarchy for more flexible programming and management, but in the experiment of the present invention, if the selected driving hierarchy is too close to the top layer, it will result in that the true purpose of each IO request cannot be identified; too close a hierarchy to the hardware may result in too much redundancy for the IO packets. Moreover, the drives of several layers with similar hardware do not have too much difference between IO requests, and the classification result does not have great difference.
Figure 2 shows IRP statistics for different malware where IRP requests per second are counted on the x-axis in seconds, 2(a) is ranomeware. The first four IRP requests for these types of malware are listed in figure 2. As can be seen from fig. 2, IRP requests by the lasso software (ransomeware. cerber, ransomeware. wannacry) on the hard disk are Read, Write, and Flush, and network connection requests, up to 3 items, which conform to the behavior of the lasso software. Compared with keyboard recording type trojan software (Trojan. Keylogger), the keyboard recording method can effectively record the read-write of a keyboard, a network function and a hard disk.
1.3 IRP Classification Performance
In order to verify the effectiveness of the IRP selected by the invention on classification performance, the IRP can be used for correctly identifying the malicious software, and the sandbox is used for extracting the IRP of the malicious software. Since the IRP contains the process ID and thread ID that originated the request, these can identify to which process the generated IRP request belongs. In the classification experiment of the IRP, the IRP is converted into characters, and then word2vec is used, so that an IRP character sequence is converted into a vector which can be identified by a neural network classifier. IRP character sequences use a double-layer GRU neural network as a classifier model, and both the parameters of word2vec and double-layer GRU networks refer to the API sequence classifier model in the study (Dai Y, Li H, Qian Y, et al. SMASH: A mail Detection Method Based on Multi-Feature Embedded library [ J ]. IEEE Access,2019,7:112588-1125 112597.). The classification results of the IRP are shown by category, distributed in the confusion matrix as shown in part 3(a) of fig. 3. At the same time, the results are shown in part 3(b) of FIG. 3, comparing with MBMAS in study 1(Zhang F, Ma Y. Using IRP with a novel aromatic animal for windows macromolecular activities detection [ C ]. ie International control on progress in information and computing,2016: 610-. MBMAS IRP we follow the IRP used in study 1, not including the combination with the driver, only the simple IRP sequence, and use letter substitution, the classification method uses the original text method (word2vec-GRU/only-IRP), here we reduce the Boosting decision number (BDT) with the highest detection accuracy in the study 1 experiment, and use the probability of 4-gram extraction as the input feature (4-gram-BDT/only-IRP). The overall classification accuracy is shown in table 2.
TABLE 2IRP Classification accuracy
Method of producing a composite material Overall accuracy of classification
word2vec-GRU/driver-IRP 93.4%
word2vec-GRU/only-IRP 87.2%
4-gram-BDT/only-IRP 86.5%
From the results, it can be seen that the overall results are basically acceptable, and using the way of driving and IRP combination proposed by the present invention, higher accuracy can be obtained in the overall classification performance than using only IRP as a feature. By using the drive and the corresponding IRP of the drive as features, the drive and the type of the request to which the IRP request belongs can be clearly distinguished, and the behavior of the IRP can be more clearly explained than by using the IRP alone.
As can be seen from fig. 3, the IRP feature using the method of the present invention is higher in the accuracy of detecting classification than the MBMAS method, but the performance of classification is not the focus of the present invention. As can be seen from the confusion matrix, in contrast to various kinds of malware, substantially good accuracy can be obtained except for the Downloader (Downloader). And the action of the downloader is too single, and most downloaders can be operated by combining the operation of Powershell or script malware. And the behavior of part of downloaders is analyzed from the IRP characteristics of the invention, and the method has similar points with Backdoor (Backdoor) malware and Trojan (Trojan) malware. However, in general, the features of the combination of the driver and the IRP proposed by the present invention have validity for most malware classifications, which provides feasibility for next verification of extraction of distributed malware.
2. Distributed malware detection assessment
2.1 distributed malware detection evaluation
And taking the IRP sequence of the malicious software extracted from the sandbox as a sample sequence, and performing local matching by using a bioinformatics local sequence alignment algorithm. In the experiment, when a complete system IRP sequence is compared with a suspicious sequence and a sample sequence, if the sample sequence is in sequence in the suspicious sequence and the matched characters exceed 80 percent of the sample sequence, the sequence is marked as 'detected'. And if a plurality of malicious sample sequences are matched at the same time, selecting the matching sequence with the highest score according to the scoring rule of the local sequence comparison algorithm. Figure 4 shows the IRP sequence fragments of trojan. keylogger matched using the local sequence alignment algorithm.
In the experiment, 100 sandbox environments of split malicious software and 15 sandbox environments of unused distributed malicious software are set through the IDA6.5 version by using a split plug-in provided by malWash, are used for the detection performance evaluation of the IRP, and meanwhile, the correlation research is compared. In this experiment, using the basic block splitting mode (BBS mode), the number of malware injection processes was set to 3, and the malware injected processes were Opera (version 68.0.3618.173), Adobe (version 19.008.20080), calculator (Windows version 76.1). Since the use of hardware performance counters proposed in the D-TIME article has a high probability of detecting distributed Malware, we have chosen a Detection Method Using the Hardware Performance Counter (HPC) of Ozsoy et al (Ozsoy M, Khasawneh K N, Donovick C, et al Hardware-Based Malware Detection Using Low-Level Architectural Features [ J ]. IEEE Transactions on Computers,2016,65(11): 3332-. The final results are shown in table 3. The second column in the table represents the number of correctly classified malware, the third column represents the number of misjudged malware as benign software, the fourth column represents the total number of misjudged malware, and the last column records the number of unidentified malware.
TABLE 3 recognition capability statistics for distributed malware
Figure BDA0002689510470000091
Figure BDA0002689510470000101
It can be seen from the experiment that the research specially aiming at the distributed malicious software has higher probability of correctly identifying the malicious software, and the experiment restores
Figure BDA0002689510470000102
The research of the people carries out behavior monitoring according to process groups, but because distributed malicious software is hidden in which processes theoretically, the detection situation can generate great errors in an actual detection environment because the distributed malicious software is black-boxed for researchers. Otsuki et al used memory dump, and we have also studied in previous studiesMemory dump identifies malware (Dai Y, Li H, Qian Y, et al. A malware classification method on memory device graph image [ J]Digital Investigation), according to our research results, the memory dump can identify malware, but the memory dump file needs to be updated continuously, and since the memory data is volatile, a considerable part of the data is released after being accessed for several times, which is also the reason that the detection rate is low easily in the detection. Compared with the detection method using HPC as the Malware characteristic, because the Malware characteristic is collected by taking a process as a unit, when the Malware is split into a plurality of blocks and injected into a benign process, the HPC operation mode of the Malware is also split, and meanwhile (Patithran J, Patnaik M, Rebeiro C, et al.D-TIME: Distributed thread Independent Malware Execution for Runtime infection]The article usenix security symposium 2019 states that running distributed malware itself incurs performance overhead, which causes performance counters to rise. In the same way, the previous research SMASH method has the same problem in dealing with distributed malware.
As can be seen from table 3, although the present invention still generates false alarm, compared with the similar research, the method of the present invention is proved to be capable of effectively detecting distributed malware.
2.2 slicing mode detection evaluation
The validity of the method of the invention is verified according to the block size proposed in the malWash article. We split malware according to three patterns, namely, a BBS pattern, a base pattern and a Paranoid pattern (Paranoid) proposed in an article (isopoglou KK, Payer M, et al. malwash: moving malware to event dynamic analysis. [ C ]. usenix security symposium, 2016), using 50 correctly classified distributed malware as samples. The statistics by number of detected malware are shown in fig. 5.
As can be seen from fig. 5, the performance of the method of the present invention for detecting the BBS mode and the base mode is good, and the performance for detecting the paranoid mode is slightly reduced. Compared with two threat models, namely malWash and D-TIME, the D-TIME is higher in concealment in nature, more IRP requests can be generated in the system, and the performance of the detector for detecting malicious software is influenced. Compared with the BAST mode, the paranoid mode generates more file fragments, and the interaction between different processes of malWash and D-TIME is more frequent, so that the grade of a local sequence alignment algorithm is reduced, and the paranoid mode is the main reason for the reduction of the detection performance.
2.3 injection Process number assessment
Second, the malWash threat model is used, as well as two fragmentation patterns: the BBS mode and the BAST mode inject malicious software into benign processes which are different from 2 to 8, and the effectiveness of the method is verified. We used 8 benign processes, including: opera (version 68.0.3618.173), Adobe (version 19.008.20080), calculator (Windows 76.1 version), nodal. exe, mspaint. exe, Media Player (version 12.0.7600.16385), IE (version 8.0.7600.16385), TeamViewer (version 14.0.12762). The final test results are shown statistically in FIG. 6.
From experimental results, in the BBS mode, when the number of processes injected by distributed malware is not more than 6, all distributed malware can be detected by IRP detection, and when the number of injected processes is increased to 8, the number of correct detections starts to decrease. With the BAST mode, the efficiency of detection begins to decrease when the number of injection processes is more than 4. However, these populations are within acceptable ranges, with detection rates maintained above 90%. The main reason for the detection rate decrease is that the interaction between each block generates an IRP sequence, each benign program also generates an IRP sequence when running normally, and when the data size is large enough, the score using the local alignment algorithm is too low to be matched completely. In addition, the more processes that are turned on, the lower the success rate of injecting benign software using malWash.
In conclusion, the distributed injected malicious software realizes a malicious software hiding technology by the fragment splitting and synchronizing technology, and can effectively escape the dynamic detection malicious software technology. Most advanced malicious software detection technologies cannot effectively deal with the malicious software injected in a distributed mode, and the existing distributed malicious software detection method has the problems of low detection efficiency and low detection precision. Aiming at the problem, the invention provides a method for detecting distributed malicious software by using an IRP and local sequence comparison algorithm, wherein the IRP corresponding to a driver and the driver is used as a characteristic, the IRP extracted from the malicious software is used as a sample sequence, and the sample sequence is compared with a suspicious IRP sequence extracted from a system, so that the type of the driver and the request to which the IRP request belongs can be clearly distinguished. Through experimental verification, the IRP sequence characteristics provided by the invention can be used as characteristics of malware classification, and the detection accuracy can reach 93.4%. Meanwhile, the sandbox is used for detecting actual distributed malicious software modified by the malicious software, and the fact that the method provided by the invention can effectively detect most types of distributed malicious software is verified, the accuracy rate of detecting the distributed malicious software can reach 93%, and the detection performance is superior to that of the similar method in the near term.
The above shows only the preferred embodiments of the present invention, and it should be noted that it is obvious to those skilled in the art that various modifications and improvements can be made without departing from the principle of the present invention, and these modifications and improvements should also be considered as the protection scope of the present invention.

Claims (4)

1. A method for detecting distributed malware using IRP and local sequence alignment algorithms, comprising:
step 1: in a sandbox environment, a driver filtering technology is adopted, a driver to be monitored and a corresponding IRP request are set, and an IRP sequence of malicious software is extracted; the IRP sequence is a combined sequence of a drive and an IRP request;
step 2: in a sandbox environment, a driving filtering technology is adopted, and an IRP sequence generated by a system is extracted according to a driver to be monitored and a corresponding IRP request; the IRP sequence is a combined sequence of a drive and an IRP request;
and step 3: replacing combinations of drivers and IRP requests in the IRP sequence with characters;
and 4, step 4: and taking the IRP sequence of the malicious software as a shorter sequence B, taking the IRP sequence extracted from the system as a global long sequence A, performing local matching by using a local sequence comparison algorithm, and selecting the IRP sequence which is matched with the malicious software with the highest score.
2. The method for detecting distributed malware according to claim 1, wherein step 4 comprises:
setting a matching threshold, and if the IRP sequence of the malicious software is in sequence in the IRP sequence extracted from the system and the matched characters exceed the matching threshold, determining that the IRP sequence of the malicious software is detected;
and if the IRP sequences of a plurality of malicious software are matched at the same time, selecting the IRP sequence of the matched malicious software with the highest score according to the scoring rule of the local sequence comparison algorithm.
3. The method for detecting distributed malware using IRP and local sequence alignment algorithm of claim 2, wherein if IRP sequences of multiple malware are matched at the same time, selecting the IRP sequence of the matched malware with the highest score according to the scoring rule of the local sequence alignment algorithm comprises:
let sequence A ═ a1,a2,a3,…,anB-B sequence1,b2,b3,…,bmWhere n, m are the lengths of sequences A and B, and n > m; a is1,a2,a3,anThe character representation of the combination of the driver and the IRP request in the IRP sequence extracted from the system; b1,b2,b3,bmA character representation of a combination of a driver and an IRP request in an IRP sequence representing malware; s (a)i,bj) Is aiAnd bjThe similarity score is that i is more than or equal to 1 and less than or equal to n, and j is more than or equal to 1 and less than or equal to m; w is a gap penalty; h is a scoring matrix;
when the alignment is started to be executed, a scoring matrix H is initialized, the value is started from 0, the first row and the first column are both 0, a row corresponding sequence A and a column corresponding sequence B are set, and the size of the scoring matrix is Hn+1,m+1
Each entry H in the calculated scoring matrixi,j
Figure FDA0002689510460000021
The sequences a and B are traced back starting with the largest one found in the matrix H:
if ai=bjThen go back to item Hi-1,j-1
If ai≠bjAccording to Hi-1,j-1,Hi,j-1,Hi-1,jBacktracking the item with the largest median value, and if the items have the same value, according to Hi-1,j-1,Hi,j-1,Hi-1,jSelecting the items in the order of the first items;
until a locally optimal sequence is matched.
4. The method for detecting distributed malware using IRP and local sequence alignment algorithms of claim 3, wherein gap penalties are performed using a finite state machine.
CN202010986734.7A 2020-09-18 2020-09-18 Method for detecting distributed malware by using IRP (anti-IRP) and local sequence alignment algorithm Pending CN112084492A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010986734.7A CN112084492A (en) 2020-09-18 2020-09-18 Method for detecting distributed malware by using IRP (anti-IRP) and local sequence alignment algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010986734.7A CN112084492A (en) 2020-09-18 2020-09-18 Method for detecting distributed malware by using IRP (anti-IRP) and local sequence alignment algorithm

Publications (1)

Publication Number Publication Date
CN112084492A true CN112084492A (en) 2020-12-15

Family

ID=73738178

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010986734.7A Pending CN112084492A (en) 2020-09-18 2020-09-18 Method for detecting distributed malware by using IRP (anti-IRP) and local sequence alignment algorithm

Country Status (1)

Country Link
CN (1) CN112084492A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101304409A (en) * 2008-06-28 2008-11-12 华为技术有限公司 Method and system for detecting malice code
CN101620658A (en) * 2009-07-14 2010-01-06 北京大学 Hook detecting method under Windows operation system
US8171552B1 (en) * 2006-02-14 2012-05-01 Trend Micro, Inc. Simultaneous execution of multiple anti-virus programs

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8171552B1 (en) * 2006-02-14 2012-05-01 Trend Micro, Inc. Simultaneous execution of multiple anti-virus programs
CN101304409A (en) * 2008-06-28 2008-11-12 华为技术有限公司 Method and system for detecting malice code
CN101620658A (en) * 2009-07-14 2010-01-06 北京大学 Hook detecting method under Windows operation system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张福勇等: "基于免疫原理的恶意软件检测模型", 《计算机应用研究》 *
龚琪 等: "基于序列比对的勒索病毒同源性分析", 《计算机与现代化》 *

Similar Documents

Publication Publication Date Title
US8108931B1 (en) Method and apparatus for identifying invariants to detect software tampering
Baek et al. SSD-insider: Internal defense of solid-state drive against ransomware with perfect data recovery
Azmandian et al. Virtual machine monitor-based lightweight intrusion detection
Dash et al. Droidscribe: Classifying android malware based on runtime behavior
US9838405B1 (en) Systems and methods for determining types of malware infections on computing devices
Shu et al. Unearthing stealthy program attacks buried in extremely long execution paths
US8719936B2 (en) VMM-based intrusion detection system
Hirano et al. RanSAP: An open dataset of ransomware storage access patterns for training machine learning models
US10007786B1 (en) Systems and methods for detecting malware
Darem et al. An adaptive behavioral-based incremental batch learning malware variants detection model using concept drift detection and sequential deep learning
US9239922B1 (en) Document exploit detection using baseline comparison
Banin et al. Multinomial malware classification via low-level features
US20120005750A1 (en) Systems and Methods for Alternating Malware Classifiers in an Attempt to Frustrate Brute-Force Malware Testing
Baek et al. SSD-assisted ransomware detection and data recovery techniques
Chandramohan et al. A scalable approach for malware detection through bounded feature space behavior modeling
Torres et al. Can data-only exploits be detected at runtime using hardware events? A case study of the Heartbleed vulnerability
Aurangzeb et al. On the classification of Microsoft-Windows ransomware using hardware profile
Kadiyala et al. Hardware performance counter-based fine-grained malware detection
Ghiasi et al. Dynamic malware detection using registers values set analysis
US11609987B2 (en) Advanced file modification heuristics
US11068595B1 (en) Generation of file digests for cybersecurity applications
Yagemann et al. Barnum: Detecting document malware via control flow anomalies in hardware traces
Zhou et al. A cautionary tale about detecting malware using hardware performance counters and machine learning
Tyagi et al. Malware Detection in PE files using Machine Learning
KR20110087826A (en) Method for detecting malware using vitual machine

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20201215