CN114329467A

CN114329467A - Memory WebShell detection method and device and electronic equipment

Info

Publication number: CN114329467A
Application number: CN202111577421.7A
Authority: CN
Inventors: 羊昕瑜; 罗伟; 游江; 任家西; 郭健
Original assignee: Nsfocus Technologies Inc; Nsfocus Technologies Group Co Ltd
Current assignee: Nsfocus Technologies Inc; Nsfocus Technologies Group Co Ltd
Priority date: 2021-12-22
Filing date: 2021-12-22
Publication date: 2022-04-12

Abstract

The application discloses a memory WebShell detection method, a memory WebShell detection device and electronic equipment, wherein the method comprises the following steps: acquiring all classes of a target application pair, and screening M first classes with memory WebShell risk in all classes; determining codes corresponding to each byte code of the first type in the memory; performing memory WebShell risk detection on the M code files through K first preset rules to obtain N code files with memory WebShell risks; and marking the classes corresponding to the N code files, and taking the marked classes as a first memory WebShell detection result. Based on the method, risk detection is carried out on the code file corresponding to the byte codes in the slave memory, the method is suitable for the non-text memory WebShell, can also be suitable for the encryption processing and the text memory WebShell after the confusion processing, and can improve the accuracy of the memory WebShell detection.

Description

A memory WebShell detection method, device and electronic device

技术领域technical field

本申请涉及信息安全技术领域，特别是涉及一种内存WebShell检测方法、装置及电子设备。The present application relates to the technical field of information security, and in particular, to a memory WebShell detection method, device and electronic device.

背景技术Background technique

网络后台管理脚本WebShell是一个以文件的形式存在与Web容器内的恶意脚本文件，通过WebShell可以控制服务器来执行任意指令，从而盗取敏感数据或凭据或者作为攻击内网主机的跳板。但是随着互联网的蓬勃发展，攻防实战演练范围的增加，防守方的防护手段逐渐完善，以文件形式存在于Web容器中的WebShell的攻击难度逐渐增加。因此，攻击方采用更加高明的攻击手段，如基于内存WebShell的攻击方式，这种攻击方式在拥有一个Web容器进程执行用户的权限时，攻击方便可以完全控制该进程对应的地址空间内的数据和代码，进而达到控制服务器的目的。在面临基于内存WebShell的攻击方式时，传统的防护方式显得捉襟见肘，因此，急需一种WebShell检测方法对内存WebShell进行检测。The network background management script WebShell is a malicious script file that exists in the form of a file in a Web container. Through WebShell, the server can be controlled to execute arbitrary instructions, thereby stealing sensitive data or credentials or serving as a springboard for attacking intranet hosts. However, with the vigorous development of the Internet, the increase in the scope of offensive and defensive actual combat drills, and the gradual improvement of defense methods by defenders, the difficulty of attacking WebShell that exists in a Web container in the form of a file is gradually increasing. Therefore, the attacker adopts more sophisticated attack methods, such as the memory-based WebShell attack method. When this attack method has the authority of a Web container process execution user, the attacker can completely control the data and data in the address space corresponding to the process. code, and then achieve the purpose of controlling the server. In the face of attacks based on in-memory WebShell, traditional protection methods are insufficient. Therefore, a WebShell detection method is urgently needed to detect in-memory WebShell.

目前对内存WebShell进行检测的方法，主要是基于WebShell文件实体的检测方法，比如特征检测法，该方法从已知WebShell样本中提取恶意特征进行模式匹配，从而检测出内存WebShell；又如统计分析法，该方法利用一些统计学方法对WebShell文件进行识别与检测，然后提取WebShell文件中的特征代码、信息熵、最长单词、重合指数、压缩等特征进行异常检测，从而发现内存WebShell；还有机器学习法，该方法利用决策树、深度学习等方式对样本进行训练得到检测模型，然后利用该检测模型对内存WebShell进行检测。The current detection methods for in-memory WebShell are mainly based on WebShell file entity detection methods, such as feature detection method, which extracts malicious features from known WebShell samples and performs pattern matching to detect in-memory WebShell; another example is statistical analysis method , this method uses some statistical methods to identify and detect WebShell files, and then extracts features such as feature code, information entropy, longest word, coincidence index, compression and other features in WebShell files for abnormal detection, thereby discovering memory WebShell; Learning method, this method uses decision tree, deep learning and other methods to train samples to obtain a detection model, and then use the detection model to detect the memory WebShell.

然而，上述基于WebShell文件实体的检测方法，难以检测出非文本类的WebShell，并且也不能检测出经过混淆、加密等手段处理过后的文本类WebShell，导致无论是特征检测法、还是统计分析法又或者机器学习法，都存在较大的误报以及漏报的可能性。However, the above-mentioned detection methods based on WebShell file entities are difficult to detect non-text WebShells, and cannot detect textual WebShells that have been processed by obfuscation, encryption, etc. Or the machine learning method, there is a large possibility of false positives and false negatives.

发明内容SUMMARY OF THE INVENTION

本申请提供了一种内存WebShell检测方法、装置及电子设备，在对目标应用对应虚拟机中的所有类作初步筛选后，在不需要将目标应用进程信息对应的字节码文件与原始字节码文件进行对比的情况下，对筛选后的类在内存中的字节码对应的代码文件进一步作内存WebShell检测和标注，这种方式适用于非文本类的内存WebShell检测，也能适应加密处理和混淆处理后的文本类的内存WebShell检测，提高内存WebShell检测准确率。The present application provides a memory WebShell detection method, device and electronic device. After preliminary screening of all classes in the virtual machine corresponding to the target application, it is not necessary to compare the bytecode file corresponding to the target application process information with the original bytes. In the case of comparing the code files, the code files corresponding to the bytecodes of the filtered classes in the memory are further detected and marked with in-memory WebShell. This method is suitable for non-text class in-memory WebShell detection, and can also be adapted to encryption processing. And the in-memory WebShell detection of the obfuscated text class improves the accuracy of in-memory WebShell detection.

第一方面，本申请提供了一种内存WebShell检测方法，所述方法包括：In a first aspect, the present application provides a memory WebShell detection method, the method comprising:

获取目标应用对应虚拟机中的所有类，并在所述所有类中筛选出存在内存WebShell风险的M个第一类，其中，所述M为大于或等于1的整数；Acquiring all classes in the virtual machine corresponding to the target application, and screening out M first classes with risks of memory WebShell from all the classes, wherein M is an integer greater than or equal to 1;

确定出所述M个第一类中的每个第一类在内存中的字节码分别对应的代码，得到M个代码文件；Determine the code corresponding to the bytecode in the memory of each first class in the M first classes, and obtain M code files;

通过K个第一预设规则对所述M个代码文件进行内存WebShell风险检测，得到存在内存WebShell风险的N个代码文件，其中，一个所述第一预设规则对应着一种内存WebShell风险，K和N均为大于或等于1的整数；Perform memory WebShell risk detection on the M code files through K first preset rules to obtain N code files with memory WebShell risks, wherein one of the first preset rules corresponds to a memory WebShell risk, Both K and N are integers greater than or equal to 1;

用第一预设标识对所述N个代码文件对应的类进行风险标注，并将包含所述第一预设标识的类作为第一内存WebShell检测结果。Risk marking is performed on the classes corresponding to the N code files with the first preset identifier, and the class containing the first preset identifier is used as the first memory WebShell detection result.

通过上述方法，在对目标应用对应虚拟机中的所有类作初步筛选后，在不需要将目标应用进程信息对应的字节码文件与原始字节码文件进行对比的情况下，对筛选后的类在内存中的字节码对应的代码文件进一步作内存WebShell检测和标注，降低误报率，同时，由于代码文件是由从内存中提取的字节码转化而来，适用于对非文本类的内存WebShell进行检测，也能适应加密处理和混淆处理后的文本类的内存WebShell，因此，能够提高内存WebShell检测准确率。Through the above method, after preliminary screening of all classes in the virtual machine corresponding to the target application, without the need to compare the bytecode file corresponding to the target application process information with the original bytecode file, the filtered The code file corresponding to the bytecode of the class in the memory is further detected and marked by the memory WebShell to reduce the false positive rate. At the same time, since the code file is converted from the bytecode extracted from the memory, it is suitable for non-text classes. In-memory WebShell detection can also be adapted to encrypted and obfuscated text-based in-memory WebShells. Therefore, the detection accuracy of in-memory WebShells can be improved.

进一步，所述在所述所有类中筛选出存在WebShell风险的M个第一类，包括：Further, the M first classes with WebShell risks are screened out from all the classes, including:

在所述所有类中筛选出在本地不存在源字节码文件的第二类，并将所述第二类作为所述存在WebShell风险的第一类；和/或Screening out a second category that does not have source bytecode files locally among all the categories, and using the second category as the first category with WebShell risks; and/or

在所述所有类中筛选出与H个第二预设规则中任一第二预设规则相匹配的第三类，并将所述第三类作为所述存在WebShell风险的第一类，其中，一个所述第二预设规则对应着一种内存WebShell风险，H为大于或等于1的整数。A third category matching any one of the H second preset rules is screened out from all the categories, and the third category is used as the first category with WebShell risk, wherein , one of the second preset rules corresponds to a memory WebShell risk, and H is an integer greater than or equal to 1.

通过上述方法，对目标应用对应虚拟机中的所有类进行初步筛选，缩小后续代码风险扫描的排查范围，同时，获取所有不存在本地源字节码实体文件的类信息，不存在实体文件的内存WebShell有很好的针对效果，提高内存WebShell的检测精度及检测准确率。Through the above method, all classes in the virtual machine corresponding to the target application are preliminarily screened, and the scope of subsequent code risk scanning is narrowed. WebShell has a good targeting effect, improving the detection accuracy and detection accuracy of memory WebShell.

进一步，所述分别确定出每个所述第一类分别对应的代码，得到M个代码文件，包括：Further, the code corresponding to each of the first categories is determined respectively, and M code files are obtained, including:

提取M个第一类分别对应内存中的字节码；Extract the M bytecodes of the first category corresponding to the memory respectively;

将所述字节码转换为M个预设类型的代码文件。Convert the bytecode into code files of M preset types.

通过上述方法，将提取的类对应内存中的字节码转换成预设类型的代码文件，便于后续进一步进行代码风险排查能够适用于非文本类的内存WebShell，同时也能适用于加密处理和混淆处理后的文本类内存WebShell，提高内存WebShell检测准确率。Through the above method, the bytecode in the memory corresponding to the extracted class is converted into a code file of a preset type, which is convenient for further code risk investigation. The processed text-like in-memory WebShell improves the detection accuracy of the in-memory WebShell.

进一步，所述通过K个第一预设规则对所述M个代码文件进行内存WebShell风险检测，得到存在内存WebShell风险的N个代码文件，包括：Further, the described M code files are subjected to memory WebShell risk detection by K first preset rules to obtain N code files with memory WebShell risks, including:

将所述M个代码文件中每个代码文件对应的文本分别与预设风险等级字符串进行匹配；Matching the text corresponding to each code file in the M code files with the preset risk level character string respectively;

若任一代码文件对应的文本中存在与所述预设风险等级字符串一致的第一字符串，则确定所述第一字符串所属的代码文本存在内存WebShell风险；和/或If there is a first character string consistent with the preset risk level character string in the text corresponding to any code file, determine that the code text to which the first character string belongs has an in-memory WebShell risk; and/or

检测所述M个代码文件中每个代码文件对应文本中预设风险字符的个数值；Detect the number of preset risk characters in the corresponding text of each code file in the M code files;

判断所述个数值是否大于预设阈值；judging whether the numerical value is greater than a preset threshold;

若存在风险字符个数值大于预设阈值的第一代码文本，则确定所述第一代码文本存在内存WebShell风险。If there is a first code text whose number of risk characters is greater than a preset threshold, it is determined that the first code text has a memory WebShell risk.

通过上述方法，利用检测不同风险类型的K个第一预设规则对M个第一类进行检测，能够适应不同种类内存WebShell风险，提高内存WebShell检测的准确率。Through the above method, the M first types are detected by using K first preset rules for detecting different risk types, which can adapt to different types of memory WebShell risks and improve the accuracy of memory WebShell detection.

在一种可能的设计中，在获取目标应用对应的虚拟机中的所有类之前，还包括：In a possible design, before acquiring all the classes in the virtual machine corresponding to the target application, it also includes:

对当前正在运行的所有应用进行扫描，得到所有应用分别对应的基本信息以及所有应用分别对应的进程信息，其中，所述基本信息至少包括应用入口完整类信息；Scan all currently running applications to obtain basic information corresponding to all applications and process information corresponding to all applications respectively, wherein the basic information at least includes complete class information of application entry;

根据所有应用对应的基本信息，在所述所有应用中选定出目标应用；According to the basic information corresponding to all applications, a target application is selected from all the applications;

将所述目标应用对应的基本信息以及所述目标应用对应的进程信息加载到所述目标应用对应的虚拟机。The basic information corresponding to the target application and the process information corresponding to the target application are loaded into the virtual machine corresponding to the target application.

通过上述方法，获取到目标应用正在运行时对应虚拟机的诸多信息，相较于访问与分析静态文件或对正在运行的目标应用以外部访问方式获取信息而言，能够获取的信息更加全面，提高检测准确率。Through the above method, a lot of information about the virtual machine corresponding to the target application is obtained when the target application is running. Compared with accessing and analyzing static files or obtaining information through external access to the running target application, the information that can be obtained is more comprehensive. detection accuracy.

在一种可能的设计中，在用第一预设标识对所述N个代码文件对应的类进行风险标注，并将包含所述第一预设标识的类作为第一内存WebShell检测结果之后，还包括：In a possible design, after using the first preset identifier to perform risk labeling on the classes corresponding to the N code files, and using the class containing the first preset identifier as the first memory WebShell detection result, Also includes:

确定所述第一内存WebShell检测结果中包含所述第一预设标识的类对应的风险类型；Determine that the first memory WebShell detection result includes the risk type corresponding to the class of the first preset identifier;

将所述风险类型对应的信息添加到所述第一内存WebShell检测结果中，得到第二内存WebShell检测结果。The information corresponding to the risk type is added to the first in-memory WebShell detection result to obtain a second in-memory WebShell detection result.

通过上述方法，进一步确定具有内存WebShell风险的类对应的风险类型，完善内存WebShell检测结果中内存WebShell对应的风险信息，提高内存WebShell检测准确率。Through the above method, the risk type corresponding to the class with memory WebShell risk is further determined, the risk information corresponding to the memory WebShell in the memory WebShell detection result is improved, and the detection accuracy of the memory WebShell is improved.

进一步，确定所述第一内存WebShell检测结果中包含所述第一预设标识的类对应的风险类型，包括：Further, it is determined that the first memory WebShell detection result contains the risk type corresponding to the class of the first preset identifier, including:

将所述第一内存WebShell检测结果中包含所述第一预设标识的类对应预设类型的第二代码文件与任何一个所述第一预设规则进行匹配；Matching the second code file of the preset type corresponding to the class of the first preset identifier in the first memory WebShell detection result with any one of the first preset rules;

若匹配一致，将与所述第二代码文件匹配一致的第一预设规则对应的内存WebShell风险作为风险类型。If the matches are consistent, the memory WebShell risk corresponding to the first preset rule that matches the second code file is used as the risk type.

通过上述方法，根据不同的规则确定具有内存WebShell风险的类对应的风险类型为字符串风险和/或字符风险，细化内存WebShell检测结果，提高检测准确率。Through the above method, the risk type corresponding to the class with memory WebShell risk is determined as string risk and/or character risk according to different rules, the memory WebShell detection result is refined, and the detection accuracy is improved.

用第二预设标识对所述第三类进行标注，其中，所述第二预设标识对应着一种内存WebShell风险；Marking the third category with a second preset identifier, wherein the second preset identifier corresponds to a memory WebShell risk;

将标注后的第三类添加到所述第一内存WebShell检测结果中，得到第三内存WebShell检测结果；或Adding the marked third category to the first memory WebShell detection result to obtain the third memory WebShell detection result; or

将标注后的第三类添加到所述第二内存WebShell检测结果中，得到第三内存WebShell检测结果。The marked third category is added to the second memory WebShell detection result to obtain the third memory WebShell detection result.

通过上述方法，将与第二预设规则相匹配的类进行标注，并将标注后的类添加到内存WebShell检测结果中，使得内存WebShell检测结果更加全面，提高检测的准确率。Through the above method, the class matching the second preset rule is marked, and the marked class is added to the memory WebShell detection result, so that the memory WebShell detection result is more comprehensive and the detection accuracy is improved.

第二方面，本申请提供了一种内存WebShell检测装置，所述装置包括：In a second aspect, the present application provides a memory WebShell detection device, the device comprising:

筛选模块，用于获取目标应用对应虚拟机中的所有类，并在所述所有类中筛选出存在内存WebShell风险的M个第一类，其中，所述M为大于或等于1的整数；A screening module, configured to obtain all the classes in the virtual machine corresponding to the target application, and screen out M first classes with risks of memory WebShell from all the classes, wherein the M is an integer greater than or equal to 1;

第一确定模块，用于确定出所述M个第一类中的每个第一类在内存中的字节码分别对应的代码，得到M个代码文件；The first determination module is used to determine the code corresponding to the bytecode of each first type in the memory of the M first types, and obtain M code files;

检测模块，用于通过K个第一预设规则对所述M个代码文件进行内存WebShell风险检测，得到存在内存WebShell风险的N个代码文件，其中，一个所述第一预设规则对应着一种内存WebShell风险，K和N均为大于或等于1的整数；The detection module is used to perform memory WebShell risk detection on the M code files through K first preset rules, and obtain N code files with memory WebShell risks, wherein one of the first preset rules corresponds to a In-memory WebShell risk, K and N are both integers greater than or equal to 1;

第一标注模块，用于用第一预设标识对所述N个代码文件对应的类进行风险标注，并将包含所述第一预设标识的类作为第一内存WebShell检测结果。The first labeling module is configured to use a first preset identifier to perform risk annotation on the classes corresponding to the N code files, and use the class including the first preset identifier as the first memory WebShell detection result.

进一步，所述筛选模块具体用于：Further, the screening module is specifically used for:

进一步，所述第一确定模块具体用于：Further, the first determining module is specifically used for:

进一步，所述检测模块具体用于：Further, the detection module is specifically used for:

在一种可能的设计中，所述装置还包括：In a possible design, the device further includes:

扫描模块，用于对当前正在运行的所有应用进行扫描，得到所有应用分别对应的基本信息以及所有应用分别对应的进程信息，其中，所述基本信息至少包括应用入口完整类信息；a scanning module, configured to scan all currently running applications to obtain basic information corresponding to all applications and process information corresponding to all applications respectively, wherein the basic information at least includes application entry complete class information;

选定模块，用于根据所有应用对应的基本信息，在所述所有应用中选定出目标应用；A selection module, used for selecting a target application among all the applications according to the basic information corresponding to all the applications;

加载模块，用于将所述目标应用对应的基本信息以及所述目标应用对应的进程信息加载到所述目标应用对应的虚拟机。The loading module is configured to load the basic information corresponding to the target application and the process information corresponding to the target application to the virtual machine corresponding to the target application.

第二确定模块，用于确定所述第一内存WebShell检测结果中包含所述第一预设标识的类对应的风险类型；a second determination module, configured to determine the risk type corresponding to the class containing the first preset identifier in the first memory WebShell detection result;

第一添加模块，用于将所述风险类型对应的信息添加到所述第一内存WebShell检测结果中，得到第二内存WebShell检测结果。The first adding module is configured to add the information corresponding to the risk type to the first memory WebShell detection result to obtain the second memory WebShell detection result.

进一步，所述第二确定模块具体用于：Further, the second determining module is specifically used for:

若匹配一致，则确定与所述第二代码文件匹配一致的第一预设规则对应的内存WebShell风险作为风险类型。If the matches are consistent, the memory WebShell risk corresponding to the first preset rule that matches the second code file is determined as the risk type.

第二标注模块，用于用第二预设标识对所述第三类进行标注，其中，所述第二预设标识对应着一种内存WebShell风险；A second labeling module, configured to label the third category with a second preset identifier, wherein the second preset identifier corresponds to a memory WebShell risk;

第二添加模块，用于将标注后的第三类添加到所述第一内存WebShell检测结果中，得到第三内存WebShell检测结果；或将标注后的第三类添加到所述第二内存WebShell检测结果中，得到第三内存WebShell检测结果。The second adding module is used to add the marked third category to the first memory WebShell detection result to obtain the third internal memory WebShell detection result; or add the marked third category to the second internal memory WebShell In the detection result, a third memory WebShell detection result is obtained.

第三方面，本申请提供了一种电子设备，包括：In a third aspect, the present application provides an electronic device, comprising:

存储器，用于存放计算机程序；memory for storing computer programs;

处理器，用于执行所述存储器上所存放的计算机程序时，实现上述内存WebShell检测方法步骤。The processor is configured to implement the steps of the above-mentioned memory WebShell detection method when executing the computer program stored in the memory.

第四方面，本申请提供了一种计算机可读存储介质，所述计算机可读存储介质内存储有计算机程序，所述计算机程序被处理器执行时实现上述内存WebShell检测方法步骤。In a fourth aspect, the present application provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the above steps of the memory WebShell detection method are implemented.

基于上述内存WebShell检测方法，在对目标应用对应虚拟机中的所有类作初步筛选后，在不需要将目标应用进程信息对应的字节码文件与原始字节码文件进行对比的情况下，对筛选后的类在内存中的字节码对应的代码文件进一步作内存WebShell检测和标注，降低误报率。同时，由于代码文件是由从内存中提取的字节码转化而来，适用于对非文本类的内存WebShell进行检测，也能适应加密处理和混淆处理后的文本类的内存WebShell，因此，能够提高内存WebShell检测效率。Based on the above memory WebShell detection method, after preliminary screening of all classes in the virtual machine corresponding to the target application, without comparing the bytecode file corresponding to the target application process information with the original bytecode file, the The code file corresponding to the bytecode in the memory of the filtered class is further detected and marked by the memory WebShell to reduce the false positive rate. At the same time, since the code file is converted from the bytecode extracted from the memory, it is suitable for detecting non-text in-memory WebShells, and can also adapt to encrypted and obfuscated text-type in-memory WebShells. Therefore, it can Improve memory WebShell detection efficiency.

上述第二方面至第四方面中的各个方面以及各个方面可能达到的技术效果参照上述针对第一方面或者第一方面中的各种可能方案可以达到的技术效果说明，这里不再重复赘述。For each aspect of the above-mentioned second aspect to the fourth aspect and the possible technical effect achieved by each aspect, reference is made to the above description of the technical effect achieved by the first aspect or various possible solutions in the first aspect, which will not be repeated here.

附图说明Description of drawings

图1为本申请提供的一种内存WebShell检测方法的流程图；Fig. 1 is the flow chart of a kind of memory WebShell detection method that the application provides;

图2为本申请提供的一种内存WebShell检测方法示意图；2 is a schematic diagram of a memory WebShell detection method provided by the application;

图3为本申请提供的一种内存WebShell检测装置的结构示意图；3 is a schematic structural diagram of a memory WebShell detection device provided by the application;

图4为本申请提供的一种电子设备结构示意图。FIG. 4 is a schematic structural diagram of an electronic device provided by the present application.

具体实施方式Detailed ways

为了使本申请的目的、技术方案和优点更加清楚，下面将结合附图对本申请作进一步地详细描述。方法实施例中的具体操作方法也可以应用于装置实施例或系统实施例中。需要说明的是，在本申请的描述中“多个”理解为“至少两个”。“和/或”，描述关联对象的关联关系，表示可以存在三种关系，例如，A和/或B，可以表示：单独存在A，同时存在A和B，单独存在B这三种情况。A与B连接，可以表示：A与B直接连接和A与B通过C连接这两种情况。另外，在本申请的描述中，“第一”、“第二”等词汇，仅用于区分描述的目的，而不能理解为指示或暗示相对重要性，也不能理解为指示或暗示顺序。In order to make the objectives, technical solutions and advantages of the present application clearer, the present application will be further described in detail below with reference to the accompanying drawings. The specific operation methods in the method embodiments may also be applied to the apparatus embodiments or the system embodiments. It should be noted that, in the description of this application, "a plurality" is understood as "at least two". "And/or", which describes the association relationship of the associated objects, means that there can be three kinds of relationships, for example, A and/or B, which can mean that A exists alone, A and B exist at the same time, and B exists alone. A and B are connected, which can be expressed as two cases: A and B are directly connected and A and B are connected through C. In addition, in the description of this application, words such as "first" and "second" are only used for the purpose of distinguishing and describing, and cannot be understood as indicating or implying relative importance, nor can they be understood as indicating or implying order.

下面结合附图，对本申请实施例进行详细描述。The embodiments of the present application will be described in detail below with reference to the accompanying drawings.

目前对内存WebShell进行检测的方法，主要是基于WebShell文件实体的检测方法，比如特征检测法，该方法从已知WebShell样本中提取恶意特征进行模式匹配，从而检测出内存WebShell；又如统计分析法，该方法利用一些统计学方法对WebShell文件进行识别与检测，然后提取WebShell文件中的特征代码、信息熵、最长单词、重合指数、压缩等特征进行异常检测，从而发现内存WebShell；还有机器学习法，该方法利用决策树、深度学习等方式对样本进行训练得到检测模型，然后利用该检测模型对内存WebShell进行检测。The current detection methods for in-memory WebShell are mainly based on WebShell file entity detection methods, such as feature detection method, which extracts malicious features from known WebShell samples and performs pattern matching to detect in-memory WebShell; another example is statistical analysis method , this method uses some statistical methods to identify and detect WebShell files, and then extracts the feature code, information entropy, longest word, coincidence index, compression and other features in the WebShell file for abnormal detection, thereby discovering the memory WebShell; Learning method, which uses decision tree, deep learning and other methods to train samples to obtain a detection model, and then uses the detection model to detect the memory WebShell.

然而，上述基于WebShell文件实体的检测方法，难以检测出非文本类的WebShell，并且也不能检测出经过混淆、加密等手段处理过后的文本类WebShell，使得无论是特征检测法、还是统计分析法又或者机器学习法，都存在较大的误报以及漏报的可能性，导致内存WebShell检测的准确率低。However, the above detection methods based on WebShell file entities are difficult to detect non-text WebShells, and cannot detect textual WebShells that have been processed by obfuscation, encryption, etc. Or the machine learning method, there is a large possibility of false positives and false negatives, resulting in low accuracy of memory WebShell detection.

为了解决上述问题，本申请提供了一种内存取证方法，在对目标应用对应虚拟机中的所有类作初步筛选后，对筛选后的类在内存中的字节码对应的代码文件进一步作内存WebShell检测和标注，由于代码文件是由从内存中提取的字节码转化而来，适用于对非文本类的内存WebShell进行检测，也能适应加密处理和混淆处理后的文本类的内存WebShell，提高内存WebShell检测效率。In order to solve the above problems, the present application provides a memory forensics method. After preliminarily screening all classes in the virtual machine corresponding to the target application, the code files corresponding to the bytecodes in the memory of the filtered classes are further processed into memory. WebShell detection and annotation, because the code file is converted from bytecode extracted from memory, it is suitable for detecting non-text in-memory WebShell, and can also adapt to encrypted and obfuscated text-type in-memory WebShell, Improve memory WebShell detection efficiency.

其中，本申请实施例所述方法和装置基于同一技术构思，由于方法及装置所解决问题的原理相似，因此装置与方法的实施例可以相互参见，重复之处不再赘述。The methods and devices described in the embodiments of the present application are based on the same technical concept. Since the principles of the problems solved by the methods and devices are similar, the embodiments of the devices and methods can be referred to each other, and repeated descriptions are omitted.

如图1所示，为本申请提供的一种三维模型生成方法的流程图，具体包括如下步骤：As shown in Figure 1, the flow chart of a three-dimensional model generation method provided by the application specifically includes the following steps:

S11，获取目标应用对应虚拟机中的所有类，并在所有类中筛选出存在内存WebShell风险的M个第一类；S11, obtain all classes in the virtual machine corresponding to the target application, and screen out M first classes with risks of memory WebShell from all classes;

在本申请实施例中，在对目标应用进行内存WebShell检测之前，首先需要在当前正在运行的所有应用中选定出目标应用，并将目标应用相关信息加载到目标应用对应的虚拟机器中，具体来讲可以为：In the embodiment of the present application, before performing the memory WebShell detection on the target application, it is first necessary to select the target application from all currently running applications, and load the relevant information of the target application into the virtual machine corresponding to the target application. It can be:

扫描主机上正在运行的所有应用，得到所有应用中每个应用分别对应的基本信息以及每个应用分别对应的进程信息，其中，基本信息至少包括应用入口完整类信息，还可以包括名称、版本号、权限、签名摘要信息等，在这个过程中，若当前应用为JAVA应用，那么可以使用sun.jvmstat.monitor包编写扫描代码，根据扫描代码实现对当前应用的扫描，其中，当前代码的进程信息也可以使用软件开发工具包JDK提供的显示当前所有进程信息的命令JPS获取，也可以通过应用平台提供的进程信息指令来获取；Scan all applications running on the host, and obtain the basic information corresponding to each application and the process information corresponding to each application in all applications, wherein the basic information includes at least the complete class information of the application entry, and can also include the name and version number. , permissions, signature summary information, etc. During this process, if the current application is a JAVA application, you can use the sun.jvmstat.monitor package to write scan code, and scan the current application according to the scan code. Among them, the process information of the current code It can also be obtained by using the command JPS to display all current process information provided by the software development kit JDK, or by the process information command provided by the application platform;

接下来，根据每个应用对应的基本信息，基于用户需求在所有应用中选定出目标应用，然后将目标应用对应的基本信息以及目标应用对应的进程信息加载到目标应用对应的虚拟机中，举例来讲，若目标应用为JAVA应用，那么可以使用Native的方式和通过JavaInstrumentation接口的方式来编写Agent程序，然后通过Agent程序将目标应用对应的进程信息和目标应用对应的基本信息作为参数加载到目标应用对应的JAVA虚拟机(JavaVirtual Machine，JVM)中，具体加载时间可以是JVM启动之前，也可以是JVM运行的过程中。Next, according to the basic information corresponding to each application, a target application is selected from all applications based on user requirements, and then the basic information corresponding to the target application and the process information corresponding to the target application are loaded into the virtual machine corresponding to the target application, For example, if the target application is a JAVA application, the Agent program can be written in the Native method and through the JavaInstrumentation interface, and then the process information corresponding to the target application and the basic information corresponding to the target application can be loaded as parameters through the Agent program. In the JAVA virtual machine (Java Virtual Machine, JVM) corresponding to the target application, the specific loading time may be before the JVM is started, or may be during the running of the JVM.

在将目标应用对应的基本信息和目标应用对应的进程信息加载到目标应用对应的虚拟机中以后，获取虚拟机中的所有类，并在所有类中筛选出存在内存WebShell风险的M个第一类，其中，M为大于或等于1的整数，具体筛选方法可以是：After the basic information corresponding to the target application and the process information corresponding to the target application are loaded into the virtual machine corresponding to the target application, all classes in the virtual machine are obtained, and the M first ones with memory WebShell risks are screened out from all classes. class, where M is an integer greater than or equal to 1, and the specific screening method can be:

检测所有类中的每个类在本地源字节码文件路径信息，若任何一个类不存在本地源字节码文件路径信息，表明该不存在路径信息的类在本地不存在源字节码文件，然后在所有类中筛选出在本地不存在源字节码文件的第二类，并将第二类作为存在WebShell风险的第一类；Detect the local source bytecode file path information for each class in all classes. If any class does not have local source bytecode file path information, it indicates that the class without path information does not have a local source bytecode file. , and then filter out the second class that does not have source bytecode files locally in all classes, and use the second class as the first class with WebShell risk;

除了上述方法外，在所有类中筛选出存在内存WebShell风险的M个第一类的具体方法还可以是：In addition to the above methods, the specific method to filter out the M first types of memory WebShell risks in all classes can also be:

确定H个第二预设规则，其中，H为大于或等于1的整数，第二预设规则根据预设类型的内存WebShell特征进行设定，一个第二预设规则对应着一种内存WebShell风险，比如，若预设类型的WebShell特征为不存在实体文件，那么该WebShell特征对应的第一预设规则用于检测出不存在实体文件的内存WebShell风险。在确定出第二预设规则以后，将所有类中的每一个类分别与H个第二预设规则中的每一个第二预设规则进行匹配，并将与任何一个第二预设规则匹配一致的第三类，作为存在WebShell风险的第一类。Determine H second preset rules, where H is an integer greater than or equal to 1, the second preset rules are set according to the memory WebShell characteristics of the preset type, and one second preset rule corresponds to a memory WebShell risk For example, if the WebShell feature of the preset type is that no entity file exists, then the first preset rule corresponding to the WebShell feature is used to detect the memory WebShell risk that no entity file exists. After the second preset rule is determined, each of all the classes is matched with each of the H second preset rules respectively, and will be matched with any one of the second preset rules Consistent third category, as the first category with WebShell risk.

上述两种筛选方式可以同时使用，也可以分别使用。The above two screening methods can be used simultaneously or separately.

通过上述方式，对目标应用对应虚拟机中的所有类进行初步筛选，缩小后续代码风险扫描的排查范围，同时，获取所有不存在本地源字节码实体文件的类信息，对不存在实体文件的内存WebShell有很好的针对效果，提高内存WebShell的检测精度及检测效率。Through the above method, all classes in the virtual machine corresponding to the target application are preliminarily screened to narrow the scope of subsequent code risk scanning. The memory WebShell has a good targeting effect, improving the detection accuracy and detection efficiency of the memory WebShell.

S12，确定出M个第一类中的每个第一类在内存中的字节码分别对应的代码，得到M个代码文件；S12, determine the codes corresponding to the bytecodes in the memory of each first class in the M first classes, and obtain M code files;

在申请实施例中，在所有类中初步筛选出存在内存WebShell风险的M个第一类以后，需针对M个第一类作进一步处理，具体处理方法可以是：In the application embodiment, after preliminarily screening out M first categories that have memory WebShell risks from all categories, further processing is required for the M first categories, and the specific processing method may be:

使用字节码增强技术提取M个第一类分别对应内存中的字节码，然后将字节码转换为M个预设类型的代码文件，其中，预设类型可以是JAVA，也可以是其他类型的代码文件，具体根据用户需求来确性。Use bytecode enhancement technology to extract M bytecodes corresponding to the first type in memory, and then convert the bytecodes into M code files of preset types, where the preset type can be JAVA or other types. Types of code files, which are specific to user needs.

通过上述处理方法，将内存中的字节码转换成对应的代码，使得后续进一步进行代码风险排查能够适用于非文本类的内存WebShell，同时也能适用于加密处理和混淆处理后的WebShell。Through the above processing method, the bytecode in the memory is converted into the corresponding code, so that the subsequent further code risk investigation can be applied to the non-text memory WebShell, and can also be applied to the encrypted and obfuscated WebShell.

S13，通过K个第一预设规则对M个代码文件进行内存WebShell风险检测，得到存在内存WebShell风险的N个代码文件；S13, perform memory WebShell risk detection on M code files through K first preset rules, and obtain N code files with memory WebShell risks;

在本申请实施例中，在将M个第一类分别转化成对应的M个代码文件之后，进一步，使用K个第一预设规则对M个代码文件进行内存WebShell风险检测，K为大于或等于1的整数，其中，一个第一预设规则对应着一种内存WebShell风险，并且，第一预设规则对应的内存WebShell风险与第二预设规则对应的WebShell风险类型可以不相同，具体来讲，第一预设规则对应的检测方法可以是：In the embodiment of the present application, after the M first types are respectively converted into corresponding M code files, further, K first preset rules are used to perform memory WebShell risk detection on the M code files, where K is greater than or An integer equal to 1, where a first preset rule corresponds to a memory WebShell risk, and the memory WebShell risk corresponding to the first preset rule and the WebShell risk type corresponding to the second preset rule may be different. In other words, the detection method corresponding to the first preset rule may be:

将M个代码文件中每个代码文件对应的文本分别与预设风险等级字符串进行匹配，其中，预设风险等级的字符串为提前设置好的，可以分为高风险字符串、中风险字符串以及低风险字符串。当然也可以将预设风险等级的字符串分为1级风险字符串、2级风险字符串或者3级风险字符串等，具体的风险等级分类方法此处不做限定；Match the text corresponding to each code file in the M code files with the preset risk level character string respectively, wherein the preset risk level character string is set in advance and can be divided into high risk character string and medium risk character string strings and low-risk strings. Of course, the strings with preset risk levels can also be divided into 1-level risk strings, 2-level risk strings, or 3-level risk strings, etc. The specific risk level classification method is not limited here;

若任一代码文件对应的文本中存在与预设风险等级字符串一致的第一字符串，则确定第一字符串所属的代码文本存在内存WebShell风险。比如，当代码文件对应的文本中存在与高风险字符串一致的字符串，则可认定代码文件存在内存WebShell风险，且风险等级为高风险。If there is a first character string consistent with the preset risk level character string in the text corresponding to any code file, it is determined that the code text to which the first character string belongs has a memory WebShell risk. For example, when there is a string consistent with a high-risk string in the text corresponding to the code file, it can be determined that the code file has a memory WebShell risk, and the risk level is high risk.

此外，第一预设规则对应的检测方法还可以是：In addition, the detection method corresponding to the first preset rule may also be:

检测M个代码文件中每个代码文件对应文本中预设风险字符的个数值，比如，若“a”“b”为预设风险字符，当存在代码文件对应的文本中包含“a”或“b”时，那么“a”的个数和“b”的个数值之和即为预设风险字符的个数值；Detect the number of preset risk characters in the text corresponding to each code file in the M code files. For example, if "a" and "b" are preset risk characters, when the text corresponding to the existing code file contains "a" or "" b", then the sum of the number of "a" and the number of "b" is the number of preset risk characters;

判断个数值是否大于预设阈值；Determine whether the value is greater than the preset threshold;

若存在风险字符个数值大于预设阈值的第一代码文本，则确定第一代码文本存在内存WebShell风险。If there is a first code text whose number of risk characters is greater than the preset threshold, it is determined that the first code text has a memory WebShell risk.

当然，还可以使用正则匹配规则，对上述M个代码文件进行内存WebShell风险检测，具体检测方式此处不详细阐述。Of course, regular matching rules can also be used to perform memory WebShell risk detection on the above M code files, and the specific detection method is not described in detail here.

通过上述方法，对M个代码文件进行内存WebShell风险检测，可以筛选出存在内存WebShell风险的N个代码文件，其中，N为大于或等于1的整数。Through the above method, by performing memory WebShell risk detection on M code files, N code files with memory WebShell risks can be screened, where N is an integer greater than or equal to 1.

S14，用第一预设标识对N个代码文件对应的类进行风险标注，并将包含第一预设标识的类作为第一内存WebShell检测结果。S14: Use the first preset identifier to perform risk annotation on the classes corresponding to the N code files, and use the class including the first preset identifier as the first memory WebShell detection result.

在本申请实施例中，在确定出存在内存WebShell风险的N个代码文件以后，进一步，使用第一预设标识对N个代码文件对应的类进行风险标注，并将包含第一预设标识的类作为第一内存WebShell检测结果。In the embodiment of the present application, after determining N code files with risks of memory WebShell, further, use the first preset identifier to carry out risk labeling on the classes corresponding to the N code files, and label the classes corresponding to the N code files with the first preset identifier. Class as the first in-memory WebShell detection result.

在一种可能的设计中，为了使得第一内存WebShell检测结果包含的信息更加全面，还可以将上述N个代码文件对应的类的风险类型添加到第一内存WebShell检测结果中，具体来讲：In a possible design, in order to make the information contained in the first memory WebShell detection result more comprehensive, the risk types of the classes corresponding to the above N code files may also be added to the first memory WebShell detection result, specifically:

首先，确定第一内存WebShell检测结果中包含第一预设标识的类对应的风险类型，具体确定风险类型的方法为：First, determine the risk type corresponding to the class containing the first preset identifier in the first memory WebShell detection result, and the specific method for determining the risk type is:

将第一内存WebShell检测结果中包含第一预设标识的类对应预设类型的第二代码文件与任何一个第一预设规则进行匹配；若匹配一致，将与第二代码文件匹配一致的第一预设规则对应的内存WebShell风险作为风险类型。Match the second code file of the class corresponding to the preset type of the first preset identifier in the first memory WebShell detection result with any first preset rule; if the matching is consistent, the second code file that is consistent with the second code file will be matched The memory WebShell risk corresponding to a preset rule is used as the risk type.

举例来讲，将第一内存WebShell检测结果中包含第一预设标识的类对应预设类型的第二代码文件与预设风险等级字符串进行匹配；若第二代码文件中存在与预设风险等级字符串一致的字符串，则确定风险类型为字符串风险。For example, match the second code file of the preset type corresponding to the class containing the first preset identifier in the first memory WebShell detection result with the preset risk level string; If the level string is consistent, the risk type is determined to be a string risk.

又例如，检测第二代码文件中预设风险字符的个数值，并判断个数值是否大于预设阈值；若是，则确定风险类型为字符风险。For another example, the number of preset risk characters in the second code file is detected, and it is determined whether the number is greater than the preset threshold; if yes, the risk type is determined to be character risk.

进一步，将风险类型对应的信息添加到第一内存WebShell检测结果中，得到第二内存WebShell检测结果。Further, the information corresponding to the risk type is added to the first memory WebShell detection result to obtain the second memory WebShell detection result.

在一种可能的设计中，对图1所示检测方法中的与任何一个第二预设规则匹配一致的第三类进行标注，并将标注后的第三类添加到上述第一内存WebShell检测或者第二内存WebShell检测中，提高内存WebShell检测准确率，具体而言：In a possible design, the third category that matches any of the second preset rules in the detection method shown in FIG. 1 is marked, and the marked third category is added to the above-mentioned first memory WebShell detection Or in the second memory WebShell detection, improve the memory WebShell detection accuracy, specifically:

用第二预设标识对第三类进行标注，其中，所述第二预设标识对应着一种内存WebShell风险；Marking the third category with a second preset identifier, wherein the second preset identifier corresponds to a memory WebShell risk;

将标注后的第三类添加到所述第二内存WebShell检测结果中，得到第三内存WebShell检测结果；The marked third category is added to the second memory WebShell detection result to obtain the third memory WebShell detection result;

将第三内存WebShell检测结果作为最终内存WebShell检测结果。The third memory WebShell detection result is used as the final memory WebShell detection result.

基于本申请提供的内存WebShell检测方法，在对目标应用对应虚拟机中的所有类作初步筛选后，在不需要将目标应用进程信息对应的字节码文件与原始字节码文件进行对比的情况下，对筛选后的类在内存中的字节码对应的代码文件进一步作内存WebShell检测和标注，降低误报率，同时，由于代码文件是由从内存中提取的字节码转化而来，适用于对非文本类的内存WebShell进行检测，也能适应加密处理和混淆处理后的文本类的内存WebShell，因此，能够提高内存WebShell检测准确率。Based on the memory WebShell detection method provided by this application, after preliminary screening of all classes in the virtual machine corresponding to the target application, there is no need to compare the bytecode file corresponding to the target application process information with the original bytecode file Next, the code file corresponding to the bytecode of the filtered class in the memory is further detected and marked with the memory WebShell to reduce the false positive rate. At the same time, since the code file is converted from the bytecode extracted from the memory, It is suitable for detecting non-text in-memory WebShells, and can also adapt to encrypted and obfuscated text-type in-memory WebShells. Therefore, it can improve the detection accuracy of in-memory WebShells.

进一步，为了更加详细阐述本申请实施例中提供的内存WebShell检测方法，下面通过具体的应用场景对本申请所提供的方法进行详细说明。Further, in order to describe the memory WebShell detection method provided in the embodiments of the present application in more detail, the method provided by the present application is described in detail below through specific application scenarios.

具体来讲，参考图2，在图2中，对目标应用进行内存WebShell检测之前，首先对主机上正在运行的JAVA应用进行扫描，并列出所有JAVA应用基本信息与进程信息，扫描主机上所有JAVA应用的方式包括但不限于：直接使用JDK提供的JPS获取所有JAVA进程信息；使用sun.jvmstat.monitor包编写扫描代码；在平台上获取进程信息的指令。Specifically, referring to Figure 2, in Figure 2, before performing memory WebShell detection on the target application, first scan the JAVA applications running on the host, and list the basic information and process information of all JAVA applications, and scan all the JAVA applications on the host. The methods of JAVA application include but are not limited to: directly use JPS provided by JDK to obtain all JAVA process information; use sun.jvmstat.monitor package to write scanning code; obtain process information instructions on the platform.

接下来，根据所有JAVA应用中每一个JAVA应用对应的基本信息，在所有JAVA应用中选定出目标JAVA应用，然后使用JAVA Agent技术将目标JAVA应用对应的基本信息以及目标JAVA应用对应的进程信息加载到目标JAVA应用对应的目标JVM中。Next, according to the basic information corresponding to each JAVA application in all JAVA applications, select the target JAVA application among all JAVA applications, and then use the JAVA Agent technology to convert the basic information corresponding to the target JAVA application and the process information corresponding to the target JAVA application. Loaded into the target JVM corresponding to the target JAVA application.

进一步，获取目标JVM中当前加载的所有类，其中，获取目标JVM中加载的所有类方式包括但不限于利用Instrumentation接口。Further, obtain all the classes currently loaded in the target JVM, wherein, the method of obtaining all the classes loaded in the target JVM includes but is not limited to using the Instrumentation interface.

接下来，在所有类中初步提取出存在WebShell风险的类信息与字节码，在本示例中，采用了两种方式，其中一种是在所有类中提取本地不存在字节码文件的类信息与字节码，即：在所有类信息中搜索源字节码文件的路径信息，如果任一类不存在路径信息，则表明该类在本地不存在对应的字节码文件，此时，提取该类信息与其在内存中字节码，其中，内存中字节码的提取采用的是字节码增强技术；另一种方法是使用第一自定义规则来提取，其中，第一自定义规则包括但不限于特征匹配规则。Next, preliminarily extract class information and bytecodes that have WebShell risks from all classes. In this example, two methods are used, one of which is to extract classes that do not have local bytecode files in all classes. Information and bytecode, that is: search for the path information of the source bytecode file in all class information. If there is no path information for any class, it means that the corresponding bytecode file does not exist locally for this class. At this time, Extracting this type of information and its in-memory bytecode, wherein the in-memory bytecode is extracted using bytecode enhancement technology; another method is to extract using the first custom rule, wherein the first custom Rules include, but are not limited to, feature matching rules.

进一步，对上述过程中提取出的存在WebShell风险的所有类进行第一次风险判定与标注，在提取类的过程中，每个被提取的类都对应着一个规则，而每一个规则都对着一种JAVA应用内存WebShell风险。Further, perform the first risk determination and labeling on all the classes with WebShell risks extracted in the above process. In the process of extracting classes, each extracted class corresponds to a rule, and each rule is directed to A Java application memory WebShell risk.

进一步，将上述提取的所有类对应内存中的字节码转换为JAVA代码，字节码转换为Java代码的过程可利用已开源的反编译技术或利用已实现的反编译工具来实现。Further, the bytecodes in the memory corresponding to all the extracted classes are converted into JAVA codes, and the process of converting the bytecodes into Java codes can be realized by using open-source decompilation technology or implemented decompilation tools.

进一步，使用第二自定义规则对JAVA代码进行检测，其中，第二自定义规则可以是采用常规字符串匹配、频次匹配或者正则匹配等方式对JAVA代码进行检测，并根据检测结果对类进行第二次风险判定与标注，其中，进行风险包括但不限于存在指定高风险字符串，或者风险字符串出现的频次超过指定阈值或者使用正则匹配法匹配到了风险字符串等。由此，可对检测结果进行风险标注。Further, use the second custom rule to detect the JAVA code, wherein the second custom rule may be to detect the JAVA code by means of conventional string matching, frequency matching, or regular matching, etc., and perform the first step on the class according to the detection result. Secondary risk determination and labeling, where the risk includes but is not limited to the existence of a specified high-risk string, or the occurrence frequency of the risk string exceeds a specified threshold, or the regular matching method is used to match the risk string, etc. In this way, the detection result can be risk-marked.

最后，将上述过程中所有存在内存WebShell风险的类、第一次风险判定与标注信息及第二次风险判定与标注信息作为目标JAVA应用的内存WebShell的检测结果。Finally, all classes with memory WebShell risks in the above process, the first risk determination and annotation information, and the second risk determination and annotation information are used as the detection results of the in-memory WebShell of the target JAVA application.

通过上述内存WebShell检测方法，在对目标JVM中的所有类作初步筛选后，对初步筛选出的类作第一次内存WebShell检测风险判定与标注，缩小后续代码风险扫描的排查范围，然后对初步筛选后的类在内存中的字节码对应的代码文件作第二次内存WebShell检测和标注，在两次风险判定与标注的过程中，都不需要将目标应用进程信息对应的字节码文件与原始字节码文件进行对比，因而可以降低误报率。同时，由于代码文件是由从内存中提取的字节码转化而来，适用于对非文本类的内存WebShell进行检测，也能适应加密处理和混淆处理后的文本类的内存WebShell，因此能够提高内存WebShell检测准确率。最后，将两次风险判定与标注的结果作为目标JAVA应用的内存WebShell检测结果，能够进一步提高内存WebShell检测准确率。Through the above-mentioned memory WebShell detection method, after preliminary screening of all classes in the target JVM, the first memory WebShell detection risk determination and labeling are performed on the preliminarily screened classes, so as to narrow the investigation scope of subsequent code risk scanning, and The code file corresponding to the bytecode in the memory of the filtered class is used for the second memory WebShell detection and annotation. In the process of two risk determination and annotation, the bytecode file corresponding to the target application process information is not required. Compared with the original bytecode file, the false positive rate can be reduced. At the same time, since the code file is converted from the bytecode extracted from the memory, it is suitable for detecting non-text in-memory WebShells, and can also adapt to encrypted and obfuscated text-type in-memory WebShells. In-memory WebShell detection accuracy. Finally, the results of the two risk determinations and annotations are used as the in-memory WebShell detection results of the target JAVA application, which can further improve the in-memory WebShell detection accuracy.

基于同一发明构思，本申请实施例中还提供了一种内存WebShell检测装置，如图3所示，为本申请中一种内存WebShell检测装置的结构示意图，该装置包括：Based on the same inventive concept, an embodiment of the present application also provides a memory WebShell detection device, as shown in FIG. 3 , which is a schematic structural diagram of a memory WebShell detection device in the present application, and the device includes:

筛选模块31，用于获取目标应用对应虚拟机中的所有类，并在所述所有类中筛选出存在内存WebShell风险的M个第一类，其中，所述M为大于或等于1的整数；The screening module 31 is used to obtain all the classes in the virtual machine corresponding to the target application, and screen out M first classes with risks of memory WebShell from all the classes, wherein the M is an integer greater than or equal to 1;

第一确定模块32，用于确定出所述M个第一类中的每个第一类在内存中的字节码分别对应的代码，得到M个代码文件；The first determination module 32 is used to determine the code corresponding to the bytecode of each first type in the memory of the M first types, and obtain M code files;

检测模块33，用于通过K个第一预设规则对所述M个代码文件进行内存WebShell风险检测，得到存在内存WebShell风险的N个代码文件，其中，一个所述第一预设规则对应着一种内存WebShell风险，K和N均为大于或等于1的整数；The detection module 33 is configured to perform memory WebShell risk detection on the M code files through K first preset rules, and obtain N code files with memory WebShell risks, wherein one of the first preset rules corresponds to An in-memory WebShell risk, where K and N are both integers greater than or equal to 1;

第一标注模块34，用于用第一预设标识对所述N个代码文件对应的类进行风险标注，并将包含所述第一预设标识的类作为第一内存WebShell检测结果。The first labeling module 34 is configured to use the first preset identifier to perform risk annotation on the classes corresponding to the N code files, and use the class including the first preset identifier as the first memory WebShell detection result.

进一步，所述筛选模块31具体用于：Further, the screening module 31 is specifically used for:

进一步，所述第一确定模块32具体用于：Further, the first determining module 32 is specifically used for:

进一步，所述检测模块33具体用于：Further, the detection module 33 is specifically used for:

检测所述M个代码文件中每个代码文件对应文本中的预设风险字符个数值；Detecting the preset risk character value in the corresponding text of each code file in the M code files;

基于上述内存WebShell检测装置，在对目标应用对应虚拟机中的所有类作初步筛选后，对筛选后的类在内存中的字节码对应的代码文件进一步作内存WebShell检测和标注，由于代码文件是由从内存中提取的字节码转化而来，适用于对非文本类的内存WebShell进行检测，也能适应加密处理和混淆处理后的文本类的内存WebShell，提高内存WebShell检测准确率。Based on the above-mentioned memory WebShell detection device, after preliminarily screening all classes in the virtual machine corresponding to the target application, the code files corresponding to the bytecodes of the filtered classes in the memory are further detected and marked with the memory WebShell. It is converted from the bytecode extracted from the memory. It is suitable for detecting non-text in-memory WebShells, and can also adapt to encrypted and obfuscated text-type in-memory WebShells, improving the detection accuracy of in-memory WebShells.

基于同一发明构思，本申请实施例中还提供了一种电子设备，所述电子设备可以实现前述一种内存WebShell检测装置的功能，参考图4，所述电子设备包括：Based on the same inventive concept, an embodiment of the present application also provides an electronic device, which can implement the functions of the aforementioned memory WebShell detection device. Referring to FIG. 4 , the electronic device includes:

至少一个处理器41，以及与至少一个处理器41连接的存储器42，本申请实施例中不限定处理器41与存储器42之间的具体连接介质，图4中是以处理器41和存储器42之间通过总线40连接为例。总线40在图4中以粗线表示，其它部件之间的连接方式，仅是进行示意性说明，并不引以为限。总线40可以分为地址总线、数据总线、控制总线等，为便于表示，图4中仅用一条粗线表示，但并不表示仅有一根总线或一种类型的总线。或者，处理器41也可以称为控制器，对于名称不做限制。At least one processor 41 and the memory 42 connected to the at least one processor 41, the specific connection medium between the processor 41 and the memory 42 is not limited in the embodiments of the present application, and in FIG. Take the connection through the bus 40 as an example. The bus 40 is represented by a thick line in FIG. 4 , and the connection manners between other components are only for schematic illustration and are not intended to be limiting. The bus 40 can be divided into an address bus, a data bus, a control bus, etc. For convenience of illustration, only one thick line is used in FIG. 4 , but it does not mean that there is only one bus or one type of bus. Alternatively, the processor 41 may also be called a controller, and the name is not limited.

在本申请实施例中，存储器42存储有可被至少一个处理器41执行的指令，至少一个处理器41通过执行存储器42存储的指令，可以执行前文论述内存WebShell检测方法。处理器41可以实现图3所示的装置中各个模块的功能。In this embodiment of the present application, the memory 42 stores instructions that can be executed by at least one processor 41 , and the at least one processor 41 can execute the memory WebShell detection method discussed above by executing the instructions stored in the memory 42 . The processor 41 can implement the functions of each module in the apparatus shown in FIG. 3 .

其中，处理器41是该装置的控制中心，可以利用各种接口和线路连接整个该控制设备的各个部分，通过运行或执行存储在存储器42内的指令以及调用存储在存储器42内的数据，该装置的各种功能和处理数据，从而对该装置进行整体监控。Among them, the processor 41 is the control center of the device, and can use various interfaces and lines to connect various parts of the entire control device, by running or executing the instructions stored in the memory 42 and calling the data stored in the memory 42, the Various functions and processing data of the device to monitor the device as a whole.

在一种可能的设计中，处理器41可包括一个或多个处理单元，处理器41可集成应用处理器和调制解调处理器，其中，应用处理器主要处理操作系统、用户界面和应用程序等，调制解调处理器主要处理无线通信。可以理解的是，上述调制解调处理器也可以不集成到处理器41中。在一些实施例中，处理器41和存储器42可以在同一芯片上实现，在一些实施例中，它们也可以在独立的芯片上分别实现。In a possible design, the processor 41 may include one or more processing units, and the processor 41 may integrate an application processor and a modem processor, wherein the application processor mainly handles the operating system, user interface and application programs etc., the modem processor mainly deals with wireless communication. It can be understood that, the above-mentioned modulation and demodulation processor may not be integrated into the processor 41 . In some embodiments, the processor 41 and the memory 42 may be implemented on the same chip, and in some embodiments, they may be implemented separately on separate chips.

处理器41可以是通用处理器，例如中央处理器(CPU)、数字信号处理器、专用集成电路、现场可编程门阵列或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件，可以实现或者执行本申请实施例中公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者任何常规的处理器等。结合本申请实施例所公开的内存WebShell检测方法的步骤可以直接体现为硬件处理器执行完成，或者用处理器中的硬件及软件模块组合执行完成。Processor 41 may be a general-purpose processor, such as a central processing unit (CPU), digital signal processor, application specific integrated circuit, field programmable gate array or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, may The methods, steps, and logic block diagrams disclosed in the embodiments of the present application are realized or executed. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the memory WebShell detection method disclosed in conjunction with the embodiments of the present application may be directly embodied as being executed by a hardware processor, or executed by a combination of hardware and software modules in the processor.

存储器42作为一种非易失性计算机可读存储介质，可用于存储非易失性软件程序、非易失性计算机可执行程序以及模块。存储器42可以包括至少一种类型的存储介质，例如可以包括闪存、硬盘、多媒体卡、卡型存储器、随机访问存储器(Random Access Memory，RAM)、静态随机访问存储器(Static Random Access Memory，SRAM)、可编程只读存储器(Programmable Read Only Memory，PROM)、只读存储器(Read Only Memory，ROM)、带电可擦除可编程只读存储器(Electrically Erasable Programmable Read-Only Memory，EEPROM)、磁性存储器、磁盘、光盘等。存储器42是能够用于携带或存储具有指令或数据结构形式的期望的程序代码并能够由计算机存取的任何其他介质，但不限于此。本申请实施例中的存储器42还可以是电路或者其它任意能够实现存储功能的装置，用于存储程序指令和/或数据。The memory 42, as a non-volatile computer-readable storage medium, can be used to store non-volatile software programs, non-volatile computer-executable programs and modules. The memory 42 may include at least one type of storage medium, such as flash memory, hard disk, multimedia card, card-type memory, random access memory (Random Access Memory, RAM), static random access memory (Static Random Access Memory, SRAM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Magnetic Memory, Disk , CD, etc. Memory 42 is, but is not limited to, any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. The memory 42 in this embodiment of the present application may also be a circuit or any other device capable of implementing a storage function, for storing program instructions and/or data.

通过对处理器31进行设计编程，可以将前述实施例中介绍的内存WebShell检测方法所对应的代码固化到芯片内，从而使芯片在运行时能够执行图1所示的实施例的内存WebShell检测方法的步骤。如何对处理器41进行设计编程为本领域技术人员所公知的技术，这里不再赘述。By designing and programming the processor 31, the code corresponding to the memory WebShell detection method introduced in the foregoing embodiment can be solidified into the chip, so that the chip can execute the memory WebShell detection method of the embodiment shown in FIG. 1 when running. A step of. How to design and program the processor 41 is known to those skilled in the art, and details are not described here.

基于同一发明构思，本申请实施例还提供一种存储介质，该存储介质存储有计算机指令，当该计算机指令在计算机上运行时，使得计算机执行前文论述内存WebShell检测方法。Based on the same inventive concept, an embodiment of the present application also provides a storage medium, where computer instructions are stored in the storage medium, and when the computer instructions are executed on a computer, the computer executes the memory WebShell detection method discussed above.

在一些可能的实施方式中，本申请提供的内存WebShell检测方法的各个方面还可以实现为一种程序产品的形式，其包括程序代码，当程序产品在装置上运行时，程序代码用于使该控制设备执行本说明书上述描述的根据本申请各种示例性实施方式的内存WebShell检测方法中的步骤。In some possible implementations, various aspects of the memory WebShell detection method provided by the present application can also be implemented in the form of a program product, which includes program code, and when the program product runs on the device, the program code is used to make the The control device executes the steps in the memory WebShell detection method according to various exemplary embodiments of the present application described above in this specification.

本领域内的技术人员应明白，本申请的实施例可提供为方法、系统、或计算机程序产品。因此，本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且，本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。As will be appreciated by those skilled in the art, the embodiments of the present application may be provided as a method, a system, or a computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器，使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the present application. It will be understood that each flow and/or block in the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to the processor of a general purpose computer, special purpose computer, embedded processor or other programmable data processing device to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing device produce Means for implementing the functions specified in a flow or flow of a flowchart and/or a block or blocks of a block diagram.

这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中，使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品，该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions The apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.

这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上，使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理，从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded on a computer or other programmable data processing device to cause a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process such that The instructions provide steps for implementing the functions specified in the flow or blocks of the flowcharts and/or the block or blocks of the block diagrams.

显然，本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的精神和范围。这样，倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内，则本申请也意图包含这些改动和变型在内。Obviously, those skilled in the art can make various changes and modifications to the present application without departing from the spirit and scope of the present application. Thus, if these modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is also intended to include these modifications and variations.

Claims

1. A memory WebShell detection method is characterized by comprising the following steps:

acquiring all classes in a virtual machine corresponding to a target application, and screening M first classes with memory WebShell risk in all the classes, wherein M is an integer greater than or equal to 1;

determining codes corresponding to the byte codes of each first type in the M first types in the memory respectively to obtain M code files;

performing memory WebShell risk detection on the M code files through K first preset rules to obtain N code files with memory WebShell risks, wherein one first preset rule corresponds to one memory WebShell risk, and K and N are integers greater than or equal to 1;

and carrying out risk marking on the classes corresponding to the N code files by using a first preset identifier, and taking the class containing the first preset identifier as a first memory WebShell detection result.

2. The method of claim 1, wherein said screening M first classes among said all classes for risk of WebShell comprises:

screening out a second class which does not have a source byte code file locally in all the classes, and taking the second class as the first class which has the WebShell risk; and/or

And screening out a third class matched with any one of H second preset rules from all the classes, and taking the third class as the first class with the WebShell risk, wherein one second preset rule corresponds to the memory WebShell risk, and H is an integer greater than or equal to 1.

3. The method of claim 1, wherein said determining the code corresponding to each of said first classes, respectively, to obtain M code files, comprises:

extracting byte codes in M first classes respectively corresponding to the memories;

and converting the byte codes into M code files of preset types.

4. The method as claimed in claim 1, wherein the performing memory WebShell risk detection on the M code files through K first preset rules to obtain N code files with memory WebShell risk comprises:

respectively matching the text corresponding to each code file in the M code files with a preset risk level character string;

if a first character string consistent with the preset risk level character string exists in a text corresponding to any code file, determining that the code text to which the first character string belongs has a memory WebShell risk; and/or

Detecting the number of preset risk characters in the text corresponding to each code file in the M code files;

judging whether the numerical values are larger than a preset threshold value or not;

and if the first code text with the risk character number larger than the preset threshold exists, determining that the memory WebShell risk exists in the first code text.

5. The method of claim 1, prior to obtaining all classes in the virtual machine corresponding to the target application, further comprising:

scanning all currently running applications to obtain basic information corresponding to each application in all the applications and process information corresponding to each application, wherein the basic information at least comprises complete class information of an application entrance;

selecting a target application from all the applications according to the basic information corresponding to each application in all the applications;

and loading the basic information corresponding to the target application and the process information corresponding to the target application to a virtual machine corresponding to the target application.

6. The method as claimed in claim 1, wherein after performing risk labeling on the classes corresponding to the N code files by using a first preset identifier and using the class containing the first preset identifier as a first memory WebShell detection result, the method further comprises:

determining a risk type corresponding to the class containing the first preset identifier in the first memory WebShell detection result;

and adding the information corresponding to the risk type into the first memory WebShell detection result to obtain a second memory WebShell detection result.

7. The method as claimed in claim 6, wherein determining the risk type corresponding to the class including the first preset identifier in the first memory WebShell detection result includes:

matching a second code file of a preset type corresponding to the class containing the first preset identification in the first memory WebShell detection result with any one first preset rule;

and if the matching is consistent, determining the memory WebShell risk corresponding to the first preset rule which is consistent with the matching of the second code file as a risk type.

8. The method as claimed in claim 2 or claim 6, wherein after performing risk labeling on the classes corresponding to the N code files by using a first preset identifier and taking the class containing the first preset identifier as a first memory WebShell detection result, the method further comprises:

marking the third category by using a second preset identification, wherein the second preset identification corresponds to a memory WebShell risk;

adding the labeled third class into the first memory WebShell detection result to obtain a third memory WebShell detection result; or

And adding the labeled third class into the second memory WebShell detection result to obtain a third memory WebShell detection result.

9. A memory WebShell detection device, the device comprising:

the system comprises a screening module, a judging module and a judging module, wherein the screening module is used for acquiring all classes in a virtual machine corresponding to a target application and screening M first classes with memory WebShell risk in all the classes, and M is an integer greater than or equal to 1;

a first determining module, configured to determine codes corresponding to the bytecode of each of the M first classes in the memory, to obtain M code files;

the detection module is used for carrying out memory WebShell risk detection on the M code files through K first preset rules to obtain N code files with memory WebShell risks, wherein one first preset rule corresponds to one memory WebShell risk, and K and N are integers greater than or equal to 1;

and the first marking module is used for carrying out risk marking on the classes corresponding to the N code files by using a first preset identifier, and taking the class containing the first preset identifier as a first memory WebShell detection result.

10. The apparatus of claim 9, wherein the screening module is specifically configured to:

11. The apparatus of claim 9, wherein the first determining module is specifically configured to:

and converting the byte codes into M code files of preset types.

12. The apparatus of claim 9, wherein the detection module is specifically configured to:

13. The apparatus of claim 9, wherein the apparatus further comprises:

the system comprises a scanning module, a processing module and a processing module, wherein the scanning module is used for scanning all currently running applications to obtain basic information respectively corresponding to all the applications and process information respectively corresponding to all the applications, and the basic information at least comprises application entry complete class information;

the selection module is used for selecting a target application from all the applications according to the basic information corresponding to all the applications;

and the loading module is used for loading the basic information corresponding to the target application and the process information corresponding to the target application to the virtual machine corresponding to the target application.

14. The apparatus of claim 9, wherein the apparatus further comprises:

a second determining module, configured to determine a risk type corresponding to the class including the first preset identifier in the first memory WebShell detection result;

and the first adding module is used for adding the information corresponding to the risk type into the first memory WebShell detection result to obtain a second memory WebShell detection result.

15. The apparatus of claim 14, wherein the second determining module is specifically configured to:

16. The apparatus of claim 10 or claim 14, wherein the apparatus further comprises:

the second marking module is used for marking the third category by using a second preset identification, wherein the second preset identification corresponds to a memory WebShell risk;

the second adding module is used for adding the labeled third class to the first memory WebShell detection result to obtain a third memory WebShell detection result; or adding the labeled third class to the second memory WebShell detection result to obtain a third memory WebShell detection result.

17. An electronic device, comprising:

a memory for storing a computer program;

a processor for implementing the method steps of any one of claims 1-8 when executing the computer program stored on the memory.

18. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of claims 1-8.