CN114499944A

CN114499944A - Method, device and equipment for detecting WebShell

Info

Publication number: CN114499944A
Application number: CN202111583370.9A
Authority: CN
Inventors: 王鑫渊; 林顺东; 许金旺; 李可惟
Original assignee: Tianyi Cloud Technology Co Ltd
Current assignee: Tianyi Cloud Technology Co Ltd
Priority date: 2021-12-22
Filing date: 2021-12-22
Publication date: 2022-05-13
Anticipated expiration: 2041-12-22
Also published as: CN114499944B

Abstract

The embodiment of the application discloses a method, a device and equipment for detecting WebShell, wherein the method comprises the following steps: acquiring a file to be detected in a WebShell environment; inputting a file to be detected into a target detection model obtained by pre-training for detection; the target detection model is obtained by applying a training sample and training an initial detection model; the initial detection model is composed of a first CNN network, a GRU network and a second CNN network; the first CNN network is used for extracting basic features from the training samples, the GRU network is used for extracting sequence features in the basic features, and the second CNN network is used for processing the sequence features so as to adjust parameters in the initial detection model according to the processed sequence features to obtain a target detection model; and determining whether the file to be detected in the WebShell environment is a safe file or not according to the detection result. The accuracy of detecting whether the webshell is malicious or safe is improved.

Description

Method, device and equipment for detecting WebShell

Technical Field

The present application relates to the field of network security technologies, and in particular, to a method, an apparatus, and a device for detecting WebShell.

Background

Webshell is a command execution environment deployed on a web server in the form of a web page file, and an administrator often manages the server by using Webshell. However, an attacker also frequently uploads malicious webshells to the server by using various vulnerabilities of the server, such as injection, cross-site scripting attack, command execution, file uploading and the like, so that the server is controlled through the webshells, which greatly threatens the information and system security of the controlled server and network users of the server.

Due to the flexible and changeable Webshell form, an attacker often adopts confusion means such as encryption and decryption, multi-coding, useless information insertion and the like, so that the existing Webshell detection tool cannot effectively detect malicious Webshell subjected to confusion processing. Therefore, the detection method in the related technology has low accuracy of the detection result, and greatly threatens the information and system safety of the corresponding server network user.

Therefore, how to accurately detect malicious Webshell has practical significance for information and system protection.

Disclosure of Invention

The embodiment of the application provides a method, a device and equipment for detecting WebShell, which are used for improving the accuracy of detecting webShell.

In a first aspect, an embodiment of the present application provides a method for detecting WebShell, including:

acquiring a file to be detected in the WebShell environment;

inputting the file to be detected into a target WebShell detection model obtained by pre-training for detection;

the target WebShell detection model is obtained by applying a training sample to train an initial WebShell detection model; the initial WebShell detection model is composed of a first CNN network, a GRU network and a second CNN network; the first CNN network is used for extracting basic features from a training sample, the GRU network is used for extracting sequence features in the basic features, and the second CNN network is used for processing the sequence features so as to adjust parameters in the initial WebShell detection model according to the processed sequence features to obtain a target WebShell detection model;

and determining whether the file to be detected in the WebShell environment is a safe file or not according to the detection result.

In some exemplary embodiments, the target WebShell detection model is obtained by:

acquiring a sample file in the WebShell environment, processing the sample file into a two-dimensional tensor form, and determining the sample file in each two-dimensional tensor form to form a training sample set;

inputting each training sample into the first CNN network so that the first CNN network extracts a basic feature of each training sample, and taking the basic feature as an input of the GRU network so that the GRU network extracts a sequence feature in the basic feature;

and inputting the sequence characteristics into the second CNN network to enable the second CNN network to process, and adjusting parameters in the initial WebShell detection model according to the processed sequence characteristics to obtain a target WebShell detection model.

In some exemplary embodiments, the first CNN network includes an embedding layer, a first one-dimensional convolutional layer, a first normalized BN layer, a first pooling layer; the GRU network comprises a circulation layer BiGRU; the second CNN network comprises a second one-dimensional convolutional layer, a second normalized BN layer, a second pooling layer, a Flatten layer, a first full-connection layer, a third normalized BN layer, a second full-connection layer, a third full-connection layer and a fourth normalized BN layer.

In some exemplary embodiments, the embedding layer is configured to convert training samples of the first CNN network input into a word vector matrix;

the first normalized BN layer, the second normalized BN layer, the third normalized BN layer and the fourth normalized BN layer are used for enabling training data of each layer in a training process to be normally distributed.

In some exemplary embodiments, the rotation layer BiGRU is implemented based on a modification to the standard LSTM, and the rotation layer BiGRU extracts sequence features from the basic features by:

for each basic feature, determining the basic feature as input information of a current LSTM unit;

processing the information after splicing the input information and the state information transmitted by the last LSTM unit according to the first weight and the activation function to obtain a reset gate;

processing the information after splicing the input information and the state information transmitted by the last LSTM unit according to a second weight and an activation function to obtain an update door;

processing the spliced information of the input information and the product of the state information transmitted by the reset gate and the last LSTM unit according to a third weight and an arc tangent function to obtain intermediate state information;

determining the sum of the first information and the second information as the state information of the front LSTM unit; wherein the first information is a product of the update gate and the intermediate state information; the second information is the product of the difference between a preset constant and an updating gate and the state information transmitted by the last LSTM unit;

the sequence characteristics are determined from the state information of each LSTM unit.

In some exemplary embodiments, the activation function output to the third full connectivity layer in the second CNN network is a Sigmoid activation function; the activation function of each hidden layer in the first CNN network, the GRU network and the second CNN network is Relu.

In some exemplary embodiments, said processing said sample file into a two-dimensional tensor form comprises:

applying Fit _ on _ texts to transmit a sample file in the WebShell environment into a Tokenizer;

applying the Tokenizer to convert the sample file in the WebShell environment into a sequence;

and processing the converted sequence into a two-dimensional tensor form by combining a Pad _ sequences sequence processing mode.

In a second aspect, an embodiment of the present application provides an apparatus for detecting WebShell, including:

the file acquisition module is used for acquiring a file to be detected in the WebShell environment;

the detection module is used for inputting the file to be detected to a target WebShell detection model obtained by pre-training for detection;

and the determining module is used for determining whether the file to be detected in the WebShell environment is a safe file according to the detection result.

In some exemplary embodiments, the method further includes a model training module, configured to train to obtain the target WebShell detection model by:

inputting each training sample into the first CNN network so that the first CNN network extracts the basic feature of each training sample, and taking the basic feature as the input of the GRU network so that the GRU network extracts the sequence feature in the basic feature;

In some exemplary embodiments, the rotation layer BiGRU is implemented based on a standard LSTM modification, and the model training module is specifically configured to extract sequence features from the basic features by:

In some exemplary embodiments, the model training module is specifically configured to:

In a third aspect, an embodiment of the present application provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of any one of the methods when executing the computer program.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium having stored thereon computer program instructions, which, when executed by a processor, implement the steps of any of the methods described above.

The embodiment of the application has the following beneficial effects:

the target Webshell detection model is obtained by training an initial Webshell detection model by applying a training sample and adjusting parameters in the training process, and the initial Webshell detection model comprises a first CNN network, a GRU network and a second CNN network, and basic features output by the first CNN network are used as input of the GRU network, and then the GRU network extracts sequence features in the basic features, so that the second CNN network can continuously process the sequence features and adjust model parameters according to processing results to further obtain the target Webshell detection model. Compared with the detection method in the prior art, the target WebShell detection model is applied to detect the file to be detected in the WebShell environment, and then whether the file to be detected in the WebShell environment is a safe file is determined, so that the detection accuracy is improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the embodiments of the present application will be briefly described below, and it is obvious that the drawings described below are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a flowchart of a method for detecting WebShell according to an embodiment of the present application;

fig. 2 is a flowchart of a method for training a WebShell detection model according to an embodiment of the present application;

fig. 3 is a schematic diagram of a network structure according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a data flow variation provided by an embodiment of the present application;

fig. 5 is a schematic diagram of a GRU process according to an embodiment of the present application;

FIG. 6 is a diagram illustrating a convolution process according to an embodiment of the present application;

FIG. 7 is a diagram of a multi-model contrast architecture according to an embodiment of the present application;

FIG. 8 is a schematic diagram of a training process provided in an embodiment of the present application;

fig. 9 is a schematic structural diagram of an apparatus for detecting WebShell according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.

For convenience of understanding, terms referred to in the embodiments of the present application are explained below:

(1) CNN (Convolutional Neural Network), which is a kind of feed-forward Neural Network including convolution calculation and having a deep structure, is one kind of deep learning.

(2) A GRU (Gated Recurrent Unit) Network is a gating mechanism in RNN (Recurrent Neural Network), and similar to other gating mechanisms, it aims to solve the problems of gradient disappearance and explosion in standard RNN, and at the same time, retain long-term information of the sequence. It has fewer parameters than LSTM (Long short-term memory), and contains only one reset gate and one update gate.

Any number of elements in the drawings are by way of example and not by way of limitation, and any nomenclature is used solely for differentiation and not by way of limitation.

In a specific practical process, Webshell is a command execution environment deployed on a web server in the form of a web page file, and an administrator often uses the Webshell to manage the server. However, an attacker also frequently uploads malicious webshells to the server by using various vulnerabilities of the server, such as injection, cross-site scripting attack, command execution, file uploading and the like, so that the server is controlled through the webshells, which greatly threatens the information and system security of the controlled server and network users of the server. Due to the flexible and changeable Webshell form, an attacker often adopts confusion means such as encryption and decryption, multi-coding, useless information insertion and the like, so that the existing Webshell detection tool cannot effectively detect malicious Webshell subjected to confusion processing. Therefore, the detection method in the related technology has low accuracy of the detection result, and greatly threatens the information and system safety of the corresponding server network user. Therefore, how to accurately detect the malicious webshell has practical significance for information and system protection.

Therefore, the method is a combined model structure based on a convolutional neural network and a bidirectional gating circulation unit, the GRU layer is spliced into a hidden output result of the CNN layer to comprehensively extract local features and sequence features, and therefore malicious Webshells written by languages such as php, jsp and asp can be detected more efficiently and accurately.

Therefore, in the application process, the file to be detected in the WebShell environment can be acquired; inputting a file to be detected into a target WebShell detection model obtained by pre-training for detection; and determining whether the file to be detected in the WebShell environment is a safe file or not according to the detection result.

In the model training process in the embodiment of the application, request data are converted into input tensors, the input tensors are input into a one-dimensional neural network for feature extraction, then output results of a hidden layer are input into a GRU layer, after signals are output, the output results are input into a Flatten layer, the output results are input into a full-connection layer after being flattened, and finally a Sigmoid activating function is used for outputting classification results. In particular, batch normalization is used among many layers so that the input of each layer of the neural network is normally distributed.

After introducing the design concept of the embodiment of the present application, some simple descriptions are provided below for application scenarios to which the technical solution of the embodiment of the present application can be applied, and it should be noted that the application scenarios described below are only used for describing the embodiment of the present application and are not limited. In specific implementation, the technical scheme provided by the embodiment of the application can be flexibly applied according to actual needs.

To further illustrate the technical solutions provided by the embodiments of the present application, the following detailed description is made with reference to the accompanying drawings and the detailed description. Although the embodiments of the present application provide method steps as shown in the following embodiments or figures, more or fewer steps may be included in the method based on conventional or non-inventive efforts. In steps where no necessary causal relationship exists logically, the order of execution of the steps is not limited to that provided by the embodiments of the present application.

The technical solutions provided in the embodiments of the present application are explained below.

Referring to fig. 1, an embodiment of the present application provides a method for detecting WebShell, including the following steps:

s101, obtaining a file to be detected in the WebShell environment.

S102, inputting a file to be detected into a target WebShell detection model obtained through pre-training for detection; the target WebShell detection model is obtained by applying a training sample to train an initial WebShell detection model; the initial WebShell detection model is composed of a first CNN network, a GRU network and a second CNN network; the first CNN network is used for extracting basic features from the training samples, the GRU network is used for extracting sequence features in the basic features, and the second CNN network is used for processing the sequence features so as to adjust parameters in the initial WebShell detection model according to the processed sequence features to obtain a target WebShell detection model.

S103, determining whether the file to be detected in the WebShell environment is a safe file or not according to the detection result.

In the embodiment of the application, the target Webshell detection model is obtained by training an initial Webshell detection model by applying a training sample and adjusting parameters in the training process, the initial Webshell detection model comprises a first CNN network, a GRU network and a second CNN network, basic features output by the first CNN network are used as input of the GRU network, and then the GRU network extracts sequence features in the basic features, so that the second CNN network can continuously process the sequence features and adjust model parameters according to a processing result, and further the target Webshell detection model is obtained. Compared with the detection method in the prior art, the target WebShell detection model is applied to detect the file to be detected in the WebShell environment, and whether the file to be detected in the WebShell environment is a safe file is determined, so that the detection accuracy is improved.

Referring to S101, in a general case, an acquired initial file to be detected in the WebShell environment is a text, and in order to improve processing efficiency and accuracy, the initial file to be detected is converted into a file to be detected in a two-dimensional tensor form.

In one specific example, the conversion process is as follows;

the text is converted to a sequence using a Tokenizer in text preprocessing, the text vector is quantized to a machine recognizable language, and a sequence of integers is generated where each integer is an index of a token in a dictionary. The file to be detected is firstly transmitted into the word segmentation device by using Fit _ on _ Texts, the obtained Index _ word is a dictionary (used for an Embedding layer later) for mapping character string indexes, the shape of the obtained Index _ word is as {1: ', 2: ' e ', 3:'t ', 4: ' a ', 5: ' r ', 6: ' n ' … }, and the text sequence list based on the Index _ word dictionary is obtained by processing Texts _ to _ sequences. The method is used together with the Pad _ sequences sequence preprocessing, the text is processed into a two-dimensional tensor which can be recognized by a machine on the basis of a word segmentation device Tokenizer, and the insufficient length of the text is filled up by 0. The final result of text vectorization is a matrix of shape (7515,2500) shaped as [ 000 … 456126 ] \ n [ 000 … 456126 ] \ n [ 252544 … 456126 ] … [ 1561.. 335858 ].

And the S102 is involved, because the target WebShell detection model is trained, the file to be detected can be input into the target WebShell detection model obtained by pre-training for detection, and the output of the target WebShell detection model is obtained as a detection result.

S103, determining which detection results correspond to the file to be detected as the safe file and which detection results correspond to the malicious file according to the detection results. In an actual application process, for example, the to-be-detected result has a plurality of indexes, wherein 90% of the indexes represent that the to-be-detected file is a secure file, 10% of the indexes represent that the to-be-detected file is a malicious file, and at this time, the to-be-detected file can be determined to be the secure file.

In order to make the technical scheme of the present application clearer, a training process of the WebShell detection model is described with reference to fig. 2:

s201, obtaining a sample file in a WebShell environment, processing the sample file into a two-dimensional tensor form, and determining the sample file in each two-dimensional tensor form to form a training sample set.

S202, inputting each training sample into the first CNN network, so that the first CNN network extracts the basic feature of each training sample, and taking the basic feature as the input of the GRU network, so that the GRU network extracts the sequence feature in the basic feature.

And S203, inputting the sequence characteristics into a second CNN network to enable the second CNN network to process, and adjusting parameters in the initial WebShell detection model according to the processed sequence characteristics to obtain a target WebShell detection model.

Referring to S201, the sample file also needs to be processed into a two-dimensional tensor form, and the processing procedure is as follows: applying Fit _ on _ texts to transmit a sample file in a WebShell environment into a Tokenizer; converting the sample file in the WebShell environment into a sequence by applying a word segmenter Tokenizer; and processing the converted sequence into a two-dimensional tensor form by combining a Pad _ sequences sequence processing mode. Further, for a detailed exemplary processing procedure, the same method for processing the initial file to be detected into the two-dimensional tensor can be referred to, and details are not repeated here. In this way, the plurality of sample files are processed into a two-dimensional tensor to form a training sample set.

Referring to S202, a network structure of an initial WebShell detection model is described, where the network structure includes three layers, a first CNN network includes an embedded layer, a first one-dimensional convolutional layer, a first normalized BN layer, and a first pooling layer; the GRU network comprises a circulation layer BiGRU; the second CNN network comprises a second one-dimensional convolutional layer, a second normalized BN layer, a second pooling layer, a Flatten layer, a first full-connection layer, a third normalized BN layer, a second full-connection layer, a third full-connection layer and a fourth normalized BN layer.

For example, fig. 3 shows a network structure, where the embedded layer is embedding _1, the first one-dimensional convolution layer is conv1d _1, the second one-dimensional convolution layer is conv1d _2, the first pooling layer is maxpouring _1, the second pooling layer is maxpouring _2, the first fully-connected layer is dense _1, the second fully-connected layer is dense _2, the first fully-connected layer is dense _3, the first normalized BN layer is batch _ normalization _1(batch _ n _1), the second normalized BN layer is batch _ normalization _2(batch _ n _2), the third normalized BN layer is batch _ normalization _3(batch _ n _3), and the fourth normalized BN layer is batch _ normalization _4(batch _ n _ 4).

Specifically, referring to FIG. 4, a schematic diagram of a data flow variation is shown. CNN _ BiGRU _ CNN is a model applied in the application, and activation _1, activation _2, activation _3 and activation _4 are activation layers.

In the actual training process, each training sample is input into the first CNN network, so that the first CNN network extracts the basic feature of each training sample, and the basic feature is used as the input of the GRU network, so that the GRU network extracts the sequence feature in the basic feature.

Next, a description will be given of a process of extracting sequence features by using a loop layer BiGRU, which is implemented based on a modification to the standard LSTM, and extracting sequence features from basic features by using the loop layer BiGRU as follows:

for each basic feature, determining the basic feature as the input information of the current LSTM unit; processing the information after splicing the input information and the state information transmitted by the last LSTM unit according to the first weight and the activation function to obtain a reset gate; processing the information after splicing the input information and the state information transmitted by the last LSTM unit according to the second weight and the activation function to obtain an update door; processing the spliced information of the input information and the product of the state information transmitted by the reset gate and the last LSTM unit according to the third weight and the arc tangent function to obtain intermediate state information; determining the sum of the first information and the second information as the state information of the front LSTM unit; wherein the first information is a product of the update gate and the intermediate state information; the second information is the product of the difference between the preset constant and the updating gate and the state information transmitted by the last LSTM unit; the sequence characteristics are determined from the state information of each LSTM unit.

Specifically, in conjunction with FIG. 5, the input information of the current LSTM cell is x_tThe status information h transmitted by the last LSTM cell_t-1The first weight is W_rAnd σ is the activation function, then the gate r is reset_tIs r_t＝σ(W_r·[h_t-1，x_t]) (ii) a Second weight W_zThen, the gate z is updated_tIs z_t＝σ(W_z·[h_t-1，x_t]) (ii) a The third weight is

Intermediate state information

Wherein the activation function is also called a gate control signal, h_t-1And x_tA value in the range of 0 to 1, r_t*h_t-1The value is in the range of-1 to 1. Thus, each state information of each LSTM unit is obtained, and the sequence feature is determined according to each state information.

In addition, during training, the role of each layer is as follows:

the embedded layer is used for converting training samples input by the first CNN network into a word vector matrix; the first normalized BN layer, the second normalized BN layer, the third normalized BN layer and the fourth normalized BN layer are used for enabling the training data of each layer in the training process to be in normal distribution.

Specifically, for the embedding layer, the embedding layer is placed at the first layer, and information about the input size that the model needs to receive is first received. The input data is integer coding, a word vector matrix is trained and output, and conversion between positive integers and vectors with fixed sizes is achieved.

For the convolutional layer, referring to fig. 6, the vectorized one-dimensional sequence is input into a linear filter for weighted average, and the convolution results are distinguished by analog characteristics to obtain convolution weight. Convolution implementation convolves a tensor of length 2500 with 256 convolution kernels of length 4, resulting in 256 results of length 2497. And then, pooling and sampling are carried out on the result to reduce the data dimension, and a sampling window is 2 to obtain a tensor with the length of 1248. And then controlling the result in a range of 0 to 1 by using an excitation function Sigmoid, and outputting the classification probability of each to realize two classifications.

For the normalized BN layer, the output of the normalized BN layer (Batchnormalization) in the embedded layer, the convolutional layer and the fully-connected layer of the model adopts the idea of batch normalization, so that the input of each layer is in the form of normal distribution during deep neural network training to reduce the difference between samples. With the deepening of the network layer, the training efficiency is gradually reduced, and the convergence is slower and slower. In the training process, the input of the hidden layer gradually deviates from a linear function, and the two poles approach to the edge of the value range, so that the convergence is slowed down. BatchNormalization is a method of normalizing the distribution of each Saint network layer in a standard normal form by means of "forcing" so that the input value is kept in a normal range with a mean value of 0 and a variance of 1. Thus, after the input node of the hidden layer is fixed in a range, the nonlinear function is more sensitive to the state of the area, and the convergence speed is accelerated.

For the Flatten layer: the Flatten layer is used to "Flatten" the output of the convolutional layer and simply multiply the multidimensional data, thereby unifying the multidimensional data into a fully connected layer.

The parameters (including activation function, optimizer, loss function, callback function, etc.) used by the various layers in the network are explained next:

the activation function output by the third full connection layer in the second CNN network is a Sigmoid activation function; the activation function of each hidden layer in the first CNN network, the GRU network and the second CNN network is Relu. Specifically, a full connection layer for activation operation is realized, and the full connection layer is output to a Sigmoid activation function for final classification.

In detail, for the activation function, the commonly used activation functions include Linear, Softmax, Relu and the like, in the application, the activation function Relu is used in each hidden layer, and the Sigmoid activation function is used for solving the classification problem during output.

For the optimizer: common optimizers include RMSprop, Adam, Adamax and the like, and the embodiment of the application uses an adaptive learning rate optimization algorithm, Adam, which is an optimized synthesis of AdaGrad and RMSprop algorithms.

For the loss function: common loss functions are: logloss, Ordinary Least Squares, Adaboost, Binary _ cross, etc. The embodiment of the present application adopts a cross entropy loss function Binary _ cross-sense of two classes, which is a loss function used corresponding to Sigmoid.

For the callback function: the embodiment of the application uses the following callback functions to adjust the state in the training process and save the model:

(1) EarlyStopping: the embodiment of the application is set to interrupt training by using the callback function if the training effect in the model 3 rounds is not improved any more.

(2) ModelCheckPoint: and continuously storing the models in different training rounds in the training process, and comparing results to obtain the optimal model in the training process. When a good model is trained and stored, the stored model can be directly called for prediction.

(3) ReduceLROnPlateau: if the loss value on the verification set no longer improves, the learning rate is reduced by the callback function. The loss in this example of the application did not improve the learning rate 0.5 in one round.

In order to illustrate the effect of the embodiment of the present application, the CNN _ BiGRU _ CNN feature extraction method in the present application is compared with the convolutional neural network CNN, the gated cyclic unit GRU, and the Capsule network Capsule, respectively, and fig. 7 is a multi-model comparison architecture diagram, and table 1 is a comparison result. In fig. 7, for each model, data collection, data processing, and model training are performed. In the data collection stage, a github open source WebShell project is used for determining black samples, a github crawl response script language code is used for determining white samples, and the black samples and the white samples are used as training data. In the data processing stage, text preprocessing and sequence preprocessing are carried out. In the training phase, four ways of training results are obtained, as shown in table 1.

TABLE 1 Effect of the methods under different indexes

Model	Loss	Acc	Precision	Recall	F1-score
						CNN	0.4478	0.8558	0.8281	0.7467	0.7691
GRU	0.3432	0.8996	0.8930	0.8140	0.8370
						CNN_BiGRU_CNN	0.0154	0.9910	0.9939	0.9792	0.9858
Capsule	0.1330	0.9505	0.9321	0.9199	0.9210

As can be seen from table 1, the effects of the four models were evaluated from five indices of Loss value (Loss), accuracy (Acc), Precision (Precision), Recall (Recall), and F1 values. The CNN and GRU models can not reach 90% in accuracy and recall rate, loss values are larger than 0.3, and the difference between predicted values and true values is large. The CNN _ BiGRU _ CNN and the Capsule model have good effects on various indexes, the accuracy and the recall rate both reach over 90 percent, the loss value also reaches a lower state, and in comparison, each index of the CNN _ BiGRU _ CNN exceeds the Capsule. Obviously, CNN _ BiGRU _ CNN is higher in each performance, far exceeds CNN, and is also higher than model Capsule in the whole. Therefore, in the embodiment of the application, the CNN and GRU are combined, so that the capability of extracting the features is increased, and the accuracy of WebShell judgment is further increased. The method is suitable for detecting whether all files in the host environment are malicious or not and whether the files are backdoor trojan files or not in the security scanning process, and is also suitable for detecting and filtering all uploaded files in real time to protect the security of the host and the server.

In summary, in conjunction with fig. 8, the training process is summarized as follows:

after a large number of training samples are collected, the samples are divided into black samples and white samples, the black samples are respectively marked, the black samples are marked with the characteristic as 1, the white samples are marked with the characteristic as 0, and then the data preprocessing process is carried out. The preprocessing process comprises file duplication removal, special character removal and line feed removal. And then text vectorization (i.e., the foregoing process of processing into a two-dimensional tensor) is performed. And randomly dividing the vectorized text into a training set and a testing set, training the initial WebShell detection model by using the testing set, predicting the WebShell, adjusting training parameters according to a prediction result to obtain a model after training, and testing by using the testing set to finally obtain a target WebShell detection model.

As shown in fig. 9, based on the same inventive concept as the method for detecting WebShell, an embodiment of the present application further provides a device for detecting WebShell, where the device at least includes a file obtaining module 91, a detecting module 92, and a determining module 93.

The file acquisition module 91 is used for acquiring a file to be detected in a WebShell environment;

the detection module 92 is used for inputting the file to be detected into a target WebShell detection model obtained by pre-training for detection;

the target WebShell detection model is obtained by applying a training sample to train an initial WebShell detection model; the initial WebShell detection model is composed of a first CNN network, a GRU network and a second CNN network; the first CNN network is used for extracting basic features from the training samples, the GRU network is used for extracting sequence features in the basic features, and the second CNN network is used for processing the sequence features so as to adjust parameters in the initial WebShell detection model according to the processed sequence features to obtain a target WebShell detection model;

and the determining module 93 is configured to determine whether the file to be detected in the WebShell environment is a secure file according to the detection result.

acquiring a sample file in a WebShell environment, processing the sample file into a two-dimensional tensor form, and determining the sample file in each two-dimensional tensor form to form a training sample set;

inputting each training sample into a first CNN network so that the first CNN network extracts the basic feature of each training sample, and taking the basic feature as the input of a GRU network so that the GRU network extracts the sequence feature in the basic feature;

and inputting the sequence characteristics into a second CNN network to enable the second CNN network to process, and adjusting parameters in the initial WebShell detection model according to the processed sequence characteristics to obtain a target WebShell detection model.

the first normalized BN layer, the second normalized BN layer, the third normalized BN layer and the fourth normalized BN layer are used for enabling the training data of each layer in the training process to be in normal distribution.

In some exemplary embodiments, the loop layer BiGRU is implemented based on a standard LSTM improvement, and the model training module is specifically configured to extract sequence features from the basic features by:

for each basic feature, determining the basic feature as the input information of the current LSTM unit;

processing the input information and the state information transmitted by the last LSTM unit by using a reset gate to obtain processed first data; processing the input information and the state information transmitted by the last LSTM unit by using an updating gate to obtain processed second data; the first data and the second data are in a first preset data range;

splicing the first data input information, and scaling the spliced data through an arc tangent function to obtain a middle value of the state information of the current LSTM unit; wherein, the state information of the front LSTM unit is in a second preset data range;

processing the preset value and the second data, and combining the processed result with the intermediate value of the state information of the current LSTM unit; and combining the state information transmitted by the last LSTM unit with the combined information to obtain the state information of the current LSTM unit.

applying Fit _ on _ texts to transmit a sample file in a WebShell environment into a Tokenizer;

converting the sample file in the WebShell environment into a sequence by applying a word segmenter Tokenizer;

The device for detecting the WebShell and the method for detecting the WebShell provided by the embodiment of the application adopt the same inventive concept, can obtain the same beneficial effects, and are not repeated herein.

Based on the same inventive concept as the WebShell detection method, the embodiment of the present application further provides an electronic device, which may be specifically a desktop computer, a portable computer, a smart phone, a tablet computer, a Personal Digital Assistant (PDA), a server, and the like. As shown in fig. 10, the electronic device may include a processor 1001 and a memory 1002.

The Processor 1001 may be a general-purpose Processor, such as a Central Processing Unit (CPU), a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, discrete hardware components, and may implement or perform the methods, steps, and logic blocks disclosed in the embodiments of the present Application. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware processor, or may be implemented by a combination of hardware and software modules in a processor.

Memory 1002, which is a non-volatile computer-readable storage medium, may be used to store non-volatile software programs, non-volatile computer-executable programs, and modules. The Memory may include at least one type of storage medium, and may include, for example, a flash Memory, a hard disk, a multimedia card, a card-type Memory, a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Programmable Read Only Memory (PROM), a Read Only Memory (ROM), a charged Erasable Programmable Read Only Memory (EEPROM), a magnetic Memory, a magnetic disk, an optical disk, and so on. The memory is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory 1002 in the embodiments of the present application may also be circuitry or any other device capable of performing a storage function for storing program instructions and/or data.

Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; the computer storage media may be any available media or data storage device that can be accessed by a computer, including but not limited to: various media that can store program codes include a removable Memory device, a Random Access Memory (RAM), a magnetic Memory (e.g., a flexible disk, a hard disk, a magnetic tape, a magneto-optical disk (MO), etc.), an optical Memory (e.g., a CD, a DVD, a BD, an HVD, etc.), and a semiconductor Memory (e.g., a ROM, an EPROM, an EEPROM, a nonvolatile Memory (NAND FLASH), a Solid State Disk (SSD)).

Alternatively, the integrated units described above in the present application may be stored in a computer-readable storage medium if they are implemented in the form of software functional modules and sold or used as independent products. Based on such understanding, the technical solutions of the embodiments of the present application may be essentially implemented or portions thereof that contribute to the prior art may be embodied in the form of a software product stored in a storage medium, and including several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods of the embodiments of the present application. And the aforementioned storage medium includes: various media that can store program codes include a removable Memory device, a Random Access Memory (RAM), a magnetic Memory (e.g., a flexible disk, a hard disk, a magnetic tape, a magneto-optical disk (MO), etc.), an optical Memory (e.g., a CD, a DVD, a BD, an HVD, etc.), and a semiconductor Memory (e.g., a ROM, an EPROM, an EEPROM, a nonvolatile Memory (NAND FLASH), a Solid State Disk (SSD)).

The above embodiments are only used to describe the technical solutions of the present application in detail, but the above embodiments are only used to help understanding the method of the embodiments of the present application, and should not be construed as limiting the embodiments of the present application. Modifications and substitutions that may be readily apparent to those skilled in the art are intended to be included within the scope of the embodiments of the present application.

Claims

1. A method for detecting WebShell, comprising:

acquiring a file to be detected in the WebShell environment;

2. The method according to claim 1, wherein the target WebShell detection model is obtained by:

3. The method of claim 1, wherein the first CNN network comprises an embedding layer, a first one-dimensional convolutional layer, a first normalized BN layer, a first pooling layer; the GRU network comprises a circulation layer BiGRU; the second CNN network comprises a second one-dimensional convolutional layer, a second normalized BN layer, a second pooling layer, a Flatten layer, a first full-connection layer, a third normalized BN layer, a second full-connection layer, a third full-connection layer and a fourth normalized BN layer.

4. The method of claim 3, wherein the embedding layer is configured to convert training samples of the first CNN network input into a word vector matrix;

5. The method of claim 3, wherein the rotation layer BiGRU is implemented based on a standard LSTM improvement, and the rotation layer BiGRU extracts sequence features from the basic features by:

6. The method of claim 3, wherein the activation function exported to the third fully-connected layer in the second CNN network is a Sigmoid activation function; the activation function of each hidden layer in the first CNN network, the GRU network and the second CNN network is Relu.

7. The method of any of claims 2 to 6, wherein the processing the sample file into a two-dimensional tensor form comprises:

8. A method for detecting WebShell, comprising:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the steps of the method of any of claims 1 to 7 are implemented when the computer program is executed by the processor.

10. A computer-readable storage medium having computer program instructions stored thereon, which, when executed by a processor, implement the steps of the method of any one of claims 1 to 7.