CN114462033A - Method and device for constructing script file detection model and storage medium - Google Patents

Method and device for constructing script file detection model and storage medium Download PDF

Info

Publication number
CN114462033A
CN114462033A CN202111575911.3A CN202111575911A CN114462033A CN 114462033 A CN114462033 A CN 114462033A CN 202111575911 A CN202111575911 A CN 202111575911A CN 114462033 A CN114462033 A CN 114462033A
Authority
CN
China
Prior art keywords
file
detection model
script file
webshell
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111575911.3A
Other languages
Chinese (zh)
Inventor
王宁
蒋顺桥
赵鹏
马龙
吴婧
王雪晴
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianyi Cloud Technology Co Ltd
Original Assignee
Tianyi Cloud Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianyi Cloud Technology Co Ltd filed Critical Tianyi Cloud Technology Co Ltd
Priority to CN202111575911.3A priority Critical patent/CN114462033A/en
Publication of CN114462033A publication Critical patent/CN114462033A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • G06F21/563Static detection by source code analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/086Learning methods using evolutionary algorithms, e.g. genetic algorithms or genetic programming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/03Indexing scheme relating to G06F21/50, monitoring users, programs or devices to maintain the integrity of platforms
    • G06F2221/033Test or assess software

Abstract

The invention discloses a method, a device and a storage medium for constructing a script file detection model, wherein the method comprises the following steps: acquiring a training sample set; extracting multi-dimensional characteristics of samples in a training sample set based on file attributes and operation attributes of a WebShell file; and training the preset neural network model according to the multidimensional characteristics to obtain a script file detection model. By implementing the method, a training sample set comprising a WebShell file sample and a non-WebShell file sample is selected, and the multidimensional characteristics of the samples in the training sample set are extracted based on the file attributes and the operation attributes of the WebShell file; and then training the preset neural network model by adopting the multidimensional characteristics to obtain a script file detection model. Therefore, the detection effect of the script file detection model is improved through the extraction of the multi-dimensional features, and the problems of false alarm and missing report of the traditional single feature on the encrypted Webshell are solved.

Description

Method and device for constructing script file detection model and storage medium
Technical Field
The invention relates to the technical field of webpage security, in particular to a method and a device for constructing a script file detection model and a storage medium.
Background
WebShell is a code execution environment in the form of webpage files such as asp, php, jsp or cgi and is mainly used for website management, server management, authority management and other operations. The application method is simple, and a lot of daily operations can be carried out by only uploading a code file and accessing through the website, thereby greatly facilitating the management of the user on the website and the server. Therefore, a small number of people modify the code and use the modified code as a backdoor program to achieve the purpose of controlling the website server.
In the internet era, the fields of e-commerce, government and enterprise systems and the like are all independent of websites of Web application systems. . With the development of Web services, more and more hackers upload the constructed WebShell to a server page directory, so that privacy information of a user is stolen by accessing the WebShell page, or the server is controlled. WebShell is difficult to intercept because it uses the same execution environment and service port as a normal web page and does not leave a log record. Therefore, how to detect the WebShell is a very important problem.
The traditional WebShell script detection has two main methods: the first is based on a lightweight approach: for example based on hash values or rule matching. The scheme has higher calculation speed and good performance. However, such a scheme has the disadvantages of limited defense capability and weak expression capability, and can only intercept known (similar) samples. Thus, an attacker can bypass such detection methods with confusion. Or based on a machine learning/deep learning scheme. And semantic information is collected, and then judgment is performed by utilizing machine learning and deep learning. This approach increases the detection capability of the bypassed sample. But the accuracy is still limited because the high-order program characteristic information of the script cannot be understood due to the characteristics. The second is based on a heavyweight approach: the scheme has high attack identification accuracy and can identify complex attack modes. But has the disadvantage that the calculation speed is generally slow and can not be applied to all script detection. Therefore, such schemes require very costly hardware to implement to achieve throughput approaching that of the lightweight schemes. On the premise of improving the detection efficiency of the WebShell, how to improve the accuracy of the WebShell detection is a problem to be solved urgently.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method, an apparatus, and a storage medium for constructing a script file detection model, so as to solve the technical problem in the prior art that WebShell detection accuracy is low.
The technical scheme provided by the invention is as follows:
a first aspect of an embodiment of the present invention provides a method for constructing a script file detection model, including: acquiring a training sample set, wherein the training sample set comprises a plurality of WebShell file samples and a plurality of non-WebShell file samples; extracting multi-dimensional features of samples in the training sample set based on file attributes and operation attributes of the WebShell file; and training a preset neural network model according to the multidimensional characteristics to obtain a script file detection model.
Optionally, before training the preset neural network model according to the multidimensional feature, the method further includes: and carrying out standardization processing on the multi-dimensional features to obtain standardized features, wherein the standardized features conform to normal distribution with a mean value of 0 and a standard deviation of 1.
Optionally, the multi-dimensional features comprise: the method comprises the following steps of file coincidence index characteristic, information entropy characteristic, longest character string characteristic, file compression ratio characteristic and behavior operation function calling frequency characteristic.
Optionally, the file coincidence index feature and the information entropy feature are calculated by the following steps: acquiring a character string in a sample; calculating to obtain the document coincidence index characteristic according to the number of any type of characters in the character string, the total number of any type of characters and the number of character types; converting characters in the character string into ASCII codes; and calculating the information entropy characteristics according to the entropy of the ASCII code.
Optionally, the behavior operation function call times feature includes: the variable and variable function call times, the character string processing function call times, the system command execution function call times, the database operation function call times, the code execution function call times, the callback function call times, the file class operation function call times and the sensitive transfer variable call times.
Optionally, training a preset neural network model according to the multidimensional features to obtain a script file detection model, including: and training a preset neural network model according to the multi-dimensional features based on a k-fold cross validation algorithm to obtain a script file detection model.
Optionally, the preset neural network model comprises a BP neural network or a GA-BP neural network.
A second aspect of the embodiments of the present invention provides a device for constructing a script file detection model, including: the system comprises a sample set acquisition module, a training sample set acquisition module and a training sample set acquisition module, wherein the training sample set comprises a plurality of WebShell file samples and a plurality of non-WebShell file samples; the characteristic extraction module is used for extracting the multidimensional characteristics of the samples in the training sample set based on the file attributes and the operation attributes of the WebShell file; and the training module is used for training a preset neural network model according to the multidimensional characteristics to obtain a script file detection model.
A third aspect of the embodiments of the present invention provides a computer-readable storage medium, where computer instructions are stored, and the computer instructions are configured to cause a computer to execute the method for constructing a script file detection model according to any one of the first aspect and the first aspect of the embodiments of the present invention.
A fourth aspect of an embodiment of the present invention provides an electronic device, including: the script file detection model building method comprises a memory and a processor, wherein the memory and the processor are connected in communication with each other, the memory stores computer instructions, and the processor executes the computer instructions to execute the method for building the script file detection model according to any one of the first aspect and the first aspect of the embodiments of the present invention.
The technical scheme provided by the invention has the following effects:
according to the construction method, the construction device and the storage medium of the script file detection model provided by the embodiment of the invention, the multi-dimensional characteristics of the samples in the training sample set are extracted based on the file attributes and the operation attributes of the WebShell file by selecting the training sample set comprising the WebShell file sample and the non-WebShell file sample; and then training the preset neural network model by adopting the multidimensional characteristics to obtain a script file detection model. Therefore, the detection effect of the script file detection model is improved through the extraction of the multi-dimensional features, and the problems of false alarm and missing report of the traditional single feature on the encrypted Webshell are solved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
FIG. 1 is a flow diagram of a method of building a script file detection model according to an embodiment of the invention;
FIG. 2 is a block diagram of a method for constructing a script file detection model for defense according to an embodiment of the invention;
FIG. 3 is a block diagram of an apparatus for constructing a script file detection model according to an embodiment of the present invention;
FIG. 4 is a schematic structural diagram of a computer-readable storage medium provided according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
As described in the background, when WebShell file detection is performed, a scheme based on machine learning/deep learning may be employed. The judgment is carried out by collecting semantic information and then utilizing machine learning and deep learning. This approach increases the detection capability of the bypassed sample. But the accuracy is still limited because the high-order program characteristic information of the script cannot be understood due to the characteristics. That is, when the WebShell file is detected by machine learning or deep learning at present, the corresponding detection accuracy is low due to the single used characteristic and poor coverage capability.
In view of this, the embodiment of the present invention provides a method for constructing a script file detection model, so as to solve the technical problem in the prior art that the WebShell detection accuracy is low.
According to an embodiment of the present invention, there is provided a method for constructing a script file inspection model, it should be noted that the steps shown in the flowchart of the figure may be executed in a computer system such as a set of computer executable instructions, and that although a logical order is shown in the flowchart, in some cases, the steps shown or described may be executed in an order different from that here.
In this embodiment, a method for constructing a script file detection model is provided, which may be used in electronic devices, such as a computer, a mobile phone, a tablet computer, and the like, fig. 1 is a flowchart of a method for constructing a script file detection model according to an embodiment of the present invention, and as shown in fig. 1, the method includes the following steps:
step S101: and acquiring a training sample set, wherein the training sample set comprises a plurality of WebShell file samples and a plurality of non-WebShell file samples.
Specifically, WebShell is a command execution program in the form of a web page file, also called a script file, which is an important means for hackers to invade websites, and usually exists in the form of web page files such as asp, php, jsp, cgi, and the like. When the script file detection model is constructed, WebShell file samples and non-WebShell file samples can be obtained to serve as training sample sets.
For example, 3000 WebShell files can be selected from a plurality of open source projects and added into the training sample set. Where GitHub is an open-source hosted service, similar to a cloud of code. It hosts the user's source code items in a variety of different programming languages and tracks the various changes made at each iteration. The service can do this by using git, a revision control system running in the command line interface. When the WebShell file is obtained, the WebShell file can be selected from open source items of github, such as freeCodeCamp, tensoflow, free-programming-books and the like. Joomla is one of the three major open source Content Management Systems (CMS) worldwide, and has two actual open source parts: one is Joomla CMS (Joomla content management system), which is a basic management platform of a website; the other is Joomla Platform (Joomla framework).
As the training sample set is required to be adopted for training the model, the training sample set not only comprises WebShell file samples, but also comprises non-WebShell file samples or normal file samples, and the non-WebShell file samples and the normal file samples jointly form the training sample set. And 3000 non-WebShell file samples can be selected and added into the training sample set.
Step S102: and extracting the multidimensional characteristics of the samples in the training sample set based on the file attributes and the operation attributes of the WebShell file.
The current WebShell file has single characteristic during detection, and the detection accuracy is low. Thus, multidimensional features can be extracted based on the file attributes and operation attributes of the WebShell file. The file attribute may include characteristics contained in the content of the file and some characteristics of the file itself; the operation attribute can be some characteristics reflected by some behavior operation functions in the file; and extracting corresponding features from a plurality of WebShell file samples and a plurality of non-WebShell file samples in the training sample set based on the characteristics, thereby realizing the extraction of the multi-dimensional features.
Step S103: and training a preset neural network model according to the multidimensional characteristics to obtain a script file detection model. And after the multidimensional features are extracted, the preset neural network model is trained by adopting the multidimensional features, before training, the samples in the training sample set are labeled, the WebShell file sample is labeled as 0, the non-WebShell file sample is labeled as 1, then the multidimensional features are extracted from the training sample set and input into the preset neural network model for training, and therefore the required script file detection model is obtained.
The preset neural network model includes a BP (ack Propagation) neural network or a GA (genetic algorithms) -BP neural network. The basic idea of the BP neural network is as follows: the learning process consists of two processes, forward propagation of the signal and back propagation of the error. In forward propagation, an input sample is transmitted from an input layer, is processed by each hidden layer and is transmitted to an output layer, and if the actual output of the output layer does not meet the expected output, the error is transmitted to a backward propagation stage. The error back propagation is to reversely propagate the output error to the input layer by layer through the hidden layer in a certain form, and distribute the error to all units of each layer, thereby obtaining the error signal of each layer unit, and the error signal is used as the basis for correcting each unit. The weight adjustment process of each layer of signal forward propagation and error backward propagation is performed in cycles. And (4) continuously adjusting the weight value, namely, a learning and training process of the network. This process continues until the error in the network output is reduced to an acceptable level or until a predetermined number of learning passes.
The BP neural network has the characteristic of accurate optimization, and the genetic algorithm has strong macro search capability and good global optimization performance. Therefore, the GA-BP neural network obtained by combining the genetic algorithm and the BP network can realize the purposes of global optimization, rapidness and high efficiency. During GA-BP neural network training, a genetic algorithm is used for optimizing, the search range is narrowed, and then the BP network is used for carrying out accurate solving. The BP neural network, the GA-BP neural network, or another type of neural network model may be used when generating the script file detection model, which is not limited in the embodiment of the present invention.
In an embodiment, when the preset neural network model is trained, a k-fold cross validation algorithm may be adopted to train the preset neural network model according to the multi-dimensional features, so as to obtain a script file detection model. The basic idea of cross validation is to group the original data, one part is used as a training set, the other part is used as a validation set, firstly, the model is trained by the training set, and then the generalization error of the model is tested by the validation set. In addition, data is always limited in reality, so that k-fold cross validation can be adopted in order to form reuse on the data. The k-fold cross validation is that 1/k of a training set is used as a test set, each model is trained for k times and tested for k times, the error rate is the average of the k times, and finally the model with the minimum average rate is selected.
Specifically, in the actual training, 10-fold cross validation may be employed. The 10-fold cross validation is to divide the experimental samples into 10 parts at random, 9 parts of the experimental samples are selected as training data and 1 part of the experimental samples are selected as testing data, and the average result of 10 experiments is calculated to be used as the final experimental result. Further, 5-fold cross validation, 20-fold cross validation, or the like may be employed.
And after the script file detection model is obtained after training is finished, the WebShell file can be detected by using the script file detection model. And during detection, extracting multi-dimensional features of the file to be predicted, inputting the multi-dimensional features into the script file detection model for detection, if the output is 0, determining that the file is a WebShell file, and if the output is 1, determining that the file is a non-WebShell file or a normal file. The file to be detected is generally obtained from a web page file, the web page file is generally executed through a corresponding command execution program (script), and in order to improve the security of the web page file, the security of the web page is improved by detecting the command execution program of the web page file and discovering the WebShell script file in the command execution program.
The method for constructing the script file detection model provided by the embodiment of the invention selects the training sample set comprising the WebShell file sample and the non-WebShell file sample, and extracts the multidimensional characteristics of the samples in the training sample set based on the file attributes and the operation attributes of the WebShell file; and then training the preset neural network model by adopting the multidimensional characteristics to obtain a script file detection model. Therefore, the detection effect of the script file detection model is improved through the extraction of the multi-dimensional features, and the problems of false alarm and missing report of the traditional single feature on the encrypted Webshell are solved.
In an embodiment, before training the preset neural network model according to the multidimensional features, the method further includes: and carrying out standardization processing on the multi-dimensional features to obtain standardized features, wherein the standardized features conform to normal distribution with a mean value of 0 and a standard deviation of 1.
Through the multi-dimensional feature extraction, unprocessed original feature values can be obtained, the features generally have different dimensions and magnitude levels, when the level difference among the feature values is large, if the extracted original feature values are directly used for training and learning, the feature values with higher magnitude can be more prominently expressed in the model, the features with lower magnitude are weakened to a certain extent, and in addition, too large data span of the feature values can also cause the slow training speed of the model and the distortion of the learning effect. Therefore, in order to ensure the reliability of the result, the raw feature data needs to be standardized. The normalized data conforms to normal distribution with the mean value of 0 and the standard deviation of 1, and the processing function is as follows:
Figure BDA0003424782650000091
wherein x is a characteristic value before standardization, mu is a mean value of the characteristic of the sample data, sigma is a standard deviation of the characteristic of the sample data, and x' is a characteristic value after standardization.
In one embodiment, the multi-dimensional features include: the method comprises the following steps of file coincidence index characteristic, information entropy characteristic, longest character string characteristic, file compression ratio characteristic and behavior operation function calling frequency characteristic. The longest character string is characterized by the longest character string with uninterrupted length in the sample file; the file compression rate is the ratio of the size of the file after the file is compressed to the size of the file before the file is compressed.
Specifically, the file overlapping index feature and the information entropy feature are obtained by calculation through the following steps: acquiring a character string in a sample; calculating to obtain the document coincidence index characteristic according to the number of any type of characters in the character string, the total number of any type of characters and the number of character types; converting characters in the character string into ASCII codes; and calculating the information entropy characteristics according to the entropy of the ASCII code.
The file coincidence index is calculated by adopting the following formula:
Figure BDA0003424782650000101
Niis the number of the ith character in the character string, n is the total number of the characters in the character string, and z is the number of the character types.
The information entropy is calculated by the following formula:
Figure BDA0003424782650000102
piis the probability of occurrence of the ith random variable, and z is the random variable type number. Wherein the random variable is ASC converted from characters appearing in the fileAnd II, codes.
The behavior operation function calling times characteristic comprises the following steps: the method comprises the following steps of variable and variable function call times, character string processing function call times, system command execution function call times, database operation function call times, code execution function call times, callback function call times, file type operation function call times and sensitive transfer variable call times.
According to the method for constructing the script file detection model, provided by the embodiment of the invention, the multidimensional characteristics such as the file coincidence index, the information entropy, the longest character string, the file compression rate, the behavior operation function calling times and the like are provided based on the text attribute and the operation attribute of WebShell, and the accuracy of the script detection model on script detection is improved after the training of the BP neural network algorithm. Meanwhile, the method has higher detection degree for both ordinary type and variant type WebShell. Thereby protecting the safety of the website. In addition, the construction method of the script file detection model is simple in multi-dimensional feature extraction, can effectively detect common and encrypted WebShell files, and solves the problems of misinformation and failure in report of the traditional single feature to the encrypted WebShell. Meanwhile, the test accuracy after the training of the BP neural network and the GA-BP neural network is respectively 90.5 percent and 92.4 percent, and the detection rate is effectively improved.
As shown in fig. 2, according to the method for constructing the script file detection model provided by the embodiment of the present invention, the constructed script file detection model can well identify the intrusion of the external WebShell to the website. The security of the Web server is protected. Therefore, the safety of websites such as an electronic commerce system, a social network site and a government and enterprise system based on the cloud platform can be protected. The method can also be applied to Web service in the cloud platform, and the security of the rights of a back-end cloud host, a database and the like is ensured.
An embodiment of the present invention further provides a device for constructing a script file detection model, as shown in fig. 3, the device includes:
the system comprises a sample set acquisition module, a data acquisition module and a data processing module, wherein the sample set acquisition module is used for acquiring a plurality of training sample sets, and the training sample sets comprise WebShell file samples and non-WebShell file samples; for details, reference is made to the corresponding parts of the above method embodiments, which are not described herein again.
The characteristic extraction module is used for extracting the multidimensional characteristics of the samples in the training sample set based on the file attributes and the operation attributes of the WebShell file; for details, reference is made to the corresponding parts of the above method embodiments, which are not described herein again.
And the training module is used for training a preset neural network model according to the multidimensional characteristics to obtain a script file detection model. For details, reference is made to the corresponding parts of the above method embodiments, which are not described herein again.
The device for constructing the script file detection model provided by the embodiment of the invention extracts the multidimensional characteristics of the samples in the training sample set by selecting the training sample set comprising the WebShell file sample and the non-WebShell file sample and based on the file attributes and the operation attributes of the WebShell file; and then training the preset neural network model by adopting the multidimensional characteristics to obtain a script file detection model. Therefore, the detection effect of the script file detection model is improved through the extraction of the multi-dimensional features, and the problems of false alarm and missing report of the traditional single feature on the encrypted Webshell are solved.
The functional description of the device for constructing the script file detection model provided by the embodiment of the invention refers to the description of the method for constructing the script file detection model in the above embodiment.
An embodiment of the present invention further provides a storage medium, as shown in fig. 4, on which a computer program 601 is stored, where the instructions, when executed by a processor, implement the steps of the method for constructing a script file detection model in the foregoing embodiments. The storage medium is also stored with audio and video stream data, characteristic frame data, an interactive request signaling, encrypted data, preset data size and the like. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Hard Disk (Hard Disk Drive, abbreviated as HDD) or a Solid State Drive (SSD), etc.; the storage medium may also comprise a combination of memories of the kind described above.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic Disk, an optical Disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a Flash Memory (Flash Memory), a Hard Disk (Hard Disk Drive, abbreviated as HDD) or a Solid State Drive (SSD), etc.; the storage medium may also comprise a combination of memories of the kind described above.
An embodiment of the present invention further provides an electronic device, as shown in fig. 5, the electronic device may include a processor 51 and a memory 52, where the processor 51 and the memory 52 may be connected by a bus or in another manner, and fig. 5 takes the connection by the bus as an example.
The processor 51 may be a Central Processing Unit (CPU). The Processor 51 may also be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, or combinations thereof.
The memory 52, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as the corresponding program instructions/modules in the embodiments of the present invention. The processor 51 executes various functional applications and data processing of the processor by running non-transitory software programs, instructions and modules stored in the memory 52, that is, implements the construction method of the script file detection model in the above method embodiments.
The memory 52 may include a storage program area and a storage data area, wherein the storage program area may store an operating device, an application program required for at least one function; the storage data area may store data created by the processor 51, and the like. Further, the memory 52 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 52 may optionally include memory located remotely from the processor 51, and these remote memories may be connected to the processor 51 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The one or more modules are stored in the memory 52 and when executed by the processor 51, perform a method of building a script file detection model as in the embodiment of fig. 1-2.
The details of the electronic device may be understood by referring to the corresponding descriptions and effects in the embodiments shown in fig. 1 to fig. 2, and are not described herein again.
Although the embodiments of the present invention have been described in conjunction with the accompanying drawings, those skilled in the art may make various modifications and variations without departing from the spirit and scope of the invention, and such modifications and variations fall within the scope defined by the appended claims.

Claims (10)

1. A method for constructing a script file detection model is characterized by comprising the following steps:
acquiring a training sample set, wherein the training sample set comprises a plurality of WebShell file samples and a plurality of non-WebShell file samples;
extracting multi-dimensional features of samples in the training sample set based on file attributes and operation attributes of the WebShell file;
and training a preset neural network model according to the multidimensional characteristics to obtain a script file detection model.
2. The method for constructing a script file detection model according to claim 1, wherein before training a preset neural network model according to the multidimensional features, the method further comprises:
and carrying out standardization processing on the multi-dimensional features to obtain standardized features, wherein the standardized features conform to normal distribution with a mean value of 0 and a standard deviation of 1.
3. The method of constructing a script file detection model as claimed in claim 1, wherein the multi-dimensional features comprise: the method comprises the following steps of file coincidence index characteristic, information entropy characteristic, longest character string characteristic, file compression ratio characteristic and behavior operation function calling frequency characteristic.
4. The method for constructing the script file detection model according to claim 3, wherein the file coincidence index feature and the information entropy feature are calculated by the following steps:
acquiring a character string in a sample;
calculating to obtain the document coincidence index characteristic according to the number of any type of characters in the character string, the total number of any type of characters and the number of character types;
converting characters in the character string into ASCII codes;
and calculating the information entropy characteristics according to the entropy of the ASCII code.
5. The method for constructing the script file detecting model according to claim 3, wherein the behavior operation function calling times characteristic comprises: the variable and variable function call times, the character string processing function call times, the system command execution function call times, the database operation function call times, the code execution function call times, the callback function call times, the file class operation function call times and the sensitive transfer variable call times.
6. The method for constructing the script file detection model according to claim 1, wherein training a preset neural network model according to the multidimensional feature to obtain the script file detection model comprises:
and training a preset neural network model according to the multi-dimensional features based on a k-fold cross validation algorithm to obtain a script file detection model.
7. The method of constructing a script file detection model according to claim 1, wherein the preset neural network model comprises a BP neural network or a GA-BP neural network.
8. An apparatus for constructing a script file detection model, comprising:
the system comprises a sample set acquisition module, a training sample set acquisition module and a training sample set acquisition module, wherein the training sample set comprises a plurality of WebShell file samples and a plurality of non-WebShell file samples;
the characteristic extraction module is used for extracting the multidimensional characteristics of the samples in the training sample set based on the file attributes and the operation attributes of the WebShell file;
and the training module is used for training a preset neural network model according to the multidimensional characteristics to obtain a script file detection model.
9. A computer-readable storage medium storing computer instructions for causing a computer to execute the method of constructing a script file detection model according to any one of claims 1 to 7.
10. An electronic device, comprising: a memory and a processor, the memory and the processor being communicatively connected to each other, the memory storing computer instructions, and the processor executing the computer instructions to perform the method of constructing the script file detection model according to any one of claims 1 to 7.
CN202111575911.3A 2021-12-21 2021-12-21 Method and device for constructing script file detection model and storage medium Pending CN114462033A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111575911.3A CN114462033A (en) 2021-12-21 2021-12-21 Method and device for constructing script file detection model and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111575911.3A CN114462033A (en) 2021-12-21 2021-12-21 Method and device for constructing script file detection model and storage medium

Publications (1)

Publication Number Publication Date
CN114462033A true CN114462033A (en) 2022-05-10

Family

ID=81406628

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111575911.3A Pending CN114462033A (en) 2021-12-21 2021-12-21 Method and device for constructing script file detection model and storage medium

Country Status (1)

Country Link
CN (1) CN114462033A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107516041A (en) * 2017-08-17 2017-12-26 北京安普诺信息技术有限公司 WebShell detection methods and its system based on deep neural network
CN109657467A (en) * 2018-11-26 2019-04-19 北京兰云科技有限公司 A kind of webpage back door detection method and device, computer readable storage medium
CN110855661A (en) * 2019-11-11 2020-02-28 杭州安恒信息技术股份有限公司 WebShell detection method, device, equipment and medium
CN112016088A (en) * 2020-08-13 2020-12-01 北京兰云科技有限公司 Method and device for generating file detection model and method and device for detecting file
CN113190849A (en) * 2021-04-28 2021-07-30 重庆邮电大学 Webshell script detection method and device, electronic equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107516041A (en) * 2017-08-17 2017-12-26 北京安普诺信息技术有限公司 WebShell detection methods and its system based on deep neural network
CN109657467A (en) * 2018-11-26 2019-04-19 北京兰云科技有限公司 A kind of webpage back door detection method and device, computer readable storage medium
CN110855661A (en) * 2019-11-11 2020-02-28 杭州安恒信息技术股份有限公司 WebShell detection method, device, equipment and medium
CN112016088A (en) * 2020-08-13 2020-12-01 北京兰云科技有限公司 Method and device for generating file detection model and method and device for detecting file
CN113190849A (en) * 2021-04-28 2021-07-30 重庆邮电大学 Webshell script detection method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
US10248910B2 (en) Detection mitigation and remediation of cyberattacks employing an advanced cyber-decision platform
CN108932426B (en) Unauthorized vulnerability detection method and device
CN107659570A (en) Webshell detection methods and system based on machine learning and static and dynamic analysis
CN107451476A (en) Webpage back door detection method, system, equipment and storage medium based on cloud platform
CN105072089A (en) WEB malicious scanning behavior abnormity detection method and system
JP7120350B2 (en) SECURITY INFORMATION ANALYSIS METHOD, SECURITY INFORMATION ANALYSIS SYSTEM AND PROGRAM
US9871826B1 (en) Sensor based rules for responding to malicious activity
US20140298471A1 (en) Evaluating Security of Data Access Statements
CN111400357A (en) Method and device for identifying abnormal login
CN112532624B (en) Black chain detection method and device, electronic equipment and readable storage medium
CN110855648A (en) Early warning control method and device for network attack
CN111628990A (en) Attack recognition method and device and server
CN113162794A (en) Next-step attack event prediction method and related equipment
CN110572402B (en) Internet hosting website detection method and system based on network access behavior analysis and readable storage medium
CN114338195A (en) Web traffic anomaly detection method and device based on improved isolated forest algorithm
CN116738369A (en) Traffic data classification method, device, equipment and storage medium
CN114462033A (en) Method and device for constructing script file detection model and storage medium
CN110717182A (en) Webpage Trojan horse detection method, device and equipment and readable storage medium
CN115470489A (en) Detection model training method, detection method, device and computer readable medium
CN113923037B (en) Anomaly detection optimization device, method and system based on trusted computing
CN117009832A (en) Abnormal command detection method and device, electronic equipment and storage medium
CN114492576A (en) Abnormal user detection method, system, storage medium and electronic equipment
CN111782967A (en) Information processing method, information processing device, electronic equipment and computer readable storage medium
CN113032774A (en) Training method, device and equipment of anomaly detection model and computer storage medium
CN115378670B (en) APT attack identification method and device, electronic equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination