CN110232277A - Detection method, device and the computer equipment at webpage back door - Google Patents

Detection method, device and the computer equipment at webpage back door Download PDF

Info

Publication number
CN110232277A
CN110232277A CN201910327403.XA CN201910327403A CN110232277A CN 110232277 A CN110232277 A CN 110232277A CN 201910327403 A CN201910327403 A CN 201910327403A CN 110232277 A CN110232277 A CN 110232277A
Authority
CN
China
Prior art keywords
file
model
back door
operation code
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910327403.XA
Other languages
Chinese (zh)
Inventor
李坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201910327403.XA priority Critical patent/CN110232277A/en
Publication of CN110232277A publication Critical patent/CN110232277A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/55Detecting local intrusion or implementing counter-measures
    • G06F21/56Computer malware detection or handling, e.g. anti-virus arrangements
    • G06F21/562Static detection
    • G06F21/563Static detection by source code analysis

Abstract

Present applicant proposes detection method, device and the computer equipments at a kind of webpage back door, wherein the detection method at above-mentioned webpage back door includes: to obtain file to be detected;The extraction operation code from the source file of the file to be detected;N meta-model feature is extracted from the operation code, using the N meta-model feature as the feature vector of the operation code;The feature vector of operation code input convolutional neural networks model trained in advance is classified, obtain the file to be detected whether include webpage back door classification results.The application can detect webpage back door by convolutional neural networks, improve the accuracy of webpage back door detection, and realize that simply the influence to system performance is smaller.

Description

Detection method, device and the computer equipment at webpage back door
[technical field]
This application involves technical field of network security more particularly to a kind of detection method, device and the calculating at webpage back door Machine equipment.
[background technique]
Web, full name World Wide Web, i.e. global wide area network, popular address are website, be it is a kind of based on hypertext and Hypertext transfer protocol (HyperText Transfer Protocol;Hereinafter referred to as: HTTP), global, dynamic interaction , cross-platform distributed graphic information system.WebShell is exactly with Active Server Pages (Active Server Pages;Hereinafter referred to as: ASP), HyperText Preprocessor (Hypertext Preprocessor;Hereinafter referred to as: PHP), Java The server page (Java Server Pages;Hereinafter referred to as: JSP) or common gateway interface (Common Gateway Interface;A kind of order performing environment existing for web page files form such as hereinafter referred to as: CGI), can also call it as A kind of webpage back door.
It is existing in the related technology, generally use static detection scheme, dynamic detection scheme, log analysis scheme and statistics Analytical plan detects Webshell, and since operation system updates frequently, the relevant attribute of Web script file is frequent occurrence Variation, so the method for biasing toward file attribute detection often generates more wrong reports.Method based on dynamic behaviour detection Often technical difficulty is larger, it is difficult to realize, and the performance caused by system is affected, in some instances it may even be possible to make to system stability At influence.Detection method based on log, on the one hand since business function is more and complicated, partial function may seldom can by with It arrives, log access may hit certain detected rules, to cause more to report by mistake, a large amount of log recording of another aspect Burden can be generated to server performance by dealing with, and since the huge detection process elapsed time of log amount is long, detect speed It is relatively slow.And normal database manipulation is often simulated at the back door type WebShell of stealing secret information, and does not have more obvious static special category Property, accessed number is fewer can not to form more apparent access feature, be also difficult to find by log analysis.
[summary of the invention]
The embodiment of the present application provides detection method, device and the computer equipment at a kind of webpage back door, to pass through convolution Neural network detects webpage back door, improves the accuracy of webpage back door detection, and realize simply, to system performance It influences smaller.
In a first aspect, the embodiment of the present application provides a kind of detection method at webpage back door, comprising: obtain text to be detected Part;The extraction operation code from the source file of the file to be detected;N meta-model feature is extracted, from the operation code with the N Feature vector of the meta-model feature as the operation code;By the feature vector input of operation code convolution mind trained in advance Classify through network model, obtain the file to be detected whether include webpage back door classification results.
In one of possible implementation, the extraction operation code packet from the source file of the file to be detected It includes: using the interpreter of the source file of the file to be detected, the source code of the file to be detected being converted into corresponding behaviour Make code.
In one of possible implementation, the N meta-model feature that extracts from the operation code includes: to utilize N- Gram model extracts N meta-model feature from the operation code.
In one of possible implementation, the convolution that the feature vector input of the operation code is trained in advance Neural network model is classified, obtain the file to be detected whether include webpage back door classification results before, further includes: The source file of the web page files of predetermined quantity is collected as sample data;To the source document for belonging to webpage back door in the sample data Part and the source file for being not belonging to webpage back door are labeled;The operation of the sample data is extracted from the sample data after mark Code;N meta-model feature is extracted from the operation code of the sample data, using the N meta-model feature of extraction as the sample number According to operation code sampling feature vectors;The sampling feature vectors are divided into training set and test set;By the training set In sampling feature vectors input convolutional neural networks model to be trained and be trained, obtain training result model;It will be described Sampling feature vectors in test set input the training result model and carry out recursive training, when the training result model exports Classification results and the test set in sampling feature vectors markup information error within a predetermined range when, trained Good convolutional neural networks model, the classification results of the training result model output be sample characteristics in the test set to Amount whether include webpage back door classification results.
In one of possible implementation, it is described to the source file for belonging to webpage back door in the sample data and not Belong to webpage back door source file be labeled before, further includes: the sample data is pre-processed, pretreatment packet It includes following one or combination: filtering data required by not meeting convolutional neural networks model to be trained in the sample data The data of rule desensitize to the sensitive data in the sample data and are formatted processing to the sample data.
Second aspect, the embodiment of the present application provide a kind of detection device at webpage back door, comprising: module are obtained, for obtaining Take file to be detected;Extraction module, for the extraction operation code from the source file for the file to be detected that the acquisition module obtains; And N meta-model feature is extracted from the operation code, using the N meta-model feature as the feature vector of the operation code; Detection module, the feature vector input of the operation code for extracting extraction module convolutional Neural net trained in advance Network model is classified, obtain the file to be detected whether include webpage back door classification results.
In one of possible implementation, the extraction module, specifically for the source using the file to be detected The source code of the file to be detected is converted to corresponding operation code by the interpreter of file.
In one of possible implementation, the extraction module is specifically used for utilizing N-Gram model from the behaviour Make to extract N meta-model feature in code.
In one of possible implementation, the detection device at the webpage back door further include: collection module, mark mould Block, division module and training module;The collection module, in the detection module that the feature vector of the operation code is defeated Enter in advance trained convolutional neural networks model to classify, obtain the file to be detected whether include webpage back door classification As a result before, the source file of the web page files of predetermined quantity is collected as sample data;The labeling module, for the receipts The source file for collecting the source file for belonging to webpage back door in the sample data of module collection and being not belonging to webpage back door is labeled;Institute Extraction module is stated, the operation code of the sample data is extracted in the sample data after being also used to mark from the labeling module;With And N meta-model feature is extracted from the operation code of the sample data, using the N meta-model feature of extraction as the sample data Operation code sampling feature vectors;The division module, for the sampling feature vectors to be divided into training set and test Collection;The training module, for the sampling feature vectors in the training set to be inputted to convolutional neural networks model to be trained It is trained, obtains training result model;And the sampling feature vectors in the test set are inputted into the training result mould Type carries out recursive training, when the sampling feature vectors in the classification results and the test set of training result model output The error of markup information within a predetermined range when, obtain trained convolutional neural networks model, the training result model is defeated Classification results out be the test set in sampling feature vectors whether include webpage back door classification results.
In one of possible implementation, the detection device at the webpage back door further include: preprocessing module is used for The labeling module to the source file for belonging to webpage back door in the sample data and be not belonging to the source file at webpage back door into Before rower note, the sample data is pre-processed, the pretreatment includes following one or combination: filtering the sample The data of data rule required by convolutional neural networks model to be trained are not met in data, in the sample data Sensitive data desensitize and is formatted processing to the sample data.
The third aspect, the embodiment of the present application provide a kind of computer equipment, including memory, processor and are stored in described It is real when the processor executes the computer program on memory and the computer program that can run on the processor Now method as described above.
Fourth aspect, the embodiment of the present application provide a kind of non-transitorycomputer readable storage medium, are stored thereon with meter Calculation machine program, the computer program realize method as described above when being executed by processor.
In above technical scheme, after obtaining file to be detected, the extraction operation from the source file of above-mentioned file to be detected Code, then extracts N meta-model feature from aforesaid operations code, feature using above-mentioned N meta-model feature as aforesaid operations code to The feature vector input of aforesaid operations code convolutional neural networks model trained in advance is finally classified, is obtained above-mentioned by amount File to be detected whether include webpage back door classification results, so as to be examined by convolutional neural networks to webpage back door It surveys, improves the accuracy of webpage back door detection, and realize that simply the influence to system performance is smaller.
[Detailed description of the invention]
Technical solution in ord to more clearly illustrate embodiments of the present application, below will be to needed in the embodiment attached Figure is briefly described, it should be apparent that, the drawings in the following description are only some examples of the present application, for this field For those of ordinary skill, without creative efforts, it can also be obtained according to these attached drawings other attached drawings.
Fig. 1 is the flow chart of the detection method one embodiment at the application webpage back door;
Fig. 2 is the flow chart of another embodiment of the detection method at the application webpage back door;
Fig. 3 is the flow chart of the detection method further embodiment at the application webpage back door;
Fig. 4 is the structural schematic diagram of the detection device one embodiment at the application webpage back door;
Fig. 5 is the structural schematic diagram of another embodiment of the detection device at the application webpage back door;
Fig. 6 is the structural schematic diagram of the application computer equipment one embodiment.
[specific embodiment]
In order to better understand the technical solution of the application, the embodiment of the present application is retouched in detail with reference to the accompanying drawing It states.
It will be appreciated that described embodiments are only a part of embodiments of the present application, instead of all the embodiments.Base Embodiment in the application, it is obtained by those of ordinary skill in the art without making creative efforts it is all its Its embodiment, shall fall in the protection scope of this application.
The term used in the embodiment of the present application is only to be not intended to be limiting merely for for the purpose of describing particular embodiments The application.In the embodiment of the present application and the "an" of singular used in the attached claims, " described " and "the" It is also intended to including most forms, unless the context clearly indicates other meaning.
Fig. 1 is the flow chart of the detection method one embodiment at the application webpage back door, as shown in Figure 1, after above-mentioned webpage Door detection method may include:
Step 101, file to be detected is obtained.
Wherein, above-mentioned file to be detected can be the web page files of ASP, PHP, JSP or CGI format.
Step 102, the extraction operation code from the source file of above-mentioned file to be detected.
Specifically, the source file of above-mentioned file to be detected can be the source code of above-mentioned web page files, from above-mentioned to be detected Extraction operation code can be in the source file of file are as follows:, will be above-mentioned to be checked using the interpreter of the source file of above-mentioned file to be detected The source code for surveying file is converted to corresponding operation code.
By taking above-mentioned file to be detected is the web page files of PHP format as an example, the source code of the web page files of PHP format is in quilt During execution, need to explain above-mentioned source code by PHP interpreter and operation, during this, above-mentioned PHP format The source codes of web page files will be generated operation code (opcode).Due to the source code literary style spirit of the web page files of PHP format It is living, if realizing the detection to webpage back door according to the feature of the source code of the web page files of PHP format, it is easy error.And PHP The corresponding operation code of the source code of the web page files of format more bottom, can more reflect the source code of the web page files of PHP format The feature of substantive execution movement, therefore by the detection at operation code progress webpage back door, the standard of webpage back door detection can be improved Exactness.In specific implementation, the rule that can use PHP code conversion operation code, by the source generation of the web page files of the PHP format Code is converted to corresponding operation code.In one embodiment, it can use webpage text of the existing PHP interpreter by PHP format The source code of part is converted to corresponding operation code.
Step 103, N meta-model feature is extracted from aforesaid operations code, using above-mentioned N meta-model feature as aforesaid operations code Feature vector.
Specifically, it can use N meta-model (N-Gram model) and extract N meta-model feature from aforesaid operations code, also It is to say, can use N-Gram model and write words insertion processing, aforesaid operations code is converted into the vector that use " 0 " or " 1 " indicate, from And obtain the feature vector of aforesaid operations code.
The basic thought of N-Gram model is that content of text is carried out the sliding window that size is N by byte stream to operate, shape The byte fragment sequence for being N at length, each byte segment are known as Gram, count to the occurrence frequency of whole Gram, from And available N meta-model feature, a N meta-model feature may include the number and corresponding N value of Gram, the Gram. Wherein, N is the integer greater than 1, for example, N can be 2,3 or 4.
Step 104, the convolutional neural networks that the feature vector input of aforesaid operations code is trained in advance (Convolutional Neural Networks;Hereinafter referred to as: CNN) model is classified, and obtaining above-mentioned file to be detected is The no classification results including webpage back door.
Wherein, CNN is developed recently, and causes a kind of efficient identification method paid attention to extensively.In the 1960s, Hubel and Wiesel is in studying cat cortex for finding its unique network when local sensitivity and the neuron of direction selection Structure can be effectively reduced the complexity of Feedback Neural Network, then propose CNN.Present CNN has become numerous science One of the research hotspot in field, especially in pattern classification field, since CNN avoids the pretreatment complicated early period to image, Original image can be directly inputted, thus has obtained more being widely applied.
The structure of CNN generally comprises following layer:
(1) input layer: the input of initial data is received;
(2) feature extraction and Feature Mapping convolutional layer: are carried out using convolution kernel;
(3) excitation layer: due to convolution and a kind of linear operation, it is therefore desirable to increase Nonlinear Mapping
(4) pond layer: carrying out down-sampling, to the sparse processing of characteristic pattern, reduces data operation quantity.
(5) it full articulamentum: is usually fitted again in the tail portion of CNN, reduces the loss of characteristic information.
That is, in the present embodiment, after the feature vector input of aforesaid operations code CNN model trained in advance, The feature vector that input layer receives aforesaid operations code first, then convolutional layer using convolution kernel to the feature of aforesaid operations code to Amount carries out feature extraction and Feature Mapping, then increases Nonlinear Mapping by excitation layer, carries out down-sampling by pond layer, finally exists Full articulamentum is fitted again, the above-mentioned file to be detected of acquisition whether include webpage back door classification results.
In the detection method at above-mentioned webpage back door, after obtaining file to be detected, from the source file of above-mentioned file to be detected Then middle extraction operation code extracts N meta-model feature, using above-mentioned N meta-model feature as aforesaid operations from aforesaid operations code The feature vector input of aforesaid operations code CNN model trained in advance is finally classified, is obtained above-mentioned by the feature vector of code Whether file to be detected includes that the classification results at webpage back door pass through so as to be detected by CNN to webpage back door The generalization ability that CNN detects webpage back door is stronger, and the accuracy of webpage back door detection can be improved, and realizes letter Single, the influence to system performance is smaller.
Fig. 2 is the flow chart of another embodiment of the detection method at the application webpage back door, as shown in Fig. 2, the application Fig. 1 In illustrated embodiment, before step 104, can also include:
Step 201, the source file of the web page files of predetermined quantity is collected as sample data.
Wherein, above-mentioned predetermined quantity can voluntarily be set according to system performance and/or realization demand etc. in specific implementation Fixed, the present embodiment is not construed as limiting the size of above-mentioned predetermined quantity.
Step 202, the source file at webpage back door and the source file at webpage back door is not belonging to belonging in above-mentioned sample data It is labeled.
Specifically, can by the source file for belonging to webpage back door in above-mentioned sample data and belong to ASP, PHP, JSP or The source file of CGI is labeled.
Step 203, the operation code of above-mentioned sample data is extracted from the sample data after mark.
Wherein, the concrete mode of extraction operation code may refer to the description of step 102 in the application embodiment illustrated in fig. 1, Details are not described herein.
Step 204, N meta-model feature is extracted from the operation code of above-mentioned sample data, is made with the N meta-model feature of extraction For the sampling feature vectors of the operation code of above-mentioned sample data.
Wherein, the concrete mode for extracting N meta-model feature may refer to step 103 in the application embodiment illustrated in fig. 1 Description, details are not described herein.
It further, can be by the N member of extraction after extracting N meta-model feature in the operation code of above-mentioned sample data The aspect of model is stored as npy tag file.
Step 205, above-mentioned sampling feature vectors are divided into training set and test set.
Specifically, it is assumed that above-mentioned predetermined quantity is 100,000, can be by 70,000 samples in the sampling feature vectors of 100,000 quantity Feature vector is divided into training set, and 30,000 sampling feature vectors are divided into test set.
Step 206, the sampling feature vectors in above-mentioned training set are inputted CNN model to be trained to be trained, is obtained Training result model.
Wherein, the structure of CNN model to be trained refers to the description of step 104 in the application embodiment illustrated in fig. 1, This is repeated no more.
Step 207, the sampling feature vectors in above-mentioned test set are inputted into the training result model and carry out recursive training, When the error of the markup information of the sampling feature vectors in the classification results and above-mentioned test set of above-mentioned training result model output When within a predetermined range, trained CNN model is obtained.
Wherein, the classification results of above-mentioned training result model output are whether the sampling feature vectors in above-mentioned test set wrap Include the classification results at webpage back door.
That is, when whether the sampling feature vectors in the above-mentioned test set of above-mentioned training result model output include net The classification results at page back door, the information phase for whether belonging to webpage back door being marked with the sampling feature vectors in above-mentioned test set When than, error within a predetermined range, it can determine that accuracy that above-mentioned training result model detects webpage back door has reached and want It asks, at this moment terminates to train, obtain trained CNN model.Later, so that it may using trained CNN model prediction it is new to Detect whether file is webpage back door.
Fig. 3 is the flow chart of the detection method further embodiment at the application webpage back door, as shown in figure 3, the application Fig. 1 In illustrated embodiment, before step 202, can also include:
Step 301, above-mentioned sample data is pre-processed, above-mentioned pretreatment includes following one or combination: in filtering State the data of data rule required by not meeting CNN model to be trained in sample data, to quick in above-mentioned sample data Sense data desensitize and are formatted processing to above-mentioned sample data.
In the present embodiment, before being labeled above-mentioned sample data, need first to locate above-mentioned sample data in advance Reason is labeled above-mentioned sample data and feature extraction so as to subsequent.
Fig. 4 is the structural schematic diagram of the detection device one embodiment at the application webpage back door, the webpage in the present embodiment The detection method at webpage back door provided by the embodiments of the present application may be implemented in the detection device at back door.As shown in figure 4, above-mentioned webpage The detection device at back door may include: to obtain module 41, extraction module 42 and detection module 43;
Wherein, module 41 is obtained, for obtaining file to be detected;Wherein, above-mentioned file to be detected can for ASP, PHP, The web page files of JSP or CGI format.
Extraction module 42, for the extraction operation code from the source file for obtaining the file to be detected that module 41 obtains;And N meta-model feature is extracted from aforesaid operations code, using above-mentioned N meta-model feature as the feature vector of aforesaid operations code;This reality It applies in example, extraction module 42, specifically for the interpreter of the source file using above-mentioned file to be detected, by above-mentioned file to be detected Source code be converted to corresponding operation code.
By taking above-mentioned file to be detected is the web page files of PHP format as an example, the source code of the web page files of PHP format is in quilt During execution, need to explain above-mentioned source code by PHP interpreter and operation, during this, above-mentioned PHP format The source codes of web page files will be generated operation code (opcode).Due to the source code literary style spirit of the web page files of PHP format It is living, if realizing the detection to webpage back door according to the feature of the source code of the web page files of PHP format, it is easy error.And PHP The corresponding operation code of the source code of the web page files of format more bottom, can more reflect the source code of the web page files of PHP format The feature of substantive execution movement, therefore by the detection at operation code progress webpage back door, the standard of webpage back door detection can be improved Exactness.In specific implementation, extraction module 42 can use the rule of PHP code conversion operation code, by the webpage of the PHP format The source code of file is converted to corresponding operation code.In one embodiment, extraction module 42 can use existing PHP and explain The source code of the web page files of PHP format is converted to corresponding operation code by device.
In the present embodiment, extraction module 42 is specifically used for extracting N meta-model from aforesaid operations code using N-Gram model Feature.It writes words insertion processing that is, can use N-Gram model, aforesaid operations code, which is converted to use " 0 " or " 1 ", to be indicated Vector, to obtain the feature vector of aforesaid operations code.
The basic thought of N-Gram model is that content of text is carried out the sliding window that size is N by byte stream to operate, shape The byte fragment sequence for being N at length, each byte segment are known as Gram, count to the occurrence frequency of whole Gram, from And available N meta-model feature, a N meta-model feature may include the number and corresponding N value of Gram, the Gram. Wherein, N is the integer greater than 1, for example, N can be 2,3 or 4.
Detection module 43, the feature vector input training in advance of the aforesaid operations code for extracting extraction module 42 CNN classifies, obtain above-mentioned file to be detected whether include webpage back door classification results.
Wherein, CNN is developed recently, and causes a kind of efficient identification method paid attention to extensively.In the 1960s, Hubel and Wiesel is in studying cat cortex for finding its unique network when local sensitivity and the neuron of direction selection Structure can be effectively reduced the complexity of Feedback Neural Network, then propose CNN.Present CNN has become numerous science One of the research hotspot in field, especially in pattern classification field, since CNN avoids the pretreatment complicated early period to image, Original image can be directly inputted, thus has obtained more being widely applied.
The structure of CNN generally comprises following layer:
(1) input layer: the input of initial data is received;
(2) feature extraction and Feature Mapping convolutional layer: are carried out using convolution kernel;
(3) excitation layer: due to convolution and a kind of linear operation, it is therefore desirable to increase Nonlinear Mapping
(4) pond layer: carrying out down-sampling, to the sparse processing of characteristic pattern, reduces data operation quantity.
(5) it full articulamentum: is usually fitted again in the tail portion of CNN, reduces the loss of characteristic information.
That is, in the present embodiment, after the feature vector input of aforesaid operations code CNN model trained in advance, The feature vector that input layer receives aforesaid operations code first, then convolutional layer using convolution kernel to the feature of aforesaid operations code to Amount carries out feature extraction and Feature Mapping, then increases Nonlinear Mapping by excitation layer, carries out down-sampling by pond layer, finally exists Full articulamentum is fitted again, the above-mentioned file to be detected of acquisition whether include webpage back door classification results.
It in the detection device at above-mentioned webpage back door, obtains after module 41 obtains file to be detected, extraction module 42 is from upper Extraction operation code in the source file of file to be detected is stated, N meta-model feature is then extracted from aforesaid operations code, with above-mentioned N member Feature vector of the aspect of model as aforesaid operations code, last detection module 43 input the feature vector of aforesaid operations code preparatory Trained CNN model is classified, obtain above-mentioned file to be detected whether include webpage back door classification results, so as to logical It crosses CNN to detect webpage back door, the generalization ability for detecting webpage back door by CNN is stronger, and webpage can be improved The accuracy of back door detection, and realize that simply the influence to system performance is smaller.
Fig. 5 is the structural schematic diagram of another embodiment of the detection device at the application webpage back door, with webpage shown in Fig. 4 The detection device at back door is compared, the difference is that, the detection device at webpage back door shown in fig. 5 can also include: collection mould Block 44, labeling module 45, division module 46 and training module 47;
Collection module 44, for detection module 43 by the feature vector of aforesaid operations code input CNN trained in advance into Row classification, obtain above-mentioned file to be detected whether include webpage back door classification results before, collect the webpage text of predetermined quantity The source file of part is as sample data;Wherein, above-mentioned predetermined quantity can in specific implementation, according to system performance and/or reality Sets itselfs, the present embodiment such as existing demand are not construed as limiting the size of above-mentioned predetermined quantity.
Labeling module 45 belongs to the source file at webpage back door in the sample data for collecting to collection module 44 and does not belong to Source file in webpage back door is labeled;Specifically, labeling module 45 can will belong to webpage back door in above-mentioned sample data Source file and belong to the source file of ASP, PHP, JSP or CGI and be labeled.
Extraction module 42, for extracting the operation of above-mentioned sample data from the sample data after the mark of labeling module 45 Code;And N meta-model feature is extracted from the operation code of above-mentioned sample data, using the N meta-model feature of extraction as above-mentioned sample The sampling feature vectors of the operation code of notebook data;Further, extraction module 42 extracts N from the operation code of above-mentioned sample data After meta-model feature, the N meta-model feature of extraction can be stored as npy tag file.
Division module 46, for above-mentioned sampling feature vectors to be divided into training set and test set;Specifically, it is assumed that above-mentioned Predetermined quantity is 100,000, and division module 47 can divide 70,000 sampling feature vectors in the sampling feature vectors of 100,000 quantity For training set, 30,000 sampling feature vectors are divided into test set.
Training module 47 is instructed for the sampling feature vectors in above-mentioned training set to be inputted CNN model to be trained Practice, obtains training result model;And the sampling feature vectors in above-mentioned test set are inputted into above-mentioned training result model and are carried out Recursive training, when the mark of the sampling feature vectors in the classification results and above-mentioned test set of above-mentioned training result model output is believed The error of breath within a predetermined range when, obtain trained CNN model, the classification results of above-mentioned training result model output are upper State the sampling feature vectors in test set whether include webpage back door classification results.
That is, when whether the sampling feature vectors in the above-mentioned test set of above-mentioned training result model output include net The classification results at page back door, the information phase for whether belonging to webpage back door being marked with the sampling feature vectors in above-mentioned test set When than, error within a predetermined range, it can determine that accuracy that above-mentioned training result model detects webpage back door has reached and want It asks, at this moment terminates to train, obtain trained CNN model.Later, so that it may using trained CNN model prediction it is new to Detect whether file is webpage back door.
Further, the detection device at above-mentioned webpage back door can also include: preprocessing module 48;
Preprocessing module 48, for labeling module 45 to the source file for belonging to webpage back door in above-mentioned sample data and not Belong to webpage back door source file be labeled before, above-mentioned sample data is pre-processed, above-mentioned pretreatment includes following One of or combination: filter the data of data rule required by not meeting CNN model to be trained in above-mentioned sample data, to upper The sensitive data in sample data is stated desensitize and be formatted processing to above-mentioned sample data.
In the present embodiment, before labeling module 45 is labeled above-mentioned sample data, preprocessing module 48 needs elder generation Above-mentioned sample data is pre-processed, above-mentioned sample data is labeled and feature extraction so as to subsequent.
Fig. 6 is the structural schematic diagram of the application computer equipment one embodiment, and above-mentioned computer equipment may include depositing Reservoir, processor and it is stored in the computer program that can be run on above-mentioned memory and on above-mentioned processor, above-mentioned processor When executing above-mentioned computer program, the detection method at webpage back door provided by the embodiments of the present application may be implemented.
Wherein, above-mentioned computer equipment can be server, such as: Cloud Server, or electronic equipment, such as: The intelligent electronic devices such as smart phone, smartwatch or tablet computer, specific form of the present embodiment to above-mentioned computer equipment It is not construed as limiting.
Fig. 6 shows the block diagram for being suitable for the exemplary computer device 12 for being used to realize the application embodiment.Fig. 6 is shown Computer equipment 12 be only an example, should not function to the embodiment of the present application and use scope bring any restrictions.
As shown in fig. 6, computer equipment 12 is showed in the form of universal computing device.The component of computer equipment 12 can be with Including but not limited to: one or more processor or processing unit 16, system storage 28 connect different system components The bus 18 of (including system storage 28 and processing unit 16).
Bus 18 indicates one of a few class bus structures or a variety of, including memory bus or Memory Controller, Peripheral bus, graphics acceleration port, processor or the local bus using any bus structures in a variety of bus structures.It lifts For example, these architectures include but is not limited to industry standard architecture (Industry Standard Architecture;Hereinafter referred to as: ISA) bus, microchannel architecture (Micro Channel Architecture;Below Referred to as: MAC) bus, enhanced isa bus, Video Electronics Standards Association (Video Electronics Standards Association;Hereinafter referred to as: VESA) local bus and peripheral component interconnection (Peripheral Component Interconnection;Hereinafter referred to as: PCI) bus.
Computer equipment 12 typically comprises a variety of computer system readable media.These media can be it is any can be by The usable medium that computer equipment 12 accesses, including volatile and non-volatile media, moveable and immovable medium.
System storage 28 may include the computer system readable media of form of volatile memory, such as arbitrary access Memory (Random Access Memory;Hereinafter referred to as: RAM) 30 and/or cache memory 32.Computer equipment 12 It may further include other removable/nonremovable, volatile/non-volatile computer system storage mediums.Only conduct Citing, storage system 34 can be used for reading and writing immovable, non-volatile magnetic media, and (Fig. 6 do not show, commonly referred to as " hard disk Driver ").Although being not shown in Fig. 6, the magnetic for reading and writing to removable non-volatile magnetic disk (such as " floppy disk ") can be provided Disk drive, and to removable anonvolatile optical disk (such as: compact disc read-only memory (Compact Disc Read Only Memory;Hereinafter referred to as: CD-ROM), digital multi CD-ROM (Digital Video Disc Read Only Memory;Hereinafter referred to as: DVD-ROM) or other optical mediums) read-write CD drive.In these cases, each driving Device can be connected by one or more data media interfaces with bus 18.Memory 28 may include that at least one program produces Product, the program product have one group of (for example, at least one) program module, and it is each that these program modules are configured to perform the application The function of embodiment.
Program/utility 40 with one group of (at least one) program module 42 can store in such as memory 28 In, such program module 42 includes --- but being not limited to --- operating system, one or more application program, other programs It may include the realization of network environment in module and program data, each of these examples or certain combination.Program mould Block 42 usually executes function and/or method in embodiments described herein.
Computer equipment 12 can also be with one or more external equipments 14 (such as keyboard, sensing equipment, display 24 Deng) communication, can also be enabled a user to one or more equipment interact with the computer equipment 12 communicate, and/or with make The computer equipment 12 any equipment (such as network interface card, the modulatedemodulate that can be communicated with one or more of the other calculating equipment Adjust device etc.) communication.This communication can be carried out by input/output (I/O) interface 22.Also, computer equipment 12 may be used also To pass through network adapter 20 and one or more network (such as local area network (Local Area Network;Hereinafter referred to as: LAN), wide area network (Wide Area Network;Hereinafter referred to as: WAN) and/or public network, for example, internet) communication.Such as figure Shown in 6, network adapter 20 is communicated by bus 18 with other modules of computer equipment 12.Although should be understood that in Fig. 6 not It shows, other hardware and/or software module can be used in conjunction with computer equipment 12, including but not limited to: microcode, equipment are driven Dynamic device, redundant processing unit, external disk drive array, RAID system, tape drive and data backup storage system etc..
Processing unit 16 by the program that is stored in system storage 28 of operation, thereby executing various function application and Data processing, such as realize the detection method at webpage back door provided by the embodiments of the present application.
The embodiment of the present application also provides a kind of non-transitorycomputer readable storage medium, is stored thereon with computer journey The detection method at webpage back door provided by the embodiments of the present application may be implemented in sequence, above-mentioned computer program when being executed by processor.
Above-mentioned non-transitorycomputer readable storage medium can appointing using one or more computer-readable media Meaning combination.Computer-readable medium can be computer-readable signal media or computer readable storage medium.Computer can Reading storage medium for example may be-but not limited to-the system of electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, device Or device, or any above combination.The more specific example (non exhaustive list) of computer readable storage medium includes: Electrical connection, portable computer diskette, hard disk, random access memory (RAM), read-only storage with one or more conducting wires Device (Read Only Memory;Hereinafter referred to as: ROM), erasable programmable read only memory (Erasable Programmable Read Only Memory;Hereinafter referred to as: EPROM) or flash memory, optical fiber, portable compact disc are read-only deposits Reservoir (CD-ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.In this document, computer Readable storage medium storing program for executing can be any tangible medium for including or store program, which can be commanded execution system, device Either device use or in connection.
Computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal, Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including --- but It is not limited to --- electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be Any computer-readable medium other than computer readable storage medium, which can send, propagate or Transmission is for by the use of instruction execution system, device or device or program in connection.
The program code for including on computer-readable medium can transmit with any suitable medium, including --- but it is unlimited In --- wireless, electric wire, optical cable, RF etc. or above-mentioned any appropriate combination.
Can with one or more programming languages or combinations thereof come write for execute the application operation computer Program code, above procedure design language include object oriented program language-such as Java, Smalltalk, C++, It further include conventional procedural programming language-such as " C " language or similar programming language.Program code can be with It fully executes, partly execute on the user computer on the user computer, being executed as an independent software package, portion Divide and partially executes or executed on a remote computer or server completely on the remote computer on the user computer.? It is related in the situation of remote computer, remote computer can pass through the network of any kind --- including local area network (Local Area Network;Hereinafter referred to as: LAN) or wide area network (Wide Area Network;Hereinafter referred to as: WAN) it is connected to user Computer, or, it may be connected to outer computer (such as being connected using ISP by internet).
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show The description of example " or " some examples " etc. means specific features, structure, material or spy described in conjunction with this embodiment or example Point is contained at least one embodiment or example of the application.In the present specification, schematic expression of the above terms are not It must be directed to identical embodiment or example.Moreover, particular features, structures, materials, or characteristics described can be in office It can be combined in any suitable manner in one or more embodiment or examples.In addition, without conflicting with each other, the skill of this field Art personnel can tie the feature of different embodiments or examples described in this specification and different embodiments or examples It closes and combines.
In addition, term " first ", " second " are used for descriptive purposes only and cannot be understood as indicating or suggesting relative importance Or implicitly indicate the quantity of indicated technical characteristic.Define " first " as a result, the feature of " second " can be expressed or Implicitly include at least one this feature.In the description of the present application, the meaning of " plurality " is at least two, such as two, three It is a etc., unless otherwise specifically defined.
Any process described otherwise above or method description are construed as in flow chart or herein, and expression includes It is one or more for realizing custom logic function or process the step of executable instruction code module, segment or portion Point, and the range of the preferred embodiment of the application includes other realization, wherein can not press shown or discussed suitable Sequence, including according to related function by it is basic simultaneously in the way of or in the opposite order, Lai Zhihang function, this should be by the application Embodiment person of ordinary skill in the field understood.
Depending on context, word as used in this " if " can be construed to " ... when " or " when ... When " or " in response to determination " or " in response to detection ".Similarly, depend on context, phrase " if it is determined that " or " if detection (condition or event of statement) " can be construed to " when determining " or " in response to determination " or " when the detection (condition of statement Or event) when " or " in response to detection (condition or event of statement) ".
It should be noted that terminal involved in the embodiment of the present application can include but is not limited to personal computer (Personal Computer;Hereinafter referred to as: PC), personal digital assistant (Personal Digital Assistant;Below Referred to as: PDA), radio hand-held equipment, tablet computer (Tablet Computer), mobile phone, MP3 player, MP4 player etc..
In several embodiments provided herein, it should be understood that disclosed systems, devices and methods, it can be with It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, said units It divides, only a kind of logical function partition, there may be another division manner in actual implementation, for example, multiple units or group Part can be combined or can be integrated into another system, or some features can be ignored or not executed.Another point, it is shown Or the mutual coupling, direct-coupling or communication connection discussed can be through some interfaces, device or unit it is indirect Coupling or communication connection can be electrical property, mechanical or other forms.
It, can also be in addition, each functional unit in each embodiment of the application can integrate in one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of hardware adds SFU software functional unit.
The above-mentioned integrated unit being realized in the form of SFU software functional unit can store and computer-readable deposit at one In storage media.Above-mentioned SFU software functional unit is stored in a storage medium, including some instructions are used so that a computer It is each that device (can be personal computer, server or network equipment etc.) or processor (Processor) execute the application The part steps of the embodiment above method.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (Read- Only Memory;Hereinafter referred to as: ROM), random access memory (Random Access Memory;Hereinafter referred to as: RAM), The various media that can store program code such as magnetic or disk.
It above are only the preferred embodiment of the application above, not to limit the application, all essences in the application Within mind and principle, any modification, equivalent substitution, improvement and etc. done be should be included within the scope of the application protection.

Claims (10)

1. a kind of detection method at webpage back door characterized by comprising
Obtain file to be detected;
The extraction operation code from the source file of the file to be detected;
N meta-model feature is extracted from the operation code, using the N meta-model feature as the feature vector of the operation code;
The feature vector input of operation code convolutional neural networks model trained in advance is classified, is obtained described to be checked Survey file whether include webpage back door classification results.
2. the method according to claim 1, wherein described extract behaviour from the source file of the file to be detected Include: as code
Using the interpreter of the source file of the file to be detected, the source code of the file to be detected is converted into corresponding behaviour Make code.
3. the method according to claim 1, wherein described extract N meta-model feature packet from the operation code It includes:
N meta-model feature is extracted from the operation code using N-Gram model.
4. method according to claim 1 to 3, which is characterized in that the feature vector by the operation code Input convolutional neural networks model trained in advance is classified, and obtains whether the file to be detected includes dividing for webpage back door Before class result, further includes:
The source file of the web page files of predetermined quantity is collected as sample data;
The source file for belonging to webpage back door in the sample data and the source file for being not belonging to webpage back door are labeled;
The operation code of the sample data is extracted from the sample data after mark;
N meta-model feature is extracted from the operation code of the sample data, using the N meta-model feature of extraction as the sample number According to operation code sampling feature vectors;
The sampling feature vectors are divided into training set and test set;
Sampling feature vectors in the training set are inputted convolutional neural networks model to be trained to be trained, are trained Results model;
Sampling feature vectors in the test set are inputted into the training result model and carry out recursive training, when the training knot The error of the markup information of the classification results and sampling feature vectors in the test set of fruit model output is within a predetermined range When, trained convolutional neural networks model is obtained, the classification results of the training result model output are in the test set Sampling feature vectors whether include webpage back door classification results.
5. according to the method described in claim 4, it is characterized in that, described to the source for belonging to webpage back door in the sample data Before file and the source file for being not belonging to webpage back door are labeled, further includes:
The sample data is pre-processed, the pretreatment includes following one or combination: being filtered in the sample data The data of data rule required by convolutional neural networks model to be trained are not met, to the sensitive number in the sample data According to carry out desensitize and processing is formatted to the sample data.
6. a kind of detection device at webpage back door characterized by comprising
Module is obtained, for obtaining file to be detected;
Extraction module, for the extraction operation code from the source file for the file to be detected that the acquisition module obtains;And from institute Extraction N meta-model feature in operation code is stated, using the N meta-model feature as the feature vector of the operation code;
Detection module, the feature vector input of the operation code for extracting extraction module convolution mind trained in advance Classify through network model, obtain the file to be detected whether include webpage back door classification results.
7. device according to claim 6, which is characterized in that
The extraction module, specifically for the interpreter of the source file using the file to be detected, by the file to be detected Source code be converted to corresponding operation code.
8. device according to claim 6, which is characterized in that
The extraction module is specifically used for extracting N meta-model feature from the operation code using N-Gram model.
9. a kind of computer equipment, which is characterized in that including memory, processor and be stored on the memory and can be in institute The computer program run on processor is stated, when the processor executes the computer program, is realized as in claim 1-5 Any method.
10. a kind of non-transitorycomputer readable storage medium, is stored thereon with computer program, which is characterized in that the meter Such as method as claimed in any one of claims 1 to 5 is realized when calculation machine program is executed by processor.
CN201910327403.XA 2019-04-23 2019-04-23 Detection method, device and the computer equipment at webpage back door Pending CN110232277A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910327403.XA CN110232277A (en) 2019-04-23 2019-04-23 Detection method, device and the computer equipment at webpage back door

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910327403.XA CN110232277A (en) 2019-04-23 2019-04-23 Detection method, device and the computer equipment at webpage back door

Publications (1)

Publication Number Publication Date
CN110232277A true CN110232277A (en) 2019-09-13

Family

ID=67860208

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910327403.XA Pending CN110232277A (en) 2019-04-23 2019-04-23 Detection method, device and the computer equipment at webpage back door

Country Status (1)

Country Link
CN (1) CN110232277A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108664791A (en) * 2017-03-29 2018-10-16 腾讯科技(深圳)有限公司 A kind of webpage back door detection method in HyperText Preprocessor code and device
CN111260033A (en) * 2020-01-15 2020-06-09 电子科技大学 Website backdoor detection method based on convolutional neural network model
CN113810400A (en) * 2021-09-13 2021-12-17 北京百度网讯科技有限公司 Website parasite detection method, device, equipment and medium
CN114499944A (en) * 2021-12-22 2022-05-13 天翼云科技有限公司 Method, device and equipment for detecting WebShell

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103761481A (en) * 2014-01-23 2014-04-30 北京奇虎科技有限公司 Method and device for automatically processing malicious code sample
CN108334781A (en) * 2018-03-07 2018-07-27 腾讯科技(深圳)有限公司 Method for detecting virus, device, computer readable storage medium and computer equipment
CN108664791A (en) * 2017-03-29 2018-10-16 腾讯科技(深圳)有限公司 A kind of webpage back door detection method in HyperText Preprocessor code and device
CN109067708A (en) * 2018-06-29 2018-12-21 北京奇虎科技有限公司 A kind of detection method, device, equipment and the storage medium at webpage back door
CN109492692A (en) * 2018-11-07 2019-03-19 北京知道创宇信息技术有限公司 A kind of webpage back door detection method, device, electronic equipment and storage medium
CN109657467A (en) * 2018-11-26 2019-04-19 北京兰云科技有限公司 A kind of webpage back door detection method and device, computer readable storage medium
CN109657459A (en) * 2018-10-11 2019-04-19 平安科技(深圳)有限公司 Webpage back door detection method, equipment, storage medium and device

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103761481A (en) * 2014-01-23 2014-04-30 北京奇虎科技有限公司 Method and device for automatically processing malicious code sample
CN108664791A (en) * 2017-03-29 2018-10-16 腾讯科技(深圳)有限公司 A kind of webpage back door detection method in HyperText Preprocessor code and device
CN108334781A (en) * 2018-03-07 2018-07-27 腾讯科技(深圳)有限公司 Method for detecting virus, device, computer readable storage medium and computer equipment
CN109067708A (en) * 2018-06-29 2018-12-21 北京奇虎科技有限公司 A kind of detection method, device, equipment and the storage medium at webpage back door
CN109657459A (en) * 2018-10-11 2019-04-19 平安科技(深圳)有限公司 Webpage back door detection method, equipment, storage medium and device
CN109492692A (en) * 2018-11-07 2019-03-19 北京知道创宇信息技术有限公司 A kind of webpage back door detection method, device, electronic equipment and storage medium
CN109657467A (en) * 2018-11-26 2019-04-19 北京兰云科技有限公司 A kind of webpage back door detection method and device, computer readable storage medium

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108664791A (en) * 2017-03-29 2018-10-16 腾讯科技(深圳)有限公司 A kind of webpage back door detection method in HyperText Preprocessor code and device
CN111260033A (en) * 2020-01-15 2020-06-09 电子科技大学 Website backdoor detection method based on convolutional neural network model
CN113810400A (en) * 2021-09-13 2021-12-17 北京百度网讯科技有限公司 Website parasite detection method, device, equipment and medium
CN114499944A (en) * 2021-12-22 2022-05-13 天翼云科技有限公司 Method, device and equipment for detecting WebShell
CN114499944B (en) * 2021-12-22 2023-08-08 天翼云科技有限公司 Method, device and equipment for detecting WebShell

Similar Documents

Publication Publication Date Title
CN107908635B (en) Method and device for establishing text classification model and text classification
CN110232277A (en) Detection method, device and the computer equipment at webpage back door
CN107992596B (en) Text clustering method, text clustering device, server and storage medium
US20180277097A1 (en) Method and device for extracting acoustic feature based on convolution neural network and terminal device
CN110276023B (en) POI transition event discovery method, device, computing equipment and medium
CN112380377B (en) Audio recommendation method and device, electronic equipment and computer storage medium
JP6756079B2 (en) Artificial intelligence-based ternary check method, equipment and computer program
CN112200318B (en) Target detection method, device, machine readable medium and equipment
CN104346408B (en) A kind of method and apparatus being labeled to the network user
CN111858943A (en) Music emotion recognition method and device, storage medium and electronic equipment
CN108563655A (en) Text based event recognition method and device
CN107909088A (en) Obtain method, apparatus, equipment and the computer-readable storage medium of training sample
CN110941951A (en) Text similarity calculation method, text similarity calculation device, text similarity calculation medium and electronic equipment
CN110532562B (en) Neural network training method, idiom misuse detection method and device and electronic equipment
CN110019837A (en) The generation method and device, computer equipment and readable medium of user's portrait
CN110363206B (en) Clustering of data objects, data processing and data identification method
CN107368568A (en) A kind of method, apparatus, equipment and storage medium for taking down notes generation
US20230315990A1 (en) Text detection method and apparatus, electronic device, and storage medium
CN112182167B (en) Text matching method and device, terminal equipment and storage medium
US11176311B1 (en) Enhanced section detection using a combination of object detection with heuristics
CN108268602A (en) Analyze method, apparatus, equipment and the computer storage media of text topic point
CN110929499B (en) Text similarity obtaining method, device, medium and electronic equipment
KR20220044074A (en) Public opinion acquisition and word viscosity model training methods and devices, server, and medium
CN116844573A (en) Speech emotion recognition method, device, equipment and medium based on artificial intelligence
CN111127057B (en) Multi-dimensional user portrait recovery method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination