CN115314267B - Monitoring method and device for coping with webpage faults and webpage loopholes - Google Patents

Monitoring method and device for coping with webpage faults and webpage loopholes Download PDF

Info

Publication number
CN115314267B
CN115314267B CN202210895518.0A CN202210895518A CN115314267B CN 115314267 B CN115314267 B CN 115314267B CN 202210895518 A CN202210895518 A CN 202210895518A CN 115314267 B CN115314267 B CN 115314267B
Authority
CN
China
Prior art keywords
webpage
vulnerability
web page
detection model
training set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210895518.0A
Other languages
Chinese (zh)
Other versions
CN115314267A (en
Inventor
黄碧银
谢津
刘明东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Huishenwang Information Technologies Co ltd
Original Assignee
Shenzhen Huishenwang Information Technologies Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Huishenwang Information Technologies Co ltd filed Critical Shenzhen Huishenwang Information Technologies Co ltd
Priority to CN202210895518.0A priority Critical patent/CN115314267B/en
Publication of CN115314267A publication Critical patent/CN115314267A/en
Application granted granted Critical
Publication of CN115314267B publication Critical patent/CN115314267B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1433Vulnerability analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0631Management of faults, events, alarms or notifications using root cause analysis; using analysis of correlation between notifications, alarms or events based on decision criteria, e.g. hierarchy, tree or time analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/02Capturing of monitoring data
    • H04L43/022Capturing of monitoring data by sampling
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer And Data Communications (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention relates to the technical field of webpage safety monitoring, and discloses a monitoring method and a device for coping with webpage faults and webpage loopholes, wherein the method comprises the following steps: constructing a webpage vulnerability text feature detection model, inputting a webpage data stream into the model, and outputting webpage vulnerability text features by the model; constructing a webpage vulnerability detection model, and outputting a webpage vulnerability detection result corresponding to the webpage vulnerability text characteristics; performing differential sampling on webpage vulnerability text features in a training set by utilizing a differential sampling strategy, and performing optimization training on a webpage vulnerability detection model; inputting the webpage vulnerability text characteristics of the webpage data stream to be detected into an optimized webpage vulnerability detection model, and outputting a webpage vulnerability detection result of whether the webpage to be detected has the webpage vulnerability or not by the model. According to the invention, the characteristic fields in the webpage communication data are rapidly extracted based on the N-gram, so that vulnerability detection based on the webpage communication characteristic fields is realized, and the accuracy of model detection is improved based on differential sampling.

Description

Monitoring method and device for coping with webpage faults and webpage loopholes
Technical Field
The invention relates to the technical field of webpage safety monitoring, in particular to a monitoring method and device for coping with webpage faults and webpage loopholes.
Background
With the continuous development of information technology, WEB becomes an important carrier in the information technology. Although WEB security has long become an important point of attention, attack means against WEB also show a trend of diversification, and WAF becomes an important means for detecting WEB attack. The existing WAF detection tools are all post-detection, namely after the attack is finished, a security expert analyzes the WAF detection tools to form WAF rules, the WAF rules are added to a WAF engine to detect webpage faults and webpage loopholes, and the detection mode can reduce the times of the webpage being attacked to a certain extent, but has serious time-lag defects, and especially WEB webpages in important field industry are more unacceptable, so that serious economic losses are caused. Aiming at the problem, the patent provides a monitoring method for coping with webpage faults and webpage loopholes, which can analyze webpage loopholes according to webpage contents in advance to form WAF rules and improve the safety degree of the webpage.
Disclosure of Invention
In view of the above, the invention provides a method and a device for monitoring web page faults and web page vulnerabilities, which aim to rapidly extract characteristic fields in web page communication data based on N-gram, and adopt a differential sampling strategy to differentially sample web page vulnerability text characteristics of web pages in a training set under the condition of considering sample imbalance, so as to avoid overfitting caused by too-dependent model of limited data samples and improve the detection accuracy of a web page vulnerability detection model; because the number of web pages with holes in practical application is far smaller than that of normal web pages, the traditional model training method easily causes the training model to deviate from normal samples, and the model can deviate from a small amount of sample data by distributing different weights to sample data with different unbalance rates.
The invention provides a monitoring method for coping with web page faults and web page loopholes, which comprises the following steps:
s1: collecting web pages containing loopholes and not containing loopholes respectively to form a training set, collecting web page data streams of web pages in the training set, constructing a web page loophole text feature detection model, inputting the collected web page data streams into the web page loophole text feature detection model, and outputting web page loophole text features by the web page loophole text feature detection model, wherein the method for finding abnormal word transfer based on N-gram is a specific implementation method of the web page loophole text feature detection model;
s2: constructing a webpage vulnerability detection model, wherein the webpage vulnerability detection model takes webpage vulnerability text characteristics as input and takes a webpage vulnerability detection result as output;
s3: performing differential sampling on webpage vulnerability text features in a training set by utilizing a differential sampling strategy, and performing optimization training on the constructed webpage vulnerability detection model to obtain an optimized webpage vulnerability detection model;
s4: collecting webpage data to be detected, inputting the webpage data to be detected into a webpage vulnerability text feature detection model, extracting webpage vulnerability text features, inputting the extracted webpage vulnerability text features into an optimized webpage vulnerability detection model, outputting whether the webpage to be detected has webpage vulnerabilities by the optimized webpage vulnerability detection model, and generating an alarm if the webpage to be detected has webpage vulnerabilities.
As a further improvement of the present invention:
optionally, in the step S1, collecting web pages containing holes and not containing holes respectively to form a training set includes:
collecting web pages containing holes and not containing holes respectively to form a training set data, wherein the total number of web pages in the training set is K+K ', and the training set comprises K web pages containing holes and K' web pages not containing holes;
the storage format of the web pages in the training set is as follows:
Figure GDA0004242500400000021
wherein:
html 1 (K) Webpage, html (hypertext markup language) representing K-th vulnerability-containing webpage in training set data 0 And (K+K ') represents the K' th webpage without loopholes in the training set data.
Optionally, the step S1 of collecting the web page data stream includes:
constructing a virtual system in a server, and taking the constructed virtual system as a sandbox;
running the webpage in a sandbox, and collecting a webpage data stream generated by running the webpage by a virtual system in the sandbox; html of the jth webpage in the training set i (j) The web page data flow of (1) is flow j ,i={0,1};
The collecting steps of the webpage data stream are as follows:
s11: setting a monitoring program in the sandbox;
s12: starting a monitoring program when a webpage is operated in the sandbox;
s13: the monitoring program adopts Post mode to request communication message data generated when the webpage runs, and the communication message data is used as webpage data stream.
Optionally, the step S1 of constructing a web page vulnerability text feature detection model includes:
constructing a webpage vulnerability text feature detection model, wherein the input of the webpage vulnerability text feature detection model is a webpage data stream, and the output is a webpage vulnerability text feature corresponding to the webpage data stream;
the detection flow of the webpage vulnerability text feature detection model is as follows:
setting the value of a sliding window of the webpage data stream as N, performing sliding traversal on the webpage data stream by utilizing the sliding window with the length of N, and taking the data in the sliding window as word characters;
counting the number of occurrences of each word character in the web page data stream and the number of occurrences of the context word character in the web page data stream, for successive word characters w n-1 ,w n Word character w n The number of occurrences is count (w n ) The number of occurrences of the contextual word character is count (w n-1 w n );
Calculating the frequency distribution of each word character, then word character w n The frequency distribution of (2) is:
Figure GDA0004242500400000022
and selecting m word characters with the largest frequency distribution in the webpage data stream, performing single-hot coding on the selected word characters, and taking a coding result as webpage vulnerability text characteristics corresponding to the webpage data stream.
In the embodiment of the invention, the communication traffic data generated by the webpage loopholes has obvious characteristic fields, including version identification, special fields, more id field requests and responses of the webpage loopholes, wherein the characteristic fields comprise login, submit, params, seed_hash and the like.
Optionally, in the step S1, the collected web page data stream is input into a web page vulnerability text feature detection model, and the web page vulnerability text feature detection model outputs web page vulnerability text features, including:
sequentially inputting the collected web page data streams intoIn the webpage vulnerability text feature detection model, the webpage vulnerability text feature detection model outputs webpage vulnerability text features; any web page html in the training set data i (j) Is characterized by the webpage loophole text
Figure GDA0004242500400000023
Optionally, the step S2 of constructing a web page vulnerability detection model includes:
constructing a webpage vulnerability detection model, wherein the webpage vulnerability detection model takes webpage vulnerability text characteristics of a webpage data stream as input and takes a webpage vulnerability detection result as output;
The webpage vulnerability detection model comprises an input layer, a convolution layer, a pooling layer and a full connection layer;
the webpage detection flow of the webpage vulnerability detection model is as follows:
s21: inputting the text characteristics of the webpage loopholes to an input layer;
s22: the input layer transmits the received webpage vulnerability text characteristics into a convolution layer, the webpage vulnerability detection model comprises two layers of convolution layers, and the convolution kernel of the convolution layer 1 is c 1 ×z 1 The convolution depth is dp 1 The convolution kernel size of convolution layer 2 is c 2 ×z 2 The convolution depth is dp 2 The convolution steps of the two convolution layers are step, and then the convolution formulas of the two convolution layers are respectively:
F 1 =σ(W 1 f+b 1 )
F 2 =σ(W 2 F 1 +b 2 )
wherein:
f is the text feature of the webpage loophole;
sigma (·) represents the activation function, in the embodiment of the invention, the selected activation function is a ReLU activation function;
W 1 is a weight matrix of convolution layer 1, b 1 To offset the convolution layer 1, W 2 Weight matrix for convolution layer 2, b 2 Is the offset of convolution layer 2;
F 1 representing the output result of convolution layer 1, F 2 Representing the output result of the convolution layer 2, and transmitting the output result of the convolution layer 2 into a pooling layer;
s23: t serially connected pooling units exist in the pooling layer, and the first pooling unit in the pooling layer receives F 2 And performing pooling operation on the received values, pooling results and F 2 Inputting the result into the next pooling unit, wherein the output of the last pooling unit in the pooling layer is the pooling result h of the pooling layer max Pool result h max Input to the full connection layer;
the pooling formula of the t-th pooling unit is as follows:
h t =σ(W t [h t-1 ,F 2 ])·tanh(h t-1 )
wherein:
W t for the weight matrix of the t-th pooling unit, t E [1, T];
h t The pooling result is the pooling result of the t pooling unit;
s24: full connection layer receiving pooling result h max Calculating the probability that the pooling result belongs to 0 or 1 respectively by using a softmax function, and selecting the value with the maximum probability as a webpage vulnerability detection result to be output, wherein 0 represents that the webpage has no vulnerability, and 1 represents that the webpage has the vulnerability;
the conventional pooling layer needs to perform a large amount of convolution calculation, so that the overall efficiency of the model is too low.
Optionally, in the step S3, differential sampling is performed on the webpage vulnerability text features of the webpage in the training set by using a differential sampling strategy, including:
calculating the webpage unbalance rate rate=k/K 'of the training set data, wherein K represents the number of webpages containing holes in the training set data, and K' represents the number of webpages not containing holes in the training set data;
Respectively calculating the distance between the webpage vulnerability text characteristics of any two webpages in a training set data and a training set 0 and a training set 1, wherein the training set 0 represents a webpage set which does not contain the vulnerability in the training set data, the training set 1 represents a webpage set which contains the vulnerability in the training set data, and the local density of the webpage vulnerability text characteristics of any webpage is calculated, and the calculation method of the distance is Euclidean distance and html of any webpage i (j) Is characterized by web page vulnerability text
Figure GDA0004242500400000031
The local density calculation formula of (2) is:
Figure GDA0004242500400000032
wherein:
Figure GDA0004242500400000033
text feature representing web page vulnerability->
Figure GDA0004242500400000034
Is, dis represents the distance threshold,
Figure GDA0004242500400000035
representing text features of vulnerability with web pages in training set i>
Figure GDA0004242500400000036
The number of webpage vulnerability text features with the distance within the dis range;
the web pages in the training set 0 are arranged according to the local density descending order of the text features of the web page loopholes, the arranged web pages are added with numbers 1,2,3, … and K', if rate>2, setting the differential sampling coefficient alpha 1 For 2, otherwise, set the differential sampling coefficient alpha 1 3; sample number 1+alpha 1 s web page data serving as differential sampling result data of training set 0 1 Wherein s=0, 1,2, …;
for the web pages in the training set 1, the method is as followsThe local density descending order of the webpage loophole text features is arranged, the arranged webpages are added with numbers 1,2,3, … and K, if rate >2, setting the differential sampling coefficient alpha 2 For 3, otherwise, set the differential sampling coefficient alpha 2 Is 2; sample number 1+alpha 2 s web page data is used as differential sampling result data of training set 1 2 Wherein s=0, 1,2, …;
data of differential sampling result 1 Data 2 The sampling training set data' is used as a webpage vulnerability detection model.
Optionally, in the step S3, optimization training is performed on the constructed web page vulnerability detection model to obtain an optimized web page vulnerability detection model, which includes:
carrying out optimization training on the constructed webpage vulnerability detection model based on a sampling training set data', wherein the parameters to be optimized training comprise a convolution layer in the webpage vulnerability detection model and a weight matrix and bias of a pooling unit, and the optimization training flow of the webpage vulnerability detection model is as follows:
s31: setting the current optimal training frequency of the webpage vulnerability detection model as d, and setting the maximum training frequency as d max Wherein the initial value of d is 0;
s32: setting a Loss function Loss (theta) d+1 ):
Figure GDA0004242500400000041
Figure GDA0004242500400000042
Figure GDA0004242500400000043
Wherein:
W d+1 loss function weight, W representing the d+1st optimization training 0 =K/K';
u represents the sampling training set dataThe u-th webpage, n data' Representing the total number of web pages in the sampling training set;
Figure GDA0004242500400000044
representing the input of the webpage vulnerability text features of the ith webpage in the sampling training set data' to the parameter theta-based d+1 In the web page vulnerability detection model, a web page vulnerability detection result is input by the web page vulnerability detection model;
y u ={0,1},y u =0 indicates that the nth web page in the sample training set data' does not contain holes, y u =1 indicates that the nth web page in the sample training set data' contains holes;
and epsilon d represents the error rate of the webpage vulnerability detection model after the d-th optimization training, wherein the calculation formula of the error rate is as follows: detecting the number of wrong web pages/the total number of web pages in the sampling training set;
s33: setting the learning rate to be 0.01 and the maximum iteration number d max And 200, the parameter training optimizer is an Adam optimizer, optimal model parameters are obtained through optimization training, and a webpage vulnerability detection model after training and optimization is constructed based on the optimal model parameters.
Optionally, collecting the text features of the web page vulnerability of the web page to be detected in the step S4, inputting the collected text features of the web page vulnerability into an optimized web page vulnerability detection model, and outputting a detection result of whether the web page to be detected has the vulnerability by the optimized web page vulnerability detection model, including:
collecting webpage data streams of webpage data to be detected, inputting the webpage data streams to be detected into a webpage vulnerability text feature detection model to extract webpage vulnerability text features, inputting the webpage vulnerability text features into an optimized webpage vulnerability detection model, outputting a detection result of whether the webpage to be detected has a vulnerability or not by the optimized webpage vulnerability detection model, if the output result of the optimized webpage vulnerability detection model is 0, indicating that the webpage to be detected does not have the vulnerability, and if the output result of the optimized webpage vulnerability detection model is 1, indicating that the webpage to be detected has the vulnerability, and generating alarm information.
In order to solve the above problems, the present invention further provides a device for monitoring web page faults and web page vulnerabilities, which is characterized in that the device comprises:
the feature extraction module is used for extracting webpage vulnerability text features of the webpage based on the webpage vulnerability text feature detection model;
the sampling device is used for differentially sampling the webpage vulnerability text characteristics of the webpage in the training set by utilizing a differential sampling strategy;
the webpage detection module is used for constructing a webpage vulnerability detection model, inputting the collected webpage vulnerability text characteristics into the optimized webpage vulnerability detection model, and outputting a detection result of whether the webpage to be detected has the vulnerability or not by the optimized webpage vulnerability detection model.
In order to solve the above-mentioned problems, the present invention also provides an electronic apparatus including:
a memory storing at least one instruction; and
And the processor executes the instructions stored in the memory to realize the monitoring method for coping with the webpage faults and the webpage vulnerabilities.
In order to solve the above problems, the present invention further provides a computer readable storage medium, where at least one instruction is stored, where the at least one instruction is executed by a processor in an electronic device to implement the above method for monitoring a web page fault and a web page vulnerability.
Compared with the prior art, the invention provides a monitoring method for coping with webpage faults and webpage loopholes, and the technology has the following advantages:
firstly, the scheme provides a webpage vulnerability text feature detection model, wherein the input of the webpage vulnerability text feature detection model is a webpage data stream, the output is a webpage vulnerability text feature corresponding to the webpage data stream, and the detection flow of the webpage vulnerability text feature detection model is as follows: setting the value of a sliding window of the webpage data stream as N, performing sliding traversal on the webpage data stream by utilizing the sliding window with the length of N, and performing sliding traversal on the webpage data stream in the sliding windowThe data is used as word characters; counting the number of occurrences of each word character in the web page data stream and the number of occurrences of the context word character in the web page data stream, for successive word characters w n-1 ,w n Word character w n The number of occurrences is count (w n ) The number of occurrences of the contextual word character is count (w n-1 w n ) The method comprises the steps of carrying out a first treatment on the surface of the Calculating the frequency distribution of each word character, then word character w n The frequency distribution of (2) is:
Figure GDA0004242500400000051
and selecting m word characters with the largest frequency distribution in the webpage data stream, performing single-hot coding on the selected word characters, and taking a coding result as webpage vulnerability text characteristics corresponding to the webpage data stream. Because the communication traffic data generated by the webpage loopholes has obvious characteristic fields, the characteristic fields comprise version identification, special fields, more id field requests and responses of the webpage loopholes, wherein the characteristic fields comprise login, submit, params, seed_hash and the like.
Meanwhile, the scheme provides a data sampling and model training method based on an unbalanced sample, and the webpage unbalance rate rate=K/K 'of the training set data is calculated, wherein K represents the number of webpages containing holes in the training set data, and K' represents the number of webpages not containing holes in the training set data; respectively calculating the distance between the webpage vulnerability text characteristics of any two webpages in a training set data and a training set 0 and a training set 1, wherein the training set 0 represents a webpage set which does not contain the vulnerability in the training set data, the training set 1 represents a webpage set which contains the vulnerability in the training set data, and the local density of the webpage vulnerability text characteristics of any webpage is calculated, and the calculation method of the distance is Euclidean distance and html of any webpage i (j) Is characterized by web page vulnerability text
Figure GDA0004242500400000052
The local density calculation formula of (2) is:
Figure GDA0004242500400000053
wherein:
Figure GDA0004242500400000054
text feature representing web page vulnerability->
Figure GDA0004242500400000055
Is, dis represents the distance threshold,
Figure GDA0004242500400000056
representing text features of vulnerability with web pages in training set i>
Figure GDA0004242500400000057
The number of webpage vulnerability text features with the distance within the dis range; the web pages in the training set 0 are arranged according to the local density descending order of the text features of the web page loopholes, the arranged web pages are added with numbers 1,2,3, … and K', if rate >2, setting the differential sampling coefficient alpha 1 For 2, otherwise, set the differential sampling coefficient alpha 1 3; sample number 1+alpha 1 s web page data serving as differential sampling result data of training set 0 1 Wherein s=0, 1,2, …; the web pages in the training set 1 are arranged according to the local density descending order of the text features of the web page loopholes, the arranged web pages are added with numbers 1,2,3, … and K, if rate>2, setting the differential sampling coefficient alpha 2 For 3, otherwise, set the differential sampling coefficient alpha 2 Is 2; sample number 1+alpha 2 s web page data is used as differential sampling result data of training set 1 2 Wherein s=0, 1,2, …; data of differential sampling result 1 Data 2 The sampling training set data' is used as a webpage vulnerability detection model. Optimizing and training the constructed webpage vulnerability detection model based on the sampling training set data', wherein the training parameters to be optimized comprise weight moments in the webpage vulnerability detection modelThe method comprises the following steps of matrix and offset, wherein the optimization training flow of the webpage vulnerability detection model is as follows: setting the current optimal training frequency of the webpage vulnerability detection model as d, and setting the maximum training frequency as d max Wherein the initial value of d is 0; setting a Loss function Loss (theta) d+1 ):
Figure GDA0004242500400000061
Figure GDA0004242500400000062
Figure GDA0004242500400000063
Wherein: w (W) d+1 Loss function weight, W representing the d+1st optimization training 0 =k/K; u represents the nth web page in the sampling training set data', n data' Representing the total number of web pages in the sampling training set;
Figure GDA0004242500400000064
representing the input of the webpage vulnerability text features of the ith webpage in the sampling training set data' to the parameter theta-based d+1 In the web page vulnerability detection model, a web page vulnerability detection result is input by the web page vulnerability detection model; y is u ={0,1},y u =0 indicates that the nth web page in the sample training set data' does not contain holes, y u =1 indicates that the nth web page in the sample training set data' contains holes; epsilon d After the d-th optimization training, the error rate of the webpage vulnerability detection model is represented, and the calculation formula of the error rate is as follows: detecting the number of wrong web pages/the total number of web pages in the sampling training set; setting the learning rate to be 0.01 and the maximum iteration number d max And 200, the parameter training optimizer is an Adam optimizer, optimal model parameters are obtained through optimization training, and a webpage vulnerability detection model after training and optimization is constructed based on the optimal model parameters. The scheme considers the situation of sample imbalanceDifferential sampling is carried out on webpage vulnerability text features of webpages in the training set by adopting a differential sampling strategy, so that overfitting caused by too much dependence of a model on limited data samples is avoided, and the detection accuracy of a webpage vulnerability detection model is improved; because the number of web pages with holes in practical application is far smaller than that of normal web pages, the traditional model training method easily causes the training model to deviate from normal samples, and the model can deviate from a small amount of sample data by distributing different weights to sample data with different unbalance rates.
Drawings
Fig. 1 is a flowchart of a method for monitoring web page faults and web page vulnerabilities according to an embodiment of the present invention;
FIG. 2 is a flow chart illustrating one of the steps in the embodiment of FIG. 1;
FIG. 3 is a functional block diagram of a device for monitoring web page faults and web page vulnerabilities according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device for implementing a method for monitoring web page faults and web page vulnerabilities according to an embodiment of the present invention.
The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
The embodiment of the application provides a monitoring method for coping with webpage faults and webpage loopholes. The execution main body of the monitoring method for coping with web page faults and web page vulnerabilities comprises at least one of a server, a terminal and the like which can be configured to execute the method provided by the embodiment of the application. In other words, the method for monitoring web page faults and web page vulnerabilities can be executed by software or hardware installed in a terminal device or a server device, and the software can be a blockchain platform. The service end includes but is not limited to: a single server, a server cluster, a cloud server or a cloud server cluster, and the like.
Example 1:
s1: collecting web pages containing loopholes and not containing loopholes respectively to form a training set, collecting web page data streams of web pages in the training set, constructing a web page loophole text feature detection model, inputting the collected web page data streams into the web page loophole text feature detection model, and outputting web page loophole text features by the web page loophole text feature detection model, wherein the method for finding abnormal word transfer based on N-gram is a specific implementation method of the web page loophole text feature detection model.
In the step S1, web pages containing holes and not containing holes are collected respectively to form a training set, and the method comprises the following steps:
collecting web pages containing holes and not containing holes respectively to form a training set data, wherein the total number of web pages in the training set is K+K ', and the training set comprises K web pages containing holes and K' web pages not containing holes;
the storage format of the web pages in the training set is as follows:
Figure GDA0004242500400000071
wherein:
html 1 (K) Webpage, html (hypertext markup language) representing K-th vulnerability-containing webpage in training set data 0 And (K+K ') represents the K' th webpage without loopholes in the training set data.
The step S1 of collecting the web page data stream comprises the following steps:
constructing a virtual system in a server, and taking the constructed virtual system as a sandbox;
Running the webpage in a sandbox, and collecting a webpage data stream generated by running the webpage by a virtual system in the sandbox; html of the jth webpage in the training set i (j) The web page data flow of (1) is flow j ,i={0,1};
In detail, referring to fig. 2, the steps of collecting the web page data stream are as follows:
s11: setting a monitoring program in the sandbox;
s12: starting a monitoring program when a webpage is operated in the sandbox;
s13: the monitoring program adopts Post mode to request communication message data generated when the webpage runs, and the communication message data is used as webpage data stream.
In the step S1, a webpage vulnerability text feature detection model is constructed, which comprises the following steps:
constructing a webpage vulnerability text feature detection model, wherein the input of the webpage vulnerability text feature detection model is a webpage data stream, and the output is a webpage vulnerability text feature corresponding to the webpage data stream;
the detection flow of the webpage vulnerability text feature detection model is as follows:
setting the value of a sliding window of the webpage data stream as N, performing sliding traversal on the webpage data stream by utilizing the sliding window with the length of N, and taking the data in the sliding window as word characters;
counting the number of occurrences of each word character in the web page data stream and the number of occurrences of the context word character in the web page data stream, for successive word characters w n-1 ,w n Word character w n The number of occurrences is count (w n ) The number of occurrences of the contextual word character is count (w n-1 w n );
Calculating the frequency distribution of each word character, then word character w n The frequency distribution of (2) is:
Figure GDA0004242500400000072
and selecting m word characters with the largest frequency distribution in the webpage data stream, performing single-hot coding on the selected word characters, and taking a coding result as webpage vulnerability text characteristics corresponding to the webpage data stream.
In the embodiment of the invention, the communication traffic data generated by the webpage loopholes has obvious characteristic fields, including version identification, special fields, more id field requests and responses of the webpage loopholes, wherein the characteristic fields comprise login, submit, params, seed_hash and the like.
In the step S1, the collected web page data stream is input into a web page vulnerability text feature detection model, and the web page vulnerability text feature detection model outputs web page vulnerability text features, including:
sequentially inputting the acquired webpage data streams into a webpage vulnerability text feature detection model, and outputting webpage vulnerability text features by the model; any web page html in the training set data i (j) Is characterized by the webpage loophole text
Figure GDA0004242500400000073
S2: and constructing a webpage vulnerability detection model, wherein the webpage vulnerability detection model takes webpage vulnerability text characteristics as input and takes a webpage vulnerability detection result as output.
The step S2 of constructing a webpage vulnerability detection model comprises the following steps:
constructing a webpage vulnerability detection model, wherein the webpage vulnerability detection model takes webpage vulnerability text characteristics of a webpage data stream as input and takes a webpage vulnerability detection result as output;
the webpage vulnerability detection model comprises an input layer, a convolution layer, a pooling layer and a full connection layer;
the webpage detection flow of the webpage vulnerability detection model is as follows:
s21: inputting the text characteristics of the webpage loopholes to an input layer;
s22: the input layer transmits the received webpage vulnerability text characteristics into a convolution layer, the webpage vulnerability detection model comprises two layers of convolution layers, and the convolution kernel of the convolution layer 1 is c 1 ×z 1 The convolution depth is dp 1 The convolution kernel size of convolution layer 2 is c 2 ×z 2 The convolution depth is dp 2 The convolution steps of the two convolution layers are step, and then the convolution formulas of the two convolution layers are respectively:
F 1 =σ(W 1 f+b 1 )
F 2 =σ(W 2 F 1 +b 2 )
wherein:
f is the text feature of the webpage loophole;
sigma (·) represents the activation function, in the embodiment of the invention, the selected activation function is a ReLU activation function;
W 1 Is a weight matrix of convolution layer 1, b 1 To offset the convolution layer 1, W 2 Weight matrix for convolution layer 2, b 2 Is the offset of convolution layer 2;
F 1 representing the output result of convolution layer 1, F 2 Representing the output result of the convolution layer 2, and transmitting the output result of the convolution layer 2 into a pooling layer;
s23: t serially connected pooling units exist in the pooling layer, and the first pooling unit in the pooling layer receives F 2 And performing pooling operation on the received values, pooling results and F 2 Inputting the result into the next pooling unit, wherein the output of the last pooling unit in the pooling layer is the pooling result h of the pooling layer max Pool result h max Input to the full connection layer;
the pooling formula of the t-th pooling unit is as follows:
h t =σ(W t [h t-1 ,F 2 ])·tanh(h t-1 )
wherein:
W t for the weight matrix of the t-th pooling unit, t E [1, T];
h t The pooling result is the pooling result of the t pooling unit;
s24: full connection layer receiving pooling result h max And calculating the probability that the pooling result belongs to 0 or 1 respectively by using the softmax function, and selecting the value with the maximum probability as a webpage vulnerability detection result to be output, wherein 0 represents that the webpage has no vulnerability, and 1 represents that the webpage has the vulnerability.
S3: and performing differential sampling on the webpage vulnerability text features in the training set by utilizing a differential sampling strategy, and performing optimization training on the constructed webpage vulnerability detection model to obtain an optimized webpage vulnerability detection model.
In the step S3, differential sampling is performed on the webpage vulnerability text characteristics of the webpage in the training set by utilizing a differential sampling strategy, and the method comprises the following steps:
calculating the webpage unbalance rate rate=k/K 'of the training set data, wherein K represents the number of webpages containing holes in the training set data, and K' represents the number of webpages not containing holes in the training set data;
respectively calculating the distance between the webpage vulnerability text characteristics of any two webpages in a training set data and a training set 0 and a training set 1, wherein the training set 0 represents a webpage set which does not contain the vulnerability in the training set data, the training set 1 represents a webpage set which contains the vulnerability in the training set data, and the local density of the webpage vulnerability text characteristics of any webpage is calculated, and the calculation method of the distance is Euclidean distance and html of any webpage i (j) Is characterized by web page vulnerability text
Figure GDA0004242500400000081
The local density calculation formula of (2) is:
Figure GDA0004242500400000082
wherein:
Figure GDA0004242500400000083
text feature representing web page vulnerability->
Figure GDA0004242500400000084
Is, dis represents the distance threshold,
Figure GDA0004242500400000085
representing text features of vulnerability with web pages in training set i>
Figure GDA0004242500400000086
The number of webpage vulnerability text features with the distance within the dis range;
the web pages in the training set 0 are arranged according to the local density descending order of the text features of the web page loopholes, the arranged web pages are added with numbers 1,2,3, … and K', if rate >2, setting the differential sampling coefficient alpha 1 For 2, otherwise, set the differential sampling coefficient alpha 1 3; sample number 1+alpha 1 s web page data serving as differential sampling result data of training set 0 1 Wherein s=0, 1,2, …;
the web pages in the training set 1 are arranged according to the local density descending order of the text features of the web page loopholes, the arranged web pages are added with numbers 1,2,3, … and K, if rate>2, setting the differential sampling coefficient alpha 2 For 3, otherwise, set the differential sampling coefficient alpha 2 Is 2; sample number 1+alpha 2 s web page data is used as differential sampling result data of training set 1 2 Wherein s=0, 1,2, …;
data of differential sampling result 1 Data 2 The sampling training set data' is used as a webpage vulnerability detection model.
And in the step S3, optimizing and training the constructed webpage vulnerability detection model to obtain an optimized webpage vulnerability detection model, wherein the method comprises the following steps:
carrying out optimization training on the constructed webpage vulnerability detection model based on a sampling training set data', wherein the parameters to be optimized training comprise a convolution layer in the webpage vulnerability detection model and a weight matrix and bias of a pooling unit, and the optimization training flow of the webpage vulnerability detection model is as follows:
s31: setting the current optimal training frequency of the webpage vulnerability detection model as d, and setting the maximum training frequency as d max Wherein the initial value of d is 0;
s32: setting a Loss function Loss (theta) d+1 ):
Figure GDA0004242500400000092
Figure GDA0004242500400000093
Figure GDA0004242500400000095
Wherein:
W d+1 loss function weight, W representing the d+1st optimization training 0 =K/K';
u represents the nth web page in the sampling training set data', n data' Representing the total number of web pages in the sampling training set;
Figure GDA0004242500400000096
representing the input of the webpage vulnerability text features of the ith webpage in the sampling training set data' to the parameter theta-based d+1 In the web page vulnerability detection model, a web page vulnerability detection result is input by the web page vulnerability detection model;
y u ={0,1},y u =0 indicates that the nth web page in the sample training set data' does not contain holes, y u =1 indicates that the nth web page in the sample training set data' contains holes;
ε d after the d-th optimization training, the error rate of the webpage vulnerability detection model is represented, and the calculation formula of the error rate is as follows: detecting the number of wrong web pages/the total number of web pages in the sampling training set;
s33: setting the learning rate to be 0.01 and the maximum iteration number d max And 200, the parameter training optimizer is an Adam optimizer, optimal model parameters are obtained through optimization training, and a webpage vulnerability detection model after training and optimization is constructed based on the optimal model parameters.
S4: collecting webpage data to be detected, inputting the webpage data to be detected into a webpage vulnerability text feature detection model, extracting webpage vulnerability text features, inputting the extracted webpage vulnerability text features into an optimized webpage vulnerability detection model, outputting whether the webpage to be detected has webpage vulnerabilities by the optimized webpage vulnerability detection model, and generating an alarm if the webpage to be detected has webpage vulnerabilities.
And S4, collecting webpage vulnerability text features of the webpage to be detected, inputting the collected webpage vulnerability text features into an optimized webpage vulnerability detection model, and outputting a detection result of whether the webpage to be detected has a vulnerability by the optimized webpage vulnerability detection model, wherein the steps comprise:
collecting webpage data streams of webpage data to be detected, inputting the webpage data streams to be detected into a webpage vulnerability text feature detection model to extract webpage vulnerability text features, inputting the webpage vulnerability text features into an optimized webpage vulnerability detection model, outputting a detection result of whether the webpage to be detected has a vulnerability or not by the optimized webpage vulnerability detection model, if the output result of the optimized webpage vulnerability detection model is 0, indicating that the webpage to be detected does not have the vulnerability, and if the output result of the optimized webpage vulnerability detection model is 1, indicating that the webpage to be detected has the vulnerability, and generating alarm information.
Example 2:
fig. 3 is a functional block diagram of a device for monitoring web page faults and web page vulnerabilities according to an embodiment of the present invention, which can implement the web page faults and web page vulnerabilities monitoring method in embodiment 1.
The monitoring device 100 for coping with web page faults and web page vulnerabilities can be installed in electronic equipment. According to the implemented functions, the monitoring device for coping with web page faults and web page vulnerabilities may include a feature extraction module 101, a sampling device 102 and a web page detection module 103. The module of the invention, which may also be referred to as a unit, refers to a series of computer program segments, which are stored in the memory of the electronic device, capable of being executed by the processor of the electronic device and of performing a fixed function.
The feature extraction module 101 is configured to extract a web page vulnerability text feature of a web page based on a web page vulnerability text feature detection model;
the sampling device 102 is used for differentially sampling the webpage vulnerability text characteristics of the webpage in the training set by utilizing a differential sampling strategy;
the web page detection module 103 is configured to construct a web page vulnerability detection model, input the collected text features of the web page vulnerability to the optimized web page vulnerability detection model, and output a detection result of whether the web page to be detected has a vulnerability.
In detail, the modules in the monitoring device 100 for handling web page faults and web page vulnerabilities in the embodiment of the present invention use the same technical means as the monitoring method for handling web page faults and web page vulnerabilities described in fig. 1, and can produce the same technical effects, which are not described herein.
Example 3:
fig. 4 is a schematic structural diagram of an electronic device for implementing a method for monitoring web page faults and web page vulnerabilities according to an embodiment of the present invention.
The electronic device 1 may comprise a processor 10, a memory 11 and a bus, and may further comprise a computer program, such as program 12, stored in the memory 11 and executable on the processor 10.
The memory 11 includes at least one type of readable storage medium, including flash memory, a mobile hard disk, a multimedia card, a card memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, etc. The memory 11 may in some embodiments be an internal storage unit of the electronic device 1, such as a removable hard disk of the electronic device 1. The memory 11 may in other embodiments also be an external storage device of the electronic device 1, such as a plug-in mobile hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the electronic device 1. Further, the memory 11 may also include both an internal storage unit and an external storage device of the electronic device 1. The memory 11 may be used not only for storing application software installed in the electronic device 1 and various data, such as codes of the program 12, but also for temporarily storing data that has been output or is to be output.
The processor 10 may be comprised of integrated circuits in some embodiments, for example, a single packaged integrated circuit, or may be comprised of multiple integrated circuits packaged with the same or different functions, including one or more central processing units (Central Processing unit, CPU), microprocessors, digital processing chips, graphics processors, combinations of various control chips, and the like. The processor 10 is a Control Unit (Control Unit) of the electronic device, connects respective components of the entire electronic device using various interfaces and lines, executes or executes programs or modules (a program 12 for performing web page vulnerability monitoring, etc.) stored in the memory 11, and invokes data stored in the memory 11 to perform various functions of the electronic device 1 and process the data.
The bus may be a peripheral component interconnect standard (peripheral component interconnect, PCI) bus or an extended industry standard architecture (extended industry standard architecture, EISA) bus, among others. The bus may be classified as an address bus, a data bus, a control bus, etc. The bus is arranged to enable a connection communication between the memory 11 and at least one processor 10 etc.
Fig. 4 shows only an electronic device with components, it being understood by a person skilled in the art that the structure shown in fig. 4 does not constitute a limitation of the electronic device 1, and may comprise fewer or more components than shown, or may combine certain components, or may be arranged in different components.
For example, although not shown, the electronic device 1 may further include a power source (such as a battery) for supplying power to each component, and preferably, the power source may be logically connected to the at least one processor 10 through a power management device, so that functions of charge management, discharge management, power consumption management, and the like are implemented through the power management device. The power supply may also include one or more of any of a direct current or alternating current power supply, recharging device, power failure detection circuit, power converter or inverter, power status indicator, etc. The electronic device 1 may further include various sensors, bluetooth modules, wi-Fi modules, etc., which will not be described herein.
Further, the electronic device 1 may also comprise a network interface, optionally the network interface may comprise a wired interface and/or a wireless interface (e.g. WI-FI interface, bluetooth interface, etc.), typically used for establishing a communication connection between the electronic device 1 and other electronic devices.
The electronic device 1 may optionally further comprise a user interface, which may be a Display, an input unit, such as a Keyboard (Keyboard), or a standard wired interface, a wireless interface. Alternatively, in some embodiments, the display may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch, or the like. Among other things, the display may also be referred to as a display screen or display unit, as appropriate, for displaying information processed in the electronic device 1 and for displaying a visualized user interface.
It should be understood that the embodiments described are for illustrative purposes only and are not limited to this configuration in the scope of the patent application.
The program 12 stored in the memory 11 of the electronic device 1 is a combination of instructions that, when executed in the processor 10, may implement:
Collecting web pages containing loopholes and without loopholes respectively to form a training set, collecting web page data streams of web pages in the training set, constructing a web page loophole text feature detection model, inputting the collected web page data streams into the web page loophole text feature detection model, and outputting web page loophole text features by the web page loophole text feature detection model;
constructing a webpage vulnerability detection model, wherein the webpage vulnerability detection model takes webpage vulnerability text characteristics as input and takes a webpage vulnerability detection result as output;
performing differential sampling on webpage vulnerability text features in a training set by utilizing a differential sampling strategy, and performing optimization training on the constructed webpage vulnerability detection model to obtain an optimized webpage vulnerability detection model;
collecting webpage data to be detected, inputting the webpage data to be detected into a webpage vulnerability text feature detection model, extracting webpage vulnerability text features, inputting the extracted webpage vulnerability text features into an optimized webpage vulnerability detection model, outputting whether the webpage to be detected has webpage vulnerabilities by the optimized webpage vulnerability detection model, and generating an alarm if the webpage to be detected has webpage vulnerabilities.
Specifically, the specific implementation method of the above instruction by the processor 10 may refer to descriptions of related steps in the corresponding embodiments of fig. 1 to 4, which are not repeated herein.
It should be noted that, the foregoing reference numerals of the embodiments of the present invention are merely for describing the embodiments, and do not represent the advantages and disadvantages of the embodiments. And the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, apparatus, article or method that comprises the element.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) as described above, comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present invention.
The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims (7)

1. A method for monitoring web page faults and web page vulnerabilities, the method comprising:
s1: collecting web pages containing loopholes and without loopholes respectively to form a training set, collecting web page data streams of web pages in the training set, constructing a web page loophole text feature detection model, inputting the collected web page data streams into the web page loophole text feature detection model, and outputting web page loophole text features by the web page loophole text feature detection model;
s2: constructing a webpage vulnerability detection model, wherein the webpage vulnerability text feature detection model takes webpage vulnerability text features as input and webpage vulnerability detection results as output, and the webpage vulnerability detection model comprises an input layer, a convolution layer, a pooling layer and a full connection layer;
the webpage detection flow of the webpage vulnerability detection model is as follows:
S21: inputting the text characteristics of the webpage loopholes to an input layer;
s22: the input layer transmits the received webpage vulnerability text characteristics into a convolution layer, the webpage vulnerability detection model comprises two layers of convolution layers, and the convolution kernel of the convolution layer 1 is c 1 ×z 1 The convolution depth is dp 1 The convolution kernel size of convolution layer 2 is c 2 ×z 2 The convolution depth is dp 2 The convolution steps of the two convolution layers are step, and then the convolution formulas of the two convolution layers are respectively:
F 1 =σ(W 1 f+b 1 )
F 2 =σ(W 2 F 1 +b 2 )
wherein:
f is the text feature of the webpage loophole;
sigma (·) represents the activation function, in the embodiment of the invention, the selected activation function is a ReLU activation function;
W 1 is a weight matrix of convolution layer 1, b 1 To offset the convolution layer 1, W 2 Weight matrix for convolution layer 2, b 2 Is the offset of convolution layer 2;
F 1 representing the output result of convolution layer 1, F 2 Representing the output result of the convolution layer 2, and transmitting the output result of the convolution layer 2 into a pooling layer;
s23: t serially connected pooling units exist in the pooling layer, and the first pooling unit in the pooling layer receives F 2 And performing pooling operation on the received values, pooling results and F 2 Inputting the result into the next pooling unit, wherein the output of the last pooling unit in the pooling layer is the pooling result h of the pooling layer max Pool result h max Input to the full connection layer;
the pooling formula of the t-th pooling unit is as follows:
h t =σ(W t [h t-1 ,F 2 ])·tanh(h t-1 )
wherein:
W t for the weight matrix of the t-th pooling unit, t E [1, T];
h t The pooling result is the pooling result of the t pooling unit;
s24: full connection layer receiving pooling result h max Calculating the probability that the pooling result belongs to 0 or 1 respectively by using a softmax function, and selecting the value with the maximum probability as a webpage vulnerability detection result to be output, wherein 0 represents that the webpage has no vulnerability, and 1 represents that the webpage has the vulnerability;
s3: performing differential sampling on webpage vulnerability text features in a training set by utilizing a differential sampling strategy, and performing optimization training on the constructed webpage vulnerability detection model to obtain an optimized webpage vulnerability detection model, wherein the process of the differential sampling strategy comprises the following steps:
calculating the webpage unbalance rate rate=k/K 'of the training set data, wherein K represents the number of webpages containing holes in the training set data, and K' represents the number of webpages not containing holes in the training set data;
respectively calculating the distances of the webpage vulnerability text features of any two webpages in training set 0 and training set 1 in training set data, wherein training set 0 represents the webpage set without vulnerability in training set data, training set 1 represents the webpage set with vulnerability in training set data, and calculating the webpages of any webpages Local density of page vulnerability text features, wherein the distance calculating method is Euclidean distance, and html of any webpage i (j) Is characterized by web page vulnerability text
Figure QLYQS_1
The local density calculation formula of (2) is:
Figure QLYQS_2
wherein:
Figure QLYQS_3
text feature representing web page vulnerability->
Figure QLYQS_4
Is, dis represents the distance threshold,
Figure QLYQS_5
representing text features of vulnerability with web pages in training set i>
Figure QLYQS_6
The number of webpage vulnerability text features with the distance within the dis range;
the web pages in the training set 0 are arranged according to the local density descending order of the text features of the web page loopholes, the arranged web pages are added with numbers 1,2,3, … and K', if rate>2, setting the differential sampling coefficient alpha 1 For 2, otherwise, set the differential sampling coefficient alpha 1 3; sample number 1+alpha 1 s web page data serving as differential sampling result data of training set 0 1 Wherein s=0, 1,2, …;
the web pages in the training set 1 are arranged according to the local density descending order of the text features of the web page loopholes, the arranged web pages are added with numbers 1,2,3, … and K, if rate>2, setting the differential sampling coefficient alpha 2 For 3, otherwise, set the differential sampling coefficient alpha 1 Is 2; sample number 1+alpha 2 s as the difference of training set 1Sub-sampling result data 2 Wherein s=0, 1,2, …;
data of differential sampling result 1 Data 2 Sampling training set data' serving as webpage vulnerability detection model;
the optimizing training is performed on the constructed webpage vulnerability detection model to obtain an optimized webpage vulnerability detection model, which comprises the following steps:
carrying out optimization training on the constructed webpage vulnerability detection model based on a sampling training set data', wherein the parameters to be optimized training comprise a convolution layer in the webpage vulnerability detection model and a weight matrix and bias of a pooling unit, and the optimization training flow of the webpage vulnerability detection model is as follows:
s31: setting the current optimal training frequency of the webpage vulnerability detection model as d, and setting the maximum training frequency as d max Wherein the initial value of d is 0;
s32: setting a Loss function Loss (theta) d+1 ):
Figure QLYQS_7
Figure QLYQS_8
Figure QLYQS_9
Wherein:
W d+1 loss function weight, W representing the d+1st optimization training 0 =K/K';
u represents the nth web page in the sampling training set data', n data' Representing the total number of web pages in the sampling training set;
Figure QLYQS_10
webpage loopholes representing the (u) th webpage in the training set data' to be sampledText feature input to parameter θ based d+1 In the web page vulnerability detection model, a web page vulnerability detection result is input by the web page vulnerability detection model;
y u ={0,1},y u =0 indicates that the nth web page in the sample training set data' does not contain holes, y u =1 indicates that the nth web page in the sample training set data' contains holes;
ε d after the d-th optimization training, the error rate of the webpage vulnerability detection model is represented, and the calculation formula of the error rate is as follows: detecting the number of wrong web pages/the total number of web pages in the sampling training set;
s33: setting the learning rate to be 0.01 and the maximum iteration number d max 200, the parameter training optimizer is an Adam optimizer, optimal model parameters are obtained through optimization training, and a webpage vulnerability detection model after training and optimization is constructed based on the optimal model parameters;
s4: collecting webpage data to be detected, inputting the webpage data to be detected into a webpage vulnerability text feature detection model, extracting webpage vulnerability text features, inputting the extracted webpage vulnerability text features into an optimized webpage vulnerability detection model, outputting whether the webpage to be detected has webpage vulnerabilities by the optimized webpage vulnerability detection model, and generating an alarm if the webpage to be detected has webpage vulnerabilities.
2. The method for monitoring web page faults and web page vulnerabilities according to claim 1, wherein the step S1 of collecting web pages containing vulnerabilities and web pages not containing vulnerabilities respectively forms a training set, comprises:
collecting web pages containing holes and not containing holes respectively to form a training set data, wherein the total number of web pages in the training set is K+K ', and the training set comprises K web pages containing holes and K' web pages not containing holes;
The storage format of the web pages in the training set is as follows:
Figure QLYQS_11
wherein:
html 1 (K) Webpage, html (hypertext markup language) representing K-th vulnerability-containing webpage in training set data 0 And (K+K ') represents the K' th webpage without loopholes in the training set data.
3. The method for monitoring web page faults and web page vulnerabilities according to claim 2, wherein the step S1 of collecting the web page data stream comprises:
constructing a virtual system in a server, and taking the constructed virtual system as a sandbox;
running the webpage in a sandbox, and collecting a webpage data stream generated by running the webpage by a virtual system in the sandbox; html of the jth webpage in the training set i (j) The web page data flow of (1) is flow j ,i={0,1};
The collecting steps of the webpage data stream are as follows:
s11: setting a monitoring program in the sandbox;
s12: starting a monitoring program when a webpage is operated in the sandbox;
s13: the monitoring program adopts Post mode to request communication message data generated when the webpage runs, and the communication message data is used as webpage data stream.
4. The method for monitoring web page faults and web page vulnerabilities according to claim 1, wherein the step S1 of constructing a web page vulnerability text feature detection model comprises:
constructing a webpage vulnerability text feature detection model, wherein the input of the webpage vulnerability text feature detection model is a webpage data stream, and the output is a webpage vulnerability text feature corresponding to the webpage data stream;
The detection flow of the webpage vulnerability text feature detection model is as follows:
setting the value of a sliding window of the webpage data stream as N, performing sliding traversal on the webpage data stream by utilizing the sliding window with the length of N, and taking the data in the sliding window as word characters;
counting the number of occurrences of each word character in a web page data stream, and contextThe number of occurrences of word characters in the web page data stream is then for successive word characters w n-1 ,w n Word character w n The number of occurrences is count (w n ) The number of occurrences of the contextual word character is count (w n-1 w n );
Calculating the frequency distribution of each word character, then word character w n The frequency distribution of (2) is:
Figure QLYQS_12
and selecting m word characters with the largest frequency distribution in the webpage data stream, performing single-hot coding on the selected word characters, and taking a coding result as webpage vulnerability text characteristics corresponding to the webpage data stream.
5. The method for monitoring web page faults and web page vulnerabilities according to claim 4, wherein the step S1 of inputting the collected web page data stream into a web page vulnerability text feature detection model, the web page vulnerability text feature detection model outputting web page vulnerability text features comprises:
Sequentially inputting the acquired webpage data streams into a webpage vulnerability text feature detection model, and outputting webpage vulnerability text features by the webpage vulnerability text feature detection model; any web page html in the training set data i (j) Is characterized by the webpage loophole text
Figure QLYQS_13
6. The method for monitoring web page faults and web page vulnerabilities according to claim 1, wherein the step S4 of collecting web page vulnerabilities text features of the web page to be detected, inputting the collected web page vulnerabilities text features into an optimized web page vulnerability detection model, and outputting a detection result of whether the web page to be detected has a vulnerability by the optimized web page vulnerability detection model comprises:
collecting webpage data streams of webpage data to be detected, inputting the webpage data streams to be detected into a webpage vulnerability text feature detection model to extract webpage vulnerability text features, inputting the webpage vulnerability text features into an optimized webpage vulnerability detection model, outputting a detection result of whether the webpage to be detected has a vulnerability or not by the optimized webpage vulnerability detection model, if the output result of the optimized webpage vulnerability detection model is 0, indicating that the webpage to be detected does not have the vulnerability, and if the output result of the optimized webpage vulnerability detection model is 1, indicating that the webpage to be detected has the vulnerability, and generating alarm information.
7. A device for monitoring web page faults and web page vulnerabilities, the device comprising:
the feature extraction module is used for extracting webpage vulnerability text features of the webpage based on the webpage vulnerability text feature detection model;
the sampling device is used for differentially sampling the webpage vulnerability text characteristics of the webpage in the training set by utilizing a differential sampling strategy;
the webpage detection module is used for constructing a webpage vulnerability detection model, inputting the collected webpage vulnerability text characteristics into the optimized webpage vulnerability detection model, and outputting a detection result of whether the webpage to be detected has the vulnerability or not by the optimized webpage vulnerability detection model so as to realize the monitoring method for coping with the webpage faults and the webpage vulnerabilities according to any one of claims 1-6.
CN202210895518.0A 2022-07-28 2022-07-28 Monitoring method and device for coping with webpage faults and webpage loopholes Active CN115314267B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210895518.0A CN115314267B (en) 2022-07-28 2022-07-28 Monitoring method and device for coping with webpage faults and webpage loopholes

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210895518.0A CN115314267B (en) 2022-07-28 2022-07-28 Monitoring method and device for coping with webpage faults and webpage loopholes

Publications (2)

Publication Number Publication Date
CN115314267A CN115314267A (en) 2022-11-08
CN115314267B true CN115314267B (en) 2023-07-07

Family

ID=83857982

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210895518.0A Active CN115314267B (en) 2022-07-28 2022-07-28 Monitoring method and device for coping with webpage faults and webpage loopholes

Country Status (1)

Country Link
CN (1) CN115314267B (en)

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105279435B (en) * 2014-06-11 2018-11-09 腾讯科技(深圳)有限公司 Webpage leak detection method and device
US11368477B2 (en) * 2019-05-13 2022-06-21 Securitymetrics, Inc. Webpage integrity monitoring
CN114048480A (en) * 2021-10-29 2022-02-15 中国建设银行股份有限公司 Vulnerability detection method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN115314267A (en) 2022-11-08

Similar Documents

Publication Publication Date Title
CN106874253A (en) Recognize the method and device of sensitive information
CN111782900B (en) Abnormal service detection method and device, electronic equipment and storage medium
CN112883730B (en) Similar text matching method and device, electronic equipment and storage medium
CN107680689A (en) Potential disease estimating method, system and the readable storage medium storing program for executing of medical text
CN113378970B (en) Sentence similarity detection method and device, electronic equipment and storage medium
CN114186132B (en) Information recommendation method and device, electronic equipment and storage medium
CN108509794A (en) A kind of malicious web pages defence detection method based on classification learning algorithm
CN110730164A (en) Safety early warning method, related equipment and computer readable storage medium
CN111460803B (en) Equipment identification method based on Web management page of industrial Internet of things equipment
CN114398557A (en) Information recommendation method and device based on double portraits, electronic equipment and storage medium
CN112632264A (en) Intelligent question and answer method and device, electronic equipment and storage medium
CN113628043B (en) Complaint validity judging method, device, equipment and medium based on data classification
CN116719683A (en) Abnormality detection method, abnormality detection device, electronic apparatus, and storage medium
CN113344125B (en) Long text matching recognition method and device, electronic equipment and storage medium
CN113869456A (en) Sampling monitoring method and device, electronic equipment and storage medium
CN115314267B (en) Monitoring method and device for coping with webpage faults and webpage loopholes
CN117155771A (en) Equipment cluster fault tracing method and device based on industrial Internet of things
CN112579781A (en) Text classification method and device, electronic equipment and medium
CN107688594A (en) The identifying system and method for risk case based on social information
CN116401602A (en) Event detection method, device, equipment and computer readable medium
CN106547780A (en) Article reprints statistics of variables method and device
CN113515591B (en) Text defect information identification method and device, electronic equipment and storage medium
CN105787101B (en) A kind of information processing method and electronic equipment
CN114513355A (en) Malicious domain name detection method, device, equipment and storage medium
CN113691525A (en) Traffic data processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant