CN113660244A - Website availability detection method, system, readable storage medium and device - Google Patents

Website availability detection method, system, readable storage medium and device Download PDF

Info

Publication number
CN113660244A
CN113660244A CN202110917616.5A CN202110917616A CN113660244A CN 113660244 A CN113660244 A CN 113660244A CN 202110917616 A CN202110917616 A CN 202110917616A CN 113660244 A CN113660244 A CN 113660244A
Authority
CN
China
Prior art keywords
waf
learning model
target
modeling
key information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110917616.5A
Other languages
Chinese (zh)
Other versions
CN113660244B (en
Inventor
李祯
范渊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
DBAPPSecurity Co Ltd
Original Assignee
DBAPPSecurity Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by DBAPPSecurity Co Ltd filed Critical DBAPPSecurity Co Ltd
Priority to CN202110917616.5A priority Critical patent/CN113660244B/en
Publication of CN113660244A publication Critical patent/CN113660244A/en
Application granted granted Critical
Publication of CN113660244B publication Critical patent/CN113660244B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/02Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L43/00Arrangements for monitoring or testing data switching networks
    • H04L43/50Testing arrangements
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/02Protocols based on web technology, e.g. hypertext transfer protocol [HTTP]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/60Scheduling or organising the servicing of application requests, e.g. requests for application data transmissions using the analysis and optimisation of the required network resources

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention provides a method, a system, a readable storage medium and a device for detecting website availability, wherein the method comprises the following steps: acquiring a site IP to be detected, and establishing a learning model corresponding to the site IP to be detected; extracting a target IP of daily Web service access at an external network interface of the WAF, and judging whether the target IP is consistent with a modeling IP of a learning model or not; if the target IP is consistent with the modeling IP of the learning model, controlling the WAF to carry out key information statistics on Web request information conforming to the modeling IP, and judging whether the key information is in a preset normal range; if the key information is not in the preset normal range, performing non-proxy processing on the target IP so as to transparently transmit according to a Linux bridge two-layer forwarding mode. According to the invention, http response key information of the monitoring web server and key aging information processed by the WAF are learned by utilizing a WAF machine learning function, and the sustainability of service access is improved by reasonably modeling and linking a bridge forwarding mechanism.

Description

Website availability detection method, system, readable storage medium and device
Technical Field
The present application relates to the field of network security technologies, and in particular, to a method, a system, a readable storage medium, and an apparatus for detecting website availability.
Background
At present, attacks of an application layer are more diverse, network security defense needs to be performed on a web service server, then a WAF (web page firewall) is deployed, and a deployment scene of a scene comprises a transparent agent and a reverse agent. The transparent proxy is physically connected in series in a service line and is in a 7-layer proxy working mode, but the WAF does not have a service address, so that the health of a back-end server cannot be directly detected through a TCP session; or the health of the self-agent function of the WAF cannot be sensed, and the influence on service access caused by the self-problem of the WAF occurs.
The existing method for detecting the availability of the service site is mostly based on a domain name access mode. A load balancing device (or nginx) carries out 7-layer HTTP access detection at regular intervals on a designated service server, and the health available state of the designated service server is judged according to the returned result in a calculation period. Two main categories of results are based: 1. whether a source detection end can establish TCP connection with a server or not and a service port is communicated; 2. and the source detection end sends out an http request url, and whether the http state code 200 returned by the server can be received or not is judged.
The following disadvantages exist in the related art: 1. when the WAF is connected between the load balance and the web server in series by the transparent proxy, the source detection end establishes TCP connection with the port of the WAF, but not the port of the back-end server; 2. after the source detection end sends out an http request url, only paying attention to whether a correct http state code is received or not and not paying attention to the acquisition waiting time each time, and the abnormal condition of service access delay cannot be sharply identified; 3. the detection technology cannot further distinguish whether the service abnormality is caused by the WAF or the web server; 4. in the current working mode of the WAF transparent proxy, there is no method for detecting the availability of the web server.
Disclosure of Invention
Embodiments of the present application provide a method, a system, a readable storage medium, and an apparatus for detecting website availability, so as to at least solve the above-mentioned deficiencies in the related art.
In a first aspect, an embodiment of the present application provides a method for detecting website availability, where the method includes:
acquiring a site IP to be detected, and establishing a learning model corresponding to the site IP to be detected;
extracting a target IP of daily Web service access at an external network interface of the WAF, and judging whether the target IP is consistent with a modeling IP of the learning model or not;
if the target IP is consistent with the modeling IP of the learning model, controlling the WAF to perform key information statistics on Web request information conforming to the modeling IP, and judging whether the key information is in a preset normal range;
and if the key information is not in the preset normal range, performing non-proxy processing on the target IP so as to transparently transmit the key information according to a Linux bridge two-layer forwarding mode.
In some embodiments, after the step of determining whether the destination IP is consistent with the modeling IP of the learning model, the method further comprises:
and if the target IP is not consistent with the modeling IP of the learning model, the target IP is proxied to a back-end server according to a normal process.
In some embodiments, the step of obtaining the station IP to be detected and establishing the learning model corresponding to the station IP to be detected includes:
inputting the site IP detected by the health availability into a machine learning model for learning, and setting parameters of the machine learning model, wherein the parameters comprise the number of learning cycles, static/dynamic types url and each learning cycle unit;
and acquiring the url request meeting the conditions in each learning cycle unit, acquiring relevant information of a plurality of url requests, and determining a base line of a machine learning model according to the relevant information to obtain the learning model corresponding to the site IP.
In some embodiments, the step of collecting information related to a plurality of url requests comprises:
recording a time stamp of each url request reaching the WAF external network port as a first time stamp;
recording a response code carried in a server response message of each server corresponding to the url request reaching the WAF internal network port as a response code;
recording a timestamp of the server response message corresponding to each url request reaching the WAF internal network port as a second timestamp;
and recording a timestamp of the response message corresponding to each url request leaving the WAF external network port as a third timestamp.
In some of these embodiments, the step of determining a baseline of the machine learning model from the relevant information comprises:
in one learning period unit, acquiring the distribution proportion of the most critical response codes in the response codes;
obtaining a first time difference value from the request sending to the response receiving of the obtaining client according to the first time stamp and the third time stamp, and calculating the median of all the first time difference values in the period;
obtaining a second time difference value consumed inside the WAF forwarding server response message according to the second time stamp and the third time stamp, and calculating the median of all the second time difference values in the period;
calculating the average value of the occurrence rate of the most critical response codes, the average value of the median of all the first time difference values and the average value of the median of all the second time difference values according to the distribution proportion of the occurrence of the most critical response codes, the median of all the first time difference values and the median of all the second time difference values and the total number of the learning periods;
and determining a machine learning model baseline according to the average value of the most critical response code occurrence rates, the average value of the median of all the first time difference values and the average value of the median of all the second time difference values.
In some embodiments, after the step of determining whether the key information is within a preset normal range, the method further includes:
and if the key information is in the preset normal range, the target IP is proxied to a back-end server according to a normal flow.
In a second aspect, an embodiment of the present application provides a website availability detection system, where the system includes:
the learning module is used for acquiring the IP of the station to be detected and establishing a learning model corresponding to the IP of the station to be detected;
the judging module is used for extracting a target IP of daily Web service access at an external network interface of the WAF and judging whether the target IP is consistent with the modeling IP of the learning model or not;
the control module is used for controlling the WAF to carry out key information statistics on Web request information conforming to the modeling IP if the target IP is consistent with the modeling IP of the learning model, and judging whether the key information is in a preset normal range or not;
and the processing module is used for carrying out non-proxy processing on the target IP if the key information is not in the preset normal range so as to transmit the key information in a two-layer forwarding mode of the Linux network bridge.
In some embodiments, the determining module comprises:
and the first processing unit is used for transmitting the target IP to a back-end server according to a normal process proxy if the target IP is inconsistent with the modeling IP of the learning model.
In a third aspect, the present application provides a readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the website availability detection method according to the first aspect.
In a fourth aspect, an embodiment of the present application provides a website availability detection apparatus, including a server, where the server includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and the processor, when executing the computer program, implements the website availability detection method according to the first aspect.
Compared with the related technology, the website availability detection method, the website availability detection system, the readable storage medium and the website availability detection device provided by the embodiment of the application are connected in series outside a Web server in a working mode of 7-layer agents through a Web application firewall, based on the learning capacity of a WAF machine, a large model which contains the http state code range distribution of website services and the delay time consumed by forwarding http state codes inside the WAF as baselines is established, the http response key information of the Web server and the key aging information processed by the WAF are learned and monitored by utilizing the learning function of the WAF machine, the sustainability of service access is improved by linking a bridge forwarding mechanism through reasonable modeling, when a client side is requested to access the website services, the WAF stops agents for the server after the consumed time from sending a request to receiving the http state codes exceeds an alert threshold value, and forwards the flow in a bridge direct connection mode, the influence on the usability of the website due to the WAF is avoided. The problem that the health of a back-end server cannot be directly detected through a TCP session in the prior art is solved; or the health of the self-agent function of the WAF cannot be sensed, and the problem that the service access is influenced by the self-problem of the WAF occurs.
The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the application.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a flowchart of a website availability detection method according to a first embodiment of the present invention;
FIG. 2 is a flowchart of a website availability detection method according to a second embodiment of the present invention;
FIG. 3 is a flowchart illustrating a website availability detection method according to a third embodiment of the present invention;
FIG. 4 is a block diagram of a website availability detection system according to a fourth embodiment of the present invention;
fig. 5 is a block diagram of a website availability detecting apparatus according to a fifth embodiment of the present invention.
Description of the main element symbols:
Figure BDA0003206207320000041
Figure BDA0003206207320000051
the following detailed description will further illustrate the invention in conjunction with the above-described figures.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described and illustrated below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments provided in the present application without any inventive step are within the scope of protection of the present application.
It is obvious that the drawings in the following description are only examples or embodiments of the present application, and that it is also possible for a person skilled in the art to apply the present application to other similar contexts on the basis of these drawings without inventive effort. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of ordinary skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments without conflict.
Unless defined otherwise, technical or scientific terms referred to herein shall have the ordinary meaning as understood by those of ordinary skill in the art to which this application belongs. Reference to "a," "an," "the," and similar words throughout this application are not to be construed as limiting in number, and may refer to the singular or the plural. The present application is directed to the use of the terms "including," "comprising," "having," and any variations thereof, which are intended to cover non-exclusive inclusions; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to the listed steps or elements, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. Reference to "connected," "coupled," and the like in this application is not intended to be limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. The term "plurality" as referred to herein means two or more. "and/or" describes an association relationship of associated objects, meaning that three relationships may exist, for example, "A and/or B" may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. Reference herein to the terms "first," "second," "third," and the like, are merely to distinguish similar objects and do not denote a particular ordering for the objects.
Example one
Referring to fig. 1, a website availability detection method for a transparent proxy according to a first embodiment of the present invention is shown, and the method specifically includes steps S101 to S104:
s101, acquiring a site IP to be detected, and establishing a learning model corresponding to the site IP to be detected;
in specific implementation, a site IP to be detected for health availability is selected from a plurality of sites to be protected by the WAF, the selected site IP is input into a function menu of a machine learning model to learn, the number of learning cycles, the static/dynamic type URL, and each learning cycle unit (5 minutes, 10 minutes, 20 minutes, or 30 minutes) in the machine learning model are set, the system pays attention to URL requests meeting conditions in the unit time according to each preset learning cycle unit, and collects related information of a plurality of URL requests, wherein the step of collecting information comprises the following steps:
s11, recording the time stamp (accurate millisecond unit) of each URL request reaching the WAF external network port;
s12, recording a response code carried in a server response message, such as http code 200, 30x, 40x and 50x, corresponding to each URL request from the server to the WAF internal network port;
s13, recording the time stamp (accurate millisecond unit) when the server response message corresponding to each URL request reaches the WAF internal network port;
s14, record the timestamp (in precise millisecond unit) when the response packet corresponding to each URL request leaves the WAF external network port.
The flow from the request client to the server is sequentially the request client, the WAF external network port, the WAF internal network port and the server IP.
The machine calculates the result:
in a learning cycle unit, according to the response codes generated in step S12, the distribution ratio a of the occurrence of the most critical response code 50x (e.g. 500, 502, 503 representing URL request failure) is counted, wherein
Figure BDA0003206207320000061
Calculating the average value A1 of the occurrence rate of the most critical response code 50x according to the total number of learning cycles
Figure BDA0003206207320000062
Calculating median values B1 of all B in the period according to the difference between the step S14 and the step S11, namely the time difference B between the request of the request client and the response receiving (all B in the single period are sorted from small to big, and the median value B1 is found by adopting a dichotomy), and calculating an average value B2 of a plurality of median values B1 according to the total number of learning periods, wherein the average value B2 is obtained
Figure BDA0003206207320000071
According to the difference between the step S14 and the step S13, that is, the time difference C consumed by the response packet of the WAF forwarding server in the inside, the median value C1 of all C in the period is calculated (all C in a single period are sorted from small to large, the median value C1 is found by using the dichotomy), and according to the total number of learning periods, the average value C2 of the plurality of median values C1 is calculated, wherein the average value C2 is calculated
Figure BDA0003206207320000072
Combining A1, B2 and C2, and forming a machine learning model baseline from three dimensions aiming at the same site IP, wherein the first dimension is the occurrence rate of URL request failure, namely the final A value; the second dimension is the time difference from the URL request to the receiving, namely the final B value; the third dimension is the time difference between the WAF internally processing the URL response message, namely the final C value, and the upper limit of the normal dynamic floating interval is set for A, B, C numerical values, for example, the normal maximum floating rate is + 10%.
S102, extracting a target IP of daily Web service access at an external network interface of the WAF, and judging whether the target IP is consistent with a modeling IP of the learning model or not;
in specific implementation, when a client URL request packet reaches a WAF external network port, the WAF extracts a source IP, a destination IP, a source port, and a destination port according to a flow, and determines whether the destination IP is consistent with the modeling IP of the learning model.
S103, if the target IP is consistent with the modeling IP of the learning model, controlling the WAF to perform key information statistics on Web request information conforming to the modeling IP, and judging whether the key information is in a preset normal range;
in specific implementation, after learning is completed, the WAF counts A, B, C values of URL requests and response packets meeting the information acquisition requirements, and then respectively counts three a averages, three B averages, and three C averages according to the latest 200 requests, the latest 300 requests, and the latest 400 requests.
And judging whether three values of the same type continuously exceed the upper limit value of the normal production area, namely, if any three values of the average value of the three A, the average value of the three B and the average value of the three C exceed the upper limit value, judging that the normal production area is abnormal.
And S104, if the key information is not in the preset normal range, performing non-proxy processing on the target IP so as to transparently transmit the key information according to a Linux bridge two-layer forwarding mode.
Through the judgment of step S103, if the access experience is not good, the WAF may consider that the access experience is abnormal, or there may be a problem in the self-agent link of the WAF, so that the WAF directly passes through the new URL request generated subsequently in a bridge-through manner, and bypasses the WAF processing.
To sum up, in the website availability detection method in the above embodiment of the present invention, a Web application firewall is connected in series to the outside of a Web server in a 7-layer proxy working mode, based on the WAF machine learning capability, a large model is established that includes the distribution of the http status codes of the website services and the delay time consumed by forwarding the http status codes inside the WAF as a baseline, the http response key information of the Web server and the key aging information processed by the WAF are learned and monitored by using the WAF machine learning function, and the sustainability of service access is improved by reasonably modeling and linking with a bridge forwarding mechanism, when a client is requested to access the website service, after the time consumed from sending a request to receiving the http status code exceeds a warning threshold value, the WAF stops acting for the server, the flow is forwarded in a bridge direct mode, and the influence on the usability of the website caused by the WAF is avoided.
Example two
Referring to fig. 2, a website availability detection method for a transparent proxy according to a second embodiment of the present invention is shown, and the method specifically includes steps S201 to S203:
s201, acquiring a site IP to be detected, and establishing a learning model corresponding to the site IP to be detected;
in specific implementation, a site IP to be detected for health availability is selected from a plurality of sites to be protected by the WAF, the selected site IP is input into a function menu of a machine learning model to learn, the number of learning cycles, the static/dynamic type URL, and each learning cycle unit (5 minutes, 10 minutes, 20 minutes, or 30 minutes) in the machine learning model are set, the system pays attention to URL requests meeting conditions in the unit time according to each preset learning cycle unit, and collects related information of a plurality of URL requests, wherein the step of collecting information comprises the following steps:
s11, recording the time stamp (accurate millisecond unit) of each URL request reaching the WAF external network port;
s12, recording a response code carried in a server response message, such as http code 200, 30x, 40x and 50x, corresponding to each URL request from the server to the WAF internal network port;
s13, recording the time stamp (accurate millisecond unit) when the server response message corresponding to each URL request reaches the WAF internal network port;
s14, record the timestamp (in precise millisecond unit) when the response packet corresponding to each URL request leaves the WAF external network port.
The flow from the request client to the server is sequentially the request client, the WAF external network port, the WAF internal network port and the server IP.
The machine calculates the result:
in a learning cycle unit, according to the response codes generated in step S12, the distribution ratio a of the occurrence of the most critical response code 50x (e.g. 500, 502, 503 representing URL request failure) is counted, wherein
Figure BDA0003206207320000091
Calculating the average value A1 of the occurrence rate of the most critical response code 50x according to the total number of learning cycles
Figure BDA0003206207320000092
Calculating median values B1 of all B in the period according to the difference between the step S14 and the step S11, namely the time difference B between the request of the client and the response receiving (all B in the single period are sorted from small to big, and the median value B1 is found by adopting a dichotomy), and calculating the number of the multiple B according to the total number of learning periodsAverage of the median values B1B 2, where
Figure BDA0003206207320000093
According to the difference between the step S14 and the step S13, that is, the time difference C consumed by the response packet of the WAF forwarding server in the inside, the median value C1 of all C in the period is calculated (all C in a single period are sorted from small to large, the median value C1 is found by using the dichotomy), and according to the total number of learning periods, the average value C2 of the plurality of median values C1 is calculated, wherein the average value C2 is calculated
Figure BDA0003206207320000094
Combining A1, B2 and C2, and forming a machine learning model baseline from three dimensions aiming at the same site IP, wherein the first dimension is the occurrence rate of URL request failure, namely the final A value; the second dimension is the time difference from the URL request to the receiving, namely the final B value; the third dimension is the time difference between the WAF internally processing the URL response message, namely the final C value, and the upper limit of the normal dynamic floating interval is set for A, B, C numerical values, for example, the normal maximum floating rate is + 10%.
S202, extracting a target IP of daily Web service access at an external network interface of the WAF, and judging whether the target IP is consistent with a modeling IP of the learning model or not;
in specific implementation, when a client URL request packet reaches a WAF external network port, the WAF extracts a source IP, a destination IP, a source port, and a destination port according to a flow, and determines whether the destination IP is consistent with the modeling IP of the learning model.
And S203, if the target IP is not consistent with the modeling IP of the learning model, the target IP is proxied to a back-end server according to a normal process.
In specific implementation, when the destination IP in step S202 is not in the monitoring list, the WAF does not need to collect various A, B, C values, and proxies and forwards the values to the backend server according to the normal flow. The IP in the monitoring list refers to the IP which is set by the WAF interface and needs to be protected for site learning.
In summary, in the website availability detection method in the above embodiment of the present invention, a Web application firewall is connected in series to the outside of a Web server in a 7-layer proxy working mode, based on the WAF machine learning capability, a large model is established that includes a range distribution of http status codes of a website service and a delay time consumed by forwarding the http status codes inside a WAF as a baseline, the http response key information of the Web server and the key aging information processed by the WAF are learned and monitored by using the WAF machine learning function, and the sustainability of service access is improved by reasonably modeling and linking with a bridge forwarding mechanism.
EXAMPLE III
Referring to fig. 3, a website availability detection method for a transparent proxy according to a third embodiment of the present invention is shown, and the method specifically includes steps S301 to S304:
s301, acquiring a site IP to be detected, and establishing a learning model corresponding to the site IP to be detected;
in specific implementation, a site IP to be detected for health availability is selected from a plurality of sites to be protected by the WAF, the selected site IP is input into a function menu of a machine learning model to learn, the number of learning cycles, the static/dynamic type URL, and each learning cycle unit (5 minutes, 10 minutes, 20 minutes, or 30 minutes) in the machine learning model are set, the system pays attention to URL requests meeting conditions in the unit time according to each preset learning cycle unit, and collects related information of a plurality of URL requests, wherein the step of collecting information comprises the following steps:
s11, recording the time stamp (accurate millisecond unit) of each URL request reaching the WAF external network port;
s12, recording a response code carried in a server response message, such as http code 200, 30x, 40x and 50x, corresponding to each URL request from the server to the WAF internal network port;
s13, recording the time stamp (accurate millisecond unit) when the server response message corresponding to each URL request reaches the WAF internal network port;
s14, record the timestamp (in precise millisecond unit) when the response packet corresponding to each URL request leaves the WAF external network port.
The flow from the request client to the server is sequentially the request client, the WAF external network port, the WAF internal network port and the server IP.
The machine calculates the result:
in a learning cycle unit, according to the response codes generated in step S12, the distribution ratio a of the occurrence of the most critical response code 50x (e.g. 500, 502, 503 representing URL request failure) is counted, wherein
Figure BDA0003206207320000101
Calculating the average value A1 of the occurrence rate of the most critical response code 50x according to the total number of learning cycles
Figure BDA0003206207320000102
Calculating median values B1 of all B in the period according to the difference between the step S14 and the step S11, namely the time difference B between the request of the request client and the response receiving (all B in the single period are sorted from small to big, and the median value B1 is found by adopting a dichotomy), and calculating an average value B2 of a plurality of median values B1 according to the total number of learning periods, wherein the average value B2 is obtained
Figure BDA0003206207320000103
According to the difference between the step S14 and the step S13, that is, the time difference C consumed by the response packet of the WAF forwarding server in the inside, the median value C1 of all C in the period is calculated (all C in a single period are sorted from small to large, the median value C1 is found by using the dichotomy), and according to the total number of learning periods, the average value C2 of the plurality of median values C1 is calculated, wherein the average value C2 is calculated
Figure BDA0003206207320000111
Combining A1, B2 and C2, and forming a machine learning model baseline from three dimensions aiming at the same site IP, wherein the first dimension is the occurrence rate of URL request failure, namely the final A value; the second dimension is the time difference from the URL request to the receiving, namely the final B value; the third dimension is the time difference between the WAF internally processing the URL response message, namely the final C value, and the upper limit of the normal dynamic floating interval is set for A, B, C numerical values, for example, the normal maximum floating rate is + 10%.
S302, extracting a target IP of daily Web service access at an external network interface of the WAF, and judging whether the target IP is consistent with a modeling IP of the learning model or not;
in specific implementation, when a client URL request packet reaches a WAF external network port, the WAF extracts a source IP, a destination IP, a source port, and a destination port according to a flow, and determines whether the destination IP is consistent with the modeling IP of the learning model.
S303, if the target IP is consistent with the modeling IP of the learning model, controlling the WAF to perform key information statistics on Web request information conforming to the modeling IP, and judging whether the key information is in a preset normal range;
in specific implementation, after learning is completed, the WAF counts A, B, C values of URL requests and response packets meeting the information acquisition requirements, and then respectively counts three a averages, three B averages, and three C averages according to the latest 200 requests, the latest 300 requests, and the latest 400 requests.
And judging whether three values of the same type continuously exceed the upper limit value of the normal production area, namely, if any three values of the average value of the three A, the average value of the three B and the average value of the three C exceed the upper limit value, judging that the normal production area is abnormal.
And S304, if the key information is in the preset normal range, the target IP is proxied to a back-end server according to a normal process.
In specific implementation, after the determination in step S103, the WAF considers that the access is normal and continues to maintain the conventional data transfer method if the upper limit is not exceeded.
To sum up, in the website availability detection method in the above embodiment of the present invention, a Web application firewall is connected in series to the outside of a Web server in a 7-layer proxy working mode, based on the WAF machine learning capability, a large model is established that includes the distribution of the http status codes of the website services and the delay time consumed by forwarding the http status codes inside the WAF as a baseline, the http response key information of the Web server and the key aging information processed by the WAF are learned and monitored by using the WAF machine learning function, and the sustainability of service access is improved by reasonably modeling and linking with a bridge forwarding mechanism, when a client is requested to access the website service, after the time consumed from sending a request to receiving the http status code exceeds a warning threshold value, the WAF stops acting for the server, the flow is forwarded in a bridge direct mode, and the influence on the usability of the website caused by the WAF is avoided.
Example four
Referring to fig. 4, a website availability detection system according to a fourth embodiment of the present invention is shown for a transparent proxy, and the system includes:
the learning module 11 is configured to acquire a station IP to be detected, and establish a learning model corresponding to the station IP to be detected;
further, the learning module 11 includes:
the learning unit 111 is configured to input the site IP detected by the health availability into a machine learning model for learning, and set parameters of the machine learning model, where the parameters include the number of learning cycles, a static/dynamic type url, and each learning cycle unit;
an obtaining unit 112, configured to obtain a url request meeting a condition in each learning cycle unit, and collect relevant information of a plurality of url requests.
Further, the obtaining unit 112 further includes:
a first recording sub-unit 1121, configured to record, as a first timestamp, a timestamp of when each url request reaches the WAF external network port;
a second recording subunit 1122, configured to record, as a response code, a response code carried in a server response message from the server corresponding to each url request to the web port in the WAF;
a third recording subunit 1123, configured to record, as a second timestamp, a timestamp when the server response packet corresponding to each url request reaches the WAF internal network interface;
a fourth recording subunit 1124, configured to record, as a third timestamp, a timestamp when the response packet corresponding to each url request leaves the WAF external network port.
And the constructing unit 113 is configured to determine a baseline of the machine learning model according to the relevant information, so as to obtain a learning model corresponding to the site IP.
Further, the building unit 113 includes:
an acquisition subunit 1131, configured to acquire, in one learning period unit, a distribution ratio of the most critical response codes in the response codes;
a first calculating subunit 1132, configured to obtain, according to the first timestamp and the third timestamp, a first time difference from sending of a request to receiving of a response by the obtaining client, and calculate a median of all first time differences in this period;
a second calculating subunit 1133, configured to obtain a second time difference value consumed by the response packet of the WAF forwarding server according to the second time stamp and the third time stamp, and calculate a median of all the second time difference values in this period;
a third calculating subunit 1134, configured to calculate an average value of the occurrence rates of the most critical response codes, an average value of the median of all the first time difference values, and an average value of the median of all the second time difference values according to the distribution proportion of the occurrence rates of the most critical response codes, the median of all the first time difference values, and the total number of the learning periods;
a constructing subunit 1135, configured to determine a machine learning model baseline according to the average value of the most critical response code occurrence rates, the average value of the median of all the first time difference values, and the average value of the median of all the second time difference values.
The judging module 12 is configured to extract a destination IP of daily Web service access at an external port of the WAF, and judge whether the destination IP is consistent with the modeling IP of the learning model;
further, the determining module 12 includes:
and the first processing unit 121 is configured to forward the destination IP to a back-end server according to a normal flow proxy if the destination IP is inconsistent with the modeling IP of the learning model.
The control module 13 is configured to control the WAF to perform key information statistics on the Web request information conforming to the modeling IP if the target IP is consistent with the modeling IP of the learning model, and determine whether the key information is within a preset normal range;
and the processing module 14 is configured to perform non-proxy processing on the destination IP if the key information is not within the preset normal range, so as to transparently transmit the key information in a Linux bridge two-layer forwarding manner.
Further, the processing module 14 includes:
and the second processing unit 141 is configured to forward the destination IP to a back-end server according to a normal flow proxy if the key information is within the preset normal range.
In summary, the website availability detection system in the above embodiment of the present invention is connected in series outside the Web server through the Web application firewall in the working mode of 7-layer proxy, and based on the WAF machine learning capability, a large model which comprises the range distribution of the website service http state code and the delay time consumed by forwarding the http state code in the WAF as a baseline is established through the learning module 11, the judging module 12 effectively judges whether the target IP needs to be learned, when the target IP is learned, the control module 13 effectively controls the WAF to count the key aging information, when the client is requested to access the web service, the processing module 14 determines that the time consumed from sending the request to receiving the http status code exceeds the alert threshold, the WAF stops acting on the server, the flow is forwarded in a bridge direct mode, and the influence on the usability of the website caused by the WAF is avoided.
EXAMPLE five
Referring to fig. 5, the website availability detecting apparatus according to a fifth embodiment of the present invention is shown, which includes a server, where the server includes a memory 10, a processor 20, and a computer program 30 stored in the memory 10 and running on the processor 20, and the processor 20 implements the website availability detecting method when executing the computer program 30.
In specific implementation, when the processor 20 obtains the station IP to be detected, and establishes a learning model corresponding to the station IP to be detected;
the processor 20 extracts a target IP of daily Web service access at the external network interface of the WAF, and judges whether the target IP is consistent with the modeling IP of the learning model;
if the target IP is consistent with the modeling IP of the learning model, the processor 20 controls the WAF to perform key information statistics on Web request information conforming to the modeling IP, and judges whether the key information is in a preset normal range;
if the key information is not in the preset normal range, the processor 20 performs non-proxy processing on the destination IP so as to transparently transmit the key information according to a Linux bridge two-layer forwarding mode.
The memory 10 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a magnetic memory, a magnetic disk, an optical disk, and the like. The memory 10 may in some embodiments be an internal storage unit of the vehicle, such as a hard disk of the vehicle. The memory 10 may also be an external storage device of the vehicle in other embodiments, such as a plug-in hard disk provided on the vehicle, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like. Further, the memory 10 may also include both an internal storage unit and an external storage device of the vehicle. The memory 10 may be used not only to store application software installed in the vehicle and various types of data, but also to temporarily store data that has been output or is to be output.
In some embodiments, the processor 20 may be an Electronic Control Unit (ECU), a Central Processing Unit (CPU), a controller, a microcontroller, a microprocessor or other data Processing chip, and is configured to run program codes stored in the memory 10 or process data, such as executing an access restriction program.
It should be noted that the structure shown in fig. 5 does not constitute a limitation of the website availability detection apparatus, and in other embodiments, the website availability detection apparatus may include fewer or more components than those shown, or some components may be combined, or a different arrangement of components.
The website availability detection device is connected in series outside a Web server in a working mode of 7-layer agents through a Web application firewall, based on WAF machine learning capacity, a large model which contains website service http state code range distribution and takes delay time consumed by forwarding the http state code inside the WAF as a baseline is established, the HTTP response key information of the Web server and key aging information processed by the WAF are learned and monitored by utilizing a WAF machine learning function, and after reasonable modeling and linkage of a bridge forwarding mechanism, the sustainability of service access is improved.
An embodiment of the present invention further provides a readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the website availability detection method as described above.
Those of skill in the art will understand that the logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be viewed as implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (10)

1. A website availability detection method, the method comprising:
acquiring a site IP to be detected, and establishing a learning model corresponding to the site IP to be detected;
extracting a target IP of daily Web service access at an external network interface of the WAF, and judging whether the target IP is consistent with a modeling IP of the learning model or not;
if the target IP is consistent with the modeling IP of the learning model, controlling the WAF to perform key information statistics on Web request information conforming to the modeling IP, and judging whether the key information is in a preset normal range;
and if the key information is not in the preset normal range, performing non-proxy processing on the target IP so as to transparently transmit the key information according to a Linux bridge two-layer forwarding mode.
2. The website availability detection method of claim 1, wherein after the step of determining whether the destination IP is consistent with the modeled IP of the learning model, the method further comprises:
and if the target IP is not consistent with the modeling IP of the learning model, the target IP is proxied to a back-end server according to a normal process.
3. The website availability detection method according to claim 1, wherein the step of obtaining the to-be-detected site IP and establishing the learning model corresponding to the to-be-detected site IP comprises:
inputting the site IP detected by the health availability into a machine learning model for learning, and setting parameters of the machine learning model, wherein the parameters comprise the number of learning cycles, static/dynamic types url and each learning cycle unit;
and acquiring the url request meeting the conditions in each learning cycle unit, acquiring relevant information of a plurality of url requests, and determining a base line of a machine learning model according to the relevant information to obtain the learning model corresponding to the site IP.
4. The website availability detection method of claim 3, wherein the step of collecting information related to the url requests comprises:
recording a time stamp of each url request reaching the external network port of the WAF as a first time stamp;
recording a response code carried in a server response message of each server corresponding to the url request reaching the WAF internal network port as a response code;
recording a timestamp of the server response message corresponding to each url request reaching the internal network port of the WAF as a second timestamp;
and recording a timestamp of the response message corresponding to each url request leaving the external network port of the WAF as a third timestamp.
5. The website usability detection method according to claim 4, wherein the step of determining a baseline of a machine learning model according to the relevant information comprises:
in one learning period unit, acquiring the distribution proportion of the most critical response codes in the response codes;
obtaining a first time difference value from the request sending to the response receiving of the obtaining client according to the first time stamp and the third time stamp, and calculating the median of all the first time difference values in the period;
obtaining a second time difference value consumed inside the response message of the WAF forwarding server according to the second time stamp and the third time stamp, and calculating the median of all the second time difference values in the period;
calculating the average value of the occurrence rate of the most critical response codes, the average value of the median of all the first time difference values and the average value of the median of all the second time difference values according to the distribution proportion of the occurrence of the most critical response codes, the median of all the first time difference values and the median of all the second time difference values and the total number of the learning periods;
and determining a machine learning model baseline according to the average value of the most critical response code occurrence rates, the average value of the median of all the first time difference values and the average value of the median of all the second time difference values.
6. The website availability detection method according to claim 1, wherein after the step of determining whether the key information is within a preset normal range, the method further comprises:
and if the key information is in the preset normal range, the target IP is proxied to a back-end server according to a normal flow.
7. A website availability detection system, the system comprising:
the learning module is used for acquiring the IP of the station to be detected and establishing a learning model corresponding to the IP of the station to be detected;
the judging module is used for extracting a target IP of daily Web service access at an external network interface of the WAF and judging whether the target IP is consistent with the modeling IP of the learning model or not;
the control module is used for controlling the WAF to carry out key information statistics on Web request information conforming to the modeling IP if the target IP is consistent with the modeling IP of the learning model, and judging whether the key information is in a preset normal range or not;
and the processing module is used for carrying out non-proxy processing on the target IP if the key information is not in the preset normal range so as to transmit the key information in a two-layer forwarding mode of the Linux network bridge.
8. The website availability detection system of claim 7, wherein the determination module comprises:
and the first processing unit is used for transmitting the target IP to a back-end server according to a normal process proxy if the target IP is inconsistent with the modeling IP of the learning model.
9. A readable storage medium on which a computer program is stored, which when executed by a processor implements a website availability detection method according to any one of claims 1 to 6.
10. A website availability detection apparatus comprising a server comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the website availability detection method according to any one of claims 1 to 6 when executing the computer program.
CN202110917616.5A 2021-08-11 2021-08-11 Website availability detection method, system, readable storage medium and device Active CN113660244B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110917616.5A CN113660244B (en) 2021-08-11 2021-08-11 Website availability detection method, system, readable storage medium and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110917616.5A CN113660244B (en) 2021-08-11 2021-08-11 Website availability detection method, system, readable storage medium and device

Publications (2)

Publication Number Publication Date
CN113660244A true CN113660244A (en) 2021-11-16
CN113660244B CN113660244B (en) 2023-02-24

Family

ID=78491326

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110917616.5A Active CN113660244B (en) 2021-08-11 2021-08-11 Website availability detection method, system, readable storage medium and device

Country Status (1)

Country Link
CN (1) CN113660244B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115994070A (en) * 2023-03-21 2023-04-21 深圳市明源云科技有限公司 System availability detection method and device, electronic equipment and readable storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107528749A (en) * 2017-08-28 2017-12-29 杭州安恒信息技术有限公司 Website Usability detection method, apparatus and system based on cloud protection daily record
CN109325193A (en) * 2018-10-16 2019-02-12 杭州安恒信息技术股份有限公司 WAF normal discharge modeling method and device based on machine learning
CN111464376A (en) * 2020-03-05 2020-07-28 奇安信科技集团股份有限公司 Website availability monitoring method and device, storage medium and computer equipment

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107528749A (en) * 2017-08-28 2017-12-29 杭州安恒信息技术有限公司 Website Usability detection method, apparatus and system based on cloud protection daily record
CN109325193A (en) * 2018-10-16 2019-02-12 杭州安恒信息技术股份有限公司 WAF normal discharge modeling method and device based on machine learning
CN111464376A (en) * 2020-03-05 2020-07-28 奇安信科技集团股份有限公司 Website availability monitoring method and device, storage medium and computer equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115994070A (en) * 2023-03-21 2023-04-21 深圳市明源云科技有限公司 System availability detection method and device, electronic equipment and readable storage medium

Also Published As

Publication number Publication date
CN113660244B (en) 2023-02-24

Similar Documents

Publication Publication Date Title
EP3855692A1 (en) Network security monitoring method, network security monitoring device, and system
US8135979B2 (en) Collecting network-level packets into a data structure in response to an abnormal condition
CN102624706B (en) Method for detecting DNS (domain name system) covert channels
CN110049130A (en) A kind of service arrangement and method for scheduling task and device based on edge calculations
CN107087014B (en) Load balancing method and controller thereof
JP6692178B2 (en) Communications system
CN106470123A (en) Log collecting method, client, server and electronic equipment
CN110855741B (en) Service self-adaptive access method and device, storage medium and electronic device
CN113660244B (en) Website availability detection method, system, readable storage medium and device
CN109495530A (en) A kind of real time traffic data transmission method, transmitting device and Transmission system
JP2004164553A (en) Server computer protection apparatus and method, server computer protection program, and server computer
CN112165445A (en) Method, device, storage medium and computer equipment for detecting network attack
CN107992416B (en) Method and device for determining webpage time delay
CN108737344B (en) Network attack protection method and device
CN114513467A (en) Network traffic load balancing method and device of data center
CN110719286A (en) Network optimization scheme sharing system and method based on big data
KR101087291B1 (en) A method for identifying whole terminals using internet and a system thereof
CN112383513B (en) Crawler behavior detection method and device based on proxy IP address pool and storage medium
CN107870848B (en) Method, device and system for detecting CPU performance conflict
CN108063814A (en) A kind of load-balancing method and device
CN110166518B (en) Session information transmission method, device, storage medium and electronic device
CN111163079A (en) System, method, storage medium and device for distributing and controlling reported data of device
US20220255827A1 (en) Method, apparatus and system for diagnosing network performance
CN106341342A (en) Communication connection maintaining method and device, terminal and server
CN108965386A (en) A kind of recognition methods of shared access terminal and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant