CN116800518A - Method and device for adjusting network protection strategy - Google Patents

Method and device for adjusting network protection strategy Download PDF

Info

Publication number
CN116800518A
CN116800518A CN202310862199.8A CN202310862199A CN116800518A CN 116800518 A CN116800518 A CN 116800518A CN 202310862199 A CN202310862199 A CN 202310862199A CN 116800518 A CN116800518 A CN 116800518A
Authority
CN
China
Prior art keywords
alarm
determining
logs
similarity
degree
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310862199.8A
Other languages
Chinese (zh)
Inventor
钟良志
严劲
蔡锋
王彦婷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Telecom Technology Innovation Center
China Telecom Corp Ltd
Original Assignee
China Telecom Technology Innovation Center
China Telecom Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Telecom Technology Innovation Center, China Telecom Corp Ltd filed Critical China Telecom Technology Innovation Center
Priority to CN202310862199.8A priority Critical patent/CN116800518A/en
Publication of CN116800518A publication Critical patent/CN116800518A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/20Network architectures or network communication protocols for network security for managing network security; network security policies in general
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/02Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The application discloses a method and a device for adjusting a network protection strategy, which are used for realizing automatic adjustment of the network protection strategy to reduce the false alarm rate and saving the labor and time cost consumed by reducing the false alarm rate. The method comprises the following steps: acquiring a plurality of alarm logs generated by a firewall for a first client in a set time period; each alarm log is generated by the firewall after intercepting one piece of abnormal data from the first client; determining false alarm rates of a plurality of pieces of abnormal data from the first client according to the similarity degree of the contents of the plurality of alarm logs; wherein the similarity is positively correlated with the false positive rate; and when the false alarm rate is larger than the set threshold value, adjusting a network protection strategy according to the protection rule matched with the plurality of abnormal data.

Description

Method and device for adjusting network protection strategy
Technical Field
The present application relates to the field of network information security technologies, and in particular, to a method and an apparatus for adjusting a network protection policy.
Background
Before the Web application protection system (Web Application Firewall, WAF) is deployed in the Web application, user requests sent to the Web application are analyzed and checked based on a regular and machine learning algorithm model, and abnormal traffic is identified. The current WAF protection mode is realized in a feature matching mode, namely when the feature successfully matched with the set rule exists in the flow, an alarm is triggered or intercepted, so that the normal flow is easily marked as the abnormal flow in error, and false alarm is generated. With the continuous evolution of various attacks and the gradual complexity of Web application, the WAF false alarm probability is higher and higher, and the influence on the normal business of the Web application is also larger and higher. Therefore, a large amount of professional technicians are required to analyze false alarms, maintain and correct the WAF protection strategy.
Disclosure of Invention
The application provides a method and a device for adjusting a network protection strategy, which are used for realizing automatic adjustment of the protection strategy to reduce the false alarm rate.
In a first aspect, the present application provides a method for adjusting a network protection policy, where the network protection policy includes a plurality of protection rules, and the method includes:
acquiring a plurality of alarm logs generated by a firewall for a first client in a set time period; each alarm log is generated by the firewall after intercepting one piece of abnormal data from the first client;
determining false alarm rates of a plurality of pieces of abnormal data from the first client according to the similarity degree of the contents of the plurality of alarm logs; wherein the degree of similarity is positively correlated with the false positive rate;
and when the false alarm rate is larger than a set threshold value, adjusting the network protection strategy according to the protection rule matched with the plurality of abnormal data.
In some embodiments, the method further comprises:
determining the cycle characteristic values of the plurality of alarm logs according to the generation time of the plurality of alarm logs; the periodic characteristic value is a first value to indicate that the plurality of alarm logs are generated periodically, and the periodic characteristic value is a second value to indicate that the plurality of alarm logs are not generated periodically;
The determining the false alarm rate of the plurality of abnormal data from the first client according to the similarity degree of the content of the plurality of alarm logs specifically includes:
determining the false alarm rate according to the similarity degree and the period characteristic value; the false alarm rate corresponding to the period characteristic value is larger than the false alarm rate corresponding to the period characteristic value as the second value.
In some embodiments, the determined degree of similarity of the content of the plurality of alert logs includes:
extracting alarm types included in each alarm log, and determining the number of alarm types related to the alarm logs;
determining the similarity degree of the contents of the plurality of alarm logs according to the quantity; wherein the number is inversely related to the degree of similarity.
In some embodiments, the determined degree of similarity of the content of the plurality of alert logs includes:
extracting Uniform Resource Identifiers (URIs) included in each alarm log, and determining a reference URI from the extracted URIs; the reference URI is the URI with the longest length in the URIs;
calculating the length of the longest common substring LCS between each extracted URI and the reference URI, and determining the number of LCS with the length greater than a length threshold;
Determining the similarity degree of the contents of the plurality of alarm logs according to the quantity; wherein the number is positively correlated with the degree of similarity.
In some embodiments, the determined degree of similarity of the content of the plurality of alert logs includes:
extracting description texts included in each alarm log, and determining statement vectors corresponding to each description text;
determining the similarity degree of the contents of the plurality of alarm logs according to the distance between the plurality of sentence vectors; wherein the distance is positively correlated with the degree of similarity.
In some embodiments, the obtaining the plurality of alert logs generated by the firewall for the first client in the set time period includes:
acquiring an alarm log generated by the firewall in the set time period;
and identifying the alarm log which comprises the source IP address of the first client and the UA of the user agent UA of the first client from the acquired alarm log.
In some embodiments, adjusting the network protection policy according to protection rules matching the plurality of pieces of abnormal data includes:
and stopping using the protection rules matched with the plurality of abnormal data and included in the network protection rules.
In a second aspect, the present application provides an adjustment device for a network protection policy, where the network protection policy includes a plurality of protection rules, and the device includes:
the obtaining unit is used for obtaining a plurality of alarm logs generated by the firewall for the first client in a set time period; each alarm log is generated by the firewall after intercepting one piece of abnormal data from the first client;
a processing unit configured to perform:
determining false alarm rates of a plurality of pieces of abnormal data from the first client according to the similarity degree of the contents of the plurality of alarm logs; wherein the degree of similarity is positively correlated with the false positive rate;
and when the false alarm rate is larger than a set threshold value, adjusting the network protection strategy according to the protection rule matched with the plurality of abnormal data.
In some embodiments, the processing unit is further configured to:
determining the cycle characteristic values of the plurality of alarm logs according to the generation time of the plurality of alarm logs; the periodic characteristic value is a first value to indicate that the plurality of alarm logs are generated periodically, and the periodic characteristic value is a second value to indicate that the plurality of alarm logs are not generated periodically;
The processing unit is specifically configured to, when determining a false alarm rate of the plurality of pieces of abnormal data from the first client according to the similarity degree of the content of the plurality of alarm logs:
determining the false alarm rate according to the similarity degree and the period characteristic value; the false alarm rate corresponding to the period characteristic value is larger than the false alarm rate corresponding to the period characteristic value as the second value.
In some embodiments, the processing unit is specifically configured to:
extracting alarm types included in each alarm log, and determining the number of alarm types related to the alarm logs;
determining the similarity degree of the contents of the plurality of alarm logs according to the quantity; wherein the number is inversely related to the degree of similarity.
In some embodiments, the processing unit is specifically configured to:
extracting Uniform Resource Identifiers (URIs) included in each alarm log, and determining a reference URI from the extracted URIs; the reference URI is the URI with the longest length in the URIs;
calculating the length of the longest common substring LCS between each extracted URI and the reference URI, and determining the number of LCS with the length greater than a length threshold;
Determining the similarity degree of the contents of the plurality of alarm logs according to the quantity; wherein the number is positively correlated with the degree of similarity.
In some embodiments, the processing unit is specifically configured to:
extracting description texts included in each alarm log, and determining statement vectors corresponding to each description text;
determining the similarity degree of the contents of the plurality of alarm logs according to the distance between the plurality of sentence vectors; wherein the distance is positively correlated with the degree of similarity.
In some embodiments, the acquiring unit is specifically configured to:
acquiring an alarm log generated by the firewall in the set time period;
and identifying the alarm log which comprises the source IP address of the first client and the UA of the user agent UA of the first client from the acquired alarm log.
In some embodiments, the processing unit is specifically configured to:
and stopping using the protection rules matched with the plurality of abnormal data and included in the network protection rules.
In a third aspect, an electronic device is provided that includes a controller and a memory. The memory is used for storing computer-executable instructions, and the controller executes the computer-executable instructions in the memory to perform the operational steps of any one of the possible implementations of the method of the first aspect using hardware resources in the controller.
In a fourth aspect, there is provided a computer readable storage medium having instructions stored therein which, when run on a computer, cause the computer to perform the methods of the above aspects.
Based on the characteristics of randomness and variability of malicious attack data, the singleness and regularity of normal data and the like, the method determines whether the plurality of abnormal data from the same client are false alarms or not by analyzing the alarm logs correspondingly generated by the plurality of abnormal data from the same client and determining whether the plurality of abnormal data accord with the characteristics of the malicious attack data according to the similarity degree of the plurality of alarm logs. If the false alarm is determined, the network protection strategy can be timely adjusted, and the false alarm rate is reduced. Compared with the traditional method of analyzing by professionals, the method reduces the false alarm rate, and the scheme of the application effectively saves labor cost and time.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:
FIG. 1 is a schematic diagram of a system architecture according to an embodiment of the present application;
Fig. 2 is a flowchart of a method for adjusting a network protection policy according to an embodiment of the present application;
fig. 3 is a flowchart of another method for adjusting a network protection policy according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an adjustment device for network protection policy according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the technical solutions of the present application, but not all embodiments. All other embodiments, based on the embodiments described in the present document, which can be obtained by a person skilled in the art without any creative effort, are within the scope of protection of the technical solutions of the present application.
The terms first, second and the like in the description and in the claims and in the above-described figures, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be capable of being practiced otherwise than as specifically illustrated and described. In addition, the term "and/or" herein is merely an association relationship describing an association object, and means that three relationships may exist, for example, a and/or B may mean: a exists alone, A and B exist together, and B exists alone. The character "/" herein generally indicates that the associated object is an "or" relationship unless otherwise specified.
In order to facilitate understanding of the solution proposed by the present application, technical terms related to the embodiments of the present application are first described:
(1) User Agent (user_agent, UA): a part of the hypertext transfer protocol (HyperText Transfer Protocol, HTTP) protocol, which is part of the header field. The website server determines the version, the processor type, the browser version and other information of the operating system applicable to each user through the UA, so that the website server can send different pages to different user terminals based on the determined information.
(2) Uniform resource identifier (Uniform Resource Identifier, URI): a string for identifying a certain internet resource name, which allows a user to interoperate with any (including local and internet) resource via a specific protocol. In brief, each resource available on the Web, documents, images, video clips, programs, etc., is located by a URI.
(3) Word Frequency-inverse text Frequency (Term Frequency-Inverse Document Frequency, TF-IDF) algorithm: a statistical method for evaluating the importance of a word to one of a set of documents or a corpus of documents. The importance of a word increases proportionally with the number of times it appears in the file, but at the same time decreases inversely with the frequency with which it appears in the corpus. The main idea of TF-IDF is that if a word or phrase appears in one article with a high frequency and rarely appears in other articles, it is considered to have a good class-distinguishing capability, suitable for classification.
The WAF is used as a protection module of the Web application and is used for verifying the traffic sent to the Web application server mainly based on a protection strategy with a preset value. The protection strategy comprises a plurality of protection rules, each protection rule is provided with one or more characteristic fields, so that the WAF can perform characteristic matching on received traffic, namely, whether the traffic contains the characteristic fields included in the protection rules or not is determined, if so, the traffic can be determined to be abnormal traffic to be intercepted, and then the attack type of the abnormal traffic is determined according to the protection rules matched with the abnormal traffic, so that a corresponding alarm log or interception log is generated. The protection strategy based on feature matching is easy to mark normal traffic as abnormal traffic, and has high false alarm rate. In order to reduce the false alarm rate, a professional technician is required to extract the alarm log or the interception log generated by the WAF at regular time for analysis, determine the abnormal flow of false alarm therein and determine the protection rule matched with the abnormal flow of false alarm, and then adjust the protection policy to stop the protection rule generating false alarm. This solution requires a lot of time and labor costs and is not efficient.
Based on the characteristics of randomness and variability of network attack, regularity and singleness of conventional access and the like, the application provides a method and a device for adjusting a network protection strategy, which automatically acquire and analyze alarm logs generated by a firewall for a plurality of pieces of abnormal data (or abnormal traffic) from the same source in a period of time, and determine whether the plurality of pieces of abnormal traffic are false alarms or not based on the similarity among the plurality of logs. Wherein the higher the degree of similarity, the higher the false positive rate. If false, the protection rule matched with the abnormal data can be determined, so that the network protection strategy is adjusted to reduce the false alarm rate. Compared with the traditional scheme of recognizing false alarms, a great deal of labor cost and time are required to be input, the scheme of the application can realize automatic recognition of false alarms and automatic adjustment of protection strategies, saves labor cost and improves the efficiency of recognizing false alarms to adjust the protection strategies.
In order to facilitate understanding of the solution of the present application, a system architecture to which the solution of the present application is applicable will be first described. Referring to fig. 1, a schematic diagram of a system architecture according to an embodiment of the present application includes a client, a protection engine, and an analysis engine. The protection engine may also be referred to as a firewall, and is configured to determine whether data from the client is abnormal data according to a pre-configured network protection policy. Specifically, the network protection policy configured in the protection engine includes a plurality of protection rules, each protection rule includes at least one feature field, and the protection engine determines whether the data is abnormal data according to whether the received data includes the feature field. The protection engine is also used for determining the attack type of the data according to the protection rule matched with the data when determining that the data is abnormal data, and further generating an interception log or an alarm log corresponding to the data.
The analysis engine included in the system shown in fig. 1 is configured to periodically acquire intercepted exception data from the guard engine and an alarm log generated for each exception data. The analysis engine aggregates the acquired abnormal data based on the sources of the acquired abnormal data, extracts a plurality of abnormal data belonging to the same source, and determines whether the extracted plurality of abnormal data is misinformation according to the sending time of the extracted plurality of abnormal data and the content of the corresponding alarm log. If the false alarm is determined, the analysis engine is further used for instructing the protection engine to adjust the network protection strategy. It should be noted that fig. 1 is only an example, and the present application is not limited to the number of analysis engines, protection engines, and clients.
The scheme of the application will be described in detail with reference to the system shown in fig. 1. Referring to fig. 2, a flowchart of a method for adjusting a network protection policy according to an embodiment of the present application is provided. The method flow may be performed by a server of the Web application or by a specific processor, processing chip or processing module in the Web application server, for example. For example, the method flow may be performed by an analysis engine included in the system shown in FIG. 1. The following description will take an analysis engine as an execution body, and the method flow shown in fig. 2 specifically includes:
and 201, acquiring a plurality of alarm logs generated by a protection engine of the Web application server for the first client in a set time period.
Wherein each alert log is generated by the protection engine after intercepting an exception piece of data from the first client. The alarm log includes fields of corresponding exception data, source of the exception data (e.g., may include a source IP address, UA), destination IP address, HOST, generated timestamp, alarm type and payload (payload) of the exception data, etc.
Optionally, the analysis engine may obtain all alarm logs generated by the protection engine in the set period of time, where all alarm logs include alarm logs of abnormal data sent by multiple clients respectively. The analysis engine may aggregate the alarm logs from different sources, i.e. identify alarm logs belonging to the same client from all alarm logs. Illustratively, the analysis engine may aggregate alarm logs belonging to the same source with the source IP address and UA as references (or may be referred to as primary keys). The analysis engine can identify the alarm log of the UA with the source IP being the IP address of the first client and the UA being the first client from all the alarm logs. It should be noted that, the first client in the present application is any one of a plurality of clients that send data to the Web application server, and is not a specific client.
202, determining false alarm rates of a plurality of abnormal data according to the similarity of the contents of a plurality of alarm logs.
The higher the similarity of the content of the plurality of alarm logs, the higher the false alarm rate of the corresponding plurality of abnormal data. The higher the similarity, the higher the regularity and the singleness of the characterization of the plurality of abnormal data, and thus the higher the false positive rate. Positive correlation between false positive rate and similarity.
Alternatively, one or more of the alert type, URI, and descriptive text may be extracted from each alert log, and the false alarm rate may be determined based on the degree of similarity of the extracted content. For example, when the number of alarm logs having the same alarm type is large, it may be determined that the false alarm rate is high. Or when the semantic similarity of the description texts of the plurality of alarm logs is higher, the false alarm rate can be determined to be higher. Of course, in the scheme of the present application, the information for determining the false alarm rate may include more than the three items, for example, the destination IP address in the alarm log may also be included, and the greater the number of alarm logs with the same destination IP address, the higher the corresponding false alarm rate.
And 203, when the false alarm rate is larger than a set threshold value, adjusting a network protection strategy according to protection rules matched with a plurality of abnormal data.
The successfully matched protection rule is a protection rule for recognizing the data error from the first client as abnormal data, namely the successfully matched protection rule is a protection rule used when the protection engine intercepts a plurality of pieces of abnormal data from the first client.
Optionally, when the protection policy of the analysis engine is adjusted according to the successfully matched protection rule, a network protection policy adjustment indication may be sent to the protection engine, where the network protection policy adjustment indication is used to instruct the protection engine to stop using the successfully matched protection rule.
Based on the scheme, the method and the device based on the characteristics of randomness and the variability of malicious attack data, the singleness and the regularity of normal data and the like analyze alarm logs correspondingly generated by a plurality of pieces of abnormal data from the same client, and determine whether the plurality of pieces of abnormal data accord with the characteristics of the malicious attack data according to the similarity degree of the plurality of pieces of alarm logs, so as to determine whether the plurality of pieces of abnormal data from the same client are false alarms. If the false alarm is determined, the network protection strategy can be timely adjusted, and the false alarm rate is reduced. Compared with the traditional method of analyzing by professionals, the method reduces the false alarm rate, and the scheme of the application effectively saves labor cost and time.
In some scenarios, the analysis engine may also be implemented in combination with whether the transmission time of the plurality of pieces of abnormal data accords with the periodicity feature when determining whether the intercepted plurality of pieces of abnormal data from the first client is a false positive. The application provides a characteristic of periodicity as a characteristic of misinformation judgment because the periodicity data is generally generated when a user normally accesses a Web application server at regular intervals and malicious attack data is not periodically transmitted. For example, after the analysis engine obtains the alarm logs generated for the plurality of pieces of abnormal data from the first client from the protection engine, the analysis engine may determine the generation time of each alarm log according to the timestamp information carried in the alarm log, so as to determine the cycle characteristic value of the alarm log according to the generation time of each alarm log. When the value of the periodic characteristic value is a first value, the plurality of alarm logs are characterized as being periodically generated; otherwise, when the value of the period characteristic value is the second value, the characteristic multiple alarm logs are not generated periodically.
Further, when the analysis engine calculates the false alarm rate, the false alarm rate of a plurality of pieces of abnormal data can be determined together according to the similarity degree of the alarm logs and the period characteristic value. Under the condition that the similarity is unchanged, the false alarm rate corresponding to the period characteristic value which is the first value is larger than the false alarm rate corresponding to the period characteristic value which is the second value. As an alternative way, when the analysis engine calculates the false alarm rates of the plurality of pieces of abnormal data based on the similarity degree and the period characteristic value of the plurality of pieces of alarm log contents, the weight corresponding to the similarity degree and the weight corresponding to the period characteristic value which are configured in advance can be obtained, and the weighted sum of the similarity degree and the period characteristic value is used as the false alarm rate of the plurality of pieces of abnormal data.
In one possible implementation, the analysis engine, in determining the degree of similarity of the content of the plurality of alert logs, may be implemented by determining one or more of a degree of similarity of attack types, a degree of similarity of URIs, and a degree of similarity of descriptive text included in the plurality of alert logs. In the following, for convenience of description, the description will be given by taking the example of determining the similarity degree together with three contents of attack type, URI and descriptive text. As an alternative, the analysis engine may calculate an alarm type analogy according to the number of alarm types involved in the plurality of alarm logs, calculate a URI repetition ratio according to the length of the longest common substring (Longest Common Substring, LCS) in the URIs included in the alarm logs, and determine a semantic similarity according to the distance between statement vectors corresponding to the descriptive text in the alarm logs, thereby determining the similarity of the contents of the plurality of alarm logs together according to the alarm type analogy, the URI repetition ratio and the semantic similarity. Optionally, corresponding weights can be configured for each parameter, and a weighted sum of the alarm type analogy, the URI repetition ratio and the semantic similarity is used as the similarity of the contents of the plurality of alarm logs. The following describes the process of calculating the alarm type analogy, URI repetition ratio and semantic similarity in detail.
Embodiment one: and calculating the URI repetition ratio.
For example, the longest URI may be first determined as the base URI from URIs respectively included in the plurality of alert logs. Further, LCS between each of the remaining URIs and the base URI may be obtained and the length of each LCS determined. Still further, the similarity between each URI and the base URI may be determined according to the length of each LCS. So that the repetition ratio of URIs can be determined from the similarity between each URI and the base URI. As an example, the similarity between each URI and the base URI may be calculated using the following formula (1).
Wherein URI is sim LCS for the similarity ratio between the i-th URI and the base URI i Length is the Length of the base URI, which is the Length of LCS between the i-th URI and the base URI.
In determining the repetition ratio of URIs from the similarity between the respective URIs and the base URIs, the repetition ratio of URIs may be calculated using the following equation (2).
Wherein, rep uri For the URI repetition ratio, N is the total number of URIs, and m is the number of URIs whose similarity to the base URI exceeds a preset similarity threshold.
It should be noted that the higher the URI repetition ratio is, the higher the false alarm rate of the abnormal data corresponding to the plurality of alarm logs is.
Embodiment two: an analogy of the alarm type is calculated.
Based on the characteristics of diversified attack types of malicious attack data and single attack type triggered by normal data, the attack type analogy is calculated through the number of attack types related to the abnormal data, so that whether the abnormal data are misjudged is judged. Illustratively, the following equation (3) may be used to calculate the attack type analogy for a plurality of pieces of anomalous data:
in equation (3), rate black Attack for multiple pieces of abnormal dataThe type analogy, N is the number of pieces of abnormal data, and m is the number of attack types related to the pieces of abnormal data. It can be seen that the value of the analogy of the attack type is between 0 and 1, and the larger the value is, the higher the false alarm rate of the abnormal data is.
Embodiment III: and calculating semantic similarity.
For example, the descriptive text included in each alert log may first be converted into a sentence vector. For example, the TF-IDF algorithm may be employed to convert descriptive text into sentence vectors. Further, a cosine distance between any two sentence vectors can be calculated, and the cosine distance is used as the semantic similarity between any two description texts. Still further, the average value of the plurality of semantic similarities obtained through calculation can be used for determining the semantic similarity between the plurality of descriptive texts. Illustratively, the following equation (4) may be used to calculate semantic similarity of descriptive text of the plurality of alert logs:
In the formula (4), sim is the semantic similarity of the description text of the plurality of alarm logs, n is the number of the alarm logs, s i,j The semantic similarity between the description text of the ith alarm log and the description text of the jth alarm log is obtained.
The higher the semantic similarity is, the higher the similarity degree of the description texts of the plurality of alarm logs is represented, and the higher the false alarm rate of the corresponding plurality of abnormal data is.
In some embodiments, after determining the alarm type analogy, URI repetition ratio, and descriptive text semantic similarity, it may be determined whether the pieces of abnormal data are false positives in combination with the periodic feature values of the plurality of alarm logs. As a possible implementation manner, a weight corresponding to the preset URI repetition ratio, a weight corresponding to the attack type category ratio, a weight corresponding to the description text semantic similarity and a weight corresponding to the periodic characteristic value may be obtained, and a weighted sum of the URI repetition ratio, the attack type category ratio, the description text semantic similarity and the periodic characteristic value is used as a false alarm comprehensive score of a plurality of abnormal data. And determining that the plurality of abnormal data are false alarms when the false alarm comprehensive score is higher than a set threshold value. Illustratively, the false positive composite score may be determined using equation (5) below.
score=c t *Rep uri +c s *Rate black +c q *T per +c p * sim; formula (5)
Wherein score is a comprehensive false positive score, rep uri For URI repetition ratio, c t For the weight corresponding to the URI repetition ratio, rate black Analogy to attack types of multiple pieces of abnormal data c s For the corresponding weight of the attack type analog, T per For periodic characteristic value, c q For the weight corresponding to the periodic characteristic value, sim is the semantic similarity of the description text of the plurality of alarm logs, c p And describing weights corresponding to the semantic similarity of the text for the plurality of alarm logs.
In the following, for a further understanding of the aspects of the application, reference is made to specific examples. Referring to fig. 3, a flow chart of a method for adjusting a network protection policy provided by the present application is shown, where the flow chart of the method may be executed by an analysis engine, and specifically includes:
301, acquiring an alarm log generated by a protection engine for intercepted abnormal data in a set time period.
Wherein the intercepted exception data may be from a plurality of clients.
302, an alarm log generated from a plurality of pieces of abnormal data from the first client is identified from the acquired alarm log.
Wherein the first client is any one of a plurality of clients.
303, determining the URI repetition ratio of the plurality of alarm logs according to URIs included in the plurality of alarm logs.
The specific process of determining the URI repetition ratio may refer to the first embodiment, and will not be described herein.
304, determining an analogy of the alarm types according to the attack types included in the plurality of alarm logs.
The specific process of determining the alarm type analogy can be referred to the second embodiment, and will not be described herein.
305, determining semantic similarity of the descriptive text according to the descriptive text included in the plurality of alarm logs.
The process of determining the semantic similarity can refer to the third embodiment, and will not be described herein.
306, determining the cycle characteristic values of the alarm logs according to the time stamps included in the alarm logs.
307, determining false positive comprehensive scores of a plurality of pieces of abnormal data according to the URI repetition ratio, the attack type analogy, the semantic similarity of descriptive text and the period characteristic value.
The process of calculating the false positive composite score can be seen in equation (5) above.
308, when the false alarm comprehensive score is higher than the set threshold, determining a target protection rule matched with the plurality of abnormal data, and indicating the protection module to disable the target protection rule.
Based on the same concept as the above method, referring to fig. 4, an adjustment apparatus 400 of a network protection policy provided for an embodiment of the present application is used to implement each step included in the above method embodiment, and in order to avoid repetition, a detailed description is omitted here. The apparatus 400 comprises: an acquisition unit 401 and a processing unit 402.
An obtaining unit 401, configured to obtain a plurality of alarm logs generated by the firewall for the first client in a set period of time; each alarm log is generated by the firewall after intercepting one piece of abnormal data from the first client;
a processing unit 402 configured to perform:
determining false alarm rates of a plurality of pieces of abnormal data from the first client according to the similarity degree of the contents of the plurality of alarm logs; wherein the degree of similarity is positively correlated with the false positive rate;
and when the false alarm rate is larger than a set threshold value, adjusting the network protection strategy according to the protection rule matched with the plurality of abnormal data.
In some embodiments, the processing unit 402 is further configured to:
determining the cycle characteristic values of the plurality of alarm logs according to the generation time of the plurality of alarm logs; the periodic characteristic value is a first value to indicate that the plurality of alarm logs are generated periodically, and the periodic characteristic value is a second value to indicate that the plurality of alarm logs are not generated periodically;
the processing unit 402 is specifically configured to, when determining a false alarm rate of the plurality of abnormal data from the first client according to the similarity degree of the content of the plurality of alarm logs:
Determining the false alarm rate according to the similarity degree and the period characteristic value; the false alarm rate corresponding to the period characteristic value is larger than the false alarm rate corresponding to the period characteristic value as the second value.
In some embodiments, the processing unit 402 is specifically configured to:
extracting alarm types included in each alarm log, and determining the number of alarm types related to the alarm logs;
determining the similarity degree of the contents of the plurality of alarm logs according to the quantity; wherein the number is inversely related to the degree of similarity.
In some embodiments, the processing unit 402 is specifically configured to:
extracting Uniform Resource Identifiers (URIs) included in each alarm log, and determining a reference URI from the extracted URIs; the reference URI is the URI with the longest length in the URIs;
calculating the length of the longest common substring LCS between each extracted URI and the reference URI, and determining the number of LCS with the length greater than a length threshold;
determining the similarity degree of the contents of the plurality of alarm logs according to the quantity; wherein the number is positively correlated with the degree of similarity.
In some embodiments, the processing unit 402 is specifically configured to:
Extracting description texts included in each alarm log, and determining statement vectors corresponding to each description text;
determining the similarity degree of the contents of the plurality of alarm logs according to the distance between the plurality of sentence vectors; wherein the distance is positively correlated with the degree of similarity.
In some embodiments, the obtaining unit 401 is specifically configured to:
acquiring an alarm log generated by the firewall in the set time period;
and identifying the alarm log which comprises the source IP address of the first client and the UA of the user agent UA of the first client from the acquired alarm log.
In some embodiments, the processing unit 402 is specifically configured to:
and stopping using the protection rules matched with the plurality of abnormal data and included in the network protection rules.
Fig. 5 shows a schematic structural diagram of an electronic device 500 according to an embodiment of the present application. The electronic device 500 in the embodiment of the present application may further include a communication interface 503, where the communication interface 503 is, for example, a network port, and the electronic device may transmit data through the communication interface 503.
In the embodiment of the present application, the memory 502 stores instructions executable by the at least one controller 501, and the at least one controller 501 may be configured to perform each step in the above method by executing the instructions stored in the memory 502, for example, the controller 501 may implement the functions of the obtaining unit 401 and the processing unit 402 in fig. 4.
Wherein the controller 501 is a control center of the electronic device, various interfaces and lines may be utilized to connect various portions of the overall electronic device by running or executing instructions stored in the memory 502 and invoking data stored in the memory 502. Alternatively, the controller 501 may include one or more processing units, and the controller 501 may integrate an application controller and a modem controller, wherein the application controller primarily handles an operating system and application programs, etc., and the modem controller primarily handles wireless communications. It will be appreciated that the modem controller described above may not be integrated into the controller 501. In some embodiments, controller 501 and memory 502 may be implemented on the same chip, or they may be implemented separately on separate chips in some embodiments.
The controller 501 may be a general purpose controller, such as a central processing unit (english: central Processing Unit, abbreviated as CPU), a digital signal controller, an application specific integrated circuit, a field programmable gate array or other programmable logic device, a discrete gate or transistor logic device, or a discrete hardware component, and may implement or perform the methods, steps, and logic blocks disclosed in the embodiments of the present application. The general purpose controller may be a microcontroller or any conventional controller or the like. The steps executed by the data statistics platform disclosed in connection with the embodiment of the application can be directly executed by a hardware controller or can be executed by a combination of hardware and software modules in the controller.
The memory 502, as a non-volatile computer readable storage medium, may be used to store non-volatile software programs, non-volatile computer executable programs, and modules. The Memory 502 may include at least one type of storage medium, and may include, for example, flash Memory, a hard disk, a multimedia card, a card-type Memory, a random access Memory (english: random Access Memory, abbreviated as "RAM"), a static random access Memory (english: static Random Access Memory, abbreviated as "SRAM"), a programmable Read-Only Memory (english: programmable Read Only Memory, abbreviated as "PROM"), a Read Only Memory (english: ROM), a charged erasable programmable Read-Only Memory (english: electrically Erasable Programmable Read-Only Memory, abbreviated as "EEPROM"), a magnetic Memory, a magnetic disk, an optical disk, and the like. Memory 502 is any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to such. The memory 502 in embodiments of the present application may also be circuitry or any other device capable of performing storage functions for storing program instructions and/or data.
By programming the controller 501, for example, codes corresponding to the methods described in the foregoing embodiments may be cured into the chip, so that the chip can execute the steps of the foregoing methods when running, and how to program the controller 501 is a technology known to those skilled in the art will not be repeated herein.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a controller of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the controller of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims (16)

1. A method for adjusting a network protection policy, wherein the network protection policy includes a plurality of protection rules, the method comprising:
acquiring a plurality of alarm logs generated by a firewall for a first client in a set time period; each alarm log is generated by the firewall after intercepting one piece of abnormal data from the first client;
determining false alarm rates of a plurality of pieces of abnormal data from the first client according to the similarity degree of the contents of the plurality of alarm logs; wherein the degree of similarity is positively correlated with the false positive rate;
and when the false alarm rate is larger than a set threshold value, adjusting the network protection strategy according to the protection rule matched with the plurality of abnormal data.
2. The method according to claim 1, wherein the method further comprises:
determining the cycle characteristic values of the plurality of alarm logs according to the generation time of the plurality of alarm logs; the periodic characteristic value is a first value to indicate that the plurality of alarm logs are generated periodically, and the periodic characteristic value is a second value to indicate that the plurality of alarm logs are not generated periodically;
The determining the false alarm rate of the plurality of abnormal data from the first client according to the similarity degree of the content of the plurality of alarm logs specifically includes:
determining the false alarm rate according to the similarity degree and the period characteristic value; the false alarm rate corresponding to the period characteristic value is larger than the false alarm rate corresponding to the period characteristic value as the second value.
3. The method of claim 1 or 2, wherein determining the degree of similarity of the content of the plurality of alert logs comprises:
extracting alarm types included in each alarm log, and determining the number of alarm types related to the alarm logs;
determining the similarity degree of the contents of the plurality of alarm logs according to the quantity; wherein the number is inversely related to the degree of similarity.
4. The method of claim 1 or 2, wherein determining the degree of similarity of the content of the plurality of alert logs comprises:
extracting Uniform Resource Identifiers (URIs) included in each alarm log, and determining a reference URI from the extracted URIs; the reference URI is the URI with the longest length in the URIs;
calculating the length of the longest common substring LCS between each extracted URI and the reference URI, and determining the number of LCS with the length greater than a length threshold;
Determining the similarity degree of the contents of the plurality of alarm logs according to the quantity; wherein the number is positively correlated with the degree of similarity.
5. The method of claim 1 or 2, wherein determining the degree of similarity of the content of the plurality of alert logs comprises:
extracting description texts included in each alarm log, and determining statement vectors corresponding to each description text;
determining the similarity degree of the contents of the plurality of alarm logs according to the distance between the plurality of sentence vectors; wherein the distance is positively correlated with the degree of similarity.
6. The method according to claim 1 or 2, wherein the obtaining a plurality of alert logs generated by the firewall for the first client in a set period of time comprises:
acquiring an alarm log generated by the firewall in the set time period;
and identifying the alarm log which comprises the source IP address of the first client and the UA of the user agent UA of the first client from the acquired alarm log.
7. The method of claim 1 or 2, wherein adjusting the network protection policy according to protection rules matching the plurality of pieces of exception data comprises:
And stopping using the protection rules matched with the plurality of abnormal data and included in the network protection rules.
8. An apparatus for adjusting a network protection policy, wherein the network protection policy includes a plurality of protection rules, the apparatus comprising:
the obtaining unit is used for obtaining a plurality of alarm logs generated by the firewall for the first client in a set time period; each alarm log is generated by the firewall after intercepting one piece of abnormal data from the first client;
a processing unit configured to perform:
determining false alarm rates of a plurality of pieces of abnormal data from the first client according to the similarity degree of the contents of the plurality of alarm logs; wherein the degree of similarity is positively correlated with the false positive rate;
and when the false alarm rate is larger than a set threshold value, adjusting the network protection strategy according to the protection rule matched with the plurality of abnormal data.
9. The apparatus of claim 8, wherein the processing unit is further configured to:
determining the cycle characteristic values of the plurality of alarm logs according to the generation time of the plurality of alarm logs; the periodic characteristic value is a first value to indicate that the plurality of alarm logs are generated periodically, and the periodic characteristic value is a second value to indicate that the plurality of alarm logs are not generated periodically;
The processing unit is specifically configured to, when determining a false alarm rate of the plurality of pieces of abnormal data from the first client according to the similarity degree of the content of the plurality of alarm logs:
determining the false alarm rate according to the similarity degree and the period characteristic value; the false alarm rate corresponding to the period characteristic value is larger than the false alarm rate corresponding to the period characteristic value as the second value.
10. The device according to claim 8 or 9, characterized in that the processing unit is specifically configured to:
extracting alarm types included in each alarm log, and determining the number of alarm types related to the alarm logs;
determining the similarity degree of the contents of the plurality of alarm logs according to the quantity; wherein the number is inversely related to the degree of similarity.
11. The device according to claim 8 or 9, characterized in that the processing unit is specifically configured to:
extracting Uniform Resource Identifiers (URIs) included in each alarm log, and determining a reference URI from the extracted URIs; the reference URI is the URI with the longest length in the URIs;
calculating the length of the longest common substring LCS between each extracted URI and the reference URI, and determining the number of LCS with the length greater than a length threshold;
Determining the similarity degree of the contents of the plurality of alarm logs according to the quantity; wherein the number is positively correlated with the degree of similarity.
12. The device according to claim 8 or 9, characterized in that the processing unit is specifically configured to:
extracting description texts included in each alarm log, and determining statement vectors corresponding to each description text;
determining the similarity degree of the contents of the plurality of alarm logs according to the distance between the plurality of sentence vectors; wherein the distance is positively correlated with the degree of similarity.
13. The apparatus according to claim 8 or 9, wherein the acquisition unit is specifically configured to:
acquiring an alarm log generated by the firewall in the set time period;
and identifying the alarm log which comprises the source IP address of the first client and the UA of the user agent UA of the first client from the acquired alarm log.
14. The device according to claim 8 or 9, characterized in that the processing unit is specifically configured to:
and stopping using the protection rules matched with the plurality of abnormal data and included in the network protection rules.
15. An electronic device, comprising: a memory and a controller;
A memory for storing program instructions;
a controller for invoking program instructions stored in the memory to perform the method of any of claims 1-7 in accordance with the obtained program.
16. A computer storage medium storing computer executable instructions for performing the method of any one of claims 1-7.
CN202310862199.8A 2023-07-13 2023-07-13 Method and device for adjusting network protection strategy Pending CN116800518A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310862199.8A CN116800518A (en) 2023-07-13 2023-07-13 Method and device for adjusting network protection strategy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310862199.8A CN116800518A (en) 2023-07-13 2023-07-13 Method and device for adjusting network protection strategy

Publications (1)

Publication Number Publication Date
CN116800518A true CN116800518A (en) 2023-09-22

Family

ID=88043719

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310862199.8A Pending CN116800518A (en) 2023-07-13 2023-07-13 Method and device for adjusting network protection strategy

Country Status (1)

Country Link
CN (1) CN116800518A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117473048A (en) * 2023-12-28 2024-01-30 长春职业技术学院 Financial abnormal data monitoring and analyzing system and method based on data mining

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117473048A (en) * 2023-12-28 2024-01-30 长春职业技术学院 Financial abnormal data monitoring and analyzing system and method based on data mining
CN117473048B (en) * 2023-12-28 2024-03-01 长春职业技术学院 Financial abnormal data monitoring and analyzing system and method based on data mining

Similar Documents

Publication Publication Date Title
Min et al. TR‐IDS: Anomaly‐based intrusion detection through text‐convolutional neural network and random forest
US10560471B2 (en) Detecting web exploit kits by tree-based structural similarity search
CN108737423B (en) Phishing website discovery method and system based on webpage key content similarity analysis
CN109768992B (en) Webpage malicious scanning processing method and device, terminal device and readable storage medium
CN107294982A (en) Webpage back door detection method, device and computer-readable recording medium
CN110830445B (en) Method and device for identifying abnormal access object
CN107463844B (en) WEB Trojan horse detection method and system
CN113194058B (en) WEB attack detection method, equipment, website application layer firewall and medium
WO2020082763A1 (en) Decision trees-based method and apparatus for detecting phishing website, and computer device
CN116800518A (en) Method and device for adjusting network protection strategy
CN114422271B (en) Data processing method, device, equipment and readable storage medium
Tang et al. HSLF: HTTP header sequence based lsh fingerprints for application traffic classification
Kasim Automatic detection of phishing pages with event-based request processing, deep-hybrid feature extraction and light gradient boosted machine model
CN107786529B (en) Website detection method, device and system
CN112966194A (en) Method and system for checking two-dimensional code
CN114697066A (en) Network threat detection method and device
CN111770097A (en) Content lock firewall method and system based on white list
CN115412312A (en) Malicious domain name determination method, device, equipment and medium
CN114398518A (en) Method and system for rapidly matching normalization strategy for log
CN112073360A (en) Detection method, device, terminal equipment and medium for hypertext transmission data
CN113709092B (en) Data detection method, device, computer equipment and storage medium
CN113806732B (en) Webpage tampering detection method, device, equipment and storage medium
CN110460592B (en) URL analysis method, device, equipment and medium
CN110120898B (en) Remote webpage resource change monitoring and harm detection and identification method
CN115694963A (en) Phishing identification method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination