WO2022174379A1

WO2022174379A1 - Risk assessment of firewall rules in a datacenter

Info

Publication number: WO2022174379A1
Application number: PCT/CN2021/076792
Authority: WO
Inventors: Zhuosi XIE; Zishuo ZHENG; Daiqian HU; Yi Zeng; Matthew Jared LOPEZ
Original assignee: Microsoft Technology Licensing, Llc
Priority date: 2021-02-19
Filing date: 2021-02-19
Publication date: 2022-08-25
Also published as: CN115885273A

Abstract

The present disclosure provides methods and apparatuses for risk assessment of firewall rules in a datacenter. The datacenter may have a plurality of devices and be configured with a plurality of firewall rules. Firewall rule configuration information of the plurality of devices may be obtained. A plurality of device groups formed by the plurality of devices may be identified. Count distributions of the plurality of firewall rules over the plurality of device groups may be determined. The plurality of firewall rules may be clustered into a plurality of firewall rule groups based on the count distributions, firewall rules in each firewall rule group having the same risk attribute.

Description

RISK ASSESSMENT OF FIREWALL RULES IN A DATACENTER

BACKGROUND

Firewall rules are widely deployed in datacenters for the purposes of, e.g., ensuring safety of network and data accessing, defending against network attacks, etc. Herein, a datacenter may widely refer to a set of devices or a platform consisted of a number of devices, which operates for various purposes or scenarios. Devices in datacenters may comprise various network or computing devices, which are usually referred to as, e.g., hosts, etc. For example, the devices in datacenters may comprise databases, file servers, application servers, cloud processing units, gateways, etc.

SUMMARY

This Summary is provided to introduce a selection of concepts that are further described below in the Detailed Description. It is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

Embodiments of the present disclosure propose methods and apparatuses for risk assessment of firewall rules in a datacenter. The datacenter may have a plurality of devices and be configured with a plurality of firewall rules. Firewall rule configuration information of the plurality of devices may be obtained. A plurality of device groups formed by the plurality of devices may be identified. Count distributions of the plurality of firewall rules over the plurality of device groups may be determined. The plurality of firewall rules may be clustered into a plurality of firewall rule groups based on the count distributions, firewall rules in each firewall rule group having the same risk attribute.

It should be noted that the above one or more aspects comprise the features hereinafter fully described and particularly pointed out in the claims. The following description and the drawings set forth in detail certain illustrative features of the one or more aspects. These features are only indicative of the various ways in which the principles of various aspects may be employed, and this disclosure is intended to include all such aspects and their equivalents.

BRIEF DESCRIPTION OF THE DRAWINGS

The disclosed aspects will hereinafter be described in connection with the appended drawings that are provided to illustrate and not to limit the disclosed aspects.

FIG. 1 illustrates an exemplary process for performing risk assessment of firewall rules in a datacenter according to an embodiment.

FIG. 2 illustrates an exemplary process for clustering firewall rules into firewall rule groups according to an embodiment.

FIG. 3 illustrates an example of obtaining firewall rule groups according to an embodiment.

FIG. 4 illustrates an exemplary process for automatically labeling firewall rules in a firewall rule group according to an embodiment.

FIG. 5 illustrates an exemplary process of historical risk attribute label inheriting mechanism according to an embodiment.

FIG. 6 illustrates a flowchart of an exemplary method for risk assessment of firewall rules in a datacenter according to an embodiment.

FIG. 7 illustrates an exemplary apparatus for risk assessment of firewall rules in a datacenter according to an embodiment.

FIG. 8 illustrates an exemplary apparatus for risk assessment of firewall rules in a datacenter according to an embodiment.

DETAILED DESCRIPTION

The present disclosure will now be discussed with reference to several example implementations. It is to be understood that these implementations are discussed only for enabling those skilled in the art to better understand and thus implement the embodiments of the present disclosure, rather than suggesting any limitations on the scope of the present disclosure.

Usually, a large number of firewall rules will be deployed in a datacenter. Besides normal or reasonably-configured firewall rules, the deployed firewall rules may also include some risky firewall rules. Herein, the risky firewall rules widely refer to those firewall rules that may hinder normal operations of the datacenter, provide very limited or no protection for the datacenter, cause threatens to network connection or data in the datacenter, etc. For example, the risky firewall rules may comprise invalid or misconfigured firewall rules, casually-applied or locally-applied firewall rules that are deployed at a limited amount of devices, malicious firewall rules that are deployed in the datacenter by illegitimate entities, etc. Therefore, there is a need of identifying risky firewall rules from all firewall rules configured in the datacenter. Existing approaches of identifying risky firewall rules rely on manually checking of firewall rules, e.g., manually checking whether the deployed firewall rules are risky one by one. However, since the datacenter may be configured with a very large number of firewall rules, the above manual approaches of identifying risky firewall rules are very inefficient, and can hardly check all the firewall rules configured in the datacenter timely. Moreover, some devices or services may require adding, removing or updating relevant firewall rules frequently, and this would further increase the difficulty of checking firewall rules by the above manual approaches of identifying risky firewall rules.

Embodiments of the present disclosure propose schemes that facilitate risk assessment of firewall rules in a datacenter. Herein, risk assessment of a firewall rule may refer to determining a risk attribute of the firewall rule, e.g., determining whether the firewall rule is risky or riskless, determining a risky level of the firewall rule, etc.

The embodiments of the present disclosure propose to cluster firewall rules into firewall rule groups based on count distributions of the firewall rules over device groups in the datacenter. It is assumed that most firewall rules in the datacenter are reasonably configured, and a group of firewall rules should server for the same purpose. For example, firewall rules “Remote Desktop -Shadow (TCP-In) ” , “Remote Desktop -User Mode (TCP-In) ” , “Remote Desktop -User Mode (UDP-In) ” , etc. serve for the same purpose of “Remote Desktop” . Those firewall rules serving for the same purpose would have equivalent count distributions in the datacenter, and thus each of the firewall rule groups obtained by using count distributions would comprise a plurality of firewall rules that serve for the same purpose and have the same risk attribute. In other words, if a firewall rule has a certain risk attribute, other firewall rules having equivalent count distributions, e.g., other firewall rules in the same firewall rule group with the firewall rule, would also have such risk attribute. Therefore, with the firewall rule groups obtained through the embodiments of the present disclosure, risk assessment may be performed efficiently, e.g., the same risk attribute label may be attached to all firewall rules in one firewall rule group. In an aspect, if manual labeling is adopted, it is only needed to manually determine a risk attribute label of one or few firewall rules in one firewall rule group, and the determined label can be attached to all firewall rules in the firewall rule group, without the need of checking all the firewall rules in the firewall rule group. In an aspect, the embodiments of the present disclosure may automatically attach risk attribute labels to firewall rules in the firewall rule groups by, e.g., utilizing historical risk attribute labels.

Through clustering firewall rules into firewall rule groups, the embodiments of the present disclosure may facilitate to perform efficient, precise and automatic risk assessment of firewall rules. All firewall rules in a datacenter may be assessed in an efficient and time-saving approach. Moreover, benefiting from the high efficiency of risk assessment according to the embodiments, it would be easier to achieve centralized control of firewall rules in a datacenter. Risky firewall rules can be identified timely. The processes proposed by the embodiments of the present disclosure may be performed automatically in a periodical approach or in response to any type of triggering events, and thus firewall rules in a datacenter can be monitored timely and continuously.

FIG. 1 illustrates an exemplary process 100 for performing risk assessment of firewall rules in a datacenter according to an embodiment. It is assumed that a datacenter 102 is configured with a plurality of firewall rules, and risk attributes of these firewall rules are to be assessed. The datacenter 102 may comprise a plurality of devices, e.g., network or computing devices.

Firewall rule configuration information 112 of the devices in the datacenter 102 may be obtained. The firewall rule configuration information 112 may comprise firewall rules configured at each device. For example, assuming that there are 25 firewall rules configured at a certain database, firewall rule configuration information of this database may comprise or indicate names of the 25 firewall rules.

A plurality of device groups 114 formed by the devices in the datacenter 102 may be identified. Usually, devices in a datacenter may be divided into a plurality of device groups based on functions or purposes, and devices in each device group may have the same function or purpose. For example, there may be 10 devices for providing a certain file assessing service in a datacenter, and thus these 10 devices may be divided into one device group. Since devices in one device group have the same function or purpose, these devices may also have the same or similar requirements of firewall rules. Accordingly, it is very likely that firewall rules serving for the same purpose are also configured at the devices in the same one device group. It should be understood that the identifying of device groups may refer to determining the device groups from all devices in the datacenter 102 based on various predetermined functions or purposes, or receiving indications of the device groups that have been previously identified in any approaches.

At 120, count distributions of the plurality of firewall rules configured in the datacenter 102 over the plurality of device groups 114 may be determined. A count distribution of each firewall rule may comprise the number of devices being configured with the firewall rule in each of the plurality of device groups. It is assumed that, for a firewall rule R _i, this firewall rule is configured at 10 devices out of the total 25 devices in device group 1, configured at 8 devices out of the total 12 devices in device group 2, configured at 15 devices out of the total 30 devices in device group 3, etc. Then, a count distribution of the firewall rule R _i may indicate that this firewall rule is configured at 10 devices in device group 1, configured at 8 devices in device group 2, configured at 15 devices in device group 3, etc. The firewall rule configuration information 112 and the identified device groups 114 may be used together for determining a count distribution of each firewall rule.

In an implementation, the firewall rules in the datacenter 102 may be represented as a list of tuples, each tuple corresponding to a firewall rule. An exemplary format of a tuple may be [RuleName, Location, Count] , wherein the item of RuleName is the name of a firewall rule represented by this tuple, and the items of Location and Count are used for representing information about a count distribution of this firewall rule. For example, the item of Location may be a vector listing all the device groups 114 in the datacenter, and the item of Count may be a vector listing the number of devices being configured with this firewall rule in each of the plurality of device groups 114. It should be understood that the above representations of the firewall rules are exemplary, and the firewall rules are not limited to be represented in this approach.

At 130, the plurality of firewall rules in the datacenter 102 may be clustered into a plurality of firewall rule groups 140 based on the count distributions determined at 120. In an implementation, the clustering operation at 130 may intend to cluster firewall rules that having equivalent count distributions into the same firewall rule group. Herein, the equivalent count distributions may refer to the same or similar count distributions. Firewall rules in each of the firewall rule groups 140 would have the same risk attribute. In an implementation, an unsupervised learning algorithm may be adopted for performing the clustering operation at 130, which will be discussed in details in connection with FIG. 2 and FIG. 3 later.

The firewall rule groups obtained through the process 100 would facilitate to greatly improve efficiency in the risk assessment of firewall rules. For example, since firewall rules in each firewall rule group have the same risk attribute, risk attribute labels of these firewall rules may be unified to the same risk attribute label. Accordingly, there is no need of checking all firewall rules in one firewall rule group.

Optionally, additional operations of attaching risk attribute labels to firewall rules may be further performed based at least on the firewall rule groups 140.

In an implementation, manual labeling may be performed at 150. For each firewall rule group among the firewall rule groups 140, a risk attribute label of one firewall rule in this firewall rule group may be manually determined, and the determined label may be further attached to all firewall rules in this firewall rule group. Alternatively, several risk attribute labels of few firewall rules in this firewall rule group may be manually determined firstly, and then a representative or combined risk attribute label may be selected from the several risk attribute labels and attached to all firewall rules in this firewall rule group.

In another implementation, automatic labeling may be performed at 160. It is assumed that at least a part or all of the firewall rules in the datacenter have been attached with risk attribute labels before the process 100 is performed, these existing risk attribute labels may be used as historical risk attribute labels and applied at 160 for automatically determining and attaching risk attribute labels to firewall rules in the firewall rule groups 140. The operation of automatic labeling at 160 will be discussed in details in connection with FIG. 4 and FIG. 5 later.

It should be understood that the process 100 may be performed periodically according to a predefined period. The predefined period may be any types of period, e.g., per day, per week, etc. Alternatively, the process 100 may be performed in response to any type of triggering events. The triggering events may be any type of predefined events, e.g., device failure, system failure, network attacks, new service or device deployment, service updating, predetermined time points, etc. Through performing the process 100 in a periodical approach or in response to triggering events, the firewall rules in the datacenter may be monitored timely and continuously, and thus risky firewall rules can be identified timely through performing risk assessment to the firewall rules.

It should be understood that risk attributes of firewall rules may be classified in different approaches. In one case, risk attributes of firewall rules may be classified as risky or riskless. Accordingly, a risk attribute label attached to a firewall rule may indicate whether the firewall rule is risky or riskless. In one case, risk attributes of firewall rules may be classified into various risky levels, e.g., high risk, low risk, riskless, etc. Accordingly, a risk attribute label attached to a firewall rule may indicate a certain risky level. The embodiments of the present disclosure are not restricted by any specific classification approaches of risk attributes.

FIG. 2 illustrates an exemplary process 200 for clustering firewall rules into firewall rule groups according to an embodiment. The process 200 is an exemplary implementation of the clustering operation at 130 in FIG. 1. An unsupervised learning algorithm for clustering firewall rules is discussed in connection with operations in the process 200. The unsupervised learning algorithm intends to cluster firewall rules that have equivalent count distributions into the same firewall rule group.

It is assumed that count distributions 202 of firewall rules in a datacenter have been obtained through, e.g., the operation at 120 in FIG. 1.

At 210, distribution vector representing may be performed for the firewall rules based on the count distributions 202. For example, a count distribution of each firewall rule may be represented as a distribution vector of the firewall rule. The distribution vector may be in a format of, e.g.,

in which

denotes the distribution vector of the i-th firewall rule R _i, n corresponds to the number of device groups, and C _i, j (j=1, …, n) denotes the count of R _i in the j-th device group, i.e., the number of devices being configured with R _i in the j-th device group. Taking a distribution vector of

as an example,

denotes the distribution vector of the third firewall rule R ₃, which indicates that there are 200 devices being configured with R ₃ in the first device group, there are 150 devices being configured with R ₃ in the second device group, …, there are 300 devices being configured with R ₃ in the n-th device group. Through the operation at 210, distribution vectors 212 of the firewall rules in the datacenter can be obtained.

At 220, similarity calculation may be performed among the firewall rules based on the distribution vectors 212. For example, similarity of every two firewall rules among the firewall rules may be calculated with distribution vectors of the two firewall rules, and a similarity matrix 222 may be formed with the calculated similarities. Various approaches may be adopted for calculating similarity of two firewall rules.

In an implementation, cosine similarity may be adopted for calculating similarity of two firewall rules. For example, assuming that

is a distribution vector of the firewall rule R ₁ and

is a distribution vector of the firewall rule R ₂, similarity of R ₁ and R ₂ may be calculated as:

wherein cos_sim (·) denotes a cosine similarity function.

In an implementation, maximum relative distance similarity may be adopted for calculating similarity of two firewall rules. For example, assuming that

is a distribution vector of the firewall rule R ₁ and

wherein max_sim (·) denotes a maximum relative distance similarity function, min (·) is a minimum value extraction function, truncate (·) is a truncating function, and abs (·) denotes taking an absolute value.

The maximum relative distance similarity is proposed by the embodiments of the present disclosure to provide sharper function shape than that of the cosine similarity. As compared with the cosine similarity, the maximum relative distance similarity may achieve higher performance in the process of clustering firewall rules, because it can pay more attention to distribution characteristics of count distributions indicated in distribution vectors.

It should be understood that the similarity calculation at 220 may adopt either or both of the cosine similarity and the maximum relative distance similarity, or any other approaches capable of calculating similarity of two firewall rules.

The similarity matrix 222 may be formed with the calculated similarities of every two firewall rules among all the firewall rules. The similarity matrix 222 may be a m╳m matrix, wherein m is the number of firewall rules in the datacenter. Items in the similarity matrix 222 may be denoted as l _p, q which is the calculated similarity of the p-th firewall rule and the q-th firewall rule.

At 230, matrix conversion may be performed to the similarity matrix 222. For example, the similarity matrix 222 may be converted to an adjacency matrix 232 by applying a similarity threshold. Those items with values equal to or above the similarity threshold in the similarity matrix 222 may be converted to items with value “1” in the adjacency matrix 232, while those items with values below the similarity threshold in the similarity matrix 222 may be converted to items with value “0” in the adjacency matrix 232. Alternatively, diagonal items in the similarity matrix 222 may also be converted to items with value “0” in the adjacency matrix 232. The similarity threshold may be predetermined empirically or experimentally. A higher similarity threshold would ensure that firewall rules in one group can have high similarity with each other, but may cause fewer firewall rules to be included in one group. A lower similarity threshold would cause a group to include more firewall rules, but may cluster risky firewall rules and riskless firewall rules into one group.

At 240, graph building may be performed with the adjacency matrix 232. For example, a graph representation 242 of the adjacency matrix 232 may be built at 240. Value “1” in the adjacency matrix 232 indicates that there is an edge between two nodes in the graph representation 242, wherein the two nodes correspond to two firewall rules.

At 250, subgraph extraction may be performed to the graph representation 242. For example, a plurality of connected subgraphs 252 may be extracted from the graph representation 242. Each connected subgraph may comprise a plurality of nodes that have high similarity with each other. Therefore, the plurality of connected subgraphs 252 may correspond to a plurality of firewall rule groups 260 respectively. It should be understood that it is possible that a connected subgraph contains only one node, which indicates that similarities between a firewall rule corresponding to this node and any other firewall rules are below the similarity threshold, and accordingly this firewall rule itself will form a firewall rule group.

Through the process 200, firewall rules in a datacenter may be clustered into a plurality of firewall rule groups based on the count distributions of the firewall rules. It should be understood that all the operations in the process 200 are exemplary, and the embodiments of the present disclosure may cover any other approaches or processes that are capable of clustering firewall rules based on count distributions. Moreover, it should be understood that, since a complete graph may indicate that all the nodes therein have higher similarities than other types of connected graph, and accordingly lead to a higher precision of clustering, the similarity threshold may also be predetermined to cause the plurality of connected subgraphs to approximate complete graphs. For example, the similarity threshold may be selected for causing the extracted connected subgraphs 252 to be complete graphs and close to complete graphs as much as possible.

Assuming that a firewall rule is originally represented as a tuple of [RuleName, Location, Count] , when the firewall rule is clustered into a certain firewall rule group through the process 200, the firewall rule may be represented as an updated tuple of [RuleName, Location, Count, GroupID] , wherein the item of GroupID is an identification of the firewall rule group into which the firewall rule is clustered.

FIG. 3 illustrates an example of obtaining firewall rule groups according to an embodiment. The example in FIG. 3 is proposed based on the process 200 in FIG. 2.

For the sake of simplicity, it is assumed that there are total six firewall rules in a datacenter to be performed a clustering process, including R ₁, R ₂, R ₃, R ₄, R ₅ and R ₆. Similarity matrix may have an exemplary format 310, wherein an item l _p, q denotes the calculated similarity of the p-th firewall rule and the q-th firewall rule. As an example, a similarity matrix 312 is shown in FIG. 3, in which each item is inserted with a value of calculated similarity.

According to the operation 230 in FIG. 2, the similarity matrix 312 may be converted to an adjacency matrix 322 through applying an exemplary similarity threshold “0.8” . Those items with values equal to or above the similarity threshold “0.8” in the similarity matrix 312 are converted to items with value “1” in the adjacency matrix 322, those items with values below the similarity threshold “0.8” in the similarity matrix 312 are converted to items with value “0” in the adjacency matrix 322, and diagonal items in the similarity matrix 312 are converted to items with value “0” in the adjacency matrix 322.

According to the operation 240 in FIG. 2, a graph representation 332 is built for the adjacency matrix 322, in which edges among nodes are set based on those items with value “1” in the adjacency matrix 322.

According to the operation 250 in FIG. 2, two

connected subgraphs

342 and 344 are extracted from the graph representation 332. The connected subgraph 342 contains three nodes corresponding to the firewall rules R ₁, R ₂ and R ₃ respectively, and the connected subgraph 344 contains three nodes corresponding to the firewall rules R ₄, R ₅ and R ₆ respectively.

Based on the extracted connected

subgraphs

342 and 344, two firewall rule groups are obtained. For example, Group 1 containing the firewall rules R ₁, R ₂ and R ₃ may be determined based on the connected subgraph 342, and Group 2 containing the firewall rules R ₄, R ₅ and R ₆ may be determined based on the connected subgraph 344.

FIG. 4 illustrates an exemplary process 400 for automatically labeling firewall rules in a firewall rule group according to an embodiment. The process 400 is an exemplary implementation of the operation at 160 in FIG. 1. The process 400 utilizes historical or existing risk attribute labels of firewall rules for automatically labeling firewall rules in each firewall rule group. The historical risk attribute labels may be manually labelled previously or automatically labeled through performing the process 400 previously. In FIG. 4, those firewall rules having been attached with historical risk attribute labels may also be referred to as historical firewall rules. For example, the process 100 in FIG. 1 may be performed iteratively or repeatedly, and thus those firewall rules processed in the last iteration of the process 100 may be deemed as historical firewall rules.

It is assumed that firewall rules in a target firewall rule group 402 are to be automatically labeled, wherein the target firewall rule group 402 may come from the firewall rule groups 140 in FIG. 1 or the firewall rule groups 260 in FIG. 2. The process 400 intends to attach the same risk attribute label to all firewall rules in the target firewall rule group 402 automatically.

At 410, an intermediate risk attribute label may be assigned to each firewall rule in the target firewall rule group 402. The intermediate risk attribute label may be a historical risk attribute label. In an implementation, the assigning of intermediate risk attribute label may be performed through historical risk attribute label inheriting mechanism. FIG. 5 illustrates an exemplary process 500 of historical risk attribute label inheriting mechanism according to an embodiment.

For the current firewall rule 502 in the target firewall rule group 402, a corresponding historical firewall rule may be identified at 510. In an implementation, a historical firewall rule having the same name as the current firewall rule 502 may be identified at 510.

At 520, it is determined whether the current firewall rule 502 and the identified historical firewall rule have equivalent count distributions. In an implementation, similarity of the current firewall rule 502 and the identified historical firewall rule may be calculated according to the operation 220 in FIG. 2, and the calculated similarity may be compared with a predetermined inheriting threshold.

In response to determining at 520 that the current firewall rule 502 and the identified historical firewall rule have equivalent count distributions, e.g., the calculated similarity is equal to or above the inheriting threshold, an operation of label inheriting may be performed at 530, e.g., assigning a historical risk attribute label of the identified historical firewall rule to the current firewall rule 502 as an intermediate risk attribute label of the current firewall rule 502.

In response to determining at 520 that the current firewall rule 502 and the identified historical firewall rule do not have equivalent count distributions, e.g., the calculated similarity is below the inheriting threshold, the current firewall rule 502 may be labeled as unknown, wherein the “unknown” risk attribute label is an intermediate risk attribute label of the current firewall rule 502.

Through performing the process 500 for each firewall rule in the target firewall rule group 402, all the firewall rules in the target firewall rule group 402 would be assigned with respective intermediate risk attribute labels.

Return to FIG. 4, the intermediate risk attribute labels of the firewall rules in the target firewall rule group 402 may be used for determining whether to attach a unified same risk attribute label to all firewall rules in the target firewall rule group 402. In an implementation, at 420, it is determined whether the ratio of firewall rules being assigned with a type of intermediate risk attribute label in the target firewall rule group 402 is above a ratio threshold. Assuming that the type of intermediate risk attribute label may be risky or riskless, and taking a ratio threshold “65%” as an example, it may be determined at 420 whether the ratio of firewall rules being assigned with a risky label in the target firewall rule group 402 is above 65%, or whether the ratio of firewall rules being assigned with a riskless label in the target firewall rule group 402 is above 65%.

In response to determining at 420 that the ratio of firewall rules being assigned with a type of intermediate risk attribute label in the firewall rule group 402 is above the ratio threshold, the type of intermediate risk attribute label may be broadcasted in the target firewall rule group 402 at 430, e.g., attaching the type of intermediate risk attribute label to all firewall rules in the target firewall rule group 402. For example, assuming that the ratio of firewall rules being assigned with a risky label in the target firewall rule group 402 is above the ratio threshold 65%, all the firewall rules in the target firewall rule group 402 may be attached with the risky label. Then the process 400 would end at 440.

In response to determining at 420 that the ratio of firewall rules being assigned with any type of intermediate risk attribute label in the firewall rule group 402 is not above the ratio threshold, no intermediate risk attribute label would be broadcasted in the target firewall rule group 402, and the process 400 would end at 440 directly.

It should be understood that, in FIG. 4, the ratio threshold may be used for controlling that: when a majority or a predetermined portion of firewall rules in a firewall rule group have the same risk attribute label, i.e., have the same risk attribute, it can be derived that all firewall rules in the firewall rule group should have such risk attribute and thus can be attached with the same risk attribute label.

Assuming that a firewall rule in the target firewall rule group 402 is originally represented as a tuple of [RuleName, Location, Count, GroupID] , wherein the item of GroupID corresponds to the target firewall rule group 402, when the firewall rule is attached with a risk attribute label through the process 400, the firewall rule may be represented as an updated tuple of [RuleName, Location, Count, GroupID, Label] , wherein the item of Label is the risk attribute label attached through the process 400. It should be understood that, for a next iteration of the process 100 in FIG. 1 together with the process 400 in FIG. 4, the current firewall rule would become a historical firewall rule, and the current Label in the tuple would become a historical risk attribute label.

FIG. 6 illustrates a flowchart of an exemplary method 600 for risk assessment of firewall rules in a datacenter according to an embodiment. The datacenter may have a plurality of devices and be configured with a plurality of firewall rules.

At 610, firewall rule configuration information of the plurality of devices may be obtained.

At 620, a plurality of device groups formed by the plurality of devices may be identified.

At 630, count distributions of the plurality of firewall rules over the plurality of device groups may be determined.

At 640, the plurality of firewall rules may be clustered into a plurality of firewall rule groups based on the count distributions, firewall rules in each firewall rule group having the same risk attribute.

In an implementation, the firewall rule configuration information may comprise firewall rules configured at each device.

In an implementation, devices in each device group may have the same function or purpose.

In an implementation, a count distribution of each firewall rule may comprise the number of devices being configured with the firewall rule in each of the plurality of device groups.

In an implementation, the clustering may comprise: clustering firewall rules that have equivalent count distributions into the same firewall rule group.

In an implementation, the clustering may comprise: representing a count distribution of each firewall rule as a distribution vector of the firewall rule; calculating similarity of every two firewall rules among the plurality of firewall rules with distribution vectors of the two firewall rules, to form a similarity matrix; converting the similarity matrix to an adjacency matrix by applying a similarity threshold; building a graph representation of the adjacency matrix; and extracting, from the graph representation, a plurality of connected subgraphs corresponding to the plurality of firewall rule groups respectively.

The similarity threshold may be predetermined to cause the plurality of connected subgraphs to approximate complete graphs.

The similarity of every two firewall rules may be calculated based on cosine similarity or maximum relative distance similarity.

In an implementation, the method 600 may further comprise: attaching the same risk attribute label to all firewall rules in each firewall rule group automatically.

A risk attribute label attached to a firewall rule may indicate whether the firewall rule is risky or riskless.

The attaching the same risk attribute label may comprise: assigning an intermediate risk attribute label to each firewall rule in the firewall rule group through historical risk attribute label inheriting mechanism; determining whether the ratio of firewall rules being assigned with a type of intermediate risk attribute label in the firewall rule group is above a ratio threshold; and in response to determining that the ratio is above the ratio threshold, attaching the type of intermediate risk attribute label to all firewall rules in the firewall rule group.

The historical risk attribute label inheriting mechanism may comprise: identifying a historical firewall rule having the same name as the current firewall rule; determining whether the current firewall rule and the historical firewall rule have equivalent count distributions; and in response to determining that the current firewall rule and the historical firewall rule have equivalent count distributions, assigning a historical risk attribute label of the historical firewall rule to the current firewall rule as an intermediate risk attribute label of the current firewall rule.

It should be understood that the method 600 may further comprise any steps/processes for risk assessment of firewall rules in a datacenter according to the embodiments of the present disclosure as mentioned above.

FIG. 7 illustrates an exemplary apparatus 700 for risk assessment of firewall rules in a datacenter according to an embodiment. The datacenter may have a plurality of devices and be configured with a plurality of firewall rules.

The apparatus 700 may comprise: a configuration information obtaining module 710, for obtaining firewall rule configuration information of the plurality of devices; a device group identifying module 720, for identifying a plurality of device groups formed by the plurality of devices; a count distribution determining module 730, for determining count distributions of the plurality of firewall rules over the plurality of device groups; and a clustering module 740, for clustering the plurality of firewall rules into a plurality of firewall rule groups based on the count distributions, firewall rules in each firewall rule group having the same risk attribute.

Moreover, the apparatus 700 may also comprise any other modules configured for risk assessment of firewall rules in a datacenter according to the embodiments of the present disclosure as mentioned above.

FIG. 8 illustrates an exemplary apparatus 800 for risk assessment of firewall rules in a datacenter according to an embodiment. The datacenter may have a plurality of devices and be configured with a plurality of firewall rules.

The apparatus 800 may comprise: at least one processor 810; and a memory 820 storing computer-executable instructions. When executing the computer-executable instructions, the at least one processor 810 may: obtain firewall rule configuration information of the plurality of devices; identify a plurality of device groups formed by the plurality of devices; determine count distributions of the plurality of firewall rules over the plurality of device groups; and cluster the plurality of firewall rules into a plurality of firewall rule groups based on the count distributions, firewall rules in each firewall rule group having the same risk attribute.

In an implementation, the computer-executable instructions stored in the memory 820 may be further executed to cause the at least one processor 810 to: attach the same risk attribute label to all firewall rules in each firewall rule group automatically.

Moreover, the at least one processor 810 may perform any other operations of the methods for risk assessment of firewall rules in a datacenter according to the embodiments of the present disclosure as mentioned above.

The embodiments of the present disclosure propose a computer program product for risk assessment of firewall rules in a datacenter. The datacenter may have a plurality of devices and be configured with a plurality of firewall rules. The computer program product may comprise a computer program that is executed by at least one processor for: obtaining firewall rule configuration information of the plurality of devices; identifying a plurality of device groups formed by the plurality of devices; determining count distributions of the plurality of firewall rules over the plurality of device groups; and clustering the plurality of firewall rules into a plurality of firewall rule groups based on the count distributions, firewall rules in each firewall rule group having the same risk attribute. Moreover, the computer program in the computer program product may be further executed by the at least one processor to perform any other operations of the methods for risk assessment of firewall rules in a datacenter according to the embodiments of the present disclosure as mentioned above.

The embodiments of the present disclosure may be embodied in a non-transitory computer-readable medium. The non-transitory computer-readable medium may comprise instructions that, when executed, cause one or more processors to perform any operations of the methods for risk assessment of firewall rules in a datacenter according to the embodiments of the present disclosure as mentioned above.

It should be appreciated that all the operations in the methods described above are merely exemplary, and the present disclosure is not limited to any operations in the methods or sequence orders of these operations, and should cover all other equivalents under the same or similar concepts.

It should also be appreciated that all the modules in the apparatuses described above may be implemented in various approaches. These modules may be implemented as hardware, software, or a combination thereof. Moreover, any of these modules may be further functionally divided into sub-modules or combined together.

Processors have been described in connection with various apparatuses and methods. These processors may be implemented using electronic hardware, computer software, or any combination thereof. Whether such processors are implemented as hardware or software will depend upon the particular application and overall design constraints imposed on the system. By way of example, a processor, any portion of a processor, or any combination of processors presented in the present disclosure may be implemented with a microprocessor, microcontroller, digital signal processor (DSP) , a field-programmable gate array (FPGA) , a programmable logic device (PLD) , a state machine, gated logic, discrete hardware circuits, and other suitable processing components configured to perform the various functions described throughout the present disclosure. The functionality of a processor, any portion of a processor, or any combination of processors presented in the present disclosure may be implemented with software being executed by a microprocessor, microcontroller, DSP, or other suitable platform.

Software shall be construed broadly to mean instructions, instruction sets, code, code segments, program code, programs, subprograms, software modules, applications, software applications, software packages, routines, subroutines, objects, threads of execution, procedures, functions, etc. The software may reside on a computer-readable medium. A computer-readable medium may include, by way of example, memory such as a magnetic storage device (e.g., hard disk, floppy disk, magnetic strip) , an optical disk, a smart card, a flash memory device, random access memory (RAM) , read only memory (ROM) , programmable ROM (PROM) , erasable PROM (EPROM) , electrically erasable PROM (EEPROM) , a register, or a removable disk. Although memory is shown separate from the processors in the various aspects presented throughout the present disclosure, the memory may be internal to the processors, e.g., cache or register.

The previous description is provided to enable any person skilled in the art to practice the various aspects described herein. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects. Thus, the claims are not intended to be limited to the aspects shown herein. All structural and functional equivalents to the elements of the various aspects described throughout the present disclosure that are known or later come to be known to those of ordinary skilled in the art are intended to be encompassed by the claims.

Claims

A method for risk assessment of firewall rules in a datacenter, the datacenter having a plurality of devices and being configured with a plurality of firewall rules, the method comprising:

obtaining firewall rule configuration information of the plurality of devices;

identifying a plurality of device groups formed by the plurality of devices;

determining count distributions of the plurality of firewall rules over the plurality of device groups; and

clustering the plurality of firewall rules into a plurality of firewall rule groups based on the count distributions, firewall rules in each firewall rule group having the same risk attribute.
The method of claim 1, wherein

the firewall rule configuration information comprises firewall rules configured at each device.
The method of claim 1, wherein

devices in each device group have the same function or purpose.
The method of claim 1, wherein

a count distribution of each firewall rule comprises the number of devices being configured with the firewall rule in each of the plurality of device groups.
The method of claim 1, wherein the clustering comprises:

clustering firewall rules that have equivalent count distributions into the same firewall rule group.
The method of claim 1, wherein the clustering comprises:

representing a count distribution of each firewall rule as a distribution vector of the firewall rule;

calculating similarity of every two firewall rules among the plurality of firewall rules with distribution vectors of the two firewall rules, to form a similarity matrix;

converting the similarity matrix to an adjacency matrix by applying a similarity threshold;

building a graph representation of the adjacency matrix; and

extracting, from the graph representation, a plurality of connected subgraphs corresponding to the plurality of firewall rule groups respectively.
The method of claim 6, wherein

the similarity threshold is predetermined to cause the plurality of connected subgraphs to approximate complete graphs.
The method of claim 6, wherein

the similarity of every two firewall rules is calculated based on cosine similarity or maximum relative distance similarity.
The method of claim 1, further comprising:

attaching the same risk attribute label to all firewall rules in each firewall rule group automatically.
The method of claim 9, wherein

a risk attribute label attached to a firewall rule indicates whether the firewall rule is risky or riskless.
The method of claim 9, wherein the attaching the same risk attribute label comprises:

assigning an intermediate risk attribute label to each firewall rule in the firewall rule group through historical risk attribute label inheriting mechanism;

determining whether the ratio of firewall rules being assigned with a type of intermediate risk attribute label in the firewall rule group is above a ratio threshold; and

in response to determining that the ratio is above the ratio threshold, attaching the type of intermediate risk attribute label to all firewall rules in the firewall rule group.
The method of claim 11, wherein the historical risk attribute label inheriting mechanism comprises:

identifying a historical firewall rule having the same name as the current firewall rule;

determining whether the current firewall rule and the historical firewall rule have equivalent count distributions; and

in response to determining that the current firewall rule and the historical firewall rule have equivalent count distributions, assigning a historical risk attribute label of the historical firewall rule to the current firewall rule as an intermediate risk attribute label of the current firewall rule.
An apparatus for risk assessment of firewall rules in a datacenter, the datacenter having a plurality of devices and being configured with a plurality of firewall rules, the apparatus comprising:

at least one processor; and

a memory storing computer-executable instructions that, when executed, cause the at least one processor to:

obtain firewall rule configuration information of the plurality of devices,

identify a plurality of device groups formed by the plurality of devices,

determine count distributions of the plurality of firewall rules over the plurality of device groups, and

cluster the plurality of firewall rules into a plurality of firewall rule groups based on the count distributions, firewall rules in each firewall rule group having the same risk attribute.
The apparatus of claim 13, wherein

a count distribution of each firewall rule comprises the number of devices being configured with the firewall rule in each of the plurality of device groups.
The apparatus of claim 13, wherein the clustering comprises:

clustering firewall rules that have equivalent count distributions into the same firewall rule group.
The apparatus of claim 13, wherein the clustering comprises:

representing a count distribution of each firewall rule as a distribution vector of the firewall rule;

calculating similarity of every two firewall rules among the plurality of firewall rules with distribution vectors of the two firewall rules, to form a similarity matrix;

converting the similarity matrix to an adjacency matrix by applying a similarity threshold;

building a graph representation of the adjacency matrix; and

extracting, from the graph representation, a plurality of connected subgraphs corresponding to the plurality of firewall rule groups respectively.
The apparatus of claim 13, wherein the computer-executable instructions stored in the memory are further executed to cause the at least one processor to:

attach the same risk attribute label to all firewall rules in each firewall rule group automatically.
The apparatus of claim 17, wherein the attaching the same risk attribute label comprises:

assigning an intermediate risk attribute label to each firewall rule in the firewall rule group through historical risk attribute label inheriting mechanism;

determining whether the ratio of firewall rules being assigned with a type of intermediate risk attribute label in the firewall rule group is above a ratio threshold; and

in response to determining that the ratio is above the ratio threshold, attaching the type of intermediate risk attribute label to all firewall rules in the firewall rule group.
The apparatus of claim 18, wherein the historical risk attribute label inheriting mechanism comprises:

identifying a historical firewall rule having the same name as the current firewall rule;

determining whether the current firewall rule and the historical firewall rule have equivalent count distributions; and

in response to determining that the current firewall rule and the historical firewall rule have equivalent count distributions, assigning a historical risk attribute label of the historical firewall rule to the current firewall rule as an intermediate risk attribute label of the current firewall rule.
A computer program product for risk assessment of firewall rules in a datacenter, the datacenter having a plurality of devices and being configured with a plurality of firewall rules, the computer program product comprising a computer program that is executed by at least one processor for:

obtaining firewall rule configuration information of the plurality of devices;

identifying a plurality of device groups formed by the plurality of devices;

determining count distributions of the plurality of firewall rules over the plurality of device groups; and

clustering the plurality of firewall rules into a plurality of firewall rule groups based on the count distributions, firewall rules in each firewall rule group having the same risk attribute.