CN116049842A

CN116049842A - Access log-based ABAC strategy extraction and optimization method

Info

Publication number: CN116049842A
Application number: CN202211153020.3A
Authority: CN
Inventors: 孙伟; 袁晓亚; 杨玚
Original assignee: Xinyang Normal University
Current assignee: Xinyang Normal University
Priority date: 2022-09-21
Filing date: 2022-09-21
Publication date: 2023-05-02

Abstract

The invention discloses an ABAC strategy extraction and optimization method based on access logs, which comprises the following steps: step one, data preprocessing: formalizing the entity and the relation in the access log by using a set, a relation, a function and the like, and converting all numerical variables into separation variables corresponding to the numerical variables; step two, clustering and dividing access logs: determining all authorized access logs related to the policy rules; dividing the access log set into different clusters by using a data mining technology, so that each cluster containing a plurality of records corresponds to one ABAC rule; step three, rule extraction: by recording features from the slave; the invention uses the clustering dividing technology to determine the initial strategy rule number for the given access log set containing the user access request and the system authorization decision, thereby being capable of reducing the strategy extraction scale; and the attribute conditions such as affirmative type and negative type are supported, so that the policy description is more flexible and convenient, and the interpretability of the policy is enhanced.

Description

Access log-based ABAC strategy extraction and optimization method

Technical Field

The invention relates to the technical field of access control, in particular to an ABAC strategy extraction and optimization method based on an access log.

Background

With the rapid development of emerging computing and information technologies such as edge computing, social networks, blockchains and the like, the traditional access control model cannot meet the functional requirements of fine granularity and actual application scenes; attribute-based Access Control (ABAC) provides a flexible approach to addressing complex, dynamic system authorization requirements; in order to successfully implement the ABAC mechanism, it is important to determine an appropriate authorization policy and build a good ABAC system; for this reason, policy engineering technology based on ABAC and two construction methods of top-down and bottom-up are proposed; compared with a top-down manual processing mode which is time-consuming, laborious and easy to make mistakes, the bottom-up method adopts an automatic or semi-automatic mode to mine policy rules, and a non-ABAC model is migrated to an ABAC system, so that mistakes in cost, time, policy development, management and the like can be reduced, and the method has been widely focused and studied in academia and industry in recent years;

the bottom-up policy engineering method is also called policy mining, which is first proposed by Kuhlmann et al and uses a data mining technology to construct a role set from a given authority allocation relation, namely role mining; although a number of character mining methods are proposed, they are not suitable for extraction of ABAC strategies; for this reason, xu and Stoller start from given access control log (or list) and attribute data set of its entity, and first propose ABAC policy research problem and its mining method; later, researchers have sequentially proposed various strategy engineering methods; however, the existing methods have the following main problems: (1) The ABAC policy rules comprise positive conditions and negative conditions, and the authorization policy can be more flexible and convenient by using the negative attribute conditions; however, existing approaches do not support policy mining with negative conditions; (2) ABAC policy rules should be as compact and accurate as possible, inconsistent or erroneous policy decisions will result in either the original authorized access request being denied or the original unauthorized request being allowed; however, the existing method does not optimize the initially mined strategy, and a large number of redundant and erroneous strategy rules exist.

Disclosure of Invention

The invention aims to provide an ABAC strategy extraction and optimization method based on access logs, which uses a clustering division technology to determine the initial strategy rule number for a given access log set comprising user access requests and system authorization decisions, so that the strategy extraction scale can be reduced; and the attribute conditions such as affirmative type and negative type are supported, so that the policy description is more flexible and convenient, and the interpretability of the policy is enhanced.

The aim of the invention can be achieved by the following technical scheme:

an ABAC strategy extraction and optimization method based on access logs comprises the following steps:

step one, data preprocessing: formalizing the entity and the relation in the access log by using a set, a relation, a function and the like, and converting all numerical variables into separation variables corresponding to the numerical variables;

step two, clustering and dividing access logs: determining all authorized access logs related to the policy rules; dividing the access log set into different clusters by using a data mining technology, so that each cluster containing a plurality of records corresponds to one ABAC rule;

step three, rule extraction: extracting attribute conditions in each rule, namely combinations of different attribute-value pairs, by searching similar modes from the attribute conditions of the record features, namely the access authorization tuples;

step four, policy optimization: the rules extracted from the access log have problems of being too strict or too relaxed compared to the original rules; an extraction rule is considered strict if it contains more and more complex attribute conditions than the original rule; conversely, if a rule contains only some simple attribute conditions, then it is considered relaxed; repeatedly correcting the extracted strategy based on the original access log to further improve the quality of the ABAC strategy

As a further scheme of the invention: in the first step, the specific steps of the data preprocessing stage are as follows:

step 1a: the U, O, S and OP respectively represent a user set or a main body set, an object set, a session set and an operation set in the system;A _u ,A _o ,A _s attributes of a user u, an object o and a session s are respectively represented; e, a represents the set of all entities and all entity attributes in the system, respectively, where e=u U O U S, a=a _U ∪A _O ∪A _S The method comprises the steps of carrying out a first treatment on the surface of the Va represents the set of all possible values of attribute a in the system; f (f) _{a_e} (e, a) represents a valued function of attribute a of entity e;

step 1b: the attribute-value pair expression is represented by a tuple shaped as < a, +.v >, where a is the attribute name and v is the attribute value, += { "=", "+|! "," > "<" } is a set of relational operators and represents a value relation between a and v; for example, < a, =v > means that a can take the value v, called positive attribute expression, abbreviated < a, v >; < a, -! v > represents a value other than a may take v, called a negative attribute expression; for convenience of description, only the first two value relationships are discussed, the AC is used for representing the set of all attribute conditions, and the EAV is used for representing the distribution relationship between all entities and the attribute conditions;

step 1c: in ABAC, session attributes relate to dynamic factors such as time, place, or access control scenarios; when preprocessing, these continuous attribute variables are decomposed into discrete types, and the access time is converted into a working duration or a discontinuous working period, so as to extract an ABAC policy pi of the form < E, a, OP, EAV, AC >.

As a further scheme of the invention: in the second step, the access log set AL is divided by using a clustering technology, and the specific steps are as follows:

step 2a: dividing the access log dataset AL into k different partitions c1, c2, … ck using a partitioning method (PAM) around the center point similar to the k-means algorithm, and randomly selecting k initial center points of the clustered partitions;

step 2b: calculating the center point al of any cluster c _i Distance to other non-center points:

wherein ,

associate(al _i ) Representing the center point al in the cluster _i All other records associated;

step 2c: comparing dis (al) _i ,associate(al _i ) Dis (al) _j ,associate(al _i )\{al _j }∪{al _i }) to determine whether to exchange al _i And alj, and determining a new center point to satisfy:

step 2d: for different k values, repeatedly running a clustering division algorithm and calculating the accuracy and error rate of the model; the k value which can balance the relation between the accuracy and the complexity of the strategy better is selected and used as the initial strategy rule number.

As a further scheme of the invention: in the third step, extracting attribute conditions in each rule, namely, combinations of different attribute-value pairs; the method comprises the following specific steps:

step 3a: defining valid positive or negative attribute-value pairs; weighing scale<a,v|！v>For rule ρ corresponds to cluster c _i If and only if for a given threshold T _p Or T _n Attribute value v appears at c _i The frequency of the log is higher or lower than the frequency of the log which appears in the original log set, and<a,v|！v>added to authorization rule ρ=<AC,op>The attribute condition of (2) is called effective attribute condition and is marked as EAC rho; the set of all rules is denoted P;

step 3b: according to the definition of the effective attribute conditions, for any given cluster ci, an extraction process of the effective attribute conditions is given, as shown in an algorithm 1;

algorithm 1. Effective attribute condition extraction:

as a further scheme of the invention: in the fourth step, the specific steps are as follows:

step 4a: the original access log al= { < rq, d > } is divided into a positive type log and a negative type log:

AL ⁺ ＝{<rq,d>|<rq,d>∈AL∧d＝permitted}；

AL ^- ＝{<rq,d>|<rq,d>∈AL∧d＝denied}；

AL＝AL ⁺ ∪AL ^- .

where < rq, d > represents an authorization (or access) record describing the authorization decision d of the system for the access request rq, the value "authorized" may be referred to as grant access or "secured" as deny access;

step 4b: according to the original log AL ⁺ 、AL ^- And the original extraction strategy pi _m The type records of "correct affirmative" (TP), "false affirmative" (FP), "correct negative" (TN) and "false negative" (FN) are respectively determined; wherein:

represent AL for an affirmative log in AL ⁺ Authorization decision d made by pi, access request rq of (a) _Π (rq) is also allowed access;

FP _∏|AL ＝{<rq,d>|<rq,d>∈AL ^- :d _∏ (rq) =admitted }, representing the log AL for negation in AL ^- Authorization decision d made by pi, access request rq of (a) _Π (rq) is allowed access;

TN _Π|AL ＝{<rq,d>|<rq,d>∈AL ^- :d _Π (rq) =identified }, representing the log AL for negation in AL ^- Authorization decision d made by pi, access request rq of (a) _Π (rq) is also access denied;

FN _Π|AL ＝{<rq,d>|<rq,d>∈AL ⁺ :d _Π (rq) =secured }, representing the authorization decision d made for the access request rq, pi of the grant log al+ in AL _Π (rq) but access is denied;

step 4c: taking FN and FP log records as training data sets, and respectively extracting policy mode II _FN and Π_FP ；

Step 4d: will be pi-shaped _FN 、Π _FP And pi (a Chinese character) _m Comparing, eliminating redundant attribute conditions from the strict rule or adding missing attribute conditions to the relaxation rule; in each optimization process, pi is selected _m Middle and pi _FN 、∏ _FP Rules with similarity and perform in two ways:

for pi _FN Arbitrary rule ρ of _i If pi is _m There is one and ρ _i Similar rule ρ _j Then the redundant attribute condition is determined from ρ _j Delete in the middle; if pi is _m Absence of and p _i Similar rules, i.e. ρ _i Is a missing rule which is directly added to pi _m ；

For pi (n) _FP Arbitrary rule ρ of _i If pi _m There is one and ρ _i Similar rule ρ _j Then the missing attribute condition is added to ρ _j 。

As a further scheme of the invention: in the fourth step, the optimization process is as shown in algorithm 2:

algorithm 2 policy optimization

The invention has the beneficial effects that:

(1) For a given access log set containing user access requests and system authorization decisions, determining the initial strategy rule number by using a cluster division technology, so that the strategy extraction scale can be reduced;

(2) The attribute conditions such as affirmative type and negative type are supported, so that the strategy description is more flexible and convenient, and the interpretability of the strategy is enhanced;

(3) Based on the principles of correctness and conciseness, policy quality evaluation criteria are given, the effectiveness and efficiency of the method are verified on the basis of construction and real data sets, and the extracted policy is higher in quality and has remarkable economic and social benefits.

Drawings

The present invention is further described below with reference to the accompanying drawings for the convenience of understanding by those skilled in the art.

FIG. 1 is a block diagram of an ABAC policy extraction and optimization process based on access logs;

FIG. 2 is a graph showing the comparison of the effects of the extraction strategy of the present invention;

FIG. 3 is a graph comparing the effects of the optimization strategy of the present invention;

Detailed Description

The technical solutions of the present invention will be clearly and completely described in connection with the embodiments, and it is obvious that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

As shown in fig. 1-3, an ABAC policy extraction and optimization method based on access logs includes the following steps:

step one, data preprocessing: formalizing the entity and the relation in the access log by using a set, a relation, a function and the like, and converting all numerical variables into separation variables corresponding to the numerical variables; the specific steps of the data preprocessing stage are as follows:

step 1a: the U, O, S and OP respectively represent a user set or a main body set, an object set, a session set and an operation set in the system; a is that _u ,A _o ,A _s Attributes of a user u, an object o and a session s are respectively represented; e, a represents the set of all entities and all entity attributes in the system, respectively, where e=u U O U S, a=a _U ∪A _O ∪A _S The method comprises the steps of carrying out a first treatment on the surface of the Va represents the set of all possible values of attribute a in the system; f (f) _{a_e} (e, a) represents a valued function of attribute a of entity e;

step 1c: in ABAC, session attributes relate to dynamic factors such as time, place, or access control scenarios; when preprocessing is carried out, decomposing the continuous attribute variables into discrete types, and converting the access time into working time or discontinuous working time so as to extract an ABAC strategy pi of the form < E, A, OP, EAV and AC >;

step two, clustering and dividing access logs: determining all authorized access logs related to the policy rules; dividing the access log set into different clusters by using a data mining technology, so that each cluster containing a plurality of records corresponds to one ABAC rule; the access log set AL is divided by using a clustering technology, and the specific steps are as follows: step 2a: dividing the access log dataset AL into k different partitions c1, c2, … ck using a partitioning method (PAM) around the center point similar to the k-means algorithm, and randomly selecting k initial center points of the clustered partitions;

step 2b: calculating the center point al of any cluster c _i To other thanDistance of center point:

wherein ,

step 2d: for different k values, repeatedly running a clustering division algorithm and calculating the accuracy and error rate of the model; selecting a k value which can better balance the relation between the accuracy and the complexity of the strategy, and taking the k value as an initial strategy rule number;

step three, rule extraction: extracting attribute conditions in each rule, namely combinations of different attribute-value pairs, by searching similar modes from the attribute conditions of the record features, namely the access authorization tuples; extracting attribute conditions in each rule, namely, combinations of different attribute-value pairs; the method comprises the following specific steps:

step 3a: defining valid positive or negative attribute-value pairs; weighing scale<a,v|！v>For rule ρ corresponds to cluster c _i If and only if for a given threshold T _p Or T _n Attribute value v appears at c _i The frequency of the log is higher or lower than the frequency of the log which appears in the original log set, and<a,v|！v>added to authorization rule ρ=<AC,op>Attribute bar of (a)A piece, called effective attribute condition, denoted EAC ρ; the set of all rules is denoted P;

algorithm 1. Effective attribute condition extraction:

step four, policy optimization: the rules extracted from the access log have problems of being too strict or too relaxed compared to the original rules; an extraction rule is considered strict if it contains more and more complex attribute conditions than the original rule; the method comprises the following specific steps:

AL ⁺ ＝{<rq,d>|<rq,d>∈AL∧d＝permitted}；

AL ^- ＝{<rq,d>|<rq,d>∈AL∧d＝denied}；

AL＝AL ⁺ ∪AL ^- .

FP _Π|AL ＝{<rq,d>|<rq,d>∈AL ^- :d _Π (rq) =admitted }, representing the log AL for negation in AL ^- Authorization decision d made by pi, access request rq of (a) _Π (rq) is allowed access;

TN _∏|AL ＝{<rq,d>|<rq,d>∈AL ^- :d _∏ (rq) =identified }, representing the log AL for negation in AL ^- Authorization decision d made by pi, access request rq of (a) _∏ (rq) is also access denied;

FN _∏|AL ＝{<rq,d>|<rq,d>∈AL ⁺ :d _∏ (rq) =secured }, representing the authorization decision d made for the access request rq, pi of the permission log al+ in AL _∏ (rq) but access is denied;

Step 4d: pi (Pi) _FN 、Π _FP And pi (a Chinese character) _m Comparing, eliminating redundant attribute conditions from the strict rule or adding missing attribute conditions to the relaxation rule; in each optimization process, choose pi _m Zhongqipi (Chinese character) _FN 、Π _FP Rules with similarity and perform in two ways:

for pi (n) _FN Arbitrary rule ρ of _i If pi _m There is one and ρ _i Similar rule ρ _j Then the redundant attribute condition is determined from ρ _j Delete in the middle; if pi is _m Absence of and p _i Similar rules, i.e. ρ _i Is a missing rule which is directly added to pi _m ；

For pi _FP Arbitrary rule ρ of _i If pi _m There is one and ρ _i Similar rule ρ _j Then will be missingAttribute condition is added to ρ _j The method comprises the steps of carrying out a first treatment on the surface of the Conversely, if a rule contains only some simple attribute conditions, then it is considered relaxed; repeatedly correcting the extracted strategy based on the original access log, so as to further improve the quality of the ABAC strategy; the optimization procedure is as shown in algorithm 2:

algorithm 2 policy optimization

Working principle: the effectiveness and efficiency of the invention are further verified through experimental evaluation;

executing the strategy extraction and optimization method provided by the invention on a plurality of strategy data sets including construction and reality; the constructed access log is derived from a randomly created policy set, including a partial dataset of UniversityP, healthcareP, projectManagementP, universityPN, healthcarePN, projectManagementPN; the authorization rule of the strategy is established in random attribute and attribute value set thereof, and the strategy extraction effect can be evaluated on access logs with different scales and continuously changing structural characteristics; for constructing input data, for each ABAC policy, creating a set of authorized tuples and evaluating the ABAC policy corresponding to each access right; the real data set is derived from the public access log data set provided by Amazon Kagle and Amazon UCI; amazon kagle records the access request of staff member to the resource and whether the staff member is authorized to access the resource, and also describes the attribute characteristic value and the resource identifier of the staff member; the data set contains 12000 users and 7000 object resources in total; amazon UCI contains more than 36000 users, 27000 rights and 33000 attribute features;

the hardware environment for all experiments included: intel i5-7400 cpu,8gb memory and 64-bit Windows10 operating system; the strategy extraction and optimization are realized in a Python 3 software development environment;

using Accuracy (Precision), recall (Recall), accuracy (Accuracy) and F1 values to evaluate how well the extraction strategy matches the original strategy, the calculations are expressed as follows:

the accuracy rate may have errors on the unbalanced data set, and the selection of using the F1 value as an evaluation index is closer to the original strategy; this is because the larger the F1 value, the higher the quality of the extracted policy, the more consistent with the original access log;

weighted structural complexity (Weighted Structural Complexity, WSC) is another method of evaluating policy quality; for a given ABAC strategy, WSC makes a generalized assessment of strategy size, calculated as follows:

WSC(ρ)＝WSC(EAC,op)＝w ₁ ×WSC(EAC _u )+w ₂ ×WSC(EAC _o )+w ₃ ×WSC(EAC _s )

wherein WSC (EAC _e )＝|EAC _e |，|EAC _e I represents the number of attribute tuples, w, contained in the attribute condition of entity e _i Representing a certain specified weight; obviously, the smaller the WSC value, the more compact the policy and the better the management;

the method of the present invention is run repeatedly 10 times on different constructs or real data sets; according to different evaluation indexes such as running time, accuracy, F1 value and complexity, the best performance of the result is obtained, and the result is compared with the methods (respectively recorded as Xu-Stoller, cotrini) proposed in the literature of "Mining attribute-based access control policies (Xu Z, stoller S D.IEEE Transactions on Dependable and Secure Computing,2014,12 (5): 533-545.)", "Mining ABAC rules from sparse logs (Cotrini C, weghorn T, basin D.2018IEEE European Symposium on Security and price.IEEE, 2018:31-46.)", and the like, and the results are shown in table 1, wherein "\" indicates that the experimental result is unknown or the effect is not ideal; from the experimental results in table 1, it can be seen that the overall performance (standard and thick portion) of more than half of the data sets of the present invention is better than that of the other two methods, especially the best performance on UniversityP, projectManagementP, amazon Kaggle, amazon UCI strategy set;

table 1 comparison of different strategy extraction methods

Furthermore, fig. 2, 3 show a comparison of the F1 value and the complexity of the three methods, respectively; as can be seen from the experimental results of FIG. 2, the F1 value change trend of the extraction strategy of the invention is similar to that of the cotini method, and both are superior to that of the Xu-Stoller method; from the experimental results in fig. 3, the complexity of the extraction strategy of the invention changes gently, very close to the trend of the Xu-Stoller method, and both are significantly higher than the quality of the extraction strategy of the cotini method.

The preferred embodiments of the invention disclosed above are intended only to assist in the explanation of the invention. The preferred embodiments are not exhaustive or to limit the invention to the precise form disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best understand and utilize the invention. The invention is limited only by the claims and the full scope and equivalents thereof.

Claims

1. An ABAC strategy extraction and optimization method based on access logs is characterized by comprising the following steps:

step four, policy optimization: the rules extracted from the access log have problems of being too strict or too relaxed compared to the original rules; an extraction rule is considered strict if it contains more and more complex attribute conditions than the original rule; conversely, if a rule contains only some simple attribute conditions, then it is considered relaxed; and repeatedly correcting the extracted strategy based on the original access log, so as to further improve the quality of the ABAC strategy.

2. The ABAC policy extraction and optimization method based on access logs according to claim 1, wherein in the first step, the specific steps of the data preprocessing stage are as follows:

step 1a: the U, O, S and OP respectively represent a user set or a main body set, an object set, a session set and an operation set in the system; a is that _u ,A _o ,A _s Attributes of a user u, an object o and a session s are respectively represented; e, a represents the set of all entities and all entity attributes in the system, respectively, where e=u U O U S, a=a _U ∪A _O ∪A _S ；V _a Representing genus in systemA set of all possible values of property a; f (f) _{a_e} (e, a) represents a valued function of attribute a of entity e;

3. The ABAC policy extraction and optimization method based on the access log according to claim 1, wherein in the second step, the access log set AL is divided by using a clustering technology, and the specific steps are as follows:

wherein ,

step 2c: comparing dis (al) _i ,associate(al _i ) Dis (al) _j ,associate(al _i )\{al _j }∪{al _i }) to determine whether to exchange al _i And al _j And determining a new center point to satisfy:

4. The method for extracting and optimizing ABAC policy based on access log according to claim 1, wherein in the third step, the attribute condition in each rule, that is, the combination of different attribute-value pairs is extracted; the method comprises the following specific steps:

algorithm 1. Effective attribute condition extraction:

"input: cluster c _i Access log set AL, attribute set A, attribute value set V, threshold T _p 、T _n

And (3) outputting:

initialization of

Determining and using

Entity sets such as a main body and an object contained in the cluster ci are represented;

determining and using E _AL Representing entity sets such as a subject and an object contained in an AL;

for each a in A do

for each v in Va do

then

EAC ^ρ ←{＜a,v＞}；

end if

then

EAC ^ρ ←{＜a,！v＞}；

end if

end for

5. the ABAC policy extraction and optimization method based on access logs according to claim 1, wherein in the fourth step, the specific steps are as follows:

AL ⁺ ＝{<rq,d>|<rq,d>∈AL∧d＝permitted}；

AL ^- ＝{<rq,d>|<rq,d>∈AL∧d＝denied}；

AL＝AL ⁺ ∪AL ^- .

TN _Π|AL ＝{<rq,d>|<rq,d>∈AL ^- :d _Π (rq) =identified }, representing the log AL for negation in AL ^- Authorization decision d made by pi, access request rq of (a) _∏ (rq) is also access denied;

FN _Π|AL ＝{<rq,d>|<rq,d>∈AL ⁺ :d _Π (rq) =identified }, representingAuthorization decision d for access request rq, pi in AL that allows Xu Xing log AL + _∏ (rq) but access is denied;

step 4c: taking FN and FP log records as training data sets, and respectively extracting strategy patterns pi _FN and ∏_FP ；

for pi _FN Arbitrary rule ρ of _i If pi _m There is one and ρ _i Similar rule ρ _j Then the redundant attribute condition is determined from ρ _j Delete in the middle; if pi (n) _m Absence of and p _i Similar rules, i.e. ρ _i Is a missing rule which is directly added to pi _m ；

6. The ABAC strategy extraction and optimization method based on the access log according to claim 1, wherein in the fourth step, the optimization process is as follows in algorithm 2:

algorithm 2 policy optimization

"input: initial policy n _m Policy mode II _FN and Π_FP Attribute condition

EAC set

Output optimization strategy pi _m ’

Initializing II _m ’＝Π _m ；

for eachρ _i in∏ _FN .P do

for eachρ _j in∏ _m ’.P do

Calculating ρ _i And ρ _j Is used for the similarity value of the (c) to the (c),