Disclosure of Invention
In order to solve the problems, the invention provides a distributed self-adaptive access control method facing to geographic big data, and the method comprises the steps of firstly analyzing a relation model of a geographic big data space entity from the boundary, inside and outside of the geographic space entity and the like based on a topological relation analysis method of a cross model; secondly, classifying the spatial big data access control strategies with different space-time relationships and different attribute characteristics based on a decision tree to form a specific geographic spatial big data access control strategy classification model; and finally, constructing a distributed adaptive access control mechanism for classifying and grading the geographic spatiotemporal big data by combining a BD-ABAC model aiming at each type of geographic space big data access control strategy, and realizing the safe access of the geographic spatiotemporal big data under an open sharing environment.
To achieve the purpose, the invention provides a distributed adaptive access control method facing to geographic big data, which comprises the following specific steps:
step 1: firstly, initializing large-batch acquired geospatial data with wide dimensionality and complex composition, and entering a step 2;
step 2: the method comprises the steps of dividing a geographical big data entity into three characteristics of time, space and attributes by using a topological analysis method based on a cross model, representing the geographical big data entity by using a set { A, B,. }, { t, s, a } which respectively represents the characteristics of the time, space and attributes of the geographical big data entity, and using an incidence relation among { t, s, a }, wherein the incidence relation can be used
Expressing in a matrix form, and entering step 3;
and step 3: constructing a geographical big data entity in the Class option of a knowledge graph modeling tool Protege, then constructing three attribute objects of time, space and geographical big data attributes, and constructing a corresponding attribute objectSetting corresponding parameter names in the objects, and constructing an association relation prototype { { t) of the geographic big data entity on time, space and attribute characteristics1,t2,...,tn},{l1,l2,...,ln},{p11,p12,...,pnnStep 4, entering;
and 4, step 4: for data obtained in Protege { { t { (T) }1,t2,...,tn},{l1,l2,...,ln},{p11,p12,...,pnnDividing and classifying, and entering step 5;
and 5: setting a training set as U, wherein n sample sets are in total, an attribute set is C, and C different classes exist; the decision attribute set is D, the example training set is divided into D different classes, D
i(i 1, 2.., d), the number of each class being n
iThen D is
iThe probability of a class appearing in the set U is
Entering step 6;
step 6: according to the probability P
iComputing the entropy of the information of d classes divided by the set U as
The information Gain of the attribute C relative to the example training sample set U is Gain (U, C) ═ H (U) — H (U, C), and the process proceeds to step 7;
and 7: and (3) using the information gain as a splitting measurement standard of the training sample set, selecting the attribute of the maximum value of the information gain as a node to generate a decision tree to split the training samples, wherein the attribute enables the information quantity required by the classification of the split samples to be minimum, and entering the step 8.
And 8: setting a desired error rate e(s) (N-N + K-1)/(N + K), wherein: s represents all training examples contained in the subtree; k is the number of classifications; n is the number of all training cases in S; c is the classification with the most proportion in S; n is the number of C in S, and the prepared error rate is error (node) min (E (node), sigma Pi*Error(Nodei) Go to step 9;
and step 9: starting from the second layer of the decision tree, each internal node of each layer is judged, if the sigma-delta P of the nodei*Error(Nodei) If the number of the leaf nodes is larger than E, the classification value with the largest proportion of the appearance in the subtree is used as a leaf node to replace the subtree, all the subtrees are pruned, then a node next to the same layer is considered, and if the sigma P of the node isi*Error(Nodei) If the value is smaller than E, each sub-node of the node is considered by the above method, and so on until the whole tree is checked, and the step 10 is entered;
step 10: establishing BD-ABAC model for access control strategy of geographic big data, and defining SAVt、RAVm、EAVn、AAVkRespectively setting a resource server for a main body attribute assignment set, a resource attribute assignment set, an environment attribute assignment set and an action attribute assignment set, and applying a security certificate to a security certification authority when a user accesses data resources for the first time, and entering step 11;
step 11: when an end user accesses a data resource, an access request Req (SAV)t,RAVm,EAVn,AAVk) The resource is forwarded to a domain positioning server, the domain positioning server analyzes the main body attribute and the resource attribute, and judges whether the access request of the resource is local domain access or cross-domain access, if the access request is cross-domain access request, a domain positioning module quickly searches a corresponding resource domain according to the resource attribute and forwards the access request, and the step 12 is entered;
step 12: access request Req (SAV) to end-user using attribute-based access control methodt,RAVm,EAVn,AAVk) Carrying out strategy judgment, sending a judgment result to a strategy execution point, providing a reliable basis for strategy judgment by each data resource domain according to a corresponding merging algorithm of the attribute strategy set, and entering step 13;
step 13: the domain positioning server forwards the access request to a corresponding resource domain, a decision mechanism of the resource domain judges the access request and returns the result to the terminal user, so that the safe access of the geographic space-time big data under the open sharing environment is realized, and the step 14 is entered;
step 14: the cycle ends.
Further, the control part of the distributed adaptive access control method facing the geographic big data comprises: a cross model analyzer, a decision tree classifier and a BD-ABAC model controller.
Furthermore, the cross model analyzer divides the geographic big data entities into three characteristics of time, space and attributes, considers the incidence relation among the time, space and attribute characteristics when describing the relation among the geographic big data entities, and simultaneously intends to utilize a knowledge graph modeling tool (Protege) to construct incidence relation prototypes of the geographic big data entities A and B on the time, space and attribute characteristics so as to clearly show the relation among different characteristics of each entity, and the specific flow is as follows:
firstly, dividing a geographical big data entity into three characteristics of time, space and attribute by a topological analysis method based on a cross model, dividing the geographical big data entity into three characteristics of time, space and attribute by the topological analysis method based on the cross model, wherein the geographical big data entity is represented by a set { A, B. }, and is represented by a set { t, s, a } which respectively represents the characteristics of time, space and attribute of the geographical big data entity, and the incidence relation among the { t, s, a } can be used
Expressed in matrix form;
then, constructing a geographical big data entity in a Class option of a knowledge graph modeling tool Protege, then constructing three attribute objects of time, space and geographical big data attributes, setting corresponding parameter names in the corresponding attribute objects, and constructing an incidence relation prototype { { t { (t) } of the geographical big data entity on the time, space and attribute characteristics1,t2,...,tn},{l1,l2,...,ln},{p11,p12,...,pnnAnd then classify the classification.
Furthermore, the decision tree classifier uses the time, space and attribute dimensions of the geographic big data as the attributes of the decision tree, constructs a geographic space-time big data adaptive access control strategy classification model based on a decision tree algorithm, uses the information gain as the metric value of the classification standard, selects the attribute with the maximum information gain as a node, and so on to obtain a complete decision tree, which may have excessive branches in the splitting process to cause excessive fitting, so that the decision tree is pruned, and the specific flow is as follows:
setting a training set as U, wherein n sample sets are in total, an attribute set is C, and C different classes exist; the decision attribute set is D, the example training set is divided into D different classes, D
i(i 1, 2.., d), the number of each class being n
iThen D is
iThe probability of a class appearing in the set U is
According to the probability P
iComputing the entropy of the information of d classes divided by the set U as
The information Gain of the attribute C relative to the example training sample set U is Gain (U, C) ═ H (U) — H (U, C), the information Gain is used as a splitting metric of the training sample set, the attribute with the maximum value of the information Gain is selected as a node to generate a decision tree to divide the training samples, the attribute enables the information quantity required for dividing the sample classification to be minimum, and the expected error rate E (S) (N-N + K-1)/(N + K) is set, wherein: s represents all training examples contained in the subtree; k is the number of classifications; n is the number of all training cases in S; c is the classification with the most proportion in S; n is the number of C in S. The prepared error rate is error (node) min (e (node), Σ P
i*Error(Node
i) Starting from the second layer of the decision tree, each internal node of each layer is judged. If sigma-P of the node
i*Error(Node
i) If the number of the leaf nodes is larger than E, the classification value with the largest proportion of the appearance in the subtree is used as a leaf node to replace the subtree, all the subtrees are pruned, then a node next to the same layer is considered, and if the sigma P of the node is
i*Error(Node
i) If the value is less than E, each sub-node of the node is considered by the above method, and so on until the whole tree is detectedAnd (6) checking.
Further, the BD-ABAC model controller implements multi-domain fine-grained access control by using a multi-domain attribute table synchronization technology through an attribute-based fine-grained access control method, and the BD-ABAC model can perform attribute-based fine-grained authorization decision, and in order to implement this objective, the BD-ABAC defines a fine-grained flexible authorization method, which can adapt to cross-domain access of resources and dynamic change of entity attributes in a geographic large data environment, and has the following specific procedures:
establishing BD-ABAC model for access control strategy of geographic big data, and defining SAVt、RAVm、EAVn、AAVkAnd respectively assigning a main body attribute assignment set, a resource attribute assignment set, an environment attribute assignment set and an action attribute assignment set. Setting a resource server, applying a security certificate to a security certification authority when a user accesses a data resource for the first time, and accessing a request Req (SAV) when an end user accesses the data resourcet,RAVm,EAVn,AAVk) Is forwarded to the domain-location server. The domain positioning server analyzes the main body attribute and the resource attribute, judges whether the access request of the resource is local domain access or cross-domain access, and if the access request is cross-domain access, the domain positioning module quickly searches a corresponding resource domain according to the resource attribute and forwards the access request. Access request Req (SAV) to end-user using attribute-based access control methodt,RAVm,EAVn,AAVk) And the domain positioning server forwards the access request to the corresponding resource domain, and a decision mechanism of the resource domain judges the access request and returns the result to the terminal user, thereby realizing the safe access of the geographic space-time big data under the open sharing environment.
The distributed self-adaptive access control method facing to the geographic big data has the beneficial effects that: the distributed self-adaptive access control method facing to the geographic big data improves the safety sharing capability of the geographic space-time data through a quantitative calculation and data mining method. By using the method provided by the invention, theoretical methods such as topology analysis, UML modeling, decision tree and BD-ABAC model can be combined, so that the distributed adaptive access control mechanism of geographic space-time big data classification and classification is realized.
Detailed Description
The invention is described in further detail below with reference to the following detailed description and accompanying drawings:
the invention provides a distributed self-adaptive access control method facing geographic big data, which is used for solving the problem of multi-dimensional and high-precision security access of the geographic space-time big data.
As shown in fig. 1, the structure diagram of the distributed adaptive access control for geographic big data mainly includes: a cross model analyzer, a decision tree classifier and a BD-ABAC model controller; referring to the architectural diagram figure 2 illustrates the components involved in the method of the present invention; the flow diagram of the process of the invention is shown in FIG. 3.
As a specific embodiment of the present invention, for convenience of description, we assume the following application examples:
now, suppose that a user accesses a large geographic database, the system reaction mechanism is as follows: the method comprises the steps of firstly analyzing a relation model of a geographical big data space entity through a topological relation analysis method of a cross model, then constructing the relation among space, time and attributes of the geographical data entity by using a project, secondly classifying space big data access control strategies with different time-space relations and different attribute characteristics based on a decision tree, and finally constructing a distributed adaptive access control mechanism aiming at geographical space-time big data classification and classification by combining a BD-ABAC model to realize the safe access of the geographical space-time big data under an open sharing environment.
The specific implementation scheme is as follows:
(1) initializing large quantities of acquired geographic space data with wide dimensionality and complex composition, analyzing a relational model of a geographic big data space entity from the boundary, inside and outside layers of the geographic space entity based on a topological relational analysis method of a cross model, and on the basis, constructing the relation among space, time and attributes of the geographic data entity through a knowledge graph modeling tool.
(2) The method comprises the steps of classifying space big data access control strategies with different time-space relationships and different attribute characteristics based on a decision tree to form a pointed geographic space big data access control strategy classification model, wherein excessive branches may occur in the splitting process to cause excessive fitting, pruning the decision tree, and mainly setting an expected error rate function and a prepared error rate function for comparison to prune redundant branches.
(3) And establishing a BD-ABAC model aiming at each type of geospatial big data access control strategy, and applying a security certificate to a security certification authority when a user accesses data resources for the first time. When the end user accesses the data resource, the access request is forwarded to the domain-locating server. And the domain positioning server analyzes the main body attribute and the resource attribute, judges and forwards the access request. And carrying out strategy judgment on the access request of the terminal user, sending a judgment result to a strategy execution point, and providing a reliable basis for the strategy judgment by each data resource domain according to a corresponding merging algorithm of the attribute strategy set. And the domain positioning server forwards the access request to a corresponding resource domain, and a decision mechanism of the resource domain judges the access request and returns the result to the terminal user, so that the safe access of the geographic space-time big data under the open sharing environment is realized.
The main work flow is as follows:
(1) firstly, initialization processing is carried out on large quantities of acquired geospatial data with wide dimensionality and complex composition. Topology analysis method based on cross modelThe geographic big data entity is divided into three characteristics of time, space and attribute, and the incidence relation among the three characteristics can be used
Expressed in matrix form.
(2) Constructing three attribute objects of a geographic big data entity, time, space and geographic big data attributes and corresponding attribute objects in a Class option of a knowledge graph modeling tool Protege to obtain an incidence relation prototype { { t ] of the geographic big data entity on time, space and attribute characteristics1,t2,...,tn},{l1,l2,...,ln},{p11,p12,...,pnn}}. And carrying out division and classification on the association relation prototype.
(3) Setting a training set U and an attribute set by adopting a decision tree algorithm, and solving the probability of the class of the decision attribute set appearing in the training set as
According to the probability P
iComputing the entropy of the information of d classes divided by the set U as
The information Gain of the attribute C relative to the example training sample set U is Gain (U, C) ═ H (U) -H (U, C). And using the information gain as a splitting measurement standard of the training sample set, and selecting the attribute of the maximum value of the information gain as a node to generate a decision tree to split the training samples, wherein the attribute enables the information quantity required by the classification of the split samples to be minimum.
(4) Subtrees with the same attribute are pruned through a pruning function, and the scale of the decision tree is reduced. Setting the expected error rate function E (S) (N-N + K-1)/(N + K), and setting the prepared error rate function as error (node) ((E) (node)), Sigma Pi*Error(Nodei)). Sigma P for judging nodes of each internal node of each layer of decision treei*Error(Nodei) And E, pruning. And so on until the entire tree has been checked.
(5) Access control to geographic big dataThe policy making establishes a BD-ABAC model, a resource server is arranged, and a user applies a security certificate to a security certification authority when accessing data resources for the first time. When an end user accesses a data resource, an access request Req (SAV)t,RAVm,EAVn,AAVk) Is forwarded to the domain-location server. The domain positioning server analyzes the main body attribute and the resource attribute and judges whether the access request of the resource is local domain access or cross-domain access. If the access request is a cross-domain access request, the domain positioning module quickly searches the corresponding resource domain according to the resource attribute and forwards the access request.
(6) Access request Req (SAV) to end-user using attribute-based access control methodt,RAVm,EAVn,AAVk) And carrying out strategy judgment and sending a judgment result to a strategy execution point. And each data resource domain provides reliable basis for strategy judgment according to the corresponding merging algorithm of the attribute strategy set. And the domain positioning server forwards the access request to a corresponding resource domain, and a decision mechanism of the resource domain judges the access request and returns the result to the terminal user, so that the safe access of the geographic space-time big data under the open sharing environment is realized.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, but any modifications or equivalent variations made according to the technical spirit of the present invention are within the scope of the present invention as claimed.