CN114065217A

CN114065217A - SELinux strategy optimization method based on knowledge base

Info

Publication number: CN114065217A
Application number: CN202111405715.1A
Authority: CN
Inventors: 李晋; 王世强; 王涵钰; 于爱民; 肖丽芳; 白玉; 程建华
Original assignee: Harbin Engineering University
Current assignee: Harbin Engineering University
Priority date: 2021-11-24
Filing date: 2021-11-24
Publication date: 2022-02-18
Anticipated expiration: 2041-11-24
Also published as: CN114065217B

Abstract

The invention discloses a knowledge base-based SELinux strategy optimization method, and relates to a knowledge base-based SELinux strategy optimization method. The invention aims to solve the problem that the security of a system is low due to the fact that the conventional SELinux policy file is used. The process is as follows: firstly, obtaining a preprocessed strategy set, an audit log, a mapping relation between attributes and types and a mapping relation data file between types and a file full path; secondly, constructing a knowledge base; thirdly, obtaining a classification result of the unknown access mode; fourthly, converting the newly identified access mode into a policy rule form of a SELinux system policy set, and performing conflict detection on the policy rule form and the known access mode or rule; when no conflict occurs, combining the newly identified access modes into the database, and executing the third step again; and when the conflict occurs, resolving the conflict, merging the resolved newly identified access modes into the database, and executing the third step again. The invention is used in the technical field of information security.

Description

SELinux strategy optimization method based on knowledge base

Technical Field

The invention relates to the technical field of information security, in particular to a SELinux strategy optimization method based on a knowledge base.

Background

Selinux (security Enhance Linux) is a security module integrated in the Linux kernel, and can be enabled or disabled as required. It is different from the traditional DAC concept: a process theoretically has the same rights as the user who executes it. The SELinux is a security model based on an MAC mandatory access mechanism, namely, a process completes definition of specific file access authority in a SELinux security policy library file, when a user runs a program to access file system resources, the difference is carried out through the context of the system resources, and the resources can be accessed only when the context of a process subject is matched with the context of an access object.

The SELinux system refers to processes (processes are entities in which programs run) as main bodies, and each process has a type identifier, which is called a domain. All objects that can be accessed, including directories, files, processes, devices, etc., are called objects, and each object has a type identifier. The subject and the object are associated by a type identifier, and to access a certain object, the subject type must be authorized for the object type. By the fine-grained access control method, the authority of the user and the process is minimized, and even if the system is attacked, the influence on the system can be controlled in a certain range. The SELinux can effectively prevent security holes, better control the security of the system and depend on the quality of access control rules configured in the system.

The existing access control research mainly focuses on solving the dynamic and flexible access control requirements of upper-layer applications, a large number of optimized access control models are provided, such as access control based on attributes, access control based on trust level, access control based on risk level and the like, and a small number of researches also use a machine learning method to optimize access control. However, the research on the access control of the underlying system is very little, since the SELinux policy file contains a large number of rules, it is very difficult and error-prone to write and maintain the policy file manually, and an incomplete policy rule may cause problems that normal access is intercepted and a compromised access is released, resulting in a reduction in the security of the system.

Disclosure of Invention

The invention aims to solve the problems that the conventional SELinux policy file contains a large number of rules, the manual writing and maintenance of the policy file are very difficult and error-prone, and the incomplete policy rules can intercept normal access and release the compromised access, so that the security of the system is low, and provides a SELinux policy optimization method based on a knowledge base.

A SELinux strategy optimization method based on a knowledge base comprises the following specific processes:

step one, acquiring a policy set, an audit log, a mapping relation between an attribute and a type and a mapping relation data file between the type and a file full path from an SELinux system, and preprocessing the acquired policy set, the audit log, the mapping relation between the attribute and the type and the mapping relation data file between the type and the file full path to obtain a preprocessed policy set, the audit log, the mapping relation between the attribute and the type and the mapping relation data file between the type and the file full path;

step two, constructing a knowledge base based on the step one;

step three, classifying the unknown access mode list in the knowledge base constructed in the step two by utilizing a neighbor classifier, a mode-rule distance measurement classifier, a co-occurrence learning device and a decision tree classifier respectively, fusing the classification results of the neighbor classifier, the mode-rule distance measurement classifier, the decision tree classifier and the co-occurrence classifier to obtain a final classification result of the unknown access mode, wherein part of the unknown access mode in the classification result is recognized as a benign or malignant mode and is called as a newly recognized access mode; another part is still an unknown type of access pattern;

step four, converting the newly identified access mode into a policy rule form of the SELinux system policy set, and performing conflict detection on the converted policy rule form of the SELinux system policy set and a known access mode or rule;

when no conflict occurs, combining the newly identified access mode into the database, and re-executing the third step to continuously identify the vector sample in the unidentified access mode list;

when conflict occurs, resolving conflict, merging the resolved newly identified access modes into the database, and re-executing the step three to continuously identify the vector samples in the unidentified access mode list;

and finishing the optimization process of the SELinux strategy rule.

The invention has the beneficial effects that:

in order to solve the technical problems, the invention provides a knowledge base-based SELinux policy optimization method for analyzing the strategy rules of a bottom layer system, provides modification guidance for strategy rule optimization, reduces the difficulty of strategy file maintenance, and improves the access control capability of the bottom layer operating system.

The invention provides a knowledge base-based SELinux strategy optimization method, which comprises the steps of firstly, carrying out data preprocessing on strategy rules, audit logs and other configuration files in a SELinux system, and obtaining strategies, access modes and other dictionary data from metadata to construct a knowledge base. The learning balance module utilizes four unknown access mode classifiers to calculate the correlation degree between the four unknown access modes and the known access modes, introduces the idea of integrated learning, and fuses the results generated by the multiple classifiers to obtain the final recognition result. And the strategy generation module generates a new strategy rule in a conversion mode according to the identification result, and combines the new strategy rule with the existing strategy rule in the knowledge base to complete the optimization of the SELinux strategy rule. The invention introduces four learning classifiers to calculate the similarity between the unknown access mode and the known access mode from different dimensions, accurately identifies the class to which the unknown access mode belongs, completes the optimization of the SELinux policy rule in an automatic mode, greatly reduces the writing and maintenance of the SELinux system policy rule, can detect and optimize the system vulnerability in near real time, and improves the security of the system.

Drawings

FIG. 1 is a schematic diagram of knowledge-base-based SELinux strategy optimization;

FIG. 2 is a schematic diagram of a data pre-processing module;

FIG. 3 is a schematic diagram of a neighbor classifier;

FIG. 4 is a schematic diagram of a pattern-rule distance measurer;

FIG. 5 is a schematic diagram of a co-occurrence learner;

FIG. 6 is a schematic diagram of a decision tree;

FIG. 7 is a schematic diagram of a learning balancer and combiner;

FIG. 8 is a schematic diagram of a knowledge-base-based classification flow of unknown access patterns;

FIG. 9 is a flow chart of rule generation, conflict detection and resolution based on a newly identified access pattern;

FIG. 10 is an exemplary diagram of a rule matching tree constructed from policy rules.

Detailed Description

The following detailed description of specific embodiments of the invention is provided in connection with the accompanying drawings and examples. The following examples are intended to illustrate the present invention, but are not intended to limit the scope of the present invention.

The first embodiment is as follows: the SELinux strategy optimization method based on the knowledge base in the embodiment specifically comprises the following processes:

step one, acquiring data files such as a policy set, an audit log, a mapping relation between attributes and types, a mapping relation between types and file full paths and the like from an SELinux system, and preprocessing the acquired data files such as the policy set, the audit log, the mapping relation between the attributes and the types, the mapping relation between the types and the file full paths and the like to obtain preprocessed data files such as the policy set, the audit log, the mapping relation between the attributes and the types, the mapping relation between the types and the file full paths and the like;

since the format of the data in the file cannot be directly used by the subsequent algorithm, data preprocessing is required.

Step two, constructing a knowledge base based on the step one;

step three, classifying the unknown access mode list in the knowledge base constructed in the step two by utilizing a neighbor classifier, a mode-rule distance measurement classifier, a co-occurrence learning device and a decision tree classifier respectively, fusing the classification results of the neighbor classifier, the mode-rule distance measurement classifier, the decision tree classifier and the co-occurrence classifier to obtain a final classification result of the unknown access mode, wherein part of the unknown access mode in the classification result is recognized as a benign or malignant mode and is called as a newly recognized access mode; another part is still an unknown type of access pattern, which will be identified in subsequent iterations;

and finishing the optimization process of the SELinux strategy rule.

The second embodiment is as follows: the difference between this embodiment and the first embodiment is that, in the first step, a specific process of preprocessing the acquired data files such as the policy set, the audit log, the mapping relationship between the attribute and the type, and the mapping relationship between the type and the file full path is as follows:

step one, data cleaning;

step two, extracting TE rules of the strategy set one by one based on the step one;

step three, processing the mapping relation based on the step two;

and step four, generating an access mode based on the step one and the step three.

Other steps and parameters are the same as those in the first embodiment.

The third concrete implementation mode: the difference between the first embodiment and the second embodiment is that the data is cleared in the first step; the specific process is as follows:

in order to eliminate the influence of data on subsequent policy classification, the data irrelevant to the algorithm is cleaned, mainly aiming at deleting data with missing values in the audit log, removing log entries (representing system requests rejected by a Cache of an Access Vector Cache) with types (types) not equal to the types of the Access Vector Cache (AVC) in the audit log, removing policy rules (type _ transition) relevant to type transition rules (type _ transition) in the policy set (relevant is rules beginning with type _ transition, such as type _ transition user _ t pass _ extra _ t: process _ pass; type _ transition) and the like (type _ transition is a type transition statement, and a subject or an object is allowed to be subjected to policy rules under the condition of meeting, type in the security context is transferred to the specified type. It is generally applicable to two scenarios:

1. when a process creates a new object, the type of the new object comes from the specification of type _ transition

2. The new type of process comes from the definition of type _ transition after the process executes the execute system call).

Other steps and parameters are the same as those in the first or second embodiment.

The fourth concrete implementation mode: the difference between this embodiment and the first to third embodiments is that, in the second step, the TE rules of the policy set are extracted one by one based on the first step; the specific process is as follows:

the Type Enforcement (Type Enforcement) policy rules of the SELinux system are of the types of dontauut, audialloy, allow and noverallow, wherein the rules of the types of dontauut and audialloy are mainly used for auditing functions and have no direct influence on access control.

The policy set of the allow type acquired by the SELinux system is composed of a plurality of TE rules, each TE rule is composed of five variable values of an authorization type, a subject domain type, an object category and an operation set, and a space is formed between each two of the five variable values: "or" { } "isolation, an example of an allow type TE rule is as follows:

allow my_chrome_t abc_t：dir{getattr read open create write}

the TE rule of the allow type indicates that the domain type my _ chrome _ t is allowed to perform attribute getattr acquisition, read reading, open, create and write operations on the directory dir file with the object type abc _ t, but the format of the TE rule is not beneficial to subsequent use, and the rule of the original policy set is extracted;

converting the TE rule of the allow type into a required policy rule five-tuple form, wherein the policy rule five-tuple consists of a subject field type subject _ domain, a subject type object _ type, a subject file type object _ class, an operation set permissions and an authorization permission action, and is expressed as follows:

(subject_domain，object_type，object_class，permissions，action)

TE rule of allow type allow my _ chrome _ t abc _ t: extracting dir { gettatr read open create write }, extracting the dir { myjchrome _ t, abc _ t, dir, [ gettatr, read, open, create, write ], allowing) into a quintuple form, and finally saving the TE rule of the allow type in the quintuple form into a known strategy rule list file in a JSON format for constructing a knowledge base; the contents of the JSON file are in the form:

{

"subject _ domain": a "body domain type, such as my _ chrome _ t",

"object _ type": "object type, e.g., abc _ t",

"object _ class": "object file category, such as dir",

"properties": "set of rights, such as [ getattr, read, open, create, write ]",

"action": "grant permission, e.g. allow

}

The strategy set of the newreallow type acquired by the SELinux system is composed of a plurality of TE rules, each TE rule is composed of five variable values of an authorization type, a subject domain type, an object category and an operation set, and a space is formed between each part: "or" { } "isolation, an example of a newerallow type TE rule is as follows:

neverallow user_tabc_t：file{open read}

the TE rule of the newAllow type indicates that the domain type user _ t is not allowed to execute open and read operations on the common file type file with the object type abc _ t, but the format of the TE rule is not beneficial to subsequent use, and the rule of the original policy set is extracted;

converting the TE rule of the newAllow type into a required policy rule five-tuple form, wherein the policy rule five-tuple consists of a subject domain type user _ t, a subject type object _ type, a subject type object _ class, an operation set permissions and an authorization permission action, and is expressed as follows:

(subject_domain，object_type，object_class，permissions，action)

TE rule newAllow user _ t abc _ t for newAllow type: file { read open }, extracting into a quintuple form (user _ t, abc _ t, file, [ open, read ], neworallow), and finally saving the TE rule of the neworallow type in the quintuple form into a known strategy rule list file in a JSON format for constructing a knowledge base; the JSON file is in the same form as the allow rule.

Other steps and parameters are the same as those in one of the first to third embodiments.

The fifth concrete implementation mode: the difference between this embodiment and one of the first to fourth embodiments is that the mapping relationship is processed based on the second step in the first step; the specific process is as follows:

step one, three and one, processing the mapping relation of the attributes and the types; the specific process is as follows:

in TE rules of a policy set in the SELinux system, a type of a process (subject) is generally called a domain (domain), a type of a file or a directory is generally called a type, a subject domain type subject _ domain and an object type object _ type may be a single type or a group of multiple types, that is, an attribute, and the attribute and the type are in a many-to-many relationship, and the relationship between the attribute and the type may be formally represented by the following function:

AttrToType(a)＝{t|t∈T，a∈A}

TypeToAttr(t)＝{a|a∈A，t∈T}

wherein A represents a set of attributes, T represents a set of types, AtttToType (a) is an attribute-to-type mapping function for computing which attributes a type contains multiple types, and TypeToAttr (T) is a type-to-attribute mapping function for computing which attributes a type is assigned to;

the attribute and type mapping relation file uses retraction to represent the inclusion and hierarchical relation between data, wherein retraction represents a blank area before each line of data starts, so that the mapping relation between the attributes and the types can be constructed according to the retraction of the data, and finally the mapping relation between the attributes and the types is stored into a JSON format file;

step one, three, two, type and file full path mapping relation processing; the specific process is as follows:

in the SELinux system, all processes of an executable program can be assigned with a unique subject domain type, and files or directories are assigned with a unique object type, but the subject domain type/object type can include a plurality of files, because a file can be uniquely identified by a full path of the file in the Linux system, the relationship between the types and the files can be converted into the relationship between the types and the full path of the file, and the following is a mapping relationship between the types and the specific file path:

TypeToPath(t)＝{p|t∈T，p∈P}

PathToType(p)＝t，t∈T，p∈P

wherein T represents a type set, P represents a set of file full paths, TypeToPath (P) is a mapping function of the type and the file full paths and is used for calculating a file set contained in the type, PathToType (T) is a mapping function of the file full paths and the type and is used for calculating the type to which a file belongs, and the type and the file are in one-to-many relationship.

Data in the type and file mapping relation file is stored in the form of key and value, each line record is composed of a file path and a type, the two parts are separated by a space, the mapping relation can be easily acquired by analyzing the file, and finally the mapping relation is stored in the JSON format file.

Other steps and parameters are the same as in one of the first to fourth embodiments.

The sixth specific implementation mode: the difference between this embodiment and one of the first to fifth embodiments is that, in the first step four, the access mode is generated based on the first step three; the specific process is as follows:

the access mode is an event behavior that a user/process executes certain operation on an object file, and is divided into two types, wherein one type is an access mode generated by a known policy rule and is called a known access mode, and the other type is a user/process access behavior extracted by analyzing an audit log and is called an unknown access mode;

the access mode is composed of common attributes of a known access mode and an unknown access mode and unique attributes of the unknown access mode;

the common attributes of the known access mode and the unknown access mode comprise a subject path object, a subject field type object _ domain, an object path object, an object type object _ type, an object type object _ class and operation set properties;

the specific attributes of the unknown access mode comprise a log number log _ id and a timestamp;

the access pattern is represented in the form:

(1og_id，subject，subject_domain，object，object_type，object_class，permissions，timestamp)

for example, an access pattern (null,/opt/google/chrome/chrome, my _ chrome _ t,/home/file/abc.txt, abc _ t, file, read, null) is known to represent the access behavior of an executable program/opt/google/chrome/chrome read/home/file/abc.txt file, and since the access pattern is known to come from a policy, two attributes of a log number and a timestamp do not exist, and thus the attribute value is null. Because the domain type and the object type may have attributes, for the processing of the subsequent algorithm, the attributes are uniformly converted into simple types through the mapping of the attributes and the types, then a subject path and an object path are generated according to the mapping relation between the types and the file path, and finally the generated known access modes and the generated unknown access modes are respectively stored as access mode list files in a JSON format, which are called as a known access mode list file and an unknown access mode list file, and the contents of the JSON files are as follows:

{

"log _ id": the log number to which the "access mode belongs, such as 1",

"subject": "subject, executable file full path, such as/opt/google/chrome/chrome",

"subject _ domain": a "body domain type, such as my _ chrome _ t",

"object": "object, full path of file, e.g.,/home/file/abc. txt",

"object _ type": "object type, e.g., abc _ t",

"object _ class": "object file category, such as dir",

"properties": "set of rights, such as [ getattr, read, open, create, write ]",

"timeframe": "time stamp of the second class", e.g. 1637463403

}

Access pattern illustration

(1，″/usr/bin/cat″，″user_t″，″/home/abc.txt″，″abc_t″，″file″，[″read″]，1637463403)

This access pattern indicates that in the SELinux system the user Tom reads the txt file named abc using the cat command.

Wherein log _ id is 1, log number, which indicates that the access mode just comes from the audit log with number 1;

a subject path indicating a full path for executing a command, "/usr/bin/cat";

the measure _ domain is equal to "user _ t", and the main body domain type represents the domain type to which the process corresponding to the cat command belongs;

the object path represents a full path corresponding to the txt file of abc;

object _ type ═ abc _ t ", object type, which represents the object type corresponding to abc.txt file;

object _ class ═ file ", object class, meaning abc.txt is a common file type;

permission [ "read" ], a set of operations performed, representing a read operation by which the user reads abc.txt;

timestamp 1637463403, which indicates the time when the access request was generated;

other steps and parameters are the same as those in one of the first to fifth embodiments.

The seventh embodiment: the difference between the present embodiment and one of the first to sixth embodiments is that, in the second step, a knowledge base is constructed based on the first step; the specific process is as follows:

preprocessing an original file to generate a file in a JSON format of a policy rule, an access mode and a mapping relation, and loading a plurality of files stored in the JSON format in the step one into a memory (an executed software memory) by using a JSON loader of python;

converting the meanings of a policy rule and an access mode JSON file into corresponding object lists according to attribute fields (the attribute fields refer to a subject path (subject), a subject domain type (subject _ domain), an object path (object), an object type (object _ type), an object class (object _ class), an operation set (permissions), a log number (log _ id), a timestamp (timestamp) and an authorization type), storing a single rule or an access mode as an object by loading the policy rule and the access mode JSON file through a JSON function, and further generating a corresponding object list which comprises a known policy rule list, a known access mode list and an unknown access mode list (the policy rule list is the policy rule set, the known access mode list is generated through the policy rule, and the unknown access mode list is extracted from an audit log);

constructing a data dictionary in python according to the JSON data of the mapping relation file of the attribute and type mapping relation and the mapping relation file of the type and the file full path for storage;

and finally, constructing a knowledge base based on the file list and the data dictionary:

knowledgebase＝{SP，KAP，UAP，ATTR_TYPE_DICT，TYPE_PATH_DICT}

wherein SP represents a policy rule list of a SELinux system, KAP represents a known access pattern list, UAP represents an unknown access pattern list, ATTR _ TYPE _ DICT represents a data dictionary of mapping relationship between attributes and TYPEs, and TYPE _ PATH _ DICT represents a data dictionary of mapping relationship between TYPEs and file full PATHs.

Other steps and parameters are the same as those in one of the first to sixth embodiments.

The specific implementation mode is eight: the third step is to classify the unknown access mode list in the knowledge base constructed in the second step by utilizing a neighbor classifier, a mode-rule distance measurement classifier, a co-occurrence learner and a decision tree classifier respectively, and fuse classification results of the neighbor classifier, the mode-rule distance measurement classifier, the decision tree classifier and the co-occurrence classifier to obtain a final classification result of the unknown access mode, wherein part of the unknown access mode in the classification result is recognized as benign or malignant and is called as a newly recognized access mode; another part is still an unknown type of access pattern, which will be identified in subsequent iterations; the specific process is as follows:

the knowledge base stores the knowledge of known policy rules, access modes, mapping relations and the like, how to calculate the category of an unknown access mode by using the known knowledge is the key of the algorithm, because the access request is allowed and illegal, the allowed behavior indicates that the access mode is benign, and the illegal behavior indicates that the access mode is malignant.

In order to improve the accuracy of calculating the classes of the unknown access modes, the algorithm analyzes and designs four different classifiers of the unknown access modes from different dimensions such as similarity, difference, spatio-temporal property and the like, wherein the four different classifiers are respectively a neighbor classifier, a mode-rule distance measurer, a co-occurrence classifier and a decision tree classifier. The core idea of the four classifiers is to compute the similarity between the rules in the unknown access pattern and the known access patterns, with which the known access pattern or policy rule with the highest similarity will decide the class of the pattern. Finally, according to the integrated learning thought, the classification results of the adjacent classifier, the mode-rule distance measurement classifier, the decision tree classifier and the co-occurrence classifier are fused to obtain the final classification result of the unknown access mode, and part of the unknown access mode in the classification result is identified as a new identified access mode; another part is still an unknown type of access pattern, which will be identified in subsequent iterations;

step three, after obtaining data such as a strategy rule, a known access mode, an unknown access mode and the like, classifying the same unknown access mode vector sample x in a knowledge base by respectively utilizing a neighbor classifier, a mode-rule distance measurer, a co-occurrence learner and a decision tree classifier, wherein the specific process is as follows:

A. utilizing a neighbor classifier to perform classification calculation on vector samples x in the same unknown access mode list;

the neighbor classifier inputs a training set T { (x) containing N access patterns₁，y₁)，(x₂，y₂)，...，(x_n，y_n) In which x_iTo access pattern feature vectors, y_iTo represent the sample class: benign or malignant, based on Hamming Distance measuring method, calculating k sample points with smaller Distance to sample x, and calling the k sample points as neighborhood of sample x, and recording the neighborhood as N_k(x) In N at_k(x) The category y of x is determined according to a classification decision rule (e.g., majority decision), and the specific calculation method is as follows:

i＝1，2，…，N；j＝1，2，…，K

where N represents the number of total samples, K represents the number of samples nearest to sample x, and I is an indicator function, i.e., when y is_i＝c_jWhen I is 1, otherwise I is 0, and finally y_knnIt means that the class x of the sample is determined to be benign or malignant according to the classification decision rule (majority decision), x_iRepresenting data points belonging to the x neighborhood of the sample, y_iRepresenting the category of the data point, c_jIndicates the class of categories in the neighborhood, here benign and malignant;

based on Hamming Distance measurement method, calculating k sample points with smaller Distance to sample x, and calling the k sample points as neighborhood of sample x and recording the neighborhood as N_k(x)；

B. Carrying out classification calculation on vector samples x in the same unknown access mode list by using a mode-rule distance measurement classifier;

the mode-rule distance measurer calculates the distance between the new access mode and the existing strategy rule, and judges the type of the new access mode according to the distance between the new access mode and the existing strategy rule, and the authorization type of the strategy rule with the minimum distance determines the type of the new access mode.

The distance measurer constructs the existing strategy rules into a tree data structure, the type of a subject, the type of an object file and the authority set are respectively from a root node to a leaf node, then the distance measurer calculates the depth mode of an unknown access mode matching with a measurement rule tree-the rule and classifies vector samples x of the unknown access mode, and the calculation mode is as follows:

dist(x)＝TotalLayerDepth-MatchedDepth(x)

wherein totalLayerDepth represents the total depth (maximum depth 4) of the policy rule to construct the policy tree (for the policy rule in the quintuple form) (step one obtains the policy set, step two converts the policy set obtained in the step one into the policy rule in the quintuple form, construct the policy tree based on the policy rule in the quintuple form, the host type, the object file type and the authority set are respectively from the root node to the leaf node, the total depth of the policy tree is obtained according to the policy tree), MatchedDepth (x) represents the maximum depth (maximum depth 4) of the sample x matching the policy rule tree in the unknown access mode list (for the policy rule in the quintuple form) (step one obtains the policy set, step two converts the policy set obtained in the step one into the policy rule in the quintuple form, construct the policy tree based on the policy rule in the quintuple form, the policy tree is respectively from the root node to the leaf node, The object type, the object file category and the permission set, the maximum depth of a sample x in the unknown access mode list matching the policy rule tree is obtained according to the policy tree),

dist (x) represents the distance of sample x from the existing policy rules;

if the minimum value of dist (x) is less than or equal to the set threshold value, calculating the category y of the sample x according to the category of the corresponding strategy rule when the minimum value of dist (x) is_distAll indicates a benign category and newerallow indicates a malignant category;

if the minimum value of dist (x) is greater than the set threshold, sample x is still set to unknown class unknown due to the security of the policy.

Tree shape built by mode-rule distance measurement classifier

Since the SELinux type mandatory policy rule only focuses on four matching dimensions, such as a subject domain type, an object file type and authority, a pattern-rule distance classifier constructs the existing policy rule set as a rule matching tree, the matching tree is constructed from large to small according to the priority of the dimension, wherein the priority of the dimension is respectively the subject domain type, the object file type and the authority from large to small, and therefore the root node to the leaf node are respectively the subject domain type (subject _ domain), the object type (object _ type), the object file type (object _ class, such as a file type) and the authority set (authorities, such as read authority), and a semantic relevance extension mechanism is added when the tree nodes are constructed, so that the problem that the semantics are the same but the access patterns are different is solved, and the semantic relevance of the rule matching tree is enriched. FIG. 10 shows an example of a rule matching tree constructed from policy rules.

C. Classifying and calculating vector samples x in the same unknown access mode list by using a simultaneous learner;

the co-occurrence learner considers the correlation between the access patterns based on the statistical relationship between the new access patterns and the known access patterns which frequently occur simultaneously in the audit logs;

the calculation method is as follows:

the establishment of the co-occurrence matrix according to the statistical relationship is as follows:

wherein CO is_APCo-occurrence matrix, ap, representing audit log access patterns_iIndicating the ith access mode, ap_jDenotes the jth access mode, c_ijRepresenting access patterns ap_iAnd access mode ap_jThe co-occurring ratio, OccurNum (ap)_i，ap_j) Representing access patterns ap_iAnd access mode ap_jNumber of co-occurrences, OccurNum (ap)_i) Representing access patterns ap_iThe total number of occurrences;

the co-occurrence matrix is used for counting the co-occurrence frequency of a plurality of access modes, the co-occurrence condition of the access modes corresponding to the sample x can be obtained through the co-occurrence matrix, in order to ensure the accuracy and the safety of strategy generation, the category of the sample x cannot be determined by all the access modes, and the category of the sample x can be calculated only when the co-occurrence frequency is higher than a threshold value, wherein the calculation formula is as follows:

wherein CO is_k(x) Representing access patterns which have a threshold number of co-occurrences with access pattern ap, I being an indicator function, i.e. when y is_i＝c_jWhen I is 1, otherwise I is 0, according toClassification decision rule (majority decision) determines sample x class y_cooWhether benign or malignant, x_iRepresenting data points belonging to a co-occurrence frequency with the sample x above a threshold, y_iRepresents the category of the data point, cj represents the category of the category in the co-occurrence set, which refers to benign and malignant;

D. carrying out classification calculation on vector samples x in the same unknown access mode list by using a decision tree classifier;

the decision tree classifier is based on the consistency of the type of the subject and object of the access request, and if a known access control policy allows a subject of a certain class to perform an operation on a subject of a certain class, the policy is benign when an unknown access mode applies for the subject of the class to try access to the object of the class again. Thus, if an unknown access control policy finds a matching known access control policy, i.e. possessing the same subject domain type, object type and object class, then a classification of the unknown access pattern can be inferred from the known access pattern. If no matching known access pattern is found, the unclassified state is maintained.

Firstly, according to the known access mode rule, a decision tree decision _ tree is established by taking a < subject field type, an object type > namely sub, obj _ type >, a < subject field type, an operation set > namely sub, per >, and an < object type > namely < obj _ class, per > as root nodes.

Finally, traversing the decision tree decision _ tree, searching whether a subject domain type, object type tuple, namely a sub, obj _ type tuple or a subject domain type, of the sample x exists, operating a set tuple, namely a sub, per tuple or an object type, operating a close tuple, namely an obj _ type, per tuple is matched with the tuple in the decision tree (the tuples are the same or matched), if so, calculating that the class of the sample x is ydec, representing benign or malignant, and if not, judging that the rule is not classified;

step two, fusing the classification result of the adjacent classifier, the classification result of the mode-rule distance measurer, the classification result of the co-occurrence learner and the classification result of the decision tree classifier; the specific process is as follows:

through the calculation of the four classifiers in the classifier module on the unknown access mode, the same unknown access mode in the list can be identified into different categories, namely y_knn、y_dist、y_cooAnd y_decAnd merging the classification results of the classifiers by using a classification result fusion algorithm, and outputting the result as a final classification result when the classification results of most classifiers are the same, namely, a certain rule is judged to be allowable or rejected.

The classification fusion algorithm combines different classifiers into a meta classifier by utilizing an integrated learning idea, so that the meta classifier has better extensive adaptability than a single classifier, the accuracy of identifying the access mode category is improved, and the safety of generating strategies is further ensured. Aiming at the problem that the category of the access mode belongs to the second category, an absolute majority voting integration algorithm is adopted to fuse the classification results, the integration algorithm is that under the condition that a plurality of classifiers predict a certain category, the final integration result is a prediction type with the number being half of that of the total classifiers, and the following calculation formula is expressed as follows:

wherein T represents T classifiers, where T is equal to 4, respectively representing a neighbor classifier, a pattern-rule distance measurement classifier, a co-occurrence learner, and a decision tree classifier; n denotes N categories of access patterns, N being equal to 2, respectively benign and malignant; c. C_jIndicates the category, refers to benign (allowed) and malignant (not allowed);

representing the classification result of the ith classifier;

the prediction result of the T classifiers to the category j is more than half of the total voting result, and the prediction result of the ensemble learning is the category j, which is called as a newly identified access mode;

otherwise, the unknown access mode cannot be accurately identified on the premise of the existing knowledge, and the unknown access mode is set to be unknown and is called as an unidentified access mode; and classifying in the next round of iterative learning.

Other steps and parameters are the same as those in one of the first to seventh embodiments.

The specific implementation method nine: the difference between this embodiment and the first to eighth embodiment is that, in the fourth step, the newly identified access mode is converted into a policy rule form of a SELinux system policy set, and the converted policy rule form of the SELinux system policy set is subjected to collision detection with a known access mode or rule;

the specific process is as follows:

based on the knowledge of the known strategy rules, the known access patterns, the mapping relation and the like in the knowledge base, the class of the unknown access patterns in the knowledge base is identified by combining a learning balance and fusion module, and a strategy generation module is introduced for converting the newly identified access patterns into a SELinux system strategy centralized rule form.

The strategy generation module converts the newly identified access mode into a quintuple form of strategy rules in the SELinux system, specifically means that a benign access mode generates strategy rules of an allowable type, a malignant access mode generates strategy rules of a rejected type, a strategy conflict detection and resolution function is added in the strategy generation process, the relationship among the SELinux strategy rules is analyzed to find out the rules with conflicts, the conflicting rules are resolved, and the combination of new and old strategy rules is completed (the new strategy rules are generated by conversion and combined with the existing strategy rules in a knowledge base to complete the optimization of the SELinux strategy rules), so that the SELinux strategy optimization is completed in an automatic mode; the specific process is as follows:

converting the newly identified access mode into a policy rule quintuple form in the SELinux system, performing policy rule conflict detection before combining the newly identified access mode with the existing policy rule, and performing conflict detection on the converted form into the policy rule quintuple form in the SELinux system and a known access mode or rule;

the conflicts are classified into seven types of conflicts, namely rule redundancy conflicts, direct conflicts, object file type conflicts, access authority conflicts, indirect read unauthorized file conflicts, indirect modification unauthorized file conflicts and indirect call unauthorized file conflicts;

(1) rule redundancy conflict: if the known access mode or rule existing in the policy set and the newly identified access mode have the same or contain relationship of the subject domain type, the object type (object _ type) and the operation, the same object file type (object _ class), and the authorization (permission not permitted) of the two rules is the same, the rule is subject to repeated conflict;

(2) direct conflict:

if the known access mode or rule in the policy set and the newly identified access mode have the same or contain the same subject domain type, object type and operation of the relationship, the same object file type, and the two rules are authorized the same, then the direct conflict is belonged to;

(3) object file category conflict:

if a known access mode or rule in the policy set allows a certain type of subject domain type to access a directory dir file of a specified object type, the subject domain type should also access a file (common file) file of the same object type (the specified object type in the upper row), the semantic directory contains the common file, the directory belongs to a high-level object type, and the file belongs to a low-level object type; if the new rule grants a negative authorization for the low-level object class, then there is an object class conflict;

(4) access right conflict:

if a known access mode or rule in the policy set allows a certain type of subject domain type to perform write (write) operation on a certain type of object type, the type of subject domain may also perform an append (add) operation on the type of object type, and a macro definition with similar semantics of operation permissions exists in the SELinux system, and if the write of a user to a file is supported, subsequent add to the file should be allowed; when the newly identified access mode and the known access mode or rule in the strategy set have access authority semantic conflict, the access authority conflict exists;

(5) indirect read unauthorized file conflict:

if there are known access patterns or rules in the policy set

Refusing a certain class of subject domain type to carry out read (read) operation on a certain class of object type, when a newly identified access mode is added into a strategy set, supporting that the class of subject domain type can read another class of object type files, and the class of object type can read corresponding unauthorized object types, so that the unauthorized objects are indirectly read to cause conflict, namely indirect read unauthorized file conflict;

(6) indirect modification unauthorized file conflicts:

if a known access mode or rule in the policy set refuses a certain class of subject domain type to perform write (modification) operation on a certain class of object type, when a newly identified access mode is added into the policy set, supporting the class of subject domain type to modify another class of object type file, and then writing the class of object type into a corresponding unauthorized object type, so that the unauthorized object is indirectly modified to cause conflict, namely indirect modification unauthorized file conflict;

(7) indirect call unauthorized program conflict:

if a known access mode or rule in the policy set rejects a certain class of object domain type and cannot perform a call operation on a certain class of object type, such as execute (execution), delete (deletion), and the like, when a newly identified access mode is added to the policy set, it is supported that the class of object domain type can call another class of object type file, and then the class of object type can call a corresponding unauthorized object type, so that a conflict caused by indirect call of an unauthorized object is caused, which is called as an indirect modification unauthorized file conflict.

Other steps and parameters are the same as those in one to eight of the embodiments.

The detailed implementation mode is ten: the difference between this embodiment and one of the first to ninth embodiments is that, when a conflict occurs, the conflict is resolved, and the specific process of the resolution is as follows:

the input of the algorithm is an original rule set RS_oldNew rule set RS_newAnd outputting a resolution result set RES, wherein RES represents a message for confirmation or prompt of an administrator. The algorithm detects the conflict type between each new rule and the existing rule according to the strategy conflict detection algorithm mentioned in the previous section, and then resolves the conflict type corresponding to each rule.

(1) For the rule redundancy conflict, as the newly identified access mode already exists in the known access mode or the rule existing in the strategy set, the method of directly ignoring the newly identified access mode is adopted for resolution;

(2) for direct conflict, because whether the newly identified access mode is effective cannot be judged, the newly identified access mode is added into a digestion result set, and an administrator needs to participate in selecting to receive a new rule and remove an old rule;

(3) for the conflict between the object file category and the conflict between the access rights, according to the upper-lower relationship and the semantic correlation in the SELinux system, adopting a mode of ignoring the newly identified access mode and prompting an administrator to resolve;

(4) for indirect read unauthorized file conflict, indirect modification unauthorized file conflict and indirect call unauthorized program conflict, because the indirect access condition is mostly the mutual communication cooperation among the subjects and is usually acceptable, an access mode of accepting new identification is adopted and the mode of prompting an administrator is cleared up;

and finally, directly accepting the new rule without conflict.

Different resolution methods are given for different types of conflicts, and a specific strategy conflict resolution algorithm is as follows:

for conflict detection of the SELinux type forcing strategy, seven types of conflicts, namely rule redundancy conflict, direct conflict, object file type conflict, access permission conflict, indirect read unauthorized file conflict, indirect modification unauthorized file conflict and indirect call unauthorized file conflict, are proposed, and the conflict types are described in detail below.

Other steps and parameters are the same as those in one of the first to ninth embodiments.

The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims

1. A SELinux strategy optimization method based on a knowledge base is characterized in that: the method comprises the following specific processes:

step two, constructing a knowledge base based on the step one;

and finishing the optimization process of the SELinux strategy rule.

2. The knowledge-base-based SELinux policy optimization method of claim 1, wherein: the specific process of preprocessing the obtained strategy set, the audit log, the mapping relation between the attributes and the types and the mapping relation data file between the types and the whole file path in the first step is as follows:

step one, data cleaning;

step three, processing the mapping relation based on the step two;

3. A SELinux policy optimization method based on a knowledge base according to claim 2, characterized in that: data cleaning in the steps one by one; the specific process is as follows:

deleting data with missing values in the audit log, removing log entries with types not equal to the access vector cache type in the audit log, and removing the policy rules related to the type conversion rules in the policy set.

4. A SELinux policy optimization method based on a knowledge base according to claim 2 or 3, characterized in that: in the second step, TE rules of the strategy set are extracted one by one based on the first step; the specific process is as follows:

allow my_chrome_t abc_t:dir{getattr read open create write}

the TE rule of the allow type indicates that the domain type my _ chrome _ t is allowed to carry out the operations of acquiring the attribute getattr, reading, opening the open, creating the create and writing on the directory dir file with the object type abc _ t;

converting the TE rule of the allow type into a policy rule five-tuple form, wherein the policy rule five-tuple consists of a subject field type subject _ domain, a subject type object _ type, a subject file type object _ class, an operation set permissions and an authorization permission action, and is expressed as follows:

(subject_domain,object_type,object_class,permissions,action)

the TE rule of the allow type, i.e. allow my _ chrome _ t abc _ t, dir { getattr read open create write }, is extracted into a (my _ chrome _ t, abc _ t, dir, [ getattr, read, open, create, write ], allow) quintuple form, and finally the TE rule of the allow type in the quintuple form is saved as a known strategy rule list file in a JSON format;

neverallow user_t abc_t:file{open read}

the TE rule of the newAllow type indicates that the domain type user _ t is not allowed to execute open and read operations on the file type file with the object type abc _ t;

converting the TE rule of the newAllow type into a policy rule quintuple form, wherein the policy rule quintuple consists of a subject domain type user _ t, a subject type object _ type, a subject type object _ class, an operation set permissions and an authorization permission action, and is expressed as the following form:

(subject_domain,object_type,object_class,permissions,action)

the TE rule of the neworallow type is neworallow user _ t abc _ t, wherein the file { read open }, is extracted to be in a five-tuple form of (user _ t, abc _ t, file, [ open, read ], neworallow), and finally the TE rule of the neworallow type in the five-tuple form is saved as a known strategy rule list file in a JSON format.

5. The SELinux policy optimization method based on a knowledge base according to claim 4, wherein: processing the mapping relation based on the first step and the second step in the third step; the specific process is as follows:

in TE rules of a policy set in the SELinux system, a type of a process is called domain, a type of a file or a directory is called type, a subject domain type subject _ domain and an object type object _ type may be a single type, or may be a group formed by multiple types, that is, an attribute, the attribute and the type are in a many-to-many relationship, and the relationship between the attribute and the type may be expressed by the following functional formalization:

AttrToType(a)＝{t|t∈T,a∈A}

TypeToAttr(t)＝{a|a∈A,t∈T}

storing the mapping relation between the attributes and the types into a JSON format file;

TypeToPath(t)＝{p|t∈T,p∈P}

PathToType(p)＝t,t∈T,p∈P

wherein T represents a type set, P represents a set of file full paths, TypeToPath (P) is a mapping function of the type and the file full paths and is used for calculating a file set contained in the type, PathToType (T) is a mapping function of the file full paths and the type and is used for calculating the type to which a file belongs, and the type and the file are in one-to-many relationship;

and saving the mapping relation to a JSON format file.

6. The SELinux policy optimization method based on a knowledge base according to claim 5, wherein: generating an access mode based on the step one in the step four; the specific process is as follows:

the access modes are divided into two types, one is a known access mode, and the other is an unknown access mode;

the access pattern is represented in the form:

(log_id,subject,subject_domain,object,object_type,object_class,permissions,timestamp)

the generated known and unknown access patterns are respectively saved as access pattern list files in the JSON format, which are called known access pattern list files and unknown access pattern list files.

7. The SELinux policy optimization method based on a knowledge base according to claim 6, wherein: in the second step, a knowledge base is constructed based on the first step; the specific process is as follows:

loading the file stored in the JSON format in the step one into a memory by using a JSON loading program of python;

converting the strategy rules and the access mode JSON file into corresponding object lists according to the meaning of the attribute fields, wherein the object lists comprise a known strategy rule list, a known access mode list and an unknown access mode list;

knowledgebase＝{SP,KAP,UAP,ATTR_TYPE_DICT,TYPE_PATH_DICT}

8. The SELinux strategy optimization method based on the knowledge base according to claim 7, characterized in that: in the third step, an unknown access mode list in the knowledge base constructed in the second step is classified by utilizing a neighbor classifier, a mode-rule distance measurement classifier, a co-occurrence learner and a decision tree classifier respectively, classification results of the neighbor classifier, the mode-rule distance measurement classifier, the decision tree classifier and the co-occurrence classifier are fused to obtain a final classification result of the unknown access mode, and part of the unknown access mode in the classification result is identified as a new identified access mode; another part is still an unknown type of access pattern; the specific process is as follows:

step three, classifying the same unknown access pattern vector sample x in the knowledge base by respectively utilizing a neighbor classifier, a pattern-rule distance measurer, a co-occurrence learner and a decision tree classifier, wherein the specific process is as follows:

A. utilizing a neighbor classifier to perform classification calculation on vector samples x in an unknown access mode list; the specific calculation method is as follows:

where N represents the number of total samples, K represents the number of samples nearest to sample x, and I is an indicator function, i.e., when y is_i＝c_jWhen I is 1, otherwise I is 0, and finally y_knnIt means that the sample x is determined to be benign or malignant according to the classification decision rule, x_iRepresenting data points belonging to the x neighborhood of the sample, y_iRepresenting the category of the data point, c_jIndicates the class of categories in the neighborhood, here benign and malignant;

based on a Hamming Distance measurement method, k sample points with smaller Distance with the sample x are calculated,this k sample points are called the neighborhood of sample x, denoted N_k(x)；

B. Carrying out classification calculation on vector samples x in the unknown access mode list by using a mode-rule distance measurement classifier; the calculation method is as follows:

dist(x)＝TotalLayerDepth-MatchedDepth(x)

where dist (x) represents the distance of sample x from the existing policy rules; totalLayerDepth represents the total depth of the strategy rule construction strategy tree, and MatchedDepth (x) represents the maximum depth of a sample x matching the strategy rule tree in an unknown access mode list;

if the minimum value of dist (x) is greater than the set threshold, still set the sample x to unknown class unknown;

C. classifying and calculating vector samples x in the unknown access mode list by using a co-occurrence learner; the calculation method is as follows:

the co-occurrence matrix is established as follows:

wherein CO is_APCo-occurrence matrix, ap, representing audit log access patterns_iIndicating the ith access mode, ap_jDenotes the jth access mode, c_ijRepresenting access patterns ap_iAnd access mode ap_jThe co-occurring ratio, OccurNum (ap)_i,ap_j) Representing access patterns ap_iAnd access mode ap_jNumber of co-occurrences, OccurNum (ap)_i) Representing access patterns ap_iAppearThe total number of times;

wherein CO is_k(x) Representing access patterns which have a threshold number of co-occurrences with access pattern ap, I being an indicator function, i.e. when y is_i＝c_jIf I is 1, otherwise, if I is 0, determining the type y of the sample x according to a classification decision rule_cooWhether benign or malignant, x_iRepresenting data points belonging to a co-occurrence frequency with the sample x above a threshold, y_iRepresenting the category of the data point, c_jRepresenting the category of the co-occurrence set, which refers to benign and malignant;

D. classifying and calculating vector samples x in the unknown access mode list by using a decision tree classifier;

firstly, respectively taking a subject domain type, an object type, namely sub, obj _ type, a subject domain type, an operation set, namely sub, per, and an object type, an operation set, namely obj _ class and per as root nodes according to a known access mode rule, and establishing a decision tree decision _ tree;

finally, the decision tree decision _ tree is traversed to find whether a sample x exists<Type of subject field, type of object>Tuples, i.e.<sub,obj_type>Tuple or<Subject Domain type, operation set>Tuples, i.e.<sub,per>Tuple or<Object class, operation box>Tuples, i.e.<obj_type,per>The tuple is matched with the tuple in the decision tree, if the tuple exists, the class y of the sample x is calculated_decIf the rule does not exist, the rule is judged not to be classified;

wherein T represents T classifiers, which respectively represent a neighbor classifier, a mode-rule distance measurement classifier, a co-occurrence learner and a decision tree classifier; n representsN categories of access patterns, N equal to 2, representing benign and malignant respectively; c. C_jIndicates the category, refers to benign (allowed) and malignant (not allowed);

representing the classification result of the ith classifier;

if the prediction result of the T classifiers to the category j is more than half of the total voting result, the prediction result is the category j, and the method is called as a newly identified access mode;

otherwise, the unknown access mode cannot be accurately identified on the premise of the existing knowledge, and the unknown access mode is set to be unknown, which is called as an unidentified access mode.

9. The SELinux strategy optimization method based on the knowledge base according to claim 8, characterized in that: in the fourth step, the newly identified access mode is converted into a policy rule form of the SELinux system policy set, and the converted policy rule form of the SELinux system policy set is subjected to conflict detection with the known access mode or rule;

the specific process is as follows:

converting the newly identified access mode into a policy rule quintuple form in the SELinux system, and performing conflict detection on the converted policy rule quintuple form in the SELinux system and a known access mode or rule;

(1) rule redundancy conflict: if the known access mode or rule existing in the policy set and the newly identified access mode have the same or contain the same or related subject domain type, object type, and operation, the same object file type object _ class, and the two rules are authorized the same, then the rules are in repeated conflict;

(2) direct conflict:

(3) object file category conflict:

if a known access mode or rule in the policy set allows a certain type of subject domain type to access a directory dir file of a specified object type, the subject domain type should also access a file of the same object type, the directory includes a common file semantically, the directory belongs to a high-level object type, and the file belongs to a low-level object type; if the new rule grants a negative authorization for the low-level object class, then there is an object class conflict;

(4) access right conflict:

if the known access mode or rule in the policy set allows a certain type of subject domain type to perform write operation on a certain type of object type, the subject domain can also perform additional ap-pend operation on the object type, and when the access mode newly identified and the known access mode or rule in the policy set have access permission semantic conflict, the access permission conflict exists;

(5) indirect read unauthorized file conflict:

if there are known access patterns or rules in the policy set

Refusing a certain class of subject domain type to read a certain class of object type, when a newly identified access mode is added into a strategy set, supporting that the class of subject domain type can read another class of object type files, and the class of object type can read corresponding unauthorized object types, so that the unauthorized objects are indirectly read to cause conflicts, namely indirect read unauthorized file conflicts;

(6) indirect modification unauthorized file conflicts:

if a known access mode or rule in the policy set refuses a certain class of subject domain type to modify write operation on a certain class of object type, when a newly identified access mode is added into the policy set, supporting that the class of subject domain type can modify another class of object type file, and then writing the class of object type into a corresponding unauthorized object type, so that the unauthorized object is indirectly modified to cause conflict, namely indirect modification unauthorized file conflict;

(7) indirect call unauthorized program conflict:

if a known access mode or rule in the policy set rejects a certain class of object domain type and cannot call a certain class of object type, when a newly identified access mode is added into the policy set, supporting the class of object domain type can call another class of object type file, and then the class of object type can call a corresponding unauthorized object type, so that the conflict caused by the indirect call of the unauthorized object is called as an indirect modification unauthorized file conflict.

10. The SELinux strategy optimization method based on the knowledge base according to claim 9, characterized in that: when conflict occurs, the conflict is resolved, and the specific resolving process is as follows:

(2) for direct conflict, because whether the newly identified access mode is effective cannot be judged, the newly identified access mode is added into a digestion result set;

(3) adopting an access mode of ignoring new identification for the object file category conflict and the access authority conflict;

(4) and adopting an access mode of accepting new identification for indirect read unauthorized file conflicts, indirect modification unauthorized file conflicts and indirect call unauthorized program conflicts.