CN113546426B

CN113546426B - Security policy generation method for data access event in game service

Info

Publication number: CN113546426B
Application number: CN202110824250.7A
Authority: CN
Inventors: 朱磊; 皎玖圆; 黑新宏; 王一川; 姬文江; 孟海宁; 盘隆; 何萍; 姚燕妮
Original assignee: Xian University of Technology; Shenzhen Tencent Computer Systems Co Ltd
Current assignee: Xian University of Technology; Shenzhen Tencent Computer Systems Co Ltd
Priority date: 2021-07-21
Filing date: 2021-07-21
Publication date: 2023-08-22
Anticipated expiration: 2041-07-21
Also published as: CN113546426A

Abstract

The invention relates to a security policy generation method aiming at data access events in game business, which firstly processes a source game data table and deletes meaningless data to lighten the operation burden of an algorithm. The large-scale data set is then processed by performing parallel operations on Spark using the FP-Growth algorithm with one column of the game data table as one data item. And acquiring data frequency characteristics through frequent items obtained by the FP-Growth, and acquiring a strong association rule through the frequent items. And finally, converting the frequency characteristics and the association relation of the data into authority rule configuration so as to enable the authority rule configuration to act on the game data list row or the game data list column. The scheme can be added into the system effectively, the refinement of authority control is completed, and the security policy is generated by using a data mining method from the aspect of accessing data, so that the scheme is a reasonable data access control policy.

Description

Security policy generation method for data access event in game service

Technical Field

The invention is applied to the technical field of access data authority control in big data, and particularly relates to a security policy generation method aiming at data access events in game business, which realizes the refined access control of data.

Background

In recent years, due to the improvement of computer performance, the arrival of a big data age is caused, and an important conclusion of a large amount of data through data analysis is an important assistance of enterprise development. Data has gradually become an important condition for competition between enterprises. For better use of data, the data must be analyzed and shared, and therefore must be open to a particular population. This also creates data protection problems. In particular, in the aspect of game data protection, various shared data access conditions exist in the role relationship, and the data authority is more refined, so that the access authority control technology of specific data in the game data table is one of important means for guaranteeing data security in the data sharing process. Due to the different service scenes, the data protection technology has the requirements of different aspects such as control objects, control ranges and the like in implementation. The body framework of access control is now mature. But access control studies based on the characteristics of the protected data itself are relatively few. The frequent item sets are generated by data mining from the data features to generate a security policy for protecting the game data.

Disclosure of Invention

The invention aims to provide a security policy generation method for a game data access event aiming at a game sharing data access event, which uses the access characteristic of data to generate a security protection policy. The method solves the security problem of data during game data sharing in the prior art, and protects the data in a finer manner.

The technical scheme adopted by the invention is that the association rule between data is obtained based on data mining, a security policy is regenerated, firstly, a game data table is preprocessed, and the FP-Growth is used for mining the game data to obtain frequent items. And analyzing and screening the obtained frequent items, acquiring the data frequency characteristics and the association relation characteristics from the frequent item set, and finally, formulating a security policy according to the data frequency characteristics and the association relation characteristics to realize the refined access of the game data table.

A security policy generation method for data access events in game business includes preprocessing a source game data table, and then mining data through an optimized FP-Growth algorithm to obtain frequent items and association relations. And configuring the game data list and the row permission by using the obtained frequency characteristics and the association relation respectively. And (3) enriching the data feature library to achieve the aim of refining and protecting the data, and finally, judging the authority of the user during access according to the definition model and the authority judging rule.

The method specifically comprises the following steps:

step 1, data cleaning is carried out on a game data table and proper minimum support is selected due to the characteristic of complexity of game data;

step 2, mining game data based on the optimized FP-Growth;

step 3, obtaining the occurrence frequency of the data item value in the frequent item set as the configuration input of the game data list authority;

step 4, obtaining authority configuration, and acting the authority configuration finally obtained according to the result of the step three on a game data list;

step 5, selecting confidence coefficient, screening strong association relation, and taking the obtained strong association relation as data table permission configuration input;

step 6, obtaining authority configuration, and acting on the game data list according to the authority configuration finally obtained in the step five;

the specific steps of the step 1 are as follows:

step 1.1: processing source data;

performing interpretation definition on the corresponding relation of the database and the data mining; the analysis object, namely the data source, is a database table, the data item is a game data table list, the data item value is a specific representation of a certain piece of data of the data item, and the transaction is understood as a row of data according to the database;

deleting a data item if the occurrence of the value of the data item is close to zero or any one of the values of the data item is not repeated at the moment, including but not limited to a primary key; if a large number of null values exist in a certain data item and the proportion exceeds 90%, judging that the data item is a nonsensical data item;

step 1.2: mapping of data values to data items

In order to complete the conversion of rule configuration, the mapping relation between the data item value and the data item needs to be clear in the process of acquiring the association relation of the data item value, and a corresponding data item identifier is added to the data item value in the data source, so that the step needs to be carried out before encoding;

step 1.3: selection of minimum support

The selected FP-Growth algorithm is used for carrying out frequent item mining, and in the algorithm, a component is required to have a FP-Tree with the minimum support degree, and the selection of the minimum support degree is involved; first, the frequency of the data item values in each data item is calculated and averaged to be denoted as A _I I represents a certain data item; find A _I At the position ofThe ratio of n is the total number of data items, and the proportion M of the data quantity in the data items to the whole data set is calculated _I By M _I Is A _I The multiplication of the corresponding weights is denoted as G _I The method comprises the steps of carrying out a first treatment on the surface of the Will G _I Sorting, namely selecting a median as a demarcation point; if the data quantity at the high frequency is large, all the data are selected, otherwise, the low-frequency data are selected; taking A for each data item of the selected data _I As the support for frequent item mining; the method is adopted to select the minimum support degree so as to acquire more frequent items.

The specific steps of the step 2 are as follows:

and the selected FP-Growth algorithm is used for excavating frequent items. The algorithm can fail in storing the large-scale FP-Tree when processing the large-scale data set, meanwhile, the mining efficiency is extremely low due to the increase of recursion times in the process of mining the FP-Tree, and the scheme uses the optimized FP-Growth for data mining;

the scheme FP-Growth optimization is that parallel operation is carried out on Spark; firstly, original data is read, a data set is randomly divided, and the data set is used for dividing into different partitions; and then carrying out data mining on each partition by using the FP-Growth, and finally merging the mining results in each partition to obtain and output the mining results of the whole data set.

The specific steps of the step 3 are as follows:

step 3.1: acquiring frequent item sets according to minimum support

The constructed FP-Tree can display all associated data item values, under the condition that the frequency of a single data item value meets the minimum support, the associated data item value hidden in the FP-Tree needs to be mined, and the frequent item needs to be ensured to meet the requirement of the minimum support, namely the occurrence frequency is higher than a set threshold; finally, the data item value association is obtained and is used as the input of security feature configuration conversion.

Step 3.2: acquiring the occurrence frequency of data items in a frequent item set

The data after data processing and frequent item mining are converted into the flow permission configuration, and as the algorithm used is selected to analyze the association relation of the data items, if the algorithm output is required to be converted into the permission rule configuration, the most obvious permission division relation means the data item occurrence frequency, and the security feature is the data item use frequency division.

According to the representable authority meaning, the algorithm output needs to be converted into the following to obtain the security configuration. The encoded data item value first needs to be decoded to recover its original representation type. The data item value V is associated with the data item C before encoding the data item value, so that the decoded actual data item value is also mapped to the data item. Replacing the data item value in each frequent item with the corresponding data item, removing the repeated data items, and obtaining data which can be expressed as { { C }, support (N) }, and is marked as S (N), wherein N represents the current frequent item, the total of the data items in the current frequent item is set as I, the total of the frequent items is N, the support rate of each frequent item is calculated, and the occurrence frequency R of each data item in the frequent item is calculated according to the support rate of the frequent item _c 。

The specific steps of the step 4 are as follows: r in step 3 _c Wherein c represents a certain data item, namely, the data item c in the nth frequent item, and the calculation result represents the associated frequency of the data item in a certain game data table; according to expert experience in [0,1]]Four frequency intervals are divided over the interval [0,0.25 ]]Low frequency [025,0.5 ]]Lower frequency [0.5,0.75]Higher frequency [0.75,1 ]]High frequency, denoted R1, R2, R3, R4, respectively; frequency of occurrence R of data items _c Corresponding to four intervals;

according to the description of the security feature, the security feature represents "frequency division of data item use", and comprises four security feature elements, wherein the meaning of each security feature element represents the data item in a frequency interval, and the security feature element is defined as F1: high frequency, F2: higher frequency, F3: lower frequency, F4: low frequency; the data items which occur frequently have high priority, which means that the data items which occur frequently occupy more important positions in data mining, so that the protected level is correspondingly improved; the relation of the security feature elements is as follows: { F1} - { F1, F2, F3, F4}, { F2} - { F2, F3, F4}, { F3} - { F3, F4}, { F4} - { } - (- > means "accessible"); i.e. the column in the game data table is accessible when the user has the highest authority F1.

The specific steps of the step 5 are as follows:

step 5.1: the frequent items acquired through the mining algorithm are converted into authority rule configuration, and an action object is made into a game data table row, the association relation is required to be found from each frequent item, and after the association relation is evaluated, row authority configuration, namely, the condition for limiting the data row is acquired from the strong association relation;

assume a frequent term v= { V ₁ ,v ₂ ,…,v _m -is a set of m data item values and the frequent item support is support (V), the frequent item being called m item set in association; the association relationship in the frequent item is defined as

Representing that any data item value is taken from the frequent item to form two sets left: { v _p … and right { v } _q …, and the data item values that make up the sets are different, i.e., the two sets are not equal. The association relationship is interpreted as that for one transaction, all data item values in the left set appear, and the situation of all data item values in the right set also appears; and the evaluation criterion describing the association relationship is defined as confidence, and the calculation formula of the confidence is that

I.e. by the ratio of the support of the frequent item to the support of the data item values only present in the right set. In the frequent item mining, if a set containing a plurality of data item values meets the minimum support, any combination of the data item values must also meet the minimum support, and the frequent item and the support thereof also exist in the frequent item mining result; therefore, the support degree of the left item set can be directly obtained from frequent item results;

step 5.2: in all the association relations, if the total data item values in the right set are more than the total data item values in the left set, namely the confidence is more than 1, the association relations are regarded as inverse association of the association relations; since this text is a mining for a large amount of data, 0.1 data is randomly selected here as a sample to calculate the overall confidence; verifying the distribution rule of the confidence coefficient of the association relation in the [0,1] interval, if the normal distribution is met, acquiring a confidence interval of 95% of the confidence coefficient according to priori knowledge, removing a larger value and a smaller value to reduce useless data, and taking the association relation in the confidence interval as the input of the next operation; if the normal distribution is not met, taking the association relation of the confidence coefficient in the [0,1] interval as the input of the next operation, wherein the step is mainly used for screening out a part of rules with weak association so as to reduce the workload of evaluating parameter calculation;

in the association relation evaluation system, the confidence degree is due to neglecting the integral quantity of the transactionsMeanwhile, the data volume which does not meet the condition is not considered, and the association relation can not be completely determined to be established only by the confidence, so that the association relation is screened by introducing the correlation coefficient; assume association relationshipThe method for calculating the correlation coefficient is as follows

Lift represents the correlation coefficient from A to B, positive if lift is greater than 1, or negative if lift is less than 1, and A and B are considered independent of each other if lift is equal to one; in the schemes herein, both positive and negative correlations are of interest, but independently we do not;

step 5.3: to determine whether this relationship is balanced, an imbalance factor IR is used, calculated by

The numerator is the difference between the supporters of the two parts of the association, the denominator is a transaction containing A and B, and if the IR is smaller, the more stable the association is, namely the association is more likely to occur.

The specific steps of the step 6 are as follows: through the association rule screened in the step 5, we can consider that certain links exist between the data item values in the association relationships, and the data may be leaked or caused, so that in order to protect the data, the data with the association relationships is not accessed randomly, and a security feature is defined, wherein the security feature represents whether the association data is allowed to be accessed or not; assume that the association relationship obtained through screening and evaluation is that

v _p ,v _q ∈V

v _p ,v _q For data item values in frequent item V, V _p ,v _q The corresponding data item, i.e. the game data table is listed as c _p ,c _q A security feature element E1 { filtered: c is defined _p ，operator:＝，value:v _p },E2:{filed:c _q ，operator:＝，value:v _q }；

If element v in something, i.e. in a game data table row _p ,v _q If the correlation exists between the E and the V, the transaction label E1: { filtered: c is given to the E _p ，operator:＝，value:v _p },E2:{filed:c _q ，operator:＝，value:v _q -a }; if the user only holds one of the tag access rights of E1 and E2, the corresponding data in the association relationship can be accessed, and only when the user holds the corresponding rights of E1 and E2, all the data of the object can be accessed.

The beneficial effects of the invention are as follows:

the invention provides a method for generating a security policy by mining data features, which relates data feature extraction to authority judgment basis and adopts an access control method for performing authority configuration on access features of a game data table to perform row or column access control. The method comprises the steps of carrying out data processing on a game data table to mine frequent items, obtaining a strong association rule, analyzing the relation of data 'values' from data, mining the organization structure of the 'values', and completing an access control method for the rows and columns of the game data table by adopting rule configuration, so that a brand-new security policy is generated. In the authority control, starting from the data characteristics, security policy generation is carried out on the analysis result of the data, and fine-grained access control of game data is more reasonably carried out.

Drawings

FIG. 1 is a general flow chart of a security policy generation method for data access events in gaming services according to the present invention;

FIG. 2 is a flow chart of a method for selecting minimum support in a security policy generation method for data access events in game services according to the present invention;

FIG. 3 is a flowchart of an optimization method of the FP-Growth algorithm in the security policy generation method for data access events in game business according to the present invention when processing a large-scale game data set;

FIG. 4 is a flow chart of a security policy generation method for data access events in gaming services according to the present invention, where frequent items are converted into column level security features;

FIG. 5 is a flow chart of a confidence level selection mode and association relationship conversion into a row-level security feature in a security policy generation method for data access events in game services according to the present invention;

fig. 6 is a diagram showing a distribution of confidence in the association (S) in a security policy generation method for data access events in game services.

Detailed Description

The invention will be described in detail below with reference to the drawings and the detailed description.

Referring to FIG. 1, the present invention is based on data mining to generate security policies. Data mining is performed using FP-Growth, and related concepts are defined for interpretation before FP-Tree construction. The analysis object, namely the data source, is a database table, the data items are game data table columns, the data item value is a certain piece of data of the data items, and meaningful data items are screened out to be used as input of FP-Growth. And then frequent line mining is carried out by using the FP-Tree to generate frequent items, then association rules among the data are acquired based on the frequent items, and finally the obtained frequent items and the association rules among the data are used for formulating permission configuration to generate a new security policy. The method is implemented according to the following steps:

step 1, data processing, namely, cleaning data of any game data table to be analyzed and selecting a proper minimum support;

the specific steps in the step 1 comprise the following steps:

step 1.1: processing source data;

the relevant concepts are defined by interpretation. The analysis object, namely the data source, is a database table, the data item is a game data table list, the data item value is a specific representation of a certain piece of data of the data item, and the transaction is understood as a row of data according to the database.

If the occurrence of a data item value is near zero or none of the data item values are repeated, the data item is deleted, including but not limited to a primary key. If a large number of null values exist in a certain data item and the ratio exceeds 90%, the data item is judged to be a nonsensical data item.

Step 1.2: mapping of data values to data items

In order to complete the transformation of rule configuration, the mapping relation between the data item value and the data item needs to be clear in the process of acquiring the association relation of the data item value, so that the corresponding data item identifier is added to the data item value in the data source, and the step needs to be carried out before encoding.

Step 1.3: selection of minimum support

The FP-Growth algorithm selected herein performs frequent term mining, in which the component is required to possess the FP-Tree with the minimum support, thus involving the selection of the minimum support. First, the frequency of the data item values in each data item is calculated and averaged to be denoted as A _I I represents a certain data item. Find A _I At the position ofThe ratio of n is the total number of data items, and the proportion M of the data quantity in the data items to the whole data set is calculated _I By M _I Is A _I The multiplication of the corresponding weights is denoted as G _I . Will G _I Sorting, namely selecting a median as a demarcation point. If the data quantity at high frequency is large, all data are selected, otherwise, low-frequency data are selected. Taking A for each data item of the selected data _I As the support for frequent item mining. The purpose of selecting the minimum support by adopting the method is to acquire more frequent items, refer to fig. 2.

Step 2, data mining;

step 2.1: acquiring data item values existing in data sources and corresponding occurrence frequencies thereof

In the construction of the FP-Tree, the data source is first required to be scanned, all the data item values of the whole table are traversed, the data item values existing in the data source and the occurrence frequencies corresponding to the data item values are obtained, and the data item values are arranged in a descending order by taking the frequencies as a reference to be used as a data set (D) of a single frequent item.

2.2 construction of FP-Tree

And constructing and forming an FP-Tree according to the data item value (V) and the frequency (F) of the data item value (V) in the data set (D) as input.

The scheme FP-Growth optimization consists in performing parallel operations on Spark. Firstly, original data is read, and a data set is randomly divided, so that the data set is divided into different partitions. And then carrying out data mining on each partition by using the FP-Growth, and finally merging the mining results in each partition to obtain the mining result of the whole data set and outputting the mining result with reference to FIG. 3.

2.3 frequent item acquisition

The constructed FP-Tree can display all associated data item values, under the condition that the frequency of a single data item value meets the minimum support, the associated data item value hidden in the FP-Tree needs to be mined, and the frequency of occurrence is higher than a set threshold value according to the requirement of the minimum support. Finally, the data item value association is obtained and is used as the input of security feature configuration conversion.

Step 3, acquiring the frequency of the data items in the frequent item collection as the configuration input of the game data list authority;

step 3.1: acquiring frequent item sets according to minimum support

Step 3.2: acquiring frequency of data items in frequent item collection item

The data after data processing and frequent item mining will flow to the authority configuration conversion, because the algorithm to be selected is the analysis of the association relation of the data items, if the algorithm output needs to be converted into the authority rule configuration, the most obvious authority division relation means the "frequency of occurrence of the data items", and the security feature is the "frequency division of use of the data items" refer to fig. 4.

According to the representable authority meaning, the algorithm output needs to be converted into the following to obtain the security configuration. The encoded data item value first needs to be decoded to recover its original representation type. The data item value (V) is mapped to the data item (C) before encoding the data item value, so that the decoded actual data item value is also mapped to the data item. Replacing the data item value in each frequent item with the corresponding data item, removing the repeated data items, and calculating the support rate of each frequent item, wherein the obtained data can be expressed as { { C }, support (N) }, denoted as S (N), N represents the current frequent item, the total of the data items in the current frequent item is set as I, the total of the frequent items is N

Calculating the occurrence frequency of each data item in the frequent item according to the support rate of the frequent item

Step 4: r in step 3 _c Wherein c represents a certain data item, namely, the data item c in the nth frequent item, and the calculation result represents the frequency of the data item associated in a certain game data table. According to expert experience in [0,1]]Four frequency intervals are divided over the interval [0,0.25 ]]Low frequency [025,0.5 ]]Lower frequency [0.5,0.75]Higher frequency [0.75,1 ]]High frequency, denoted R1, R2, R3, R4, respectively; frequency of occurrence R of data items _c Corresponding to four intervals.

According to the description of the security feature, the security feature represents "frequency division of data item use", and comprises four security feature elements, wherein the meaning of each security feature element represents the data item in a frequency interval, and the security feature element is defined as F1: high frequency, F2: higher frequency, F3: lower frequency, F4: low frequency. The security feature configuration obtained according to the difference in the selected security feature element relationship type is also different. The data items which occur frequently have high priority, which means that the data items which occur frequently occupy more important positions in data mining, so that the protection level is correspondingly improved.

The specific steps of the step 5 are as follows:

step 5.1: frequent items acquired through an excavation algorithm are converted into permission rule configuration, an action object is made to be a game data table row, an association relationship is required to be found from each frequent item, and after the association relationship is evaluated, row permission configuration, namely, conditions for limiting data rows, are acquired from the strong association relationship.

Assume a frequent term v= { V ₁ ,v ₂ ,…,v _m And the frequent item support is support (V), and the frequent item is called m-item set in the association relation. The association relationship in the frequent item is defined as

Representing that any data item value is taken from the frequent item to form two sets left: { v _p … and right { v } _q …, and the data item values that make up the sets are different, i.e., the two sets are not equal. The association is interpreted as the occurrence of all data item values in the left set and also the occurrence of all data item values in the right set for one transaction. And the evaluation criterion describing the association relationship is defined as confidence, and the calculation formula of the confidence is that

I.e. by the ratio of the support of the frequent item to the support of the data item values only present in the right set. In frequent item mining, it is considered that if a set containing a plurality of data item values satisfies a minimum support, any combination of the data item values must also satisfy the minimum support, and the frequent item and its support also exist in the frequent item mining result. The support of the left item set can be directly obtained from frequent item results. If the frequent item only contains one value, the frequent item has no association relationship.

Step 5.2: in all the association relations, if the total data item values in the right set are more than the total data item values in the left set, namely the confidence is more than 1, the association relation is regarded as the inverse association of the association relation. Since this text is a mining for a large amount of data, 0.1 of data is randomly chosen here as a sample to calculate the overall confidence. Verifying the distribution rule of the confidence coefficient of the association relation in the [0,1] interval, if the normal distribution is met, acquiring a confidence interval of 95% of the confidence coefficient according to priori knowledge, removing a larger value and a smaller value to reduce useless data, and taking the association relation in the confidence interval as the input of the next operation; if the normal distribution is not met, the association relation of the confidence coefficient in the [0,1] interval is used as the input of the next operation, and the step is mainly used for screening out a part of rules with weak association so as to reduce the workload of evaluating parameter calculation, and referring to FIG. 5.

In the association evaluation system, the confidence coefficient is introduced because the integral quantity of the transaction is ignored, the data quantity of the unsatisfied condition is not considered, and the association cannot be completely determined by the confidence coefficient, so that the association is screened. Assume association relationshipThe method for calculating the correlation coefficient is as follows

Lift represents the correlation coefficient from A to B, positive if lift is greater than 1, or negative if lift is less than 1, and A and B are considered independent of each other if lift is equal to one. In the schemes herein. Both positive and negative correlations are of interest. But are not required independently.

Step 5.3: to determine whether this relationship is balanced, an unbalance factor IR is introduced, calculated by

For the associations between the data item values that appear, we can consider that there are some associations between the data item values in these associations, possibly or causing leakage of the data, so in order to protect the data, the data with the associations is not accessed at will, a security feature is defined, which indicates whether the associated data is allowed to be accessed. Assume that the association relationship obtained through screening and evaluation is that

v _p ,v _q ∈V

v _p ,v _q For data item values in frequent item V, V _p ,v _q The corresponding data item, i.e. the game data table is listed as c _p ,c _q A security feature element E1 { filtered: c is defined _p ，operator:＝，value:v _p },E2:{filed:c _q ，operator:＝，value:v _q }。

If element v in something, i.e. in a game data table row _p ,v _q If the correlation exists between the E and the V, the transaction label E1: { filtered: c is given to the E _p ，operator:＝，value:v _p },E2:{filed:c _q ，operator:＝，value:v _q }. If the user only holds one of the tag access rights of E1 and E2, the corresponding data in the association relationship can be accessed, and only when the user holds the corresponding rights of E1 and E2, all the data of the object can be accessed.

Examples

The embodiment of the invention uses a game data table with the size of 17008 x 18, namely, the game data table comprises 17008 transactions (game data table rows) and 18 data items (game data table columns). According to the definition with unique identification for the data items in the data processing, the data items with unique identification in the game data table are filtered out, and 7 items are added in total. Among the remaining data items, the record with more than 90% of all data item values in the same data item is filtered again to be null value, and total 4 items are counted. By definition, this 11-item game data table list has no reference value in frequent item mining, and thus the data item values in the remaining 7-item game data table list are used as data sources to be analyzed.

Before frequent item mining, the minimum support degree is required to be selected, and according to the definition of parameter selection in data processing, the average value of the frequencies of different data item values corresponding to each data item is calculated, so that the average value is used as the input of the minimum support degree selection. Average frequency of data item values in data items obtained from a data source.

The number of data items at low frequencies is determined based on the frequency at which the average frequency of each data item occurs in all data items and the product of the frequency and the corresponding weight, so that the minimum support will select the average of the average frequencies of the data items in the low frequency range. The data items in the low frequency interval and the corresponding average frequency are { Languages:4, genres:16,Original Release:5,Current Version Release:6}, thus support degree

And according to the minimum support, frequent items are acquired for the data item value, an FP-Tree is constructed, and all the frequent items acquired from the frequent Tree.

The obtained frequent items, when converted into the security configuration for the game data list, need to calculate the ratio of the frequent items to the support, and calculate the frequency of occurrence of each data item by replacing the data item value in the frequent item with the data item (game data list).

And (3) corresponding each data item to the divided frequency interval according to the frequency, so as to form different data item groups, and representing the use frequency of the data items. In the data source, "Current Version Release", "Original Release Date" are data items corresponding to "low frequency", "Price", "Languages", "Genres", "Age Rating" are data items corresponding to "high frequency" and "Primary Genre" are data items corresponding to "high frequency". Furthermore, the data columns filtered out at the data source may be unprotected, i.e., the corresponding security feature element level is null, and accessible to the user.

Then, when the obtained frequent items are converted into the security configuration aiming at the game data table row, finding out all association relations (R) in the frequent items, determining data item values with strong association in the association relations according to the calculated confidence coefficient (C), the verification distribution selection confidence interval, the verification KULC coefficient and the IR imbalance factor, and then corresponding the data item values and the data items into specific security feature configuration.

In the association relation (S), the distribution of the confidence coefficient accords with the normal distribution as shown in fig. 6, so that a confidence interval having a confidence coefficient of 95% is selected as [0.553495,0.566024]. And selecting the association relation in the interval.

According to the fact that the higher the KULC coefficient is, the higher the possibility that the association relation exists is, the lower the IR factor is, the more stable the association relation is, the KULC and IR of the association relation with confidence in a confidence interval are distributed and displayed, and the association relation with IR=0 is selected as input of security feature conversion.

The security feature elements corresponding to the finally obtained stable strong association relation describing whether the associated data is allowed to be accessed are { film: age Rating, operator =, value:4}, { film: original Release Date, operator =, value:2/09/2016}, { film: extended, operator =, value: EN }, { film: current Version Release Date, operator =, value:2/09/2016}, { film: original Release Date, operator =, value:20/10/2016}, value: current Version Release Date, operator =, value:20/10/2016}, respectively.

The invention provides a security policy generation method based on data mining, which relates data characteristic extraction to authority judgment basis and adopts an access control method for performing authority configuration on access characteristics of a game data table to perform row or column access control. The frequent items are extracted by carrying out data processing on the game data table, the relation of the data 'value' is analyzed from the data, the organization structure of the 'value' is mined, and a rule configuration is adopted to complete the access control method to the rows and the columns of the game data table, so that a brand new security policy is generated, and the finer protection of the game data table is completed.

Claims

1. A security policy generation method aiming at data access events in game business is characterized in that a source game data table is preprocessed, and then data are mined through an optimized FP-Growth algorithm to obtain frequent items and association relations; configuring game data list and line authority by using the obtained frequency characteristics and association relation; the data feature library is filled so as to achieve the aim of refining and protecting data, and finally, authority judgment is carried out when a user accesses according to a definition model and authority judgment rules;

the method specifically comprises the following steps:

the specific steps of the step 1 are as follows:

step 1.1: processing source data;

performing interpretation definition on the corresponding relation of the database and the data mining; the analysis object, namely the data source, is a database table, the data items are game data table columns, the data item values are specific representations of a certain piece of data of the data items, and things are understood as a row of data according to the database;

step 1.2: mapping of data values to data items

step 1.3: selection of minimum support

The selected FP-Growth algorithm is used for carrying out frequent item mining, and in the algorithm, a component is required to have a FP-Tree with the minimum support degree, and the selection of the minimum support degree is involved; first, the frequency of the data item values in each data item is calculated and averaged to be denoted as A _I I represents a certain data item; find A _I At the position ofThe ratio of n is the data itemThe total number is calculated, and the proportion M of the data quantity in the data item to the whole data set is calculated _I By M _I Is A _I The multiplication of the corresponding weights is denoted as G _I The method comprises the steps of carrying out a first treatment on the surface of the Will G _I Sorting, namely selecting a median as a demarcation point; if the data quantity at the high frequency is large, all the data are selected, otherwise, the low-frequency data are selected; taking A for each data item of the selected data _I As the support for frequent item mining; the method is adopted to select the minimum support degree to acquire more frequent items;

step 2, mining game data based on the optimized FP-Growth;

the specific steps of the step 2 are as follows:

the selected FP-Growth algorithm is used for carrying out frequent item mining, the algorithm can fail to store large-scale FP-Tree when processing large-scale data sets, meanwhile, the mining efficiency is extremely low due to the increase of recursion times in the process of mining the FP-Tree, and the scheme uses the FP-Growth based on optimization for carrying out data mining;

the scheme FP-Growth optimization is that parallel operation is carried out on Spark; firstly, original data is read, a data set is randomly divided, and the data set is used for dividing into different partitions; then, carrying out data mining on each partition by using an FP-Growth, and finally merging the mining results in each partition to obtain and output mining results of the whole data set;

the specific steps of the step 3 are as follows:

step 3.1: acquiring frequent item sets according to minimum support

The constructed FP-Tree can display all associated data item values, under the condition that the frequency of a single data item value meets the minimum support, the associated data item value hidden in the FP-Tree needs to be mined, and the frequent item needs to be ensured to meet the requirement of the minimum support, namely the occurrence frequency is higher than a set threshold; finally, obtaining a data item value association as an input for security feature configuration conversion;

The data after data processing and frequent item mining are converted into flow authority configuration, and as the algorithm used is selected to analyze the association relation of the data items, if the algorithm output is required to be converted into authority rule configuration, the most obvious authority division relation means "data item occurrence frequency", and the security feature is "data item use frequency division";

according to the representable authority meaning, the algorithm output is required to be converted into the following to obtain the security configuration; firstly, decoding the coded data item value to restore the original representation type; the data item value V corresponds to the data item C before encoding the data item value, so that the decoded real data item value is mapped with the data item as well; replacing the data item value in each frequent item with the corresponding data item, removing the repeated data items, and obtaining data which can be expressed as { { C }, support (N) }, and is marked as S (N), wherein N represents the current frequent item, the total of the data items in the current frequent item is set as I, the total of the frequent items is N, the support rate of each frequent item is calculated, and the occurrence frequency R of each data item in the frequent item is calculated according to the support rate of the frequent item _c ；

according to the description of the security feature, the security feature represents "frequency division of data item use", and comprises four security feature elements, wherein the meaning of each security feature element represents the data item in a frequency interval, and the security feature element is defined as F1: high frequency, F2: higher frequency, F3: lower frequency, F4: low frequency; the data items which occur frequently have high priority, which means that the data items which occur frequently occupy more important positions in data mining, so that the protected level is correspondingly improved; the relation of the security feature elements is as follows: { F1} - { F1, F2, F3, F4}, { F2} - { F2, F3, F4}, { F3} - { F3, F4}, { F4} - { F4}, { } -, and } "means" accessible "; i.e. the user has the highest authority F1 to access the column in the game data table;

the specific steps of the step 5 are as follows:

Representing that any data item value is taken from the frequent item to form two sets left: { v _p … and right { v } _q …, and the data item values that make up the sets are different, i.e., the two sets are not equal; the association relationship is interpreted as that for one thing, all data item values in the left set appear, and all data item values in the right set also appear; and the evaluation criterion describing the association relationship is defined as confidence, and the calculation formula of the confidence is that

Namely, the support degree of the frequent item is compared with the support degree of the data item value only in the right set, and in the process of frequent item mining, if the set containing a plurality of data item values meets the minimum support degree, any combination of the data item values is considered to meet the minimum support degree, and the frequent item and the support degree thereof are also in the frequent item mining result; therefore, the support degree of the left item set can be directly obtained from frequent item results;

in the association evaluation system, the confidence coefficient is used for neglecting the number of the whole things, meanwhile, the data volume of unsatisfied conditions is not considered, and the association cannot be completely determined by the confidence coefficient, so that the association coefficient is introduced, and the association is screened; assume association relationship The method for calculating the correlation coefficient is as follows

Lift represents the correlation coefficient from A to B, positive if lift is greater than 1, or negative if lift is less than 1, and A and B are considered independent of each other if lift is equal to 1;

The numerator is the difference of the supporting degrees of the two parts of the association relationship, the denominator is a thing comprising A and B, if the IR is smaller, the association relationship is more stable, namely the association relationship is more likely to appear;

step 6, obtaining authority configuration, and acting on the game data list according to the authority configuration finally obtained in the step 5;

If something is in the middle of a tripElement v in a row of a play data table _p ,v _q If the correlation exists between the E and V, the object label E1: { filtered: c is given to the object label _p ，operator:＝，value:v _p },E2:{filed:c _q ，operator:＝，value:v _q -a }; if the user only holds one of the tag access rights of E1 and E2, the corresponding data in the association relationship can be accessed, and only when the user holds the corresponding rights of E1 and E2, all the data of the object can be accessed.