CN110135167A

CN110135167A - A kind of edge calculations terminal security grade appraisal procedure of random forest

Info

Publication number: CN110135167A
Application number: CN201910399303.8A
Authority: CN
Inventors: 雷文鑫; 文红; 侯文静; 刘文洁
Original assignee: University of Electronic Science and Technology of China; Research Institute of Southern Power Grid Co Ltd
Current assignee: University of Electronic Science and Technology of China; Research Institute of Southern Power Grid Co Ltd
Priority date: 2019-05-14
Filing date: 2019-05-14
Publication date: 2019-08-16
Anticipated expiration: 2039-05-14
Also published as: CN110135167B

Abstract

The invention discloses a kind of edge calculations lateral terminal security level appraisal procedures of random forest, comprising the following steps: the test result of S1. setting terminal safety test individual event and each test individual event；S2. the intelligent terminal of access is tested, S3. determines the corresponding relationship of intelligent terminal security level and individual event test result collection；S4. the corresponding security level of each edge termination is calculated, data set is obtained；S5. data set is divided into training set and test set；S6. training set input random forest is trained, obtains mature sorter model；S7. it in random forest grader model test set input training obtained, obtains test result and step S4 security level compares to obtain classifier up to standard；S8. the terminal security grade newly accessed is assessed using sorter model up to standard.The present invention by the data safety demand of edge termination press grade classification, according to face security risk, system complexity, can pass through the objective standard of quantization carry out edge calculations lateral terminal security evaluation.

Description

A kind of edge calculations terminal security grade appraisal procedure of random forest

Technical field

The present invention relates to edge calculations terminal security grade appraisal procedures, more particularly to a kind of edge meter of random forest Calculate terminal security grade appraisal procedure.

Background technique

With rapid development and extensive use that all things on earth interconnects, intelligent terminal will become all things on earth and interconnect key node, and produce Raw magnanimity real time data.According to IDC statistical data, it there will be over 50,000,000,000 terminals and equipment access network to the year two thousand twenty, wherein Data more than 50% are needed in the analysis of network edge side, processing and storage.The mass data that a large amount of edge devices generate needs Quicker connection, more effective data processing, while to have better data protection.Internet of Things are accessed in face of a large amount of heterogeneous terminals Net, edge calculations side are also faced with bigger data safety threat and hidden danger, and there are some not trusted terminals and mobile sides The illegal access problem of edge application developer.Therefore, it is necessary to the data safety demands to edge computing terminal to press grade classification, Terminal, fringe node establish new secure access mechanism between edge calculations service, with guarantee the confidentialities of data, integrality, User information privacy.It under this background, tests and assesses for the security performance of edge calculations terminal, first in edge calculations side Individual event assessment is carried out to terminal security, according to the test result scientific algorithm of each test individual event, carries out drawing for terminal security grade Point, it realizes the safe handling of different security level demands, it is safe and effective to reach intelligent terminal.

The computing resource of edge side is supported, is made it possible to and is carried out terminal security performance using more complicated calculation method It assesses, objective, effective and accurate division of realization terminal security grade, terminal and data demand for security are pressed in this patent proposition etc. Grade divides, and according to security risk, the system complexity etc. faced, carries out edge calculations lateral terminal peace by the objective standard of quantization The evaluation and test of congruent grade.

Random forest (Random forest) is the machine learning algorithm proposed by LeoBreiman in 2001, is mainly answered For returning and classifying.Its basic thought is to utilize bootstrap (bootstrap) resampling technique and node random splitting skill Art constructs more decision trees, has from original training sample collection N and repeats to randomly select k sample and generate new training sample with putting back to Then this set generates k classification tree according to self-service sample set and forms random forest, votes to obtain new data by classification tree Classification results.

Based on the support of edge calculations ability, realize that the data safety demand of intelligent terminal is pressed under random forests algorithm Grade divides, for realizing that the largest optimization of edge calculations security of system energy is of great significance.

Summary of the invention

It is an object of the invention to overcome the deficiencies of the prior art and provide a kind of edge calculations terminal securities of random forest Grade appraisal procedure obtains test result according to the test of each individual event security performance of intelligent terminal, and uses random forests algorithm The safety status classification for carrying out intelligent terminal, improves the accuracy of safety status classification.

The purpose of the present invention is achieved through the following technical solutions: a kind of edge calculations lateral terminal peace of random forest Congruent grade appraisal procedure, comprising the following steps:

S1. in edge calculations side Build Security test platform, k test individual event of setting terminal, each test individual event Test result is 0 or 1, wherein 0 indicates not pass through, 1 indicates to pass through；

S2. on the safe test platform of edge side, m+n platform intelligent terminal is tested according to k test individual event, is obtained To the security performance individual event test result collection of each intelligent terminal, wherein the security performance individual event test of i-th intelligent terminal Result set are as follows:

X_i=[x_i1,x_i2,...,x_ik], i=1,2 ..., m+n；

Wherein, x_ijFor j-th of test individual event score of i-th intelligent terminal, j=1,2 ..., k；By all intelligent terminals Individual event test result with (m+n) * k tie up matrix X indicate:

S3. the corresponding relationship of intelligent terminal security level and individual event test result collection is determined；

S4. according to the corresponding relationship in step S3, each X is calculated_i=[x_i1,x_i2,...,x_ik] corresponding security level y_i, Data set D={ (X is obtained after calculating₁,y₁),(X₂,y₂),...,(X_m+n,y_m+n)}；

S5. data set D is divided, takes preceding m of data set D as training set T, latter n is test set S:

Training set T={ (X₁,y₁),(X₂,y₂),...,(X_m,y_m), the ratio for accounting for data set is

Test set S={ (X_m+1,y_m+1),(X_m+2,y_m+2),...,(X_m+n,y_m+n), the ratio for accounting for data set is

Preferably, the size of training set T and test set S are adjustable, and data set is bigger, and training set data is more, training Effect is better, more accurate to the classification of test set；

S6. by training set T={ (X₁,y₁),(X₂,y₂),...,(X_m,y_m) it is used as sample set, input random forest point It is trained in class device model, obtains mature sorter model；

S7. after the completion of training, by test set S={ (X_m+1,y_m+1),(X_m+2,y_m+2),...,(X_m+n,y_m+n) input training In obtained random forest grader model, obtains test result and step S4 security level compares to obtain classifier up to standard；

S8. the edge calculations side to be measured intelligent terminal newly accessed access safe test platform is obtained into test result, inputted It is assessed in sorter model up to standard, obtains corresponding security level.

Further, the step S3 includes following sub-step:

It S31. is y class by the safety status classification of intelligent terminal；

S32. the test individual event total score of i-th intelligent terminal is enabled0≤sum_i≤k；

S33. withSafety status classification range is determined to be spaced, whenWhen, the safety of i-th intelligent terminal Grade is 0,When security level be 1,When security level be 2, and so on,When security level be t, t=1,2 ..., y-1；sum_iThe bigger security performance for indicating intelligent terminal is more It is good.

Further, the step S6 includes following sub-step:

S61. selection random forests algorithm constructs random forest grader model, it belongs to Bagging type, passes through combination Multiple Weak Classifiers, final result is by ballot or takes mean value, so that the result of overall model accuracy with higher and general Change performance；

S62. by training set T={ (X₁,y₁),(X₂,y₂),...,(X_m,y_m) it is divided into minority class sample set T_minAnd majority Class sample set T_max, whereinAnd T_min∩T_max={ T }；

S63. 2/3rds sample points of random extraction are concentrated from original sample, obtains training set T ', observation T's ' lacks Several classes of data set T_min', most class data set T_max′；

S64. it calculatesValue, provides conditionAnd

S65. if training set T ' meets the condition in S64, the training set for extracting and obtaining is saved, if training set T ' is discontented Condition in sufficient S64 then gives up extraction and obtains training set；

S66. step S63~S65 is repeated, until obtaining N_treeA training set for meeting condition, wherein N_treeFor quasi- construction Decision tree quantity, finally obtained N_treeA training set isWherein i=1,2 ..., N_tree；

S67. in i=1,2 ..., N_treeWhen, utilize training set T_i, one CART decision tree H of training_i, according to Gini index Choose optimal characteristics.

Wherein, the step S62 includes following sub-step:

S621. training set T={ (X is counted₁,y₁),(X₂,y₂),...,(X_m,y_m) in each security level sample number Mesh；

S622. for each security level, if its corresponding number of samples is greater than preset threshold H, by the safety etc. Most class sample set T are added in all samples of grade_max；If its corresponding number of samples is less than or equal to preset threshold H, by the peace Minority class sample set T is added in all samples of congruent grade_min。

Wherein, the step S67 includes following sub-step:

S671. for training set T_i, gini index Gini is calculated,In the smaller expression set of Gini index Selected sample is smaller by the probability of misclassification, that is to say, that the purity of set is higher, conversely, set is more impure；Wherein P_kTable Show the frequency that k-th of classification occurs in classification results；

S672. for the training set T containing N number of sample_i, according to the ith attribute value of attribute A, by data set T_iIt is divided into Two parts calculate Gain_GINI,Wherein n₁、n₂For sample set T_i1、T_i2Number of samples；

S673. for attribute A, the Gain_GINI that data set is divided into after two parts by any attribute value is calculated separately, Minimum value therein is chosen, optimal two offshoot program obtained as attribute A:

S674. for sample set T_i, optimal two offshoot program of all properties is calculated, minimum value therein is chosen, as sample This collection T_iOptimal two offshoot program:

Further, the step S7 includes following sub-step:

S71. test set S={ (X_m+1,y_m+1),(X_m+2,y_m+2),...,(X_m+n,y_m+n) it is sample to be tested；

S72. for i=1,2 ..., N_tree, the initial ballot weight of decision tree is 1, enables R_i=T_imax′/T_imin′；

The ballot weight for updating every decision tree is

S73. for j=m+1, m+2 ..., m+n, i=1,2 ..., N_tree, input sample to be tested X_j, by the decision of S66 Set H_iExport H_i(X_j), the final classification of prediction isAs test sample X_jCorresponding peace Congruent grade；

S74. setting judgement classifier error threshold value θ, 0≤θ≤1.

IfM+1≤j≤m+n, then classifier meets predetermined threshold value, is classification up to standard Device, the return step S5 re -training if being unsatisfactory for, wherein

Further, the step S8 includes following sub-step:

S81. the edge calculations side to be measured intelligent terminal newly accessed access safe test platform k test individual events are obtained to survey Test result X=[x₁,x₂,...,x_k]；

S82. test result is inputted in sorter model up to standard,I=1, 2,...,N_tree.F (X) is corresponding security level.

The beneficial effects of the present invention are: test of (1) present invention according to each individual event security performance of edge calculations intelligent terminal, Objective and accurate division to intelligent terminal security level is realized using random forest sorting algorithm, realizes edge calculations system safety The largest optimization of performance；(2) present invention constructs disaggregated model, the introducing of randomness, so that random gloomy using random forests algorithm Woods is not easy over-fitting, there is good noise resisting ability, and training speed is fast, available variable grade classification results, obtain compared with Accurately to quantify objective standard；(3) present invention carries out safety test to different Edge intelligence terminal devices, and with every end Holding test result data collection is feedback, to realize the training of classifier and the division of security level, improves safety status classification As a result confidence level.

Detailed description of the invention

Fig. 1 is flow chart of the method for the present invention；

Fig. 2 is a kind of flow chart of the edge calculations terminal security grade appraisal procedure of random forest in embodiment.

Specific embodiment

Technical solution of the present invention is described in further detail with reference to the accompanying drawing, but protection scope of the present invention is not limited to It is as described below.

As shown in Figure 1, a kind of edge calculations lateral terminal security level appraisal procedure of random forest, comprising the following steps:

X_i=[x_i1,x_i2,...,x_ik], i=1,2 ..., m+n；

In embodiments herein, the size of training set T and test set S are adjustable, and data set is bigger, training set number According to more, training effect is better, more accurate to the classification of test set；

Further, the step S3 includes following sub-step:

Further, the step S6 includes following sub-step:

S64. it calculatesValue, provides conditionAnd

Wherein, the step S62 includes following sub-step:

Wherein, the step S66 includes following sub-step:

Further, the step S7 includes following sub-step:

The ballot weight for updating every decision tree is

S74. setting judgement classifier error threshold value θ, 0≤θ≤1.

Further, the step S8 includes following sub-step:

As shown in Fig. 2, using trained random forest, inputting edge termination to be measured in embodiments herein and obtaining The process of edge calculations terminal security grade is as follows:

1. 10 edge of table intelligent terminals are first accessed safe test platform in edge calculations side, design terminal tests individual event It is 22, the individual event test result for obtaining every edge of table intelligent terminal is X_i=[x₁,x₂,...,x₂₂], i=1,2 ..., 10, institute There is the individual event test result of Edge intelligence terminal to integrate and tie up matrix X as 10*22, wherein x_ij=0 or x_ij=1.

2. determining the corresponding relationship of edge termination security level and individual event test result collection.

1) security level of Edge intelligence terminal is divided into 0,1,2,3 four class by this assessment；

2) the test individual event total score of i-th intelligent terminal is enabled0≤sum_i≤22；

3) safety status classification is determined according to sum value, it is 0, when 6≤sum≤10 that security level is corresponded to as 0≤sum≤5 Security level is 1, and security level is 2 when 11≤sum≤15, and security level is 3 when 16≤sum≤22, security level higher generation The security performance of meter terminal is better.Shown in security level corresponding relationship following table:

Total score sum	0~5	6~10	11~15	16~22
					Security level Y_i	0	1	2	3
Safe coefficient	It is very poor	Difference	Generally	Safety

3. calculating each X_i=[x₁,x₂,...,x₂₂] security level y_i, data set is obtained after calculating:

D={ (X₁,y₁),(X₂,y₂),...,(X₁₀,y₁₀)}。

4. using Monte carlo algorithm since data set is not big enough and expanding data set D in proportion.

5. data set D is divided into training set T={ (X₁,y₁),(X₂,y₂),...,(X_m,y_m) and test set S= {(X_m+1,y_m+1),(X_m+2,y_m+2),...,(X_m+n,y_m+n), test set is as sample to be tested.

6. concentrating 2/3rds sample points of random extraction from original sample, training set T ' is obtained.Observe the minority of T ' Class data set T_min', most class data set T_max′。

7. calculatingValue: if training set T ' satisfactionAndThen repeat step 6, repeat N_treeIt is secondary, N_treeFor quasi- construction decision tree quantity.Training set T after obtaining stochastical sampling_i, i=1,2 ..., N_tree。

8. couple i=1,2 ..., N_tree, use training set T_iGenerate the tree H of a not beta pruning_i.It is random from 22 features M feature is selected, on each node from M feature according to Gini selecting index optimal characteristics, division is grown into most until tree Greatly.

9. for i=1,2 ..., N_tree, the initial ballot weight of decision tree is 1, enables R_i=T_imax′/T_imin', update every The ballot weight of decision tree is

10. for j=m+1, m+2 ..., m+n, i=1,2 ..., N_tree, input sample to be tested X_j, by decision tree H_iIt is defeated H out_i(X_j), the test sample classification of prediction isAs corresponding safety of test sample etc. Grade.

11. setting judgement classifier error threshold value θ=0.98.M+1≤j≤m+n, point Class device meets predetermined threshold value, is classifier up to standard.

12. the edge calculations side to be measured intelligent terminal newly accessed access safe test platform is obtained 22 test individual events to survey Test result X=[x₁,x₂,...,x₂₂]。

13. by test result X=[x₁,x₂,...,x₂₂] input in sorter model up to standard,I=1,2 ..., N_tree.F (X) is the corresponding safety of edge calculations side to be measured intelligent terminal Grade.

In embodiments herein, step S6 is in addition to using machine learning random forests algorithm building disaggregated model, also It can be using k- nearest neighbor algorithm, NB Algorithm, SVM algorithm and decision Tree algorithms or convolutional neural networks algorithm, preceding It presents neural network algorithm and radial base neural net algorithm constructs corresponding neural network, and neural network is instructed using training set Practice, obtains corresponding maturity model.

To sum up, the present invention is based on the edge calculations terminals that machine learning algorithm grade separation model proposes a kind of random forest Security level appraisal procedure is classified using random forest and is calculated according to the test of each individual event security performance of edge calculations intelligent terminal Method realizes the objective and accurate division to intelligent terminal security level, realizes the largest optimization of edge calculations security of system energy；Benefit Disaggregated model is constructed with random forests algorithm, the introducing of randomness has good anti-noise so that random forest is not easy over-fitting Sound ability, training speed is fast, available variable grade classification results, obtains accurately quantization objective standard；To difference Edge intelligence terminal device carry out safety test, and with every terminal test result data collection be feedback, to realize classifier Training and security level division, improve the confidence level of safety status classification result；Meanwhile the present invention is to acquisition training set Double sampling process improved, by increase constraint condition sampling results are screened, can guarantee obtain it is random Training set can preferably represent original training set；And the process of forest is formed for combination decision tree, the present invention passes through change The ballot weight of decision tree can effectively reduce the defect of random forests algorithm itself, unbalanced especially for data distribution Scene process effect has significantly improved, and the few treatment effect of data volume connects preferably.

The above is a preferred embodiment of the present invention, it should be understood that the present invention is not limited to shape described herein Formula should not be viewed as excluding other embodiments, and can be used for other combinations, modification and environment, and can be in this paper institute It states in contemplated scope, modifications can be made through the above teachings or related fields of technology or knowledge.And what those skilled in the art were carried out Modifications and changes do not depart from the spirit and scope of the present invention, then all should be within the scope of protection of the appended claims of the present invention.

Claims

1. a kind of edge calculations lateral terminal security level appraisal procedure of random forest, it is characterised in that: the following steps are included:

S1. in edge calculations side Build Security test platform, k test individual event of setting terminal, the test of each test individual event It as a result is 0 or 1, wherein 0 indicates not pass through, 1 indicates to pass through；

S2. on the safe test platform of edge side, m+n platform intelligent terminal is tested according to k test individual event, is obtained every The security performance individual event test result collection of one intelligent terminal, wherein the security performance individual event test result of i-th intelligent terminal Collection are as follows:

X_i=[x_i1, x_i2..., x_ik], i=1,2 ..., m+n；

Wherein, x_ijFor j-th of test individual event score of i-th intelligent terminal, j=1,2 ..., k；By the list of all intelligent terminals Item test result is tieed up matrix X with (m+n) * k and is indicated:

S4. according to the corresponding relationship in step S3, each X is calculated_i=[x_i1, x_i2..., x_ik] corresponding security level y_i, calculate After obtain data set D={ (X₁, y₁), (X₂, y₂) ..., (X_m+n, y_m+n)}；

Training set T={ (X₁, y₁), (X₂, y₂) ..., (X_m, y_m), the ratio for accounting for data set is

Test set S={ (X_m+1, y_m+1), (X_m+2, y_m+2) ..., (X_m+n, y_m+n), the ratio for accounting for data set is

S6. by training set T={ (X₁, y₁), (X₂, y₂) ..., (X_m, y_m) it is used as sample set, input random forest grader It is trained in model, obtains mature sorter model；

S7. after the completion of training, by test set S={ (X_m+1, y_m+1), (X_m+2, y_m+2) ..., (X_m+n, y_m+n) input trained obtain Random forest grader model in, obtain test result and step S4 security level compare to obtain classifier up to standard；

S8. the edge calculations side to be measured intelligent terminal newly accessed access safe test platform is obtained into test result, inputted up to standard Sorter model in assessed, obtain corresponding security level.

2. a kind of edge calculations lateral terminal security level appraisal procedure of random forest according to claim 1, feature Be: the step S3 includes following sub-step:

S32. the test individual event total score of i-th intelligent terminal is enabled

S33. withSafety status classification range is determined to be spaced, whenWhen, the security level of i-th intelligent terminal It is 0,When security level be 1,When security level be 2, and so on,When security level be t, t=1,2 ..., y-1；sum_iThe bigger security performance for indicating intelligent terminal is more It is good.

3. a kind of edge calculations lateral terminal security level appraisal procedure of random forest according to claim 1, feature Be: the step S6 includes following sub-step:

S61. selection random forests algorithm constructs random forest grader model, it belongs to Bagging type, multiple by combining Weak Classifier, final result is by ballot or takes mean value, so that the result of overall model accuracy with higher and generalization Energy；

S62. by training set T={ (X₁, y₁), (X₂, y₂) ..., (X_m, y_m) it is divided into minority class sample set T_minWith most class samples This collection T_max, whereinAnd T_minT_max={ T }；

S63. 2/3rds sample points of random extraction are concentrated from original sample, obtains training set T ', observe the minority class of T ' Data set T_min', most class data set T_max′；

S64. it calculatesValue, provides conditionAnd

S65. if training set T ' meets the condition in S64, the training set for extracting and obtaining is saved, if training set T ' is unsatisfactory for Condition in S64 then gives up extraction and obtains training set；

S66. step S63~S65 is repeated, until obtaining N_treeA training set for meeting condition, wherein N_treeFor quasi- construction decision Set quantity, finally obtained N_treeA training set isWherein i=1,2 ..., N_tree；

S67. in i=1,2 ..., N_treeWhen, utilize training set T_i, one CART decision tree H of training_i, according to Gini selecting index Optimal characteristics.

4. a kind of edge calculations lateral terminal security level appraisal procedure of random forest according to claim 1, feature Be: the step S7 includes following sub-step:

S71. test set S={ (X_m+1, y_m+1), (X_m+2, y_m+2) ..., (X_m+n, y_m+n) it is sample to be tested；

The ballot weight for updating every decision tree is

S73. for j=m+1, m+2 ..., m+n, i=1,2 ..., Nt_ree, input sample to be tested X_j, by the decision tree H of S66_iIt is defeated H out_i(X_j), the final classification of prediction isAs test sample X_jCorresponding safety etc. Grade；

S74. setting judgement classifier error threshold value θ, 0≤θ≤1；

IfThen classifier meets predetermined threshold value, is classification up to standard Device, the return step S5 re -training if being unsatisfactory for, wherein

5. a kind of edge calculations lateral terminal security level appraisal procedure of random forest according to claim 1, feature Be: the step S8 includes following sub-step:

S81. the edge calculations side to be measured intelligent terminal newly accessed access safe test platform is obtained into k test individual event test knots Fruit X=[x₁, x₂..., x_k]；

S82. test result is inputted in sorter model up to standard,f It (X) is corresponding security level.

6. a kind of edge calculations lateral terminal security level appraisal procedure of random forest according to claim 3, feature Be: the step S62 includes following sub-step:

S621. training set T={ (X is counted₁, y₁), (X₂, y₂) ..., (X_m, y_m) in each security level number of samples；

S622. for each security level, if its corresponding number of samples is greater than preset threshold H, by the security level Most class sample set T are added in all samples_max；If its corresponding number of samples is less than or equal to preset threshold H, by the safety etc. Minority class sample set T is added in all samples of grade_min。

7. a kind of edge calculations lateral terminal security level appraisal procedure of random forest according to claim 3, feature Be: the step S67 includes following sub-step:

S671. for training set T_i, gini index Gini is calculated,It is chosen in the smaller expression set of Gini index In sample it is smaller by the probability of misclassification, that is to say, that the purity of set is higher, conversely, set it is more impure；Wherein P_kIt indicates to divide The frequency that k-th of classification occurs in class result；

S672. for the training set T containing N number of sample_i, according to the ith attribute value of attribute A, by data set T_iIt is divided into two Point, Gain_GINI is calculated,Wherein n₁、n₂For sample set T_i1、T_i2 Number of samples；

S673. for attribute A, the Gain_GINI that data set is divided into after two parts by any attribute value is calculated separately, is chosen Minimum value therein, optimal two offshoot program obtained as attribute A:

S674. for sample set T_i, optimal two offshoot program of all properties is calculated, minimum value therein is chosen, as sample set T_i Optimal two offshoot program: