CN113076541B

CN113076541B - Vulnerability scoring model and method of operating system based on back propagation neural network

Info

Publication number: CN113076541B
Application number: CN202110253083.5A
Authority: CN
Inventors: 王丽星; 罗飞; 崔雷; 刘艳彬; 刘涛; 韩乃平; 魏立峰; 张铎; 齐璇; 战茅
Original assignee: Kirin Software Co Ltd
Current assignee: Kirin Software Co Ltd
Priority date: 2021-03-09
Filing date: 2021-03-09
Publication date: 2023-06-27
Anticipated expiration: 2041-03-09
Also published as: WO2022188066A1; CN113076541A

Abstract

The invention relates to a loophole scoring model and method of an operating system based on a back propagation neural network, wherein the method comprises the following steps: s1, determining a scoring index system and collecting data; s2, constructing a neural network model; s3, preprocessing data; s4, training a neural network; the model includes a data collection module and a neural network scoring module. The invention solves the problems that the prior scoring scheme is inaccurate in using a linear scoring formula, can not provide vulnerability scoring when the information is insufficient, and can not provide a scoring method for an operating system.

Description

Vulnerability scoring model and method of operating system based on back propagation neural network

Technical Field

The patent application belongs to the technical field of general vulnerability scoring, and particularly relates to a model and a method for scoring vulnerabilities of an operating system based on a back propagation neural network.

Background

The currently widely used vulnerability risk assessment standard CVSS gives a score to the security vulnerability risk degree of the computer system according to indexes such as vulnerability attack difficulty degree, severity degree of influence caused after attack and the like, so that responders can prioritize responses and resources according to threats. The scoring method measures the influence of various factors on the severity of the vulnerability by using a linear modeling mode, and ignores the nonlinear relation among the influence factors.

The scoring standard based on CVSS only provides evaluation of basic properties of loopholes, and the loophole repair states and user requirements under different environments are not considered, so that loophole risk evaluation aiming at the operating system environment is not performed, and the rapid development of autonomous controllable enterprises is limited to a certain extent.

In the prior art, some methods for evaluating the severity of security vulnerabilities of computer systems exist, which are implemented based on the CVSS2.0 or CVSS3.0 standards. These scoring schemes design a linear scoring formula for each of three sets of metrics, where each metric is constant in weight to the vulnerability assessment score. In addition, existing scoring schemes only provide a single CVSS base score for a vulnerability due to differences in platform, version, and even software compilation.

The disadvantages of the prior art are in particular:

1) Inaccurate linear scoring formula

While the prior art gives a score for a basic set of metrics for the severity of a security breach, these scoring schemes all use linear scoring formulas that do not adequately account for the non-linear relationship between each metric.

2) No scoring method for platform (the way to collect two sets of data needs to be novel)

The threat brought by the vulnerability may change with time, but the existing scheme does not provide the score of the time measurement set, and cannot meet the requirement that some domestic and foreign enterprises (such as financial enterprises) need to know the severity of the vulnerability in real time.

Furthermore, different environments may have different impact on the risk that vulnerabilities bring to the organization and its stakeholders. Existing solutions do not provide environmental metric group vulnerability scores for a certain operating system.

3) Failure to provide vulnerability scores in time when insufficient information is available

The existing scoring schemes cannot evaluate vulnerability when vulnerability information is insufficient. For example, for a "zero day vulnerability attack," it is often necessary to obtain relevant detailed announcements and scores after a month or even several months when such vulnerabilities are discovered. And a hacker discovers a security hole in the code and can exploit the hole to attack the system before the hole is repaired or a patch is available. In this case, vulnerability assessment of vulnerabilities can provide a certain reference value for security administrators to decide whether additional mitigation measures need to be taken.

Disclosure of Invention

Aiming at the problems that the linear scoring formula is inaccurate, the loophole scoring cannot be provided when the information is insufficient, the scoring method aiming at an operating system is not provided and the like in the existing scoring scheme, the invention provides a dynamic loophole scoring model and method of the operating system based on a neural network.

In order to solve the problems, the invention adopts the following technical scheme:

a loophole scoring method of an operating system based on a back propagation neural network comprises the following steps:

s1, determining a scoring index system and collecting data;

s2, constructing a neural network model;

s3, preprocessing the collected data;

s4, training a neural network;

s5, obtaining CVE scores.

The technical scheme of the invention is further improved as follows: the specific flow of step S1 is as follows,

s11, determining a measurement index system, wherein the measurement index system is constructed based on CVSS3.0 and comprises three measurement groups, namely a basic measurement group, a time measurement group and an environment measurement group, and simultaneously adding a characteristic named unrepaired duration into the time measurement group according to the grading requirement of an operating system, wherein the characteristic represents that when a user inquires about a vulnerability grading, if the vulnerability is not repaired yet, the time from the vulnerability release to the clicking moment is counted, and the time is taken as a unit of a day; if the vulnerability is repaired, the value of the unrepaired time length is 0;

s12, collecting basic measurement set data;

s13, collecting time measurement group data;

s13, collecting environment measurement set data.

The technical scheme of the invention is further improved as follows: the specific flow of the step S2 is as follows;

s21, designing a network structure, constructing a BP neural network by using a Keas open source machine learning frame based on Tensorflow, wherein the BP neural network is provided with 1 input layer, more than 2 hidden layers and 1 output layer, the input layer uses the data characteristics of three types of measurement groups collected in the step S11, and corresponding nodes are created; the node number of the hidden layer is sequentially decreased according to the hierarchy thereof; the output layer outputs a score, thus creating a node;

s22, setting an activation function: a ReLU activation function is used in a hidden layer and an output layer of the BP neural network;

s23, setting a learning rate: the initial learning rate is 0.01;

s24, setting an optimizer and a loss function: the optimizer is an Adam optimizer, and the loss function is a mean square error loss function; the loss function, also called the objective function, is one of two parameters necessary to compile a neural network model. Another essential parameter is the optimizer. The loss function refers to a function for calculating a difference between a tag value and a predicted value, and in the machine learning process, there are various loss functions that are selectable, typically distance vectors, absolute value vectors, and the like.

S25, training scale: the number of batch sizes is 256 or more.

The technical scheme of the invention is further improved as follows: step S3 is used for formatting the collected data with different formats, and the specific flow is as follows,

s31, formatting processing data: collecting a plurality of (e.g. 10000) CVE sample data according to the measurement index system determined in the step S11, wherein each CVE sample data corresponds to a group of data characteristics comprising three types of measurement groups;

s32, complement missing data: for the case that partial value is missing during collection, filling the missing value by using the average value of all (10000) CVE sample data under the data characteristic;

s33, selecting a training data set: randomly selecting a plurality of CVE sample data as a training data set, and using the rest samples as a test data set;

s34, manually scoring the training data set: scoring the training data set by an expert, marking a base metric group score as base_score, a time metric group score as time_score, an environmental metric group score as environmental_score, and a final score as risk_score; the three groups of measurement indexes respectively calculate scores, and the calculation formula is the same as the CVSS3.0 standard; the expert scoring formula is:

risk_score＝base_score*0.4+time_score*0.2+environmental_score*0.4。

the technical scheme of the invention is further improved as follows: the number of training data sets is more than 60%, such as 70%, of the CVE sample data, and correspondingly, 7000 CVE sample data in 10000 sets of data are selected as training data sets at random, and the remaining 3000 samples are used as test data sets.

The technical scheme of the invention is further improved as follows: step S4 is used for training out a neural network capable of calculating the vulnerability score, and the specific process is as follows,

s41, inputting a training data set into a BP neural network for training, wherein the size of each batch of processing is 256 samples;

s42, after training for 5000 batches, reducing the learning rate to 0.005;

s43, stopping iteration when the loss function value loss is less than or equal to 0.1.

The technical scheme of the invention is further improved as follows: in step S5, a real-time score is obtained by inputting any one of the data (BVC data) into the BP neural network.

The invention also discloses a vulnerability scoring model of the operating system based on the back propagation neural network, which is used for realizing the method and comprises the following modules:

and a data collection module: the vulnerability scoring method comprises a CVE (continuously variable electronic) platform of an operating system and a vulnerability tracking system, wherein the CVE platform of the operating system is used for acquiring basic measurement set data, and the vulnerability tracking system is used for acquiring time measurement set data and environment measurement set data, so that step S1 of the vulnerability scoring method is realized;

neural network scoring module: the vulnerability scoring method comprises a data preprocessing module, a neural network module and a CVE scoring module which are sequentially connected through information, wherein the steps S2-S5 are used for realizing the vulnerability scoring method.

The data collection module further comprises a data transmission module, the neural network scoring module further comprises a data collection module, and the data collection module is in information connection with the data transmission module.

Due to the adoption of the technical scheme, the beneficial effects obtained by the invention are as follows:

1. evaluating vulnerability risks by using a back propagation neural network, and providing a nonlinear scoring modeling scheme; the existing linear scoring method is expanded to a nonlinear scoring scheme, and the nonlinear relation among each measurement index is fully considered.

2. The loophole scoring scheme customized for the operating system is provided, and the scoring scheme comprehensively considers various indexes of a basic measurement set, a time measurement set and an environment measurement set, has the characteristics of timeliness and individuation, can update time measurement set information in real time, and can provide scoring of fusion time measurement set information for the operating system with higher timeliness requirement. Individuation means that the vulnerability scoring scheme can analyze the values of the indicators of the environment metric group for a specific operating system and provide scoring of the fused environment metric group information.

3. A scoring scheme is provided that can still provide predictive scoring when vulnerability information is insufficient (e.g., zero-day vulnerability) for security administrators to reference and decide whether to take corresponding mitigation measures.

4. In the process of collecting training data, a method of complementing missing data is provided.

5. The invention fills the problem that the prior scoring method based on the CVSS standard can not provide scoring for the loopholes of the operating system, and expands the prior scoring scheme of linear modeling to the scoring scheme of nonlinear modeling. The neural network scoring model generated in the scheme flow has reusability under an operating system platform, and secondary development under the operating system is supported.

Drawings

FIG. 1 is a general flow chart of the method of the present invention;

FIG. 2 is a flow chart of a method of determining a scoring index system according to the present invention;

FIG. 3 is a schematic diagram of a BP neural network topology constructed by the method of the present invention;

FIG. 4 is a flow chart of preprocessing data in the method of the present invention;

FIG. 5 is a diagram showing the structural relationship of the parts of the model of the present invention.

Detailed Description

The present invention will be described in further detail with reference to examples.

The invention discloses a loophole scoring model and method of an operating system based on a back propagation neural network.

Neural network: is a network structure model made up of many artificial neurons, the strength of the connection between which is a learnable parameter, commonly used to deal with artificial intelligence problems, e.g. reasoning, decision.

Back propagation neural network: the BP (Back Propagation) neural network is a multi-layer feedforward neural network and consists of two learning stages of signal forward propagation and error reverse propagation. Given the training samples, the BP neural network aims to learn parameters that minimize the mean square error loss function.

Universal vulnerability scoring system: one of the free and open industry standards is the most widely used standard for quantifying security breach severity, abbreviated as CVSS. The CVSS score includes: basic metric set scores, time metric set scores, and environmental metric set scores. Consider the fundamental features of vulnerabilities, the availability of mitigation measures, and the breadth of vulnerable systems in an organization, respectively.

Linear model: is machine learningThe most widely used model refers to a model that predicts by linear combination of sample features. Given a D-dimensional sample x= [ x ] ₁ ，…，x _D ] ^T The linear combination function is:

f(x；w)＝w ₁ x ₁ +w ₂ x ₂ +…+w _D x _D +b

wherein w= [ w ] ₁ ，…w _D ] ^T The weight vector is in D dimension, and b is the bias.

Nonlinear model: the generalized nonlinear model can be written as a linear combination of multiple nonlinear basis functions phi (x):

f(x：θ)＝w ^T φ(x)+b，

wherein phi (x) = [ phi ] ₁ (x)，φ ₂ (x)，…，φ _K (x)] ^T The parameter θ contains a weight vector w and a bias b for a vector of K nonlinear basis functions.

Vulnerability tracking system: is a database that gathers defects. It allows the user to report the bugs of the software and thus pass them on to the appropriate developer. The developer can use the vulnerability tracking system to maintain a priority table of things to do, as well as schedule and tracking dependencies.

The following detailed description is provided with reference to the accompanying drawings.

The invention discloses a loophole scoring method of an operating system based on a back propagation neural network,

the general flow chart is shown in fig. 1 below, including:

1. determining a scoring index system and collecting data;

2. constructing a neural network model;

3. preprocessing data;

4. training a neural network;

5. a CVE score is obtained.

The present application will be described in detail with reference to specific examples.

Specific examples are as follows.

Step 1: determining a scoring index system and collecting data:

this step determines an operating system-based security breach scoring system and collects data, as described in fig. 2, comprising a number of detailed steps, including the following steps:

determining a metric index system: the measurement index system designed by the scoring scheme is based on CVSS3.0, and part of indexes are improved according to scoring requirements of an operating system. The metric groups are three classes, including a basic metric group, a time metric group, and an environmental metric group.

The set of basic metrics reflects the inherent characteristics of a vulnerability-it does not change over time and with changes in the user's environment. It consists of two groups of indexes: availability and impact indicators;

a set of time metrics, a feature of a vulnerability that may change over time, but not across the user environment. For example, the presence of an easy-to-use exploit kit increases the vulnerability risk score, while the creation of an official patch reduces it. The time measurement group is added with an index of unrepaired duration different from the CVSS3.0 measurement standard, wherein the index indicates that when a user clicks the vulnerability scoring query module, if the vulnerability is not repaired yet, the time from the vulnerability release to the clicking moment is counted, and the unit is "day". If the vulnerability is repaired, the value of the unrepaired time length is 0;

the set of environmental metrics, which describe the characteristics of a vulnerability that is relevant and unique to a particular user environment, typically take on different values as the user environment changes.

(2) Collecting basic metric set data: the basic measurement set data is derived from data provided by a security product provider or an application program provider when a vulnerability bulletin is issued and analysis reports of a vulnerability analysis team of an operating system provider, and the two parts of data are collected, tidied and issued on a CVE platform by the operating system provider. For example, operating system vendor kylin software has a published CVE distribution platform. And running a basic measurement set data crawler to crawl basic measurement set data about the loopholes in the CVE platform of the operating system.

(3) Collecting time measurement set data: the operating system provider maintains vulnerability information using a vulnerability tracking system. For example, kylin software provides Buddhist vulnerability management software. The repair status provided by the platform with respect to the vulnerability is updated in real-time, so that the data of the time metric set collected at the platform is updated in real-time.

(4) Collecting environmental metric group data: since the user is most capable of evaluating the potential impact of a vulnerability in his own environment, the user interface provided in the vulnerability tracking system may be filled by the user with data of the vulnerability in his own environment.

The indices of the basic metric set are shown in table 1 below:

TABLE 1

The indicators of the time metric sets are shown in table 2 below:

TABLE 2

The metrics of the set of environmental metrics are shown in table 3 below:

TABLE 3 Table 3

Step 2: building a neural network model:

the construction of the neural network model based on the security hole score of the operating system platform is realized, and the constructed BP neural network topological structure is shown in figure 3.

The construction process comprises a plurality of detailed steps, and the specific flow is as follows:

(1) Designing a network structure: the BP neural network is constructed by using a Keas open source machine learning framework based on Tensorflow. A neural network with 1 input layer, 2 hidden layers, 1 output layer was constructed. Wherein the input layer will create 14 nodes using the data of the 3 metric sets collected in step 1 for a total of 14 features. The first hidden layer creates 14 nodes, the second hidden layer creates 8 nodes, and the output layer outputs a score, thus creating 1 node. In this embodiment, two hidden layers are used, and of course, 3 or more hidden layers may be used as needed, if three hidden layers are used, the first hidden layer creates 14 nodes, the second hidden layer creates 9 nodes, and the third hidden layer creates 6 nodes, that is, the three hidden layers sequentially decrease according to the number of nodes of the hierarchy.

(2) Setting an activation function: the ReLU activation function is used in the hidden layer and the output layer of the BP network.

(3) Setting a learning rate as follows: the initial learning rate was 0.01.

(4) Setting an optimizer to: adam optimizer. Adam is an adaptive learning rate optimization algorithm that updates various parameters at different learning rates. The excellent performance of the modified polypropylene is widely confirmed in the prior researches.

Setting the loss function as a mean square error loss function, i.e

(5) Training scale: batch size 256.

Step 3: preprocessing data:

this step formats the collected data of different formats, including a plurality of detailed steps, and a flowchart of preprocessing the data is shown in fig. 4.

The specific flow is as follows:

(1) Formatting the processed data: 10000 CVE sample data are collected according to the measurement index system determined in the step 1, and each CVE sample corresponds to a group of characteristics containing 14 numerical values.

(2) Complement missing data: for the case of partial value missing at the time of collection, the missing value is filled with the average of 10000 CVEs under the feature.

(3) Selecting a training data set: 7000 samples in 10000 sets of data are randomly selected as training data sets, and 3000 samples are selected as test data sets.

(4) Manually scoring the training set: the training set is scored by an expert. The base metric set score is denoted as base_score, the time metric set score is denoted as time_score, the environmental metric set score is denoted as environmental_score, and the final score is denoted as task_score. The three groups of measurement indexes respectively calculate scores, and the calculation formula is the same as the CVSS3.0 standard. The expert scoring formula is risk_score=base_score 0.4+time_score 0.2+environmental_score 0.4.

Step 4: training a neural network:

the step is to train out a neural network capable of calculating the vulnerability score, and comprises a plurality of detailed steps, wherein the specific flow is as follows:

(1) The training data set was input into a neural network for training, with each batch being 256 samples in size.

(2) After 5000 batches of training, the learning rate was reduced to 0.005.

(3) And stopping iteration when the loss function value loss is less than or equal to 0.1.

Step 5: obtaining a CVE score:

this step will obtain the score of the CVE by inputting CVE metric set data, i.e. inputting certain CVE data directly into the neural network model, a real-time score will be obtained.

The loophole scoring method based on the BP neural network is provided by the patent:

1. the existing linear scoring method is expanded to a nonlinear scoring scheme, and the nonlinear relation among each measurement index is fully considered.

2. And a custom vulnerability scoring scheme is proposed for the operating system.

1) Timeliness of

The scoring method provided by the patent updates the time measurement set information in real time, and can provide scoring of the fusion time measurement set information for an operating system with higher timeliness requirements.

2) Personalisation

The scoring scheme provided by the patent can analyze the values of the indexes of the environment measurement group aiming at a specific operating system and provide scoring fused with the information of the environment measurement group.

3. The scoring scheme provided by the patent can evaluate the risk degree of the loopholes under the condition of insufficient loophole information (such as zero-day loopholes), and provides a predictive score for security management personnel to refer to and decide whether to take corresponding relieving measures.

The patent also provides a vulnerability scoring model of an operating system based on a neural network, and the structural relationship of each part is shown in figure 5 and comprises the following parts:

1. and a data collection module: the system comprises an operating system CVE platform and a vulnerability tracking system, wherein the operating system CVE platform is used for collecting basic measurement group data, the vulnerability tracking system is used for maintaining vulnerability information, collecting time measurement group data and environment measurement group data through a provided user interface, and then transmitting the collected basic measurement group data, time measurement group data and environment measurement group data to a neural network scoring module through a data transmission module.

2. Neural network scoring module: and collecting the transmitted data through a data collection module, sequentially carrying out data preprocessing, neural network training and score calculation, and inputting certain CVE data into a neural network model to obtain real-time CVE scores.

English full name of CVE is "Common Vulnerabilities & Exposures" -generic vulnerability disclosure. CVE appears as a dictionary table, giving a common name for widely agreed information security vulnerabilities or vulnerabilities that have been exposed, and using a common name can help users share data in their respective independent vulnerability databases and vulnerability assessment tools.

Claims

1. The loophole scoring method of the operating system based on the back propagation neural network is characterized by comprising the following steps of:

s1, determining a scoring index system and collecting data;

s2, constructing a neural network model;

s3, preprocessing data;

s4, training a neural network;

s5, obtaining a CVE score; the specific flow of step S1 is as follows,

s11, determining a measurement index system, wherein the measurement index system is constructed based on CVSS3.0 and comprises three measurement groups, namely a basic measurement group, a time measurement group and an environment measurement group, and simultaneously adding a characteristic named unrepaired duration into the time measurement group according to the grading requirement of an operating system, wherein the characteristic represents that when a user inquires about a vulnerability grading, if the vulnerability is not repaired yet, the time from the vulnerability release to the clicking moment is counted; if the vulnerability is repaired, the value of the unrepaired time length is 0;

s12, collecting basic measurement set data; the source of the basic measurement group data is data provided by a security product provider or an application program provider when a vulnerability bulletin is issued and an analysis report of a vulnerability analysis team of an operating system provider, and the two parts of data are collected, tidied and issued on a CVE platform by the operating system provider;

s13, collecting time measurement group data; the time measurement group data is used for maintaining vulnerability information for an operating system provider by using a vulnerability tracking system;

s14, collecting environment measurement group data; the environment measurement group data is vulnerability data in the own environment filled by the user;

the specific flow of the step S2 is as follows;

s23, setting a learning rate: the initial learning rate is 0.01;

s24, setting an optimizer and a loss function: the optimizer is an Adam optimizer, and the loss function is a mean square error loss function;

s25, training scale: the number of batch sizes is more than 256;

step S3 is used for formatting the collected data with different formats, and the specific flow is as follows,

s31, formatting processing data: collecting a plurality of CVE sample data according to the measurement index system determined in the step S11, wherein each CVE sample data corresponds to a group of data characteristics comprising three types of measurement groups;

s32, complement missing data: for the situation that partial value is missing in the collection, filling the missing value by using the average value of all CVE sample data under the data characteristic;

risk_score＝base_score*0.4+time_score*0.2+environmental_score*0.4。

2. the method for scoring vulnerabilities of an operating system based on a back propagation neural network according to claim 1, wherein: the number of training data sets accounts for more than 60% of the CVE sample data.

3. The method for scoring vulnerabilities of an operating system based on a back propagation neural network according to claim 2, wherein: step S4 is used for training out a neural network capable of calculating the vulnerability score, and the specific process is as follows,

s42, after training for 5000 batches, reducing the learning rate to 0.005;

4. A method for scoring vulnerabilities of a back-propagation neural network-based operating system according to claim 3, wherein: in step S5, a real-time score is obtained by inputting any one data into the BP neural network.

5. A vulnerability scoring system of an operating system based on a back propagation neural network for implementing the method of any one of claims 1 to 4, comprising the following modules:

6. The vulnerability scoring system of claim 5, wherein the vulnerability scoring system comprises: the data collection module further comprises a data transmission module, the neural network scoring module further comprises a data collection module, and the data collection module is in information connection with the data transmission module.