CN112016097B - Method for predicting network security vulnerability time to be utilized - Google Patents

Method for predicting network security vulnerability time to be utilized Download PDF

Info

Publication number
CN112016097B
CN112016097B CN202010889524.6A CN202010889524A CN112016097B CN 112016097 B CN112016097 B CN 112016097B CN 202010889524 A CN202010889524 A CN 202010889524A CN 112016097 B CN112016097 B CN 112016097B
Authority
CN
China
Prior art keywords
time
network security
utilized
data
vulnerability
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010889524.6A
Other languages
Chinese (zh)
Other versions
CN112016097A (en
Inventor
殷娇
游明山
雷丽
安建梅
彭玲玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Hongyue Information Technology Co ltd
Original Assignee
Shenzhen Hongyue Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Hongyue Information Technology Co ltd filed Critical Shenzhen Hongyue Information Technology Co ltd
Priority to CN202010889524.6A priority Critical patent/CN112016097B/en
Publication of CN112016097A publication Critical patent/CN112016097A/en
Application granted granted Critical
Publication of CN112016097B publication Critical patent/CN112016097B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
    • G06F21/577Assessing vulnerabilities and evaluating computer system security
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds
    • G06F18/2414Smoothing the distance, e.g. radial basis function networks [RBFN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"

Abstract

The invention relates to the technical field of computer network security, and in particular discloses a method for predicting network security vulnerability time, which comprises the following steps: the original data d related to the network security vulnerability acquired at the moment t is obtained (t) The characteristic data x is obtained through data preprocessing, characteristic extraction and characteristic selection (t) The method comprises the steps of carrying out a first treatment on the surface of the By combining characteristic data x (t) Classifier model f passing time t (t) (. Cndot.) predicting to obtain the predicted value of the network security vulnerability utilized timeAcquiring a real utilized time tag y corresponding to the network security vulnerability (t) The method comprises the steps of carrying out a first treatment on the surface of the Calculating unbalanced factors corresponding to various categories to which the network security vulnerability in the current sliding window belongs in the time of being utilized through a sliding window unbalanced factor algorithmAnd category weightsAccording to the characteristic data x (t) Utilized time tag y (t) And category weightsRetraining classifier model f (t) (. Cndot.) and updating parameters to obtain classifier model f at time t+1 (t+1) (. Cndot.) the use of a catalyst. The technical scheme of the invention can be used for improving the performance of the prediction model of the network security vulnerability utilized time under the condition of dynamic unbalance of the data category.

Description

Method for predicting network security vulnerability time to be utilized
Technical Field
The invention relates to the technical field of computer network security, in particular to a method for predicting network security vulnerability time to be utilized.
Background
Network vulnerabilities are generally understood to be defects in specific implementations of hardware, software, protocols, etc., or in system security policies, which may enable an attacker to access or destroy a system without authorization. Network vulnerabilities can affect a wide range of software and hardware devices, including the system itself and its supporting software, network client and server software, network routers and security firewalls, etc. Different security holes can be formed between different types of software and hardware devices, between different versions of the same type of device and between different systems formed by different devices, and under different setting conditions of the same type of system.
The vulnerability problem is closely related to time. A system will be continuously exposed with the presence of vulnerabilities in the system from the day of release, and these vulnerabilities found earlier will be continuously repaired by patch software released by the system provider or corrected in new versions of the system released later. While the new version system corrects the loopholes in the old version, some new loopholes and errors are introduced. Thus, over time, old vulnerabilities may be repaired by patching, etc., and new vulnerabilities may appear continuously.
Because of limited resources such as software developers, network security specialists, system maintainers and the like, resources such as funds, technology and the like, all vulnerabilities cannot be repaired in time in the face of increasingly more network security vulnerabilities, and only vulnerabilities which are most easily attacked and are first attacked can be selected for repair. Therefore, predicting the possible utilization time of the network security vulnerabilities is very important to network security vulnerability management, and can help a decision maker find out the earliest possible attacked vulnerabilities, so as to provide patch and other repair tools for the decision maker and minimize the loss caused by the network security vulnerabilities.
One challenge faced by network security vulnerabilities with time-of-use predictions is that network security vulnerabilities data statistics suffer from the phenomenon of dynamic migration, i.e., the functional relationship y=f (x) between the data's characteristics x and the time-stamp y that is utilized changes over time and the appearance of new samples. Therefore, the utilization time of the future vulnerability cannot be predicted by only relying on static predictive models trained from fixed historical data. Instead, online learning is used to continuously retrain and update parameters of the predictive model using new vulnerability samples.
Another challenge faced by network security vulnerabilities with temporal predictions is the problem of class dynamic imbalance.Is an n-dimensional feature vector, y in a feature space X acquired at the time t (t) ∈{c [1] ,c [2] ,…,c [N] Is corresponding to characteristic data x (t) C [1] ,c [2] ,…,c [N] N categories corresponding to the classification problem data are N more than or equal to 2. (x) (t) ,y (t) ) Referred to as a labeled sample. Class dynamic imbalance refers to belonging to class c [1] ,c [2] ,…,c [N] The ratio of the number of samples to the total number of samples is not the same and the ratio of each type is dynamically transformed over time. For the network security vulnerability exploit time prediction problem, the exploit time tags include, but are not limited to, y e { c } [1] Before release of = 'vulnerability disclosure', c [2] = 'vulnerability publication day', c [3] Within one month 'after release of the=' vulnerability disclosure, c [4] One year after release of=' vulnerability disclosure, c [5] = 'will never be utilized'. The real historical data shows that the network security hole belongs to the category c [k] The proportion of the number of samples corresponding to (k=1, 2, …, N) to the total number of samples is unbalanced, the number of samples of some categories is large, the number of samples corresponding to some categories is small, and the unbalanced state is dynamically changed with the passage of time. The traditional classifier model based on data driving only has good effect on data modeling of sample equalization. When the samples are unbalanced, the categories with more samples or the categories with less samples need to be downsampled according to the unbalanced state of the samples, so that the purpose of sample balance is achieved.
One of the existing solutions for sample imbalance is shown in equation (3),
the result of the calculation by the method shown in the formula (3) is the proportion of each class in all samples up to the current time t, and the global imbalance factors of each class are calculated. The imbalance factor calculated by the method is insensitive to the imbalance state existing in the new data due to the influence of the old data. When the unbalanced state of the sample dynamically changes, the method cannot timely respond to the latest unbalanced state.
In order to reduce the influence of old data, an improved algorithm introduces a time delay coefficient theta based on the step (3), and the unbalanced factor is calculated according to the step (4).
The calculation method in the formula (4) assumes that the time delay coefficient θ is the same at all times, and does not fundamentally solve the problem of dynamic change of the unbalanced state, and still cannot capture the unbalanced state of the sample in the last period of time in real time.
Disclosure of Invention
In order to solve the technical problem that the performance of a prediction model is gradually deteriorated along with the time course due to concept drift and sample data dynamic unbalance in the prediction problem of the predicted network security vulnerability exploitation time, the invention provides a method for predicting the network security vulnerability exploitation time.
The basic scheme of the invention is as follows:
a method for predicting network security vulnerability time comprises step S1, obtaining original data d related to network security vulnerability at time t (t) Wherein t=1, 2,3, …; step S2, for the obtained original data d at the time t (t) Sequentially performing data preprocessing, feature extraction and feature selection to obtain feature data x (t) The method comprises the steps of carrying out a first treatment on the surface of the The method also comprises the following steps:
step S3, feature data x (t) Classifier model f passing time t (t) (. Cndot.) is predicted to obtain the predicted value of the network security vulnerability utilized time obtained at the moment tFor downstream applications, where->
Step S4, obtaining a real utilized time tag y corresponding to the network security hole at the moment t (t) Wherein y is (t) ∈{c [1] ,c [2] ,…,c [k] ,…c [N] },c [1] ,c [2] ,…,c [k] ,…c [N] N categories which indicate the network security vulnerability is utilized, wherein N indicates the total number of categories of the network security vulnerability is utilized, and N is more than or equal to 2;
step S5, calculating unbalanced factors corresponding to each category to which the network security vulnerability in the current sliding window belongs by a sliding window unbalanced factor algorithmAnd category weight->Where k=1, 2, …, N;
step S6, according to the characteristic data x of the network security hole at the moment t (t) Utilized time tag y (t) Category weightRetraining a current classifier model f (t) (. Cndot.) and updating parameters to obtain classifier model f at time t+1 (t+1) (·)。
The basic scheme has the beneficial effects that: 1. according to the technical scheme, the classifier model is trained by utilizing original data of network security vulnerabilities which appear in history and corresponding utilized time labels in an online learning mode, so that the utilized time of the network security vulnerabilities acquired at the current moment is predicted, and the prediction result can give out the approximate time range of the network security vulnerabilities acquired at the current moment when the network security vulnerabilities are utilized, so that decision support of time, funds, personnel and other resource allocation problems is provided for network security experts, system developers and maintainers.
2. According to the technical scheme, network security holes in the latest sliding window are obtained through calculation by means of a sliding window unbalanced factor algorithmUsing imbalance factors corresponding to each category to which the time belongsThe dynamic unbalanced state of the categories in the network security vulnerability exploitation time prediction problem can be tracked in real time. According to the tracked dynamic unbalanced state of the categories, the technical scheme further calculates the category weights of the categories through a sliding window unbalanced factor algorithm>And use the class weight +.>And controlling the retraining and parameter updating processes of the classifier model, so that the problem of model prediction performance reduction caused by the dynamic unbalanced state of the category in the problem of network security vulnerability utilized time prediction is effectively reduced.
3. According to the technical scheme, whenever a network security vulnerability sample with a label is obtained, the network security vulnerability sample is obtained according to the characteristic data x of the network security vulnerability (t) Utilized time tag y (t) And category weightsRetraining the classifier model and updating parameters of the classifier model to realize online learning of the classifier model. Compared with a static prediction model obtained by training by means of fixed and unchanged historical data in the prior art, the technical scheme can effectively reduce the technical problem that the performance of the prediction model is gradually deteriorated along with the time due to concept drift in the network security vulnerability exploitation time prediction problem.
Further, in step S5, the imbalance factorThe calculation formula of (2) is as follows;
wherein z represents the total number of samples contained in the current sliding window, wherein z is not less than N, c [k] (k=1, 2, …, N) represents the kth class to which the security breach is subject to by-time, the unbalanced factorIs shown within the current sliding window as belonging to category c [k] In the ratio of the number of samples in the current sliding window to the total number of samples z, where the characteristic data x (t) The corresponding network security vulnerability is utilized for a period of time belonging to category c [k] When [ (x) (t) ,c [k] )]=1, otherwise, [ (x) (t) ,c [k] )]=0。
The beneficial effects are that: compared with the solution of sample imbalance of the formulas (3) and (4), the method can track real-time imbalance factors of various categories in the nearest sliding window z (z is larger than or equal to N) samples in real time. The sensitivity of the classifier model to time can be adjusted by adjusting the size of the sliding window z, and the smaller z is, the more sensitive to time is, and the more real-time unbalanced state of the classes of each class can be reflected; the greater z, the less time sensitive, the more likely it is to reflect the average of the various classes of imbalance conditions over a relatively long period of time.
Further, in step S5, the category weightsThe calculation formula of (2) is as follows:
wherein,for category c [k] A corresponding imbalance factor.
Due to the current sample (x (t) ,y (t) ) Belongs to category c [k] Class c [k] Proportion of (unbalanced factor)Must be greater than 0.
The beneficial effects are that: for the class with smaller sample proportion, the corresponding unbalanced factorSmaller, class weight calculated by this method +.>Larger; similarly, for the class with larger sample proportion, the corresponding imbalance factor +.>Larger, class weight calculated by this method +.>Smaller. Class weight calculated by the method>The retraining process of the classifier model is adjusted, so that the data of the categories with few samples can be enhanced, the data of the categories with more samples are weakened, and the purposes of balancing the sample data and improving the performance of the classifier model are achieved.
Further, the value of z in the formula (1) can be optimized by any super-parameter determination method including random search, grid search and Bayesian optimization algorithm.
The beneficial effects are that: the size of the sliding window z can be used for adjusting the sensitivity of the classifier model to time, and the smaller z is, the more sensitive to time is, and the more real-time unbalanced state of each category can be reflected; the greater z, the less time sensitive, the more likely it is to reflect the average of the various classes of imbalance conditions over a relatively long period of time. The specific value of z is determined by adopting the super-parameter optimization method, so that a user can select the value of z most suitable for engineering application according to the data condition of the user.
Further, the data preprocessing in step S2 includes any one or a combination of a plurality of general algorithms including data deduplication, outlier detection, regularization, normalization, word segmentation, and single-hot encoding.
The beneficial effects are that: the preprocessing mode of the original data is various, convenient to select and high in adaptability.
Further, the original data comprises a network security vulnerability multi-dimensional original data combination of vulnerability numbers, vulnerability description information, vulnerability release time and security level scores or any one of the single-dimensional original data.
The beneficial effects are that: the scheme can be suitable for single-dimensional original data and multi-dimensional original data, and has a large application range.
Further, the feature extraction in step S2 includes employing a general manual feature extraction algorithm or an automatic feature extraction algorithm according to the form and content of the original data.
The beneficial effects are that: the method for extracting the characteristics of the original data is various and convenient to select.
Further, feature selection includes principal component analysis, correlation coefficient method, and recursive feature elimination method.
The beneficial effects are that: the feature selection is varied in manner to facilitate selection based on downstream applications.
Further, classifier models used to predict when network security vulnerabilities are exploited include, but are not limited to, fully connected neural network algorithms, convolutional neural network algorithms, and recurrent neural network algorithms, f when t=1 (1) (-) model parameters are initialized randomly or with a known pre-trained model.
The beneficial effects are that: the classifier model is various and is convenient to select according to downstream application. When no history data is accumulated, f (1) (.) random initialization is adopted, so that the technical scheme can be started in a cold mode under the condition that historical data are not available. When part of historical data exists, a known pre-training model is adopted for initialization, so that the technical scheme can effectively utilize the existing historical data, and the performance of the algorithm in the initial operation stage is improved.
Further, the net at time t+1Original data d of network security hole (t+1) Before being acquired, the network security vulnerability utilized time tag y at t moment is not acquired yet (t) When the steps S5 and S6 are skipped, let f (t+1) (·)=f (t) (. Cndot.) for predicting the network security breach exploit time at time t+1.
The beneficial effects are that: in the practical application process, only x (t) After the corresponding network security hole is truly utilized, the real label y of the network security hole utilized time can be obtained (t) . When x is (t) When the corresponding network security hole is not really utilized and a new network security hole with t+1 needs to be predicted, f is caused to be (t+1) (·)=f (t) (. Cndot.) it can be guaranteed that the algorithm can still operate effectively.
Drawings
FIG. 1 is a flow chart of an embodiment of a method for predicting a network security vulnerability to be exploited;
FIG. 2 is a diagram of experimental results of an embodiment of a method for predicting the time that a network security vulnerability is utilized;
FIG. 3 is a diagram of experimental results of an embodiment of a method for predicting the time that a network security vulnerability is utilized;
FIG. 4 is a diagram of experimental results of an embodiment of a method for predicting the time that a network security vulnerability is utilized;
FIG. 5 is a diagram of experimental results of an embodiment of a method for predicting the time that a network security vulnerability is utilized;
FIG. 6 is a diagram of experimental results of an embodiment of a method for predicting the time when a network security vulnerability is utilized.
Detailed Description
The following is a further detailed description of the embodiments:
examples
A method for predicting network security vulnerabilities to be exploited, as shown in fig. 1, comprising:
step S1, obtaining original data d related to network security vulnerabilities at time t (t) Where t=1, 2,3, …, raw data d (t) Includes vulnerability number and vulnerability description letterThe network security vulnerability multi-dimensional original data combination or any one of the single-dimensional original data can be obtained by information, vulnerability release time, security grade score and the like. In the present embodiment, the original data d (t) The method comprises the steps of vulnerability numbering, vulnerability description information, vulnerability release time and measurement indexes and scores provided in a common vulnerability assessment system CVSS 2.0.
Step S2, for the obtained original data d at the time t (t) Sequentially preprocessing, extracting features and selecting features to obtain feature data x (t) . The data preprocessing comprises any one or a combination of a plurality of general algorithms including data deduplication, outlier detection, regularization, normalization, word segmentation and single-hot encoding. Feature extraction involves the use of a general manual feature extraction algorithm or an automatic feature extraction algorithm depending on the form and content of the original data. The feature selection comprises a general feature extraction algorithm such as a principal component analysis method, a correlation coefficient method, a recursive feature elimination method and the like.
In this embodiment, the step S2 specifically includes the following steps:
step S201, using vulnerability description information as the numerical feature of the first part, extracting semantic features by using a BERT deep learning model in a natural language processing technology, converting natural language data such as description information into high-dimensional numerical features, and then using a principal component analysis method as a feature selection method to reduce the dimensions and select the feature data of the first part with 10 dimensions;
step S202, taking other multidimensional original data except vulnerability description information as the numerical value characteristic of the second part, converting the non-numerical value characteristic into the numerical value characteristic according to the single thermal coding, carrying out normalization processing on the numerical value characteristic, and then adopting a principal component analysis method as a characteristic selection method to reduce the dimension to select the characteristic data of the second part with 10 dimensions.
Step S203 of stitching the first portion of the selected 10-dimensional feature data with the second portion of the selected 10-dimensional feature data to 20-dimensional feature data x (t)
Step S3, feature data x (t) Classifier model f passing time t (t) (. Cndot.) prediction to give tPredicted value of network security vulnerability utilized time obtained at momentThe prediction result can provide decision support for resource allocation problems such as time, funds, personnel and the like for network security experts, system developers and maintenance parties. In addition, a classifier model f for predicting the time taken for a network security vulnerability to be exploited (t) (-) can be any machine learning algorithm capable of handling classification problems, including but not limited to fully connected neural network algorithms, convolutional neural network algorithms and modifications thereof, cyclic neural network algorithms and modifications thereof, etc., f when t=1 (1) The (-) model parameters may be initialized randomly or with a known pre-trained model. In this embodiment, the classifier model uses a three-layer fully-connected neural network model, where the number of neurons in the input layer is equal to x (t) The dimension of (2), 20; the number of neurons of the output layer is equal to the number of classes N, n=3 in this embodiment; f (f) (1) (. Cndot.) the initialization is performed using a random initialization method.
Step S4, obtaining a real utilized time tag y corresponding to the network security hole at the moment t (t) Is utilized with a time stamp y (t) Is characteristic data x (t) The corresponding network security vulnerability is exploited by a true value of the category to which the time belongs. Wherein y is (t) ∈{c [1] ,c [2] ,…,c [k] ,…c [N] },c [1] ,c [2] ,…,c [k] ,…c [N] N categories which represent the network security vulnerability is utilized, N is more than or equal to 2, N represents the total number of categories of the network security vulnerability is utilized, and the division of the categories can be customized by people. In this embodiment, the network security vulnerability exploitation times are divided into 3 classes, i.e., n=3, specifically y (t) ∈{c [1] Before release of = 'vulnerability disclosure', c [2] = 'vulnerability publication day', c [3] = 'vulnerability disclosure post release' }.
Step S5, calculating to obtain the network in the current sliding window through the sliding window unbalanced factor algorithmUnbalanced factors corresponding to each category to which the security breach is utilizedAnd category weight->Where k=1, 2, …, N.
Wherein the imbalance factorThe calculation formula of (2) is as follows;
wherein z represents the total number of samples contained in the current sliding window, wherein z is greater than or equal to N, and the value of z is optimized by any one of the hyper-parameter determination methods including random search, grid search and bayesian optimization algorithm, and in this embodiment, after the value of z is optimized by the hyper-parameter determination method of grid search, z=50 is taken. c [k] (k=1, 2, …, N) represents the kth category to which the security breach is subject to the exploit time. Imbalance factorIs shown within the current sliding window as belonging to category c [k] In the ratio of the sample data of (2) to the total sample number z in the current sliding window, in the formula (1), when the characteristic data x (t) The corresponding network security vulnerability is utilized for a period of time belonging to category c [k] When [ (x) (t) ,c [k] )]=1, otherwise, [ (x) (t) ,c [k] )]=0. And due to the current sample data (x (t) ,y (t) ) Belongs to category c [k] Class c [k] Proportion (unbalanced factor) of +.>Must be greater than 0.
Category weightThe calculation formula of (2) is as follows:
wherein,for category c [k] A corresponding imbalance factor.
Step S6, according to the characteristic data x of the network security hole at the moment t (t) Utilized time tag y (t) Category weightRetraining a current classifier model f (t) (. Cndot.) and updating parameters to obtain classifier model f at time t+1 (t+1) (·)。
And in the above step, when the original data d of the network security hole at time t+1 (t+1) Before being acquired, the network security vulnerability utilized time tag y at t moment is not acquired yet (t) When the steps S5 and S6 are skipped, let f (t+1) (·)=f (t) (. Cndot.) for predicting the network security breach exploit time at time t+1.
The specific implementation process comprises the following steps: in this embodiment, 23302 vulnerability information recorded in the open source database NVD with the utilization time recorded in the open source database NVD between 1988 and 2020 is taken for simulation verification. The utilization time of each network security hole is shown in fig. 2, the dots are single holes, and the larger the dots are, the longer the interval between the utilization time of the network security hole and the release of the holes is. The dynamic unbalanced states of the types of the network security vulnerabilities of the 3 different types in the embodiment are shown in fig. 3. In FIG. 3, the legend 'Neg' represents the category c [1] Before the release of the =' vulnerability disclosure, 18113 total static accounts for 77.73%; 'ZeroDay' represents class c [2] = 'vulnerability disclosure release day', total 1312, static duty ratio 5.63%; 'Pos' represents class c [3] After the= 'vulnerability publication', 3877 pieces in total, the static duty ratio is 16.64%.
The technical scheme is used for carrying out the time prediction performance evaluation of the network security vulnerability utilized by the category 'Neg', the category 'ZeroDay' and the category 'Pos'. Fig. 4, 5 and 6 show the most widely used 4 performance indicators in classifying problems over these three categories: accuracy, precision, recall, and F1 value over time. Comparing the dynamic unbalanced trend of each category in fig. 3, it can be found that the predicted performance trend of each category is consistent with the dynamic unbalanced state trend of each category in fig. 3, and the predicted classification performance of the network security vulnerability utilized time by using the technical scheme is obviously better than the random guess result.
The foregoing is merely exemplary embodiments of the present invention, and specific structures and features that are well known in the art are not described in detail herein. It should be noted that modifications and improvements can be made by those skilled in the art without departing from the structure of the present invention, and these should also be considered as the scope of the present invention, which does not affect the effect of the implementation of the present invention and the utility of the patent. The protection scope of the present application shall be subject to the content of the claims, and the description of the specific embodiments and the like in the specification can be used for explaining the content of the claims.

Claims (8)

1. A method for predicting network security vulnerability time comprises step S1, obtaining original data d related to network security vulnerability at time t (t) Wherein t=1, 2,3, …; step S2, for the obtained original data d at the time t (t) Sequentially performing data preprocessing, feature extraction and feature selection to obtain feature data x (t) The method comprises the steps of carrying out a first treatment on the surface of the The method is characterized by further comprising the following steps:
step S3, feature data x (t) Classifier model f passing time t (t) (. Cndot.) is predicted to obtain the prediction of the network security vulnerability utilized time obtained at time tValue ofFor downstream applications, where->
Step S4, obtaining a real utilized time tag y corresponding to the network security hole at the moment t (t) Wherein y is (t) ∈{c [1] ,c [2] ,…,c [k] ,…c [N] },c [1] ,c [2] ,…,c [k] ,…c [N] N categories which indicate the network security vulnerability is utilized, wherein N indicates the total number of categories of the network security vulnerability is utilized, and N is more than or equal to 2;
step S5, calculating unbalanced factors corresponding to each category to which the network security vulnerability in the current sliding window belongs by a sliding window unbalanced factor algorithmAnd category weight->Where k=1, 2, …, N;
in step S5, the imbalance factorThe calculation formula of (2) is as follows:
wherein z represents the total number of samples contained in the current sliding window, wherein z is not less than N, c [k] (k=1, 2,., N) represents the kth category to which the security breach is subject to by-time, an imbalance factorIs shown within the current sliding window as belonging to category c [k] In the ratio of the sample data of (2) to the total sample number z in the current sliding window, where, when the characteristic data x (t) The corresponding network security vulnerability is utilized for a period of time belonging to category c [k] When [ (x) (t) ,c [k] )]=1, otherwise, [ (x) (t) ,c [k] )]=0;
In step S5, category weightsThe calculation formula of (2) is as follows:
wherein,for category c [k] A corresponding imbalance factor;
step S6, according to the characteristic data x of the network security hole at the moment t (t) Utilized time tag y (t) Category weightRetraining a current classifier model f (t) (. Cndot.) and updating parameters to obtain classifier model f at time t+1 (t+1) (·)。
2. The method for predicting network security breach in time of claim 1, wherein: the value of z in the formula (1) is optimized by any super-parameter determination method including random search, grid search and Bayesian optimization algorithm.
3. The method for predicting network security breach in time of claim 1, wherein: the data preprocessing in step S2 includes any one or a combination of a plurality of general algorithms including data deduplication, outlier detection, regularization, normalization, word segmentation and single-hot encoding.
4. The method for predicting network security breach in time of claim 1, wherein: raw data d (t) The network security vulnerability multi-dimensional original data combination comprises a vulnerability number, vulnerability description information, vulnerability release time and security level scores or any one of the single-dimensional original data.
5. The method for predicting network security breach in time of claim 1, wherein: the feature extraction in step S2 includes using a general artificial feature extraction algorithm or an automatic feature extraction algorithm according to the form and content of the original data.
6. The method for predicting network security breach in time of claim 1, wherein: feature selection includes principal component analysis, correlation coefficient, and recursive feature elimination.
7. The method for predicting network security breach in time of claim 1, wherein: classifier models used to predict when network security vulnerabilities are exploited include, but are not limited to, fully connected neural network algorithms, convolutional neural network algorithms, and recurrent neural network algorithms, f when t=1 (1) (-) model parameters are initialized randomly or with a known pre-trained model.
8. The method for predicting network security breach in time of claim 1, wherein: original data d of network security hole at time t+1 (t+1) Before being acquired, the network security vulnerability utilized time tag y at t moment is not acquired yet (t) When the steps S5 and S6 are skipped, let f (t+1) (·)=f (t) (. Cndot.) for predicting network security vulnerabilities at time t+1The time utilized.
CN202010889524.6A 2020-08-28 2020-08-28 Method for predicting network security vulnerability time to be utilized Active CN112016097B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010889524.6A CN112016097B (en) 2020-08-28 2020-08-28 Method for predicting network security vulnerability time to be utilized

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010889524.6A CN112016097B (en) 2020-08-28 2020-08-28 Method for predicting network security vulnerability time to be utilized

Publications (2)

Publication Number Publication Date
CN112016097A CN112016097A (en) 2020-12-01
CN112016097B true CN112016097B (en) 2024-02-27

Family

ID=73503285

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010889524.6A Active CN112016097B (en) 2020-08-28 2020-08-28 Method for predicting network security vulnerability time to be utilized

Country Status (1)

Country Link
CN (1) CN112016097B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113792300B (en) * 2021-11-17 2022-02-11 山东云天安全技术有限公司 System for predicting industrial control network bugs based on internet and industrial control network bug parameters
CN114021149B (en) * 2021-11-17 2022-06-03 山东云天安全技术有限公司 System for predicting industrial control network bugs based on correction parameters
CN114329500B (en) * 2022-03-09 2022-06-17 山东卓朗检测股份有限公司 Server cluster security vulnerability detection method based on artificial intelligence
CN116980065B (en) * 2023-08-17 2024-03-19 辽宁天衡智通防务科技有限公司 Clock calibration method, clock calibration device, terminal equipment and storage medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001090921A2 (en) * 2000-05-25 2001-11-29 Kanisa, Inc. System and method for automatically classifying text
CN102185735A (en) * 2011-04-26 2011-09-14 华北电力大学 Network security situation prediction method
CN104809226A (en) * 2015-05-07 2015-07-29 武汉大学 Method for early classifying imbalance multi-variable time sequence data
CN109347801A (en) * 2018-09-17 2019-02-15 武汉大学 A kind of vulnerability exploit methods of risk assessment based on multi-source word insertion and knowledge mapping
CN110018670A (en) * 2019-03-28 2019-07-16 浙江大学 A kind of industrial process unusual service condition prediction technique excavated based on dynamic association rules
WO2019150343A1 (en) * 2018-02-05 2019-08-08 Telefonaktiebolaget Lm Ericsson (Publ) Resource needs prediction in virtualized systems: generic proactive and self-adaptive solution
CN110109969A (en) * 2019-04-16 2019-08-09 公安部第三研究所 A kind of integrated data stream method for digging and system for the unbalanced application of class
CN110321940A (en) * 2019-06-24 2019-10-11 清华大学 The feature extraction of aircraft telemetry and classification method and device
CN110636020A (en) * 2019-08-05 2019-12-31 北京大学 Neural network equalization method for adaptive communication system
CN111401808A (en) * 2020-03-12 2020-07-10 重庆文理学院 Material agreement inventory demand prediction method based on hybrid model

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001090921A2 (en) * 2000-05-25 2001-11-29 Kanisa, Inc. System and method for automatically classifying text
CN102185735A (en) * 2011-04-26 2011-09-14 华北电力大学 Network security situation prediction method
CN104809226A (en) * 2015-05-07 2015-07-29 武汉大学 Method for early classifying imbalance multi-variable time sequence data
WO2019150343A1 (en) * 2018-02-05 2019-08-08 Telefonaktiebolaget Lm Ericsson (Publ) Resource needs prediction in virtualized systems: generic proactive and self-adaptive solution
CN109347801A (en) * 2018-09-17 2019-02-15 武汉大学 A kind of vulnerability exploit methods of risk assessment based on multi-source word insertion and knowledge mapping
CN110018670A (en) * 2019-03-28 2019-07-16 浙江大学 A kind of industrial process unusual service condition prediction technique excavated based on dynamic association rules
CN110109969A (en) * 2019-04-16 2019-08-09 公安部第三研究所 A kind of integrated data stream method for digging and system for the unbalanced application of class
CN110321940A (en) * 2019-06-24 2019-10-11 清华大学 The feature extraction of aircraft telemetry and classification method and device
CN110636020A (en) * 2019-08-05 2019-12-31 北京大学 Neural network equalization method for adaptive communication system
CN111401808A (en) * 2020-03-12 2020-07-10 重庆文理学院 Material agreement inventory demand prediction method based on hybrid model

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
Fast Sliding Window Classification with Convolutional Neural Networks;Henry G. R. Gouk,Anthony M. Blake;《Proceedings of the 29th International Conference on Image and Vision Computing New Zealand》;114-118 *
基于ASP.NET学生选课系统的设计与实现;雷丽;《重庆文理学院学报 自然科学版》(第2期);72-74+77 *
基于类别非平衡时序数据批的企业财务困境预测动态建模研究;刘欣;《中国优秀硕士学位论文全文数据库 经济与管理科学辑》(第02期);J152-3485 *
概念漂移数据流分类算法研究;孙艳歌;《中国博士学位论文全文数据库 信息科技辑》(第01期);I138-41 *
类别严重不均衡应用的在线数据流学习算法;赵强利,蒋艳凰;《计算机科学》(第06期);255-259 *

Also Published As

Publication number Publication date
CN112016097A (en) 2020-12-01

Similar Documents

Publication Publication Date Title
CN112016097B (en) Method for predicting network security vulnerability time to be utilized
Tian et al. An intrusion detection approach based on improved deep belief network
US11005872B2 (en) Anomaly detection in cybersecurity and fraud applications
US11194691B2 (en) Anomaly detection using deep learning models
Wang et al. Heterogeneous network representation learning approach for ethereum identity identification
US8676726B2 (en) Automatic variable creation for adaptive analytical models
CN110381079B (en) Method for detecting network log abnormity by combining GRU and SVDD
CN113312447B (en) Semi-supervised log anomaly detection method based on probability label estimation
CN108717408A (en) A kind of sensitive word method for real-time monitoring, electronic equipment, storage medium and system
Myronenko et al. Accounting for dependencies in deep learning based multiple instance learning for whole slide imaging
Nuha Training dataset reduction on generative adversarial network
CN111160959A (en) User click conversion estimation method and device
CN115017513A (en) Intelligent contract vulnerability detection method based on artificial intelligence
Kekül et al. A multiclass hybrid approach to estimating software vulnerability vectors and severity score
Hamolia et al. Intrusion detection in computer networks using latent space representation and machine learning
He et al. Semi-leak: Membership inference attacks against semi-supervised learning
Bai et al. Benchmarking tropical cyclone rapid intensification with satellite images and attention-based deep models
Fonseca et al. Model-agnostic approaches to handling noisy labels when training sound event classifiers
Chen et al. An efficient network intrusion detection model based on temporal convolutional networks
Wang et al. Embedding learning with heterogeneous event sequence for insider threat detection
Dang et al. seq2graph: discovering dynamic dependencies from multivariate time series with multi-level attention
CN113079168B (en) Network anomaly detection method and device and storage medium
Huo et al. Traffic anomaly detection method based on improved GRU and EFMS-Kmeans clustering
Shukla et al. Sentiment analysis of international relations with artificial intelligence
Zong et al. Application of artificial fish swarm optimization semi-supervised kernel fuzzy clustering algorithm in network intrusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20240126

Address after: 518000 1104, Building A, Zhiyun Industrial Park, No. 13, Huaxing Road, Henglang Community, Longhua District, Shenzhen, Guangdong Province

Applicant after: Shenzhen Hongyue Information Technology Co.,Ltd.

Country or region after: China

Address before: 402160, Honghe Avenue, Yongchuan District, Chongqing, 319

Applicant before: CHONGQING University OF ARTS AND SCIENCES

Country or region before: China

GR01 Patent grant
GR01 Patent grant