Disclosure of Invention
The present invention is directed to a computer network security prediction system for analyzing big data information, so as to solve the problems set forth in the background art.
In order to achieve the above object, a computer network security prediction system for analyzing based on big data information is provided, which comprises a data generation unit, a security detection unit, an early warning feedback unit and a data updating unit;
the data generation unit is used for collecting the existing computer network data and classifying the collected data when predicting the existing computer network environment, screening according to the influence on network safety, and extracting the network characteristics of the reserved data;
the security detection unit is used for detecting the network characteristics extracted by the data generation unit, analyzing detected vulnerabilities combined with network attack means, simulating corresponding defending measures according to the detected vulnerabilities, and simultaneously evaluating the defending measures combined with the existing computer network environment;
the early warning feedback unit is used for conveying the defensive measure scheme simulated by the safety detection unit to the cloud according to the evaluation result and collecting cloud feedback data;
the data updating unit is used for judging the defensive measure scheme simulated by the safety detection unit according to cloud feedback data collected by the early warning feedback unit, conveying the defensive measure scheme simulated by the safety detection unit to the existing computer network environment according to a judging result, completing the updating of the computer network environment, and simultaneously, regularly predicting the computer network environment.
As a further improvement of the technical scheme, the data generating unit comprises a network data acquisition module and a data processing module;
the network data acquisition module comprises an information collection module and an information classification module;
the information collection module is used for collecting real-time network environment data of a network to be predicted;
the information classification module is used for carrying out property analysis on the network environment data collected by the information collection module and packaging and classifying the data according to different properties.
As a further improvement of the technical scheme, the data processing module comprises a data cleaning module and a data extraction module;
the data cleaning module is used for cleaning the data in the data packets classified by the information classification module according to the influence on the network security environment, and reserving the network environment data with influence;
the data extraction module is used for extracting characteristics of the network environment data reserved by the data cleaning module by using a statistical analysis algorithm.
As a further improvement of the technical scheme, the data extraction module statistical analysis algorithm comprises the following steps:
performing feature extraction and normalization processing on the data to enable the data to be suitable for clustering analysis, so as to obtain class labels of each sample;
calculating the center point of each category according to the category label, visually displaying and explaining the clustering center, and finding out the characteristics and regularity of each category;
and obtaining the network environment data characteristics according to the characteristic category.
As a further improvement of the technical scheme, the security detection unit comprises a threat assessment module and a defense generation module;
the threat assessment module is used for assessing the analysis data of the data processing module by combining with the network environment to be predicted, and searching a corresponding attack means in the network according to the assessment result;
the defending generation module is used for analyzing the attack means collected by the threat assessment module, obtaining a defending correction scheme, and assessing the defending correction scheme according to the network environment predicted by the need.
As a further improvement of the technical scheme, the threat assessment module comprises a vulnerability extraction module and an attack analysis module;
the vulnerability extraction module is used for analyzing the characteristic data extracted by the data extraction module in combination with the real-time network environment so as to obtain network vulnerability information;
the attack analysis module is used for collecting corresponding network data attack means in the network according to the network vulnerability information acquired by the vulnerability extraction module and carrying out combination classification on the collected network data attack means.
As a further improvement of the technical scheme, the defense generating module comprises a defense evaluating module and an implementation analyzing module;
the defense evaluation module is used for analyzing the network data attack means collected by the attack analysis module by combining with the existing network vulnerability restoration means so as to obtain a corresponding system updating scheme;
the implementation analysis module is used for analyzing the system updating scheme acquired by the defense evaluation module by combining the network environment to be predicted.
As a further improvement of the technical scheme, the early warning feedback unit comprises a report generating module and a feedback collecting module;
the report generation module is used for screening invalid data in the system updating scheme according to the analysis result of the implementation analysis module, and sending the screened system updating scheme to the cloud;
the filtering expression of invalid data in the system updating scheme is as follows:
;
wherein,for unlimited data value, when the value is greater than 1, the data is invalid, otherwise, when the value is less than 1, the data is valid, and +.>To obtain a system update scheme->For analysis of system update scheme->Get combined implementation of system update scheme->To evaluate the effect of the implementation, the final pass +.>The index size judges the validity of the data and provides better data value.
The feedback collection module is used for collecting cloud feedback data of the system updating scheme.
As a further improvement of the technical scheme, the data updating unit comprises a system optimizing module and a report updating module;
the system optimization module is used for judging the system updating scheme screened by the report generation module according to the feedback data collected by the feedback collection module, and transmitting the system updating scheme to the real-time network environment data according to the judging result to finish the network environment safety data updating;
the report updating module is used for continuously predicting the real-time network environment data and keeping the real-time network environment data continuously updated.
Compared with the prior art, the invention has the beneficial effects that:
in the computer network security prediction system based on big data information analysis, various data sources such as network flow, log information and user behaviors are fully mined, so that more comprehensive and accurate network security prediction service is provided, various threat types are identified by adopting the existing network characteristics, the severity and potential hazard of the threat are evaluated, an accurate threat prediction result is provided, an accurate defense measure scheme is generated by combining the threat evaluation result and the existing network attack means, and meanwhile, the network environment is automatically detected at regular intervals, so that the network security vulnerability coping capability is improved.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1: referring to fig. 1-7, the present embodiment is directed to a computer network security prediction system for analyzing big data information, which includes a data generating unit 1, a security detecting unit 2, an early warning feedback unit 50, and a data updating unit 60;
the data generating unit 1 is used for collecting the existing computer network data and classifying the collected data when predicting the existing computer network environment, screening according to the influence on network safety, and extracting the network characteristics of the reserved data;
the security detection unit 2 is used for detecting the network characteristics extracted by the data generation unit 1, analyzing the detected vulnerabilities combined with network attack means, simulating corresponding defending measures according to the detected vulnerabilities, and simultaneously evaluating the defending measures combined with the existing computer network environment;
the early warning feedback unit 50 is used for conveying the defensive measure scheme simulated by the safety detection unit 2 to the cloud according to the evaluation result and collecting cloud feedback data;
the data updating unit 60 is configured to determine the defensive measure scheme simulated by the security detection unit 2 according to the cloud feedback data collected by the early warning feedback unit 50, and transmit the defensive measure scheme simulated by the security detection unit 2 to the existing computer network environment according to the determination result, thereby completing the updating of the computer network environment and simultaneously performing periodic prediction on the computer network environment.
The data generation unit (1) comprises a network data acquisition module (10) and a data processing module (20);
the information collection module 11 is used for collecting real-time network environment data of a network needing prediction;
the information classification module 12 is used for performing property analysis on the network environment data collected by the information collection module 11, and packaging and classifying the data according to different properties. Data traffic flowing through the network is monitored and analyzed by the network device using network packet capture techniques. System logs and application logs are collected periodically from network devices and servers using a log collector and monitoring agent. And using a proxy server and a monitoring tool to record the behaviors and operations of the user on the computer network, and classifying and storing the data according to the flow, the log, the behaviors and the operations.
The data processing module 20 includes a data cleaning module 21 and a data extraction module 22;
the data cleaning module 21 is configured to clean data in the data packets classified by the information classification module 12 according to the influence on the network security environment, and retain the network environment data with influence; the collected data is denoised and preprocessed by using a data cleaning algorithm to remove invalid and redundant data, and the following is a common data cleaning algorithm:
missing value filling algorithm: common algorithms include mean filling, median filling, mode filling, interpolation, and the like. Wherein, the mean filling formula is: the mean of the column in which the missing value is located.
The data extraction module 22 is configured to perform feature extraction on the network environment data retained by the data cleansing module 21 using a statistical analysis algorithm.
The data extraction module 22 performs the following statistical analysis algorithm steps:
performing feature extraction and normalization processing on the data to enable the data to be suitable for clustering analysis, so as to obtain class labels of each sample;
calculating the center point of each category according to the category label, visually displaying and explaining the clustering center, and finding out the characteristics and regularity of each category;
and obtaining the network environment data characteristics according to the characteristic category.
Critical information and patterns, such as abnormal behavior, potential threats, and attack signatures, are extracted from the cleaned data. With statistical analysis methods, for descriptive statistical analysis, the mean formula can be used:
;
where mean represents the mean, xi represents the value of the ith data point, and n represents the number of data points.
For correlation analysis, the formula of pearson correlation coefficient can be used:
;
wherein r represents a correlation coefficient, xi and yi represent values of two variables at an ith observation point, bar { x } and bar { y } represent average values of the two variables, so that key information in different indexes is extracted, and accurate prediction and evaluation of network security threat are realized.
The security detection unit 2 includes a threat assessment module 30 and a defense generation module 40;
the threat assessment module 30 is configured to assess the analysis data of the data processing module 20 in combination with a network environment to be predicted, and find a corresponding attack means in the network according to the assessment result;
the defense generating module 40 is configured to analyze the attack means collected by the threat assessment module 30, obtain a defense modification scheme, and assess the defense modification scheme according to a network environment that needs to be predicted.
Threat assessment module 30 includes a vulnerability extraction module 31 and an attack analysis module 32;
the vulnerability extraction module 31 is configured to analyze the feature data extracted by the data extraction module 22 in combination with a real-time network environment, so as to obtain network vulnerability information; using machine learning algorithms and pattern matching techniques:
the regular expression: text for matching certain rules is commonly used in text processing.
KMP algorithm: the method is used for searching the appearance position of a pattern string P in a text string S, and the speed is high.
BM algorithm: the method is also one of character string matching algorithms, can quickly inquire whether the text string contains a mode string, compares and classifies the extracted key information with the known security threat, and determines the threat type. The level of threat is assessed by comprehensively considering the severity, potential hazard and possibility of the threat in combination with the historical data and the rule engine.
The attack analysis module 32 is configured to collect corresponding network data attack means in the network according to the network vulnerability information obtained by the vulnerability extraction module 31, and perform combination classification on the collected network data attack means. The method for carrying out combined classification on the collected network data attack means generally uses a clustering algorithm, and common clustering algorithms include hierarchical clustering, K-means clustering, DBSCAN clustering and the like. Taking K-means clustering as an example, the formula is as follows:
given a data set X and a number of clusters K, the goal of the K-means clustering algorithm is to divide the data set into K clusters such that the similarity of data points within each cluster is highest, and the similarity between different clusters is lowest, where the similarity can be defined according to different data sets and problems.
Randomly selecting k data points as an initial centroid;
assigning all data points to clusters in which the closest centroid is located;
next, for each cluster, calculating the center point of all data points therein as a new centroid;
finally, the above process is repeated until the centroid does not change significantly any more or reaches a preset number of iterations. By clustering the network data attack means, similar attack means can be put together, and meanwhile, differences, rules and characteristics among different attack means are found, so that powerful support is provided for network security defense.
The defense generating module 40 includes a defense evaluating module 41 and an implementation analyzing module 42;
the defense evaluation module 41 is configured to analyze the network data attack means collected by the attack analysis module 32 in combination with the existing network vulnerability restoration means, so as to obtain a corresponding system update scheme;
the formula for analyzing the detected vulnerabilities combined with the network attack means is as follows:
detecting the influence of vulnerability and analysis vulnerability on network attack- > making corresponding countermeasures
Analyzing possible intrusion paths and potential attack targets of an attacker by using a network topology analysis and path inference algorithm, providing basis for formulating defending measures, and analyzing the network topology:
path algorithm: common path algorithms are Dijkstra's algorithm, bellman-Ford algorithm, etc. The formula of Dijkstra algorithm is as follows:
page ranking algorithm: common page ranking algorithms are the PageRank algorithm, the HITS algorithm, and the like. The formula of the PageRank algorithm is as follows:
;
wherein,representing web page->Ranking value of->For damping factor->For the total number of pages in the website, < > is->For web page->Is the number of outgoing chains.
The path inference algorithm expression is as follows:
;
wherein the method comprises the steps ofFor time stamp->For the source IP address>For the target IP address>For the source port->As a result of the fact that the destination port,is a protocol type.
And (3) flow information analysis: modeling flow information by using algorithms such as a support vector machine, so as to accurately analyze attack types and attack sources and identify potential attack targets;
TTL-basedTraceroute algorithm: different router hops in the network are probed by modifying the TTL field of the Traceroute tool. The algorithm has no definite formula, but can infer path information in the network by intercepting network data packets and analyzing TTL fields in the network data packets
The implementation analysis module 42 is configured to analyze the system update scheme acquired by the defense evaluation module 41 in conjunction with the network environment that needs to be predicted. Using data analysis methods
The data analysis method is mainly used for predicting the change trend and the bandwidth requirement of the future network environment based on historical data or real-time data.
Time series analysis: time series analysis can help us link historical data with future trends, and common methods are ARIMA model and ARMA model.
Regression analysis: regression analysis can help us find correlations between different variables to predict values of future variables. The common methods are linear regression, polynomial regression, etc.
The early warning feedback unit 50 includes a report generation module 51 and a feedback collection module 52;
the report generating module 51 is configured to screen invalid data in the system update scheme according to the analysis result of the implementation analysis module 42, and send the screened system update scheme to the cloud;
the filtering expression of invalid data in the system updating scheme is as follows:
;
wherein,for unlimited data value, when the value is greater than 1, the data is invalid, otherwise, when the value is less than 1, the data is valid, and +.>To obtain a system update scheme->For analysis of system update scheme->Get combined implementation of system update scheme->To evaluate the effect of the implementation, the final pass +.>The index size judges the validity of the data and provides better data value.
The feedback collection module 52 is configured to collect cloud feedback data of the system update scheme, send the system update scheme to the user, and collect user modification information.
And sending the system updating scheme to the user. You can input "please send system update scheme" in GalaxyBot, i can provide you with the latest system update scheme according to what you input.
User modification information is collected. If you want to modify this system update scheme, you can enter "i want to modify some information" in GalaxyBot, after which i ask you what you need to be modified. The method can be concretely as follows: if you want to modify the description of the new function of the system you can enter: the description of "please modify new functions" is "wherein '×' represents what you want to modify. -if you want to modify the update time, you can enter: "please modify update time" is "wherein'/represents the time you want to modify.
The scheme is modified. I will make corresponding modifications to the system update scheme based on the modification information you provide.
A new update scheme is generated. Next, i will present the modified system update scenario to you for your review and validation. The general method formula is as follows: sending system update scheme- > user provides modification information- > modifies scheme- > generates new update scheme.
The data update unit 60 includes a system optimization module 61 and a report update module 62;
the system optimization module 61 is configured to determine the system update scheme screened by the report generation module 51 according to the feedback data collected by the feedback collection module 52, and send the system update scheme to the real-time network environment data according to the determination result, so as to complete the update of the network environment security data;
the report update module 62 is configured to continuously predict real-time network environment data, periodically report to generate a network security report according to a time interval set by a user, and include contents such as security event statistics, threat trend analysis, and defense effect evaluation, so as to help the user understand network security status and take corresponding countermeasures, and keep the real-time network environment data continuously updated.
The foregoing has shown and described the basic principles, principal features and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the above-described embodiments, and that the above-described embodiments and descriptions are only preferred embodiments of the present invention, and are not intended to limit the invention, and that various changes and modifications may be made therein without departing from the spirit and scope of the invention as claimed. The scope of the invention is defined by the appended claims and equivalents thereof.