CN116776334A - Office software vulnerability analysis method based on big data - Google Patents

Office software vulnerability analysis method based on big data Download PDF

Info

Publication number
CN116776334A
CN116776334A CN202310619236.2A CN202310619236A CN116776334A CN 116776334 A CN116776334 A CN 116776334A CN 202310619236 A CN202310619236 A CN 202310619236A CN 116776334 A CN116776334 A CN 116776334A
Authority
CN
China
Prior art keywords
vulnerability
data
software
historical
algorithm
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202310619236.2A
Other languages
Chinese (zh)
Inventor
和欣彤
胡丽牡
崔海春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Individual
Original Assignee
Individual
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Individual filed Critical Individual
Priority to CN202310619236.2A priority Critical patent/CN116776334A/en
Publication of CN116776334A publication Critical patent/CN116776334A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
    • G06F21/577Assessing vulnerabilities and evaluating computer system security
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3409Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
    • G06F11/3414Workload generation, e.g. scripts, playback
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3438Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment monitoring of user actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/34Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation ; Recording or statistical evaluation of user activity, e.g. usability assessment
    • G06F11/3466Performance evaluation by tracing or monitoring
    • G06F11/3476Data logging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/10Pre-processing; Data cleansing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2431Multiple classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2123/00Data types
    • G06F2123/02Data types in the time domain, e.g. time-series data

Abstract

The invention relates to the technical field of office software security, in particular to an office software vulnerability analysis method based on big data. The method comprises the following steps: performing data integration processing on the vulnerability data of the office software and the vulnerability data of the vulnerability information library by using a big data integration algorithm to obtain software vulnerability data; performing historical data acquisition processing on the software vulnerability data through a historical backtracking algorithm to obtain software historical vulnerability data; performing text analysis processing on the software historical vulnerability data by using a natural language processing technology to obtain software historical vulnerability text data; performing user behavior acquisition processing on the software history vulnerability text data by using a behavior acquisition technology to obtain a user use behavior log; and performing association mining analysis on the software historical vulnerability data through an association data mining algorithm to obtain key features of the historical vulnerability data. According to the method, new vulnerability information is adaptively processed through incremental learning, so that accuracy and stability of vulnerability detection are improved.

Description

Office software vulnerability analysis method based on big data
Technical Field
The invention relates to the technical field of office software security, in particular to an office software vulnerability analysis method based on big data.
Background
Office software is widely used in various fields, but the security problem thereof is also attracting attention. Office software vulnerabilities may be exploited by hackers, resulting in significant security events such as personal and business information leakage, funds theft, etc. Currently, many office software vulnerability analysis methods exist, but the methods are often limited by the scale of a data set, the data quality and other limitations, and the information such as the cause, influence and risk of a new vulnerability cannot be accurately analyzed. In addition, detection and repair techniques for software vulnerabilities have evolved greatly. However, in terms of software vulnerability detection, the conventional method requires a large amount of manpower and material resources, and only single vulnerability detection is often realized. Meanwhile, the conventional vulnerability detection model generally needs to be retrained, and cannot adaptively process new vulnerability situations. This presents a significant challenge to vulnerability detection and repair efforts.
Disclosure of Invention
Based on this, the present invention needs to provide an office software vulnerability analysis method based on big data to solve at least one of the above technical problems.
In order to achieve the above purpose, an office software vulnerability analysis method based on big data comprises the following steps:
Step S1: performing data integration processing on the vulnerability data of the office software and the vulnerability data of the vulnerability information library by using a big data integration algorithm to obtain software vulnerability data; performing historical data acquisition processing on the software vulnerability data through a historical backtracking algorithm to obtain software historical vulnerability data;
step S2: performing text analysis processing on the software historical vulnerability data by using a natural language processing technology to obtain software historical vulnerability text data; performing user behavior acquisition processing on the software history vulnerability text data by using a behavior acquisition technology to obtain a user use behavior log;
step S3: performing association mining analysis on the software historical vulnerability data through an association data mining algorithm to obtain key features of the historical vulnerability data; performing feature classification processing on key features of the historical vulnerability data by using a preset vulnerability classification algorithm to obtain a vulnerability analysis data set;
step S4: performing vulnerability detection on the vulnerability analysis data set by using a vulnerability prediction model based on deep learning and incremental learning to obtain a vulnerability data set;
step S5: combining and associating the user using behavior log and the vulnerability data set by using a data association technology to obtain vulnerability association data; performing vulnerability analysis processing on the vulnerability association data through a time sequence analysis technology to obtain vulnerability formation rule data;
Step S6: performing vulnerability risk prediction on vulnerability formation rule data through a risk detection algorithm to obtain a vulnerability risk analysis result; and executing corresponding bug repairing measures according to the bug risk analysis result.
According to the method, the big data integration algorithm is utilized to integrate the vulnerability data from different sources, and the more comprehensive and more accurate software vulnerability data is obtained after integration. The historical backtracking algorithm is used for carrying out historical data acquisition processing on the software vulnerability data so as to acquire the software historical vulnerability data and provide basic data for subsequent vulnerability analysis. And performing text analysis processing on the historical vulnerability text data by using a natural language processing technology, and extracting key information such as vulnerability descriptions, vulnerability types, vulnerability grades, influence ranges and the like to further extract the software historical vulnerability text data. Meanwhile, the behavior of the user on office software is monitored by using a behavior acquisition technology, a user use behavior log is obtained, and key data are provided for subsequent analysis. And analyzing the historical vulnerability data of the software through a correlation data mining algorithm to obtain key features of the historical vulnerability data, such as vulnerability types, influence ranges, vulnerability grades, vulnerability sources and the like. And performing vulnerability classification processing on key features of the historical vulnerability data by using a preset vulnerability classification algorithm to obtain a vulnerability analysis data set, which is helpful for in-depth analysis of features and trends of vulnerabilities of different categories. And performing vulnerability detection on the vulnerability analysis data set by using a vulnerability prediction model based on deep learning and incremental learning, detecting potential vulnerability and improving vulnerability detection efficiency and accuracy. And then, combining and associating the user using the behavior log and the vulnerability data set by using a data association technology to obtain vulnerability association data. And performing vulnerability analysis processing on the vulnerability association data through a time sequence analysis technology to obtain important data about vulnerability formation rules and trends, such as vulnerability exploitation time, exploitation environment, attack range and the like, and further knowing the vulnerability exploitation mode, thereby helping to improve the efficiency and accuracy of vulnerability detection and defense strategy formulation. Finally, performing vulnerability risk prediction on vulnerability formation rule data through a risk detection algorithm to obtain a vulnerability risk analysis result, rapidly identifying high-risk vulnerabilities, and formulating a corresponding vulnerability repair scheme. And executing corresponding bug repairing measures according to the bug risk analysis result, improving the software security and ensuring the normal operation of the office system.
Drawings
Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of a non-limiting implementation, made with reference to the accompanying drawings in which:
FIG. 1 is a schematic flow chart of the steps of the big data based office software vulnerability analysis method of the present invention;
FIG. 2 is a detailed step flow chart of step S2 in FIG. 1;
FIG. 3 is a detailed flowchart illustrating the step S22 in FIG. 2;
FIG. 4 is a detailed step flow chart of step S3 in FIG. 1;
FIG. 5 is a detailed flowchart illustrating the step S31 in FIG. 4;
FIG. 6 is a detailed step flow chart of step S4 in FIG. 1;
fig. 7 is a detailed step flow diagram of step S41 in fig. 6.
Detailed Description
The following is a clear and complete description of the technical method of the present patent in conjunction with the accompanying drawings, and it is evident that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, are intended to fall within the scope of the present invention.
Furthermore, the drawings are merely schematic illustrations of the present invention and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. The functional entities may be implemented in software or in one or more hardware modules or integrated circuits or in different networks and/or processor methods and/or microcontroller methods.
It will be understood that, although the terms "first," "second," etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another element. For example, a first element could be termed a second element, and, similarly, a second element could be termed a first element, without departing from the scope of example embodiments. The term "and/or" as used herein includes any and all combinations of one or more of the associated listed items.
In order to achieve the above objective, referring to fig. 1 to 7, the present invention provides an office software vulnerability analysis method based on big data, the method comprises the following steps:
step S1: performing data integration processing on the vulnerability data of the office software and the vulnerability data of the vulnerability information library by using a big data integration algorithm to obtain software vulnerability data; performing historical data acquisition processing on the software vulnerability data through a historical backtracking algorithm to obtain software historical vulnerability data;
step S2: performing text analysis processing on the software historical vulnerability data by using a natural language processing technology to obtain software historical vulnerability text data; performing user behavior acquisition processing on the software history vulnerability text data by using a behavior acquisition technology to obtain a user use behavior log;
Step S3: performing association mining analysis on the software historical vulnerability data through an association data mining algorithm to obtain key features of the historical vulnerability data; performing feature classification processing on key features of the historical vulnerability data by using a preset vulnerability classification algorithm to obtain a vulnerability analysis data set;
step S4: performing vulnerability detection on the vulnerability analysis data set by using a vulnerability prediction model based on deep learning and incremental learning to obtain a vulnerability data set;
step S5: combining and associating the user using behavior log and the vulnerability data set by using a data association technology to obtain vulnerability association data; performing vulnerability analysis processing on the vulnerability association data through a time sequence analysis technology to obtain vulnerability formation rule data;
step S6: performing vulnerability risk prediction on vulnerability formation rule data through a risk detection algorithm to obtain a vulnerability risk analysis result; and executing corresponding bug repairing measures according to the bug risk analysis result.
In the embodiment of the present invention, please refer to fig. 1, which is a schematic flow chart of steps of the big data based office software vulnerability analysis method of the present invention, in this example, the steps of the big data based office software vulnerability analysis method include:
Step S1: performing data integration processing on the vulnerability data of the office software and the vulnerability data of the vulnerability information library by using a big data integration algorithm to obtain software vulnerability data; performing historical data acquisition processing on the software vulnerability data through a historical backtracking algorithm to obtain software historical vulnerability data;
according to the embodiment of the invention, the vulnerability data of the office software and the vulnerability data of the vulnerability information base are subjected to standardized processing, so that the data formats of the vulnerability data and the vulnerability data are consistent. And then, selecting a proper big data integration algorithm, and carrying out data integration processing on the vulnerability data of the office software and the vulnerability data of the vulnerability information base through the selected big data integration algorithm so as to obtain software vulnerability data. Finally, performing historical backtracking processing on the software vulnerability data by setting a proper historical backtracking algorithm, and finally obtaining the software historical vulnerability data.
Step S2: performing text analysis processing on the software historical vulnerability data by using a natural language processing technology to obtain software historical vulnerability text data; performing user behavior acquisition processing on the software history vulnerability text data by using a behavior acquisition technology to obtain a user use behavior log;
According to the embodiment of the invention, the obtained software historical vulnerability data is analyzed and processed by using a natural language processing technology, and text information in the software historical vulnerability data is obtained, so that the software historical vulnerability text data is obtained. And then, user behavior collection is carried out on the software history vulnerability text data by using a proper behavior collection technology, the relation between the software vulnerability and the user use behavior is comprehensively known, the use condition of the software user is effectively collected, the high-dimensional user use behavior data is compressed into a lower-dimensional log form by a proper log conversion algorithm, and finally the user use behavior log is obtained.
Step S3: performing association mining analysis on the software historical vulnerability data through an association data mining algorithm to obtain key features of the historical vulnerability data; performing feature classification processing on key features of the historical vulnerability data by using a preset vulnerability classification algorithm to obtain a vulnerability analysis data set;
according to the embodiment of the invention, the software historical vulnerability data is subjected to association mining analysis by selecting a proper association degree data mining algorithm, and key characteristics of the historical vulnerability data are obtained through a data mining technology. Then, a proper vulnerability classification algorithm model is constructed through a support vector machine algorithm, and feature classification processing is carried out on key features of historical vulnerability data by utilizing the vulnerability classification algorithm model, so that a vulnerability analysis data set is finally obtained.
Step S4: performing vulnerability detection on the vulnerability analysis data set by using a vulnerability prediction model based on deep learning and incremental learning to obtain a vulnerability data set;
according to the embodiment of the invention, a proper vulnerability prediction model is constructed in a mode of combining a deep learning algorithm and an incremental learning algorithm, and vulnerability detection is carried out on a vulnerability analysis data set through the constructed vulnerability prediction model, so that a vulnerability data set is finally obtained.
Step S5: combining and associating the user using behavior log and the vulnerability data set by using a data association technology to obtain vulnerability association data; performing vulnerability analysis processing on the vulnerability association data through a time sequence analysis technology to obtain vulnerability formation rule data;
according to the embodiment of the invention, the user using behavior log and the vulnerability data set are combined and associated through the set data association technology, and the correlation among different vulnerabilities and the association mode between the vulnerabilities and the user using behaviors are obtained, so that vulnerability association data are obtained. Then, performing vulnerability analysis processing on the vulnerability association data by using a time sequence analysis technology to obtain a time rule and a periodicity rule of vulnerability formation, and finally obtaining vulnerability formation rule data.
Step S6: performing vulnerability risk prediction on vulnerability formation rule data through a risk detection algorithm to obtain a vulnerability risk analysis result; and executing corresponding bug repairing measures according to the bug risk analysis result.
According to the embodiment of the invention, the vulnerability formation rule data is subjected to vulnerability risk prediction by setting a proper risk detection algorithm, so that a vulnerability risk analysis result is obtained. And according to the vulnerability risk analysis result, corresponding vulnerability restoration schemes are formulated for different vulnerability types and grades, corresponding vulnerability restoration measures are executed through the vulnerability restoration schemes, vulnerability restoration is carried out on office software, and software safety and stability are ensured.
According to the method, the big data integration algorithm is utilized to integrate the vulnerability data from different sources, and the more comprehensive and more accurate software vulnerability data is obtained after integration. The historical backtracking algorithm is used for carrying out historical data acquisition processing on the software vulnerability data so as to acquire the software historical vulnerability data and provide basic data for subsequent vulnerability analysis. And performing text analysis processing on the historical vulnerability text data by using a natural language processing technology, and extracting key information such as vulnerability descriptions, vulnerability types, vulnerability grades, influence ranges and the like to further extract the software historical vulnerability text data. Meanwhile, the behavior of the user on office software is monitored by using a behavior acquisition technology, a user use behavior log is obtained, and key data are provided for subsequent analysis. And analyzing the historical vulnerability data of the software through a correlation data mining algorithm to obtain key features of the historical vulnerability data, such as vulnerability types, influence ranges, vulnerability grades, vulnerability sources and the like. And performing vulnerability classification processing on key features of the historical vulnerability data by using a preset vulnerability classification algorithm to obtain a vulnerability analysis data set, which is helpful for in-depth analysis of features and trends of vulnerabilities of different categories. And performing vulnerability detection on the vulnerability analysis data set by using a vulnerability prediction model based on deep learning and incremental learning, detecting potential vulnerability and improving vulnerability detection efficiency and accuracy. And then, combining and associating the user using the behavior log and the vulnerability data set by using a data association technology to obtain vulnerability association data. And performing vulnerability analysis processing on the vulnerability association data through a time sequence analysis technology to obtain important data about vulnerability formation rules and trends, such as vulnerability exploitation time, exploitation environment, attack range and the like, and further knowing the vulnerability exploitation mode, thereby helping to improve the efficiency and accuracy of vulnerability detection and defense strategy formulation. Finally, performing vulnerability risk prediction on vulnerability formation rule data through a risk detection algorithm to obtain a vulnerability risk analysis result, rapidly identifying high-risk vulnerabilities, and formulating a corresponding vulnerability repair scheme. And executing corresponding bug repairing measures according to the bug risk analysis result, improving the software security and ensuring the normal operation of the office system.
Preferably, step S1 comprises the steps of:
step S11: performing data integration processing on the vulnerability data of the office software and the vulnerability data of the vulnerability information base by utilizing a big data integration algorithm to obtain software vulnerability initial data;
the embodiment of the invention firstly determines the data format of the vulnerability data of the office software and the vulnerability data of the vulnerability information base, and performs standardized processing on the vulnerability data of the office software and the vulnerability data of the vulnerability information base, so that the data formats of the vulnerability data and the vulnerability data of the vulnerability information base are consistent. Then, a proper big data integration algorithm is selected, the weight, characteristic coefficient, time distribution function adjustment coefficient, time distribution function and other information of each vulnerability data are determined in the big data integration algorithm, and meanwhile, a proper calculation interval and a proper data integration time variable are selected. And carrying out data integration processing on the vulnerability data of the office software and the vulnerability data of the vulnerability information base through the selected big data integration algorithm, and finally obtaining software vulnerability initial data.
The big data integration algorithm function is as follows:
wherein D (x) is initial data of software loopholes, x is structural characteristics of the loopholes, n is the quantity of the loopholes, and w i (x) An integration weight parameter alpha for ith vulnerability data i Characteristic coefficient beta of ith vulnerability data i (t) is the time distribution function coefficient of the ith vulnerability data, t is the data integration duration, t' is the data integration time variable, gamma i Adjusting the coefficient for the time distribution function of the ith vulnerability data, f i (x, t') is the time distribution function of the ith vulnerability data, g i (x) The ith vulnerability data of office software, h i (x) The i-th vulnerability data of the vulnerability information base is obtained, and mu is a correction value of initial software vulnerability data;
the invention constructs a formula of a big data integration algorithm function, which is used for carrying out data integration processing on the vulnerability data of office software and the vulnerability data of a vulnerability information base, thereby obtaining software vulnerability initial data. The algorithm performs weighting processing on different vulnerability data through different weight parameters, wherein the weight parameters comprise an integrated weight parameter, a characteristic coefficient, a time distribution function adjustment coefficient and the like, the weight parameters are set according to actual requirements, and the influence of irrelevant or even redundant vulnerability data on the whole analysis result can be effectively avoided through the weighting processing, so that the algorithm is more complete Surface, more accurate software vulnerability initial data. The structural feature x of the vulnerability data, the number n of the vulnerability data and the integration weight parameter w of the ith vulnerability data are fully considered by the algorithm function formula i (x) Characteristic coefficient alpha of ith vulnerability data i Time distribution function coefficient beta of ith vulnerability data i (t) data integration duration t, data integration time variable t', time distribution function adjustment coefficient gamma of ith vulnerability data i Time distribution function f of ith vulnerability data i (x, t') ith vulnerability data g of office software i (x) Ith vulnerability data h of vulnerability information base i (x) In addition, in order to prevent deviation in the integration process, normalization processing is required to be performed on the initial data of the software bug, and a functional relationship is formed according to the correlation between the initial data D (x) of the software bug and the parametersThe method realizes the data integration processing of the vulnerability data of the office software and the vulnerability data of the vulnerability information library, and simultaneously, the correction value mu of the software vulnerability initial data in the formula can be adjusted according to the actual situation, thereby improving the accuracy and the applicability of the big data integration algorithm.
Step S12: performing data preprocessing on the initial data of the software vulnerability to obtain data of the software vulnerability;
According to the embodiment of the invention, the software vulnerability data is finally obtained after the data preprocessing such as repeated value removal, abnormal value removal, missing value filling, normalization, standardization and the like is performed on the software vulnerability initial data.
Step S13: performing historical data acquisition processing on the software vulnerability data through a historical backtracking algorithm to obtain software historical vulnerability data;
according to the embodiment of the invention, by setting a proper historical backtracking algorithm, determining the algorithm weight function and the vulnerability data distribution function, and calculating the weight of each piece of software vulnerability data in the historical backtracking process through the weight function, the influence degree of the software vulnerability data on the historical vulnerability data is represented. The vulnerability data distribution function is used for calculating the vulnerability distribution condition of the software vulnerability data in the historical time period. And (3) carrying out historical data acquisition processing by combining the weight function and the vulnerability data distribution function, and finally obtaining the software historical vulnerability data.
In the embodiment of the invention, the history backtracking algorithm function is as follows:
wherein V is j (X) is software historical vulnerability data after historical tracing of jth software vulnerability data, N is the number of the software vulnerability data after historical tracing, X is the software vulnerability data, W j (X) is the weight function of the jth software vulnerability data, sigma j The standard deviation of vulnerability data of jth software vulnerability data, exp is an exponential function, T is a historical tracing time variable, F j (X, T) is the vulnerability data distribution function of the jth software vulnerability data at time T, T j-1 Starting time for historical tracing of jth software vulnerability data, T j And (3) performing historical tracing on the jth software vulnerability data, wherein S (T) is the number of vulnerability data of which the historical tracing time exceeds T, and epsilon is the correction value of the software historical vulnerability data.
The invention constructs a formula of a historical backtracking algorithm function, which is used for carrying out historical data acquisition processing on software vulnerability data so as to obtain the software historical vulnerability data and provide a basic data source for subsequent vulnerability analysis. The algorithm can better understand the law of the occurrence of the software loopholes through analyzing the historical loophole data of the software, provides references for the prediction and prevention of the loopholes, and mainly comprises a weight function and a loophole data distribution function. The weight function is used for calculating the weight of each piece of software vulnerability data in the historical backtracking, and the influence degree of the software vulnerability data on the historical vulnerability data is represented. Different software vulnerability data may have different weights, so the weight of each software vulnerability data needs to be calculated for software vulnerability data integration and history backtracking. While the vulnerability data distribution function is history backtracking The core part of the algorithm is used for calculating the vulnerability distribution situation of the software vulnerability data in the historical time period, so that the accuracy of vulnerability detection is improved. The algorithm function formula fully considers the quantity N of software vulnerability data, the software vulnerability data X and the weight function W of the jth software vulnerability data for historical tracing j (X) vulnerability data standard deviation sigma of jth software vulnerability data j Historical tracing time variable T and vulnerability data distribution function F of jth software vulnerability data at time T j Starting time T of historical tracing of jth software vulnerability data j-1 Ending time T of history tracing of jth software vulnerability data j The number S (T) of the vulnerability data with the historical tracing time exceeding T, and the software historical vulnerability data V after historical tracing is carried out according to the j-th software vulnerability data j (X) combining the exponential function exp with the correlation between the above parameters to form a functional relationshipThe method realizes the collection and processing of the historical data of the software vulnerability data, and simultaneously, the introduction of the correction value epsilon of the software historical vulnerability data can be adjusted according to actual conditions, so that the applicability and the stability of a historical backtracking algorithm are improved.
According to the method, the big data integration algorithm is utilized to integrate the vulnerability data of different data sources, so that the software vulnerability initial data which is wider and more comprehensive in coverage is obtained, the problems of inconsistent data, sparsity of the vulnerability data and the like are effectively solved, and the accuracy and the comprehensiveness of vulnerability identification are improved. By preprocessing the initial data of the software loopholes, the accuracy and efficiency of the loophole prediction can be improved, and the method is also beneficial to finding and repairing the loophole problems in the software system. The historical data collection processing is carried out on the software vulnerability data through the historical backtracking algorithm, the software historical vulnerability data is obtained, the new vulnerability possibly occurring is predicted by utilizing the historical vulnerability data, corresponding precaution and preventive measures are timely taken, the vulnerability response speed can be greatly improved, and the harm of the vulnerability to office software is effectively reduced. This helps software developers, administrators, and security specialists to better assess and manage the risk of software vulnerabilities. By comprehensively utilizing the vulnerability data of different data sources and combining the collection and analysis of the historical vulnerability data, the vulnerability in the software system can be detected more comprehensively and accurately, the vulnerability detection rate is improved to the greatest extent, and early discovery, early warning and quick disposal are realized.
Preferably, step S2 comprises the steps of:
step S21: performing text analysis processing on the software historical vulnerability data by using a natural language processing technology to obtain software historical vulnerability text data;
step S22: noise reduction processing is carried out on the software history vulnerability text data through a vulnerability noise reduction algorithm, so that software history vulnerability noise reduction data are obtained;
step S23: performing user behavior acquisition processing on the software historical vulnerability noise reduction data by using a behavior acquisition technology to obtain user behavior data;
step S24: performing log conversion processing on the user use behavior data based on a log conversion algorithm to obtain a user use behavior log;
the log conversion algorithm function is as follows:
in the method, in the process of the invention,as a log conversion algorithm function, Z is a time variable of log conversion, s is a log space mapping coordinate after log conversion, y is an initial space mapping coordinate of user use behavior data, Z is a time range of log conversion, R is Gaussian kernel of the log conversion algorithm function, m is Gaussian kernel number, p (y, Z) is a probability density function of the initial space mapping coordinate y of the user use behavior data under the time Z>Is the (r) th highWeight of ston, q r (y) the shape function of the (r) th Gaussian kernel, exp is an exponential function, y r For the spatial center coordinates of the r-th Gaussian kernel,/->The space standard deviation of the Gaussian kernel is ρ, the space variance of the Gaussian kernel is ρ, and i is the correction value of the log transformation algorithm function.
As an embodiment of the present invention, referring to fig. 2, a detailed step flow chart of step S2 in fig. 1 is shown, in which step S2 includes the following steps:
step S21: performing text analysis processing on the software historical vulnerability data by using a natural language processing technology to obtain software historical vulnerability text data;
according to the embodiment of the invention, the obtained software historical vulnerability data is analyzed and processed by using a natural language processing technology, the text information in the software historical vulnerability data is obtained, and finally the software historical vulnerability text data is obtained.
Step S22: noise reduction processing is carried out on the software history vulnerability text data through a vulnerability noise reduction algorithm, so that software history vulnerability noise reduction data are obtained;
according to the embodiment of the invention, the software history vulnerability text data is converted into the frequency domain data format required by the vulnerability noise reduction algorithm by selecting the proper vulnerability noise reduction algorithm, and the converted software history vulnerability text data is input into the vulnerability noise reduction algorithm to finally obtain the software history vulnerability noise reduction data.
Step S23: performing user behavior acquisition processing on the software historical vulnerability noise reduction data by using a behavior acquisition technology to obtain user behavior data;
according to the embodiment of the invention, the user behavior collection is carried out on the software historical vulnerability noise reduction data through the behavior collection technology, the relation between the software vulnerability and the user use behavior is comprehensively known, the use condition of the software user is effectively collected, and the user use behavior data is finally obtained.
Step S24: performing log conversion processing on the user use behavior data based on a log conversion algorithm to obtain a user use behavior log;
according to the embodiment of the invention, through selecting a proper log conversion algorithm and setting parameters such as the number of Gaussian kernels, the central coordinate, the weight, the shape function and the like of each Gaussian kernel, the spatial standard deviation and the spatial variance are determined, proper time variable is determined and set aiming at user use behavior data, and a probability density function of the user use behavior data under the time variable is designed for describing the distribution condition of the user use behavior data at the moment. And then, inputting the designed Gaussian kernel and probability density function into a log conversion algorithm for calculation, and selecting proper space mapping coordinates according to a calculation result for describing the space state of the user using behavior data at the current time. Finally, repeating the steps for different time variables to obtain a user use behavior log sequence for describing the evolution process of the user use behavior data on a time axis, and finally obtaining the user use behavior log.
The log conversion algorithm function is as follows:
in the method, in the process of the invention,as a log conversion algorithm function, Z is a time variable of log conversion, s is a log space mapping coordinate after log conversion, y is an initial space mapping coordinate of user use behavior data, Z is a time range of log conversion, R is Gaussian kernel of the log conversion algorithm function, m is Gaussian kernel number, p (y, Z) is a probability density function of the initial space mapping coordinate y of the user use behavior data under the time Z>Weight of the r-th Gaussian kernel, q r (y) the shape function of the (r) th Gaussian kernel, exp is an exponential function, y r Space for the r-th gaussian kernelCenter coordinates>The space standard deviation of the Gaussian kernel is ρ, the space variance of the Gaussian kernel is ρ, and 1 is the correction value of the log transformation algorithm function.
The invention constructs a formula of a log conversion algorithm function, which is used for carrying out log conversion processing on user use behavior data, thereby compressing the high-dimensional user use behavior data into a log form with lower dimension. The algorithm is an algorithm that maps user usage behavior data to a new space for analysis, and it implements log conversion of the data by representing the user usage behavior data as a linear combination of gaussian kernels. This approach can help us understand the user behavior trace and extract useful information to better identify software vulnerabilities. The algorithm function formula fully considers the time variable Z of log conversion, log space mapping coordinate s after log conversion, initial space mapping coordinate y of user using behavior data, time range Z of log conversion, gaussian kernel R of log conversion algorithm function, gaussian kernel quantity m, probability density function p (y, Z) of user using initial space mapping coordinate y of behavior data under time Z, weight of R-th Gaussian kernel Shape function q of the r-th gaussian kernel r (y), spatial center coordinates y of the (r) th Gaussian kernel r Spatial standard deviation of Gaussian kernel>The spatial variance ρ of the Gaussian kernel is +.>(z, s) in combination with the correlation between the exponential function exp and the above parameters constitutes a functional relationship ++>Realize the use of the line for the userThe method is used for log conversion processing of data, and meanwhile, the introduction of the correction value 1 of the log conversion algorithm function can be adjusted according to abnormal conditions in the log conversion processing process, so that the applicability and accuracy of the log conversion method are improved.
According to the method, the text analysis processing is carried out on the software historical vulnerability data through the natural language processing technology, so that key words and modes in vulnerability information can be captured, the effective information quantity contained in the software historical vulnerability data can be obtained timely, and useful information can be provided for subsequent vulnerability analysis. The vulnerability noise reduction algorithm can perform noise reduction processing according to random errors and adverse influence factors in software historical vulnerability text data, remove unnecessary false alarm data or noise data, and improve the accuracy of subsequent vulnerability detection and analysis. Then, the user behavior collection is performed through the behavior collection technology, so that software vulnerabilities can be more comprehensively known, and the use condition of the software user can be effectively collected. The data collection of the behavior can reveal the popular mode and the influence range of the loopholes, is beneficial to detecting and repairing historical loopholes, is beneficial to predicting the occurrence probability of current or future loopholes, and provides beneficial service for better loopholes management. Finally, the log conversion algorithm is used for performing dimension reduction processing on user use behavior data, compressing the high-dimension user use behavior data into a log form with lower dimension, so that data analysis and prediction processing are facilitated, meanwhile, the log converted data can be visually displayed in space, visual observation by vulnerability analysts is facilitated, and omnibearing high-quality vulnerability management is provided.
Preferably, step S22 comprises the steps of:
step S221: noise reduction processing is carried out on the software history vulnerability text data through a vulnerability noise reduction algorithm, so that a vulnerability noise value is obtained;
the vulnerability noise reduction algorithm function is as follows:
wherein E is a vulnerabilityNoise value, K is the number of software history vulnerability text data for noise reduction processing, K is the noise frequency variable of the software history vulnerability text data, K l For the noise frequency variable of the first software history hole text data, a and b are noise harmonic smoothing parameters of the software history hole text data, U (k-k) l ) Is a noise frequency domain weighting function of software history vulnerability text data, ζ is a noise weight parameter of the software history vulnerability text data, ω is a frequency k-k of the software history vulnerability text data l The lower vulnerability noise degree coefficient, θ is the correction value of the vulnerability noise value;
step S222: judging the vulnerability noise value according to a preset vulnerability noise threshold, and if the vulnerability noise value is greater than or equal to the preset vulnerability noise threshold, rejecting software history vulnerability text data corresponding to the vulnerability noise value to obtain software history vulnerability noise reduction data;
step S223: judging the vulnerability noise value according to a preset vulnerability noise threshold, and defining the software history vulnerability text data as software history vulnerability noise reduction data if the vulnerability noise value is smaller than the preset vulnerability noise threshold.
As an embodiment of the present invention, referring to fig. 3, a detailed step flow chart of step S22 in fig. 2 is shown, in which step S22 includes the following steps:
step S221: noise reduction processing is carried out on the software history vulnerability text data through a vulnerability noise reduction algorithm, so that a vulnerability noise value is obtained;
according to the embodiment of the invention, the software historical vulnerability text data is subjected to noise reduction processing by selecting the proper vulnerability noise reduction algorithm, the selected vulnerability noise reduction algorithm is utilized to carry out filtering smoothing noise reduction processing on the software historical vulnerability text data, the software historical vulnerability text data is converted into frequency domain noise data, the frequency domain noise value of each software historical vulnerability text data is calculated by the vulnerability noise reduction algorithm, and finally the vulnerability noise value is obtained.
The vulnerability noise reduction algorithm function is as follows:
wherein E is a vulnerability noise value, K is the number of software history vulnerability text data subjected to noise reduction processing, K is a noise frequency variable of the software history vulnerability text data, and K is l For the noise frequency variable of the first software history hole text data, a and b are noise harmonic smoothing parameters of the software history hole text data, U (k-k) l ) Is a noise frequency domain weighting function of software history vulnerability text data, ζ is a noise weight parameter of the software history vulnerability text data, ω is a frequency k-k of the software history vulnerability text data l The lower vulnerability noise degree coefficient, θ is the correction value of the vulnerability noise value;
according to the method, a formula of a vulnerability denoising algorithm function is constructed, so that noise reduction processing is needed to be carried out on software history vulnerability text data to obtain cleaner and accurate software history vulnerability text data in order to eliminate the influence of noise sources in the software history vulnerability text data on a subsequent vulnerability detection process, and noise and interference data in the software history vulnerability text data can be effectively removed through the vulnerability denoising algorithm, so that the quality and accuracy of the data are improved, and a reliable data base is provided for subsequent vulnerability detection and vulnerability restoration work. The algorithm function formula fully considers the quantity K of software history vulnerability text data for noise reduction, the noise frequency variable K of the software history vulnerability text data and the noise frequency variable K of the first software history vulnerability text data l Noise harmonic smoothing parameters a and b of software history vulnerability text data, noise frequency domain weighting function U (k-k l ) Noise weight parameter xi of software history loophole text data, and frequency k-k of software history loophole text data l In order to prevent deviation in the noise reduction process, normalization processing is required to be performed on the calculated vulnerability noise value, and a functional relationship is formed according to the correlation between the vulnerability noise value E and the parametersThe formula realizes the noise reduction processing of software history vulnerability text data, and meanwhile, the correction value theta of the vulnerability noise value in the algorithm formula can be adjusted according to actual conditions, so that the accuracy and the applicability of the vulnerability noise reduction algorithm are improved.
Step S222: judging the vulnerability noise value according to a preset vulnerability noise threshold, and if the vulnerability noise value is greater than or equal to the preset vulnerability noise threshold, rejecting software history vulnerability text data corresponding to the vulnerability noise value to obtain software history vulnerability noise reduction data;
according to the embodiment of the invention, whether the calculated vulnerability noise value exceeds the preset vulnerability noise threshold value is judged according to the preset vulnerability noise threshold value, when the vulnerability noise value is larger than or equal to the preset vulnerability noise threshold value, the situation that the influence of noise source interference in the software history vulnerability text data is larger is indicated, the software history vulnerability text data corresponding to the vulnerability noise value is removed, and finally the software history vulnerability noise reduction data is obtained.
Step S223: judging the vulnerability noise value according to a preset vulnerability noise threshold, and defining the software history vulnerability text data as software history vulnerability noise reduction data if the vulnerability noise value is smaller than the preset vulnerability noise threshold.
According to the embodiment of the invention, whether the calculated vulnerability noise value exceeds the preset vulnerability noise threshold value is judged according to the preset vulnerability noise threshold value, when the vulnerability noise value is smaller than the preset vulnerability noise threshold value, the fact that the noise source interference in the software history vulnerability text data is smaller is indicated, and the software history vulnerability text data corresponding to the vulnerability noise value is directly defined as software history vulnerability noise reduction data.
According to the method, noise reduction processing is carried out on the software history vulnerability text data through the vulnerability noise reduction algorithm, noise and error data in the software history vulnerability text data can be removed, and therefore accuracy and quality of the software history vulnerability text data are improved. The noise-reduced software history vulnerability text data is more beneficial to subsequent analysis and decision making, and noise and erroneous input which can mislead the decision making are effectively avoided. And the calculation of the vulnerability noise value can quantitatively evaluate the noise source data of the software history vulnerability text data according to the established parameters and weights, and determine a reasonable vulnerability noise threshold value. By setting a reasonable vulnerability noise threshold value, the credibility of the removed software history vulnerability text data can be judged more accurately, and the reliability and rationality of the noise reduction effect are improved. In addition, noise source data and abnormal data can be effectively removed in the noise reduction and elimination process of the software history vulnerability text data. Through filtering of the vulnerability noise reduction algorithm and calculation of noise values, distribution rules and trends of software historical vulnerability text data can be determined more accurately, and therefore data support with more reference values is provided for vulnerability analysis.
Preferably, step S3 comprises the steps of:
step S31: performing association mining analysis on the software historical vulnerability data through an association data mining algorithm to obtain key features of the historical vulnerability data;
step S32: performing data cleaning treatment on the key features of the historical vulnerability data to obtain a historical vulnerability key feature data set;
step S33: and performing feature classification processing on the historical vulnerability key feature data set by using a vulnerability classification algorithm based on a support vector machine to obtain a vulnerability analysis data set.
As an embodiment of the present invention, referring to fig. 4, a detailed step flow chart of step S3 in fig. 1 is shown, in which step S3 includes the following steps:
step S31: performing association mining analysis on the software historical vulnerability data through an association data mining algorithm to obtain key features of the historical vulnerability data;
according to the embodiment of the invention, the software historical vulnerability data is subjected to association mining analysis by selecting a proper association degree data mining algorithm, and the key characteristics of the historical vulnerability data are finally obtained.
Step S32: performing data cleaning treatment on the key features of the historical vulnerability data to obtain a historical vulnerability key feature data set;
According to the embodiment of the invention, the extracted key features of the historical vulnerability data are subjected to missing value filling, invalid value removal, abnormal value removal, repeated value removal and other processing, and the processed key features of the historical vulnerability data are subjected to data conversion, so that a data set of the key features of the historical vulnerability is finally obtained.
Step S33: and performing feature classification processing on the historical vulnerability key feature data set by using a vulnerability classification algorithm based on a support vector machine to obtain a vulnerability analysis data set.
The embodiment of the invention divides the historical vulnerability key characteristic data set into a training set and a testing set, and aims to verify the accuracy and stability of the vulnerability classification algorithm model. And training the historical vulnerability key characteristic training set by using a support vector machine classification algorithm to obtain a vulnerability classification algorithm model. And then, verifying the trained vulnerability classification algorithm model by using the test set, and evaluating indexes such as accuracy, recall rate and the like of the vulnerability classification algorithm model. And finally, classifying the new historical vulnerability key characteristic data set according to the evaluated vulnerability classification algorithm model to finally obtain a vulnerability analysis data set.
According to the method, the association relation between the historical vulnerability data of the software is mined by using the association data mining algorithm, so that key features related to the vulnerability are found, and the key features are helpful for deeper understanding of the reasons and rules of the occurrence of the vulnerability, so that the software safety protection is enhanced pertinently. The data cleaning process can effectively remove dirty data and abnormal data in the key features of the historical vulnerability data, and accuracy and reliability of the key feature data set of the historical vulnerability are improved. After the interference data are removed, the result of the vulnerability analysis is more accurate and reliable. The vulnerability classification algorithm based on the support vector machine can classify and process the historical vulnerability key characteristic data set and obtain a corresponding vulnerability analysis data set. The analysis data sets can help office software management personnel to better know information such as the type, the occurrence frequency, the influence degree and the like of the loopholes, and provide references for the bug repair.
Preferably, step S31 comprises the steps of:
step S311: extracting features of the historical vulnerability data of the software to obtain the features of the historical vulnerability data;
step S312: performing association mining analysis on the historical vulnerability data characteristics through an association data mining algorithm to obtain the association degree of the historical vulnerability data;
the association data mining algorithm function is as follows:
in the method, in the process of the invention,for the correlation of historical vulnerability data, +.>For historical vulnerability data feature->And historical vulnerability data feature->Simultaneous probability density function, < >>For historical vulnerability data feature->Probability density function of>For historical vulnerability data feature->Probability density function of>For historical vulnerability data feature->And historical vulnerability data feature->Weight factors, which occur simultaneously->For historical vulnerability data feature->And historical vulnerability data feature->Distance function between>For the Euclidean space integral variable in dimension d, < >>A correction value for the correlation degree of the historical vulnerability data;
step S313: and sequencing the historical vulnerability data relevancy according to the sequence from large to small to obtain the historical vulnerability data features corresponding to the historical vulnerability data relevancy ranked at the front so as to obtain the key features of the historical vulnerability data.
As an embodiment of the present invention, referring to fig. 5, a detailed step flow chart of step S31 in fig. 4 is shown, in which step S31 includes the following steps:
step S311: extracting features of the historical vulnerability data of the software to obtain the features of the historical vulnerability data;
according to the embodiment of the invention, the software historical vulnerability data is subjected to feature extraction through a specific feature extraction technology, and is converted into the feature vector, so that the historical vulnerability data features are finally obtained.
Step S312: performing association mining analysis on the historical vulnerability data characteristics through an association data mining algorithm to obtain the association degree of the historical vulnerability data;
according to the embodiment of the invention, the proper association degree data mining algorithm is selected, the association degree calculation method is set according to the selected association degree data mining algorithm, the historical vulnerability data features are input into the selected association degree data mining algorithm, the association degree between the historical vulnerability data features is calculated according to the set association degree calculation method, and finally the association degree of the historical vulnerability data is obtained.
The association data mining algorithm function is as follows:
in the method, in the process of the invention,for the correlation of historical vulnerability data, +. >For historical vulnerability data feature->And historical vulnerability data feature->Simultaneous probability density function, < >>For historical vulnerability data feature->Probability density function of>For historical vulnerability data feature->Probability density function of>For historical vulnerability data feature->And historical vulnerability data feature->Weight factors, which occur simultaneously->For historical vulnerability data feature->And historical vulnerability data feature->Distance function between>For the Euclidean space integral variable in dimension d, < >>A correction value for the correlation degree of the historical vulnerability data;
the invention constructs a formula of a relevance data mining algorithm function, which is used for carrying out relevance mining analysis on historical vulnerability data characteristics so as to obtain the relevance of the historical vulnerability data. The association data mining algorithm can find out the interrelationship between the features by analyzing the features of the historical vulnerability data, so as to find out the key features of the historical vulnerability data. In addition, the algorithm describes the relation between the features of the historical vulnerability data by adopting a probability density function, adjusts the importance among different features by introducing a weight factor, calculates the association degree among the features, and thus obtains accurate key features in the historical vulnerability data . The algorithm function formula fully considers the characteristics of the historical vulnerability dataAnd historical vulnerability data feature->Simultaneous probability density function>Historical vulnerability data feature->Probability density function +.>Historical vulnerability data feature->Probability density function +.>Historical vulnerability data feature->And historical vulnerability data feature->Weight factor->Historical vulnerability data feature->And historical vulnerability data feature->Distance function between->Euclidean space integral variable in dimension d>Association according to historical vulnerability data>The interrelationship between the above parameters constitutes a functional relationship +.>The correlation mining analysis of the historical vulnerability data features is realized to obtain the correlation degree of each historical vulnerability data, and meanwhile, the introduction of the correction value l of the correlation degree of the historical vulnerability data can be adjusted according to actual conditions, so that the accuracy and the applicability of the correlation degree data mining algorithm are improved.
Step S313: and sequencing the historical vulnerability data relevancy according to the sequence from large to small to obtain the historical vulnerability data features corresponding to the historical vulnerability data relevancy ranked at the front so as to obtain the key features of the historical vulnerability data.
According to the embodiment of the invention, the historical vulnerability data relevancy is sequenced according to the calculated historical vulnerability data relevancy, the historical vulnerability data features corresponding to the historical vulnerability data relevancy ranked at the front are selected as key features according to requirements, and finally the key features of the historical vulnerability data are obtained.
The method is favorable for converting the data into the interpretable feature vector by carrying out feature extraction on the software historical vulnerability data, so that the features and rules of the software historical vulnerability data are researched. In particular, feature extraction may use commonly used data preprocessing methods to obtain features that can distinguish between different vulnerabilities. The feature extraction can also consider factors such as software attributes, types of loopholes, influence ranges, repair difficulty and the like. By means of feature extraction, the interpretability and operability of the data can be improved, and the software historical vulnerability data can be analyzed more deeply. And then, carrying out association mining analysis on the historical vulnerability data features through an association data mining algorithm, wherein the association data mining algorithm can find association relations among the historical vulnerability data features so as to understand the essential reasons and rules of the vulnerability. The algorithm can also identify and analyze similarities and interactions between vulnerabilities, thereby helping software security administrators to more deeply analyze and understand vulnerability data. The association degree data mining algorithm calculates similarity and weight between vulnerability characteristics based on historical vulnerability data characteristics, and further deduces association degree between vulnerability data. By setting a proper association degree data mining algorithm to perform association mining analysis, the essential reasons and rules of the loopholes can be mined, and prediction and prevention of the loopholes are guided, so that the safety and reliability of software are improved. And finally, sorting the calculated historical vulnerability data association degrees from high to low, and acquiring the historical vulnerability data characteristics corresponding to the historical vulnerability data association degrees with the top ranking, so that the key characteristics in the historical vulnerability data can be found. The system may analyze the historical data of the vulnerability to determine key features that may contribute to the vulnerability or have a significant impact. Therefore, the historical vulnerability data features corresponding to the correlation degree of the historical vulnerability data ranked at the top are the features with the most influence, which are helpful for guiding the improvement of software development flow and security policy and reducing the occurrence probability of future vulnerabilities.
Preferably, step S4 comprises the steps of:
step S41: performing vulnerability detection on the vulnerability analysis data set by using a vulnerability prediction model based on deep learning to obtain a vulnerability detection data set;
step S42: performing incremental training on the vulnerability prediction model by using an incremental learning algorithm to obtain a self-adaptive vulnerability prediction model;
wherein, the function of the increment learning algorithm is as follows:
in the method, in the process of the invention,for model parameters after incremental training at time kappa+1 +.>The eta parameter of->Model parameters trained for the kth moment increment +.>The eta parameter of (2), M is the number of model parameters, ">For learning rate of model parameters zeta after incremental training at time kappa+1,/o>Time frame for incremental training, A η,κ For analysis of dataset features for the eta-th vulnerability at time kappa, A κ-ζ For vulnerability analysis dataset features at time κ - ζ, P (A η,κ |A κ-ζ ) Analysis of dataset feature A for vulnerability at time κ - ζ κ-ζ Is used for analyzing the feature A of the data set under the condition of the eta loophole at the moment kappa η,κ Probability density function of occurrence, Q (A η,κ |A κ-ζ ) For a predetermined probability density function +.>Analyzing probability density functions of the features of the data set at the moment kappa for the eta loophole under the condition of the parameter theta, wherein E is a correction value of the model parameter;
Step S43: re-inputting the vulnerability detection data set into the self-adaptive vulnerability prediction model for re-detection processing to obtain the vulnerability data set.
As an embodiment of the present invention, referring to fig. 6, a detailed step flow chart of step S4 in fig. 1 is shown, in which step S4 includes the following steps:
step S41: performing vulnerability detection on the vulnerability analysis data set by using a vulnerability prediction model based on deep learning to obtain a vulnerability detection data set;
the embodiment of the invention collects the vulnerability analysis data set, wherein the vulnerability analysis data set comprises vulnerability description, vulnerability type, vulnerability grade and other information. Then, a vulnerability prediction model is built by adopting deep learning algorithms such as a Convolutional Neural Network (CNN), a cyclic neural network (RNN) and the like, and the model is trained by utilizing a vulnerability analysis data set to obtain the vulnerability prediction model. And finally, predicting each piece of data in the vulnerability analysis data set to be predicted by using a trained vulnerability prediction model, and finally obtaining a vulnerability detection data set.
Step S42: performing incremental training on the vulnerability prediction model by using an incremental learning algorithm to obtain a self-adaptive vulnerability prediction model;
according to the embodiment of the invention, the corresponding starting time and ending time are set according to the time period of incremental learning required. And then, setting parameters of an incremental learning algorithm according to the number of parameters requiring incremental training. For each parameter, a corresponding learning rate is set, representing the step size of the model parameter update at each iteration. Finally, for each time step, inputting all data (namely historical data) before the current time point and new data of the current time point into an incremental learning algorithm, and performing incremental updating of model parameters so as to obtain an adaptive vulnerability prediction model after incremental training.
Wherein, the function of the increment learning algorithm is as follows:
in the method, in the process of the invention,for model parameters after incremental training at time kappa+1 +.>The eta parameter of->Model parameters trained for the kth moment increment +.>The eta parameter of (2), M is the number of model parameters, ">For learning rate of model parameters zeta after incremental training at time kappa+1,/o>Time frame for incremental training, A η,κ For analysis of dataset features for the eta-th vulnerability at time kappa, A κ ζ is the vulnerability analysis dataset feature at time κ - ζ, P (a) η,κ |A κ ζ) vulnerability analysis dataset feature A at time κ - ζ κ - ζ analysis of data set feature a at the η vulnerability of time κ under the condition of ζ η,κ Probability density function of occurrence, Q (A η,κ |A κ-ζ ) For a predetermined probability density function +.>To be in parameter->The eta loophole analysis data set feature is a probability density function appearing at the moment kappa under the condition that epsilon is a correction value of a model parameter;
the invention constructs a formula of an incremental learning algorithm function for incremental training of a vulnerability prediction model, the algorithm updates model parameters by utilizing a cumulative gradient descent method, and at each time step, the algorithm calculates a specific probability density function according to previous training data and new vulnerability analysis data set characteristics and further performs a gradient descent method New model parameters. In this process, the algorithm adjusts the update rate of the parameters according to the previous learning rate and the new data to ensure the accuracy and stability of the model. The algorithm function fully considers model parameters after the kappa moment increment trainingThe eta parameter of->To update model parameters by gradient descent, and the formula also considers the number M of model parameters, the learning rate of zeta model parameters after the kappa+1 time increment training +.>Time frame for incremental trainingAnalysis of data set feature A by the eta vulnerability at time kappa η,κ Vulnerability analysis dataset feature A at time κ - ζ κ-ζ Vulnerability analysis dataset feature A at time κ - ζ κ-ζ Is used for analyzing the feature A of the data set under the condition of the eta loophole at the moment kappa η,κ Probability density function of occurrence P (a η,κ |A κ-ζ ) A preset probability density function Q (a η,κ |A κ-ζ ) The probability density function of the occurrence of the feature of the eta vulnerability analysis dataset at time kappa under the condition of the parameter theta>By the relation between the above parameters a functional relation is formed>According to model parameters after incremental training at time kappa+1 +.>The eta parameter of->The interrelationship between the above parameters constitutes a functional relationship +. > The incremental training of the vulnerability prediction model by using the incremental learning algorithm is realized, and meanwhile, the characteristic conditions which appear when the model is subjected to incremental training can be adjusted by introducing the correction value E of the model parameters, so that the applicability and the stability of the incremental learning algorithm are further improved, the generalization capability and the robustness of the vulnerability prediction model are improved, and the model can adapt to continuously-changing vulnerability environments.
Step S43: re-inputting the vulnerability detection data set into the self-adaptive vulnerability prediction model for re-detection processing to obtain the vulnerability data set.
According to the embodiment of the invention, the vulnerability detection data set is input into the adaptive vulnerability prediction model after incremental training for detection, and whether each vulnerability detection data is a vulnerability or not is obtained according to model prediction so as to obtain a vulnerability detection result. And integrating the data determined as the loopholes into a loophole data set according to the loophole detection result, classifying and sorting according to the type, grade, time sequence and the like of the loopholes, and providing a basis for the establishment of a subsequent loophole repair scheme.
According to the method, the vulnerability analysis data set is subjected to vulnerability detection by utilizing the vulnerability prediction model based on deep learning, so that the vulnerabilities existing in the vulnerability analysis data set can be identified, and the accuracy and the efficiency of vulnerability detection are improved. The vulnerability prediction model based on deep learning can utilize technologies such as a deep neural network to mine features in a vulnerability analysis data set, and meanwhile, classification and prediction of vulnerabilities are realized, so that the vulnerability prediction model is suitable for processing large-scale data sets. By utilizing the incremental learning algorithm to perform incremental training on the vulnerability prediction model, the model can be continuously updated and optimized, and the model is adapted to the change of the dynamic vulnerability prediction task. The incremental learning algorithm utilizes the historical data and the current data in the vulnerability data set to learn in a mode of gradually updating model parameters, so that self-adaption and continuous optimization of the model are realized. The incremental learning algorithm has the greatest advantages that the updating rate of the model can be dynamically adjusted, the calculation complexity caused by model updating is reduced, and meanwhile, the speed and the efficiency of model updating are improved. And finally, re-inputting the vulnerability detection data set into the adaptive vulnerability prediction model subjected to incremental learning processing for re-detection processing, so as to be beneficial to evaluating whether the vulnerability prediction model is effectively updated and optimized. The re-detection process can check performance indexes such as accuracy and recall rate of the model, and improve the model by feeding back bad data, so that the effectiveness of the vulnerability prediction model is improved. In addition, the re-detection process can also promote continuous updating and optimization of the model, maintain the practicability and accuracy of the vulnerability detection model, adaptively process new vulnerability conditions through incremental learning, thereby improving the accuracy and stability of vulnerability detection and providing a basic data source for subsequent vulnerability repair work.
Preferably, step S41 comprises the steps of:
step S411: dividing the vulnerability analysis data set into a training data set, a verification data set and a test data set;
step S412: constructing a vulnerability prediction model based on a convolutional neural network, wherein the vulnerability prediction model comprises model training, model verification and model evaluation;
step S413: inputting the training data set into the constructed vulnerability prediction model for model training, and performing tuning treatment on model parameters by a cross verification method to obtain a verification model; performing model verification on the verification data set by using a verification model to obtain a test model;
step S414: performing model evaluation on the test data set by using the test model to obtain an optimal vulnerability prediction model; and re-inputting the vulnerability analysis data set into the optimal vulnerability prediction model to perform vulnerability detection to obtain a vulnerability detection data set.
As an embodiment of the present invention, referring to fig. 7, a detailed step flow diagram of step S41 in fig. 6 is shown, in which step S41 includes the following steps:
step S411: dividing the vulnerability analysis data set into a training data set, a verification data set and a test data set;
according to the embodiment of the invention, the vulnerability analysis data set is divided into the training data set, the verification data set and the test data set according to a certain proportion, and the preset division proportion 7 is adopted: 2:1 divide the vulnerability analysis dataset into a 70% training dataset, a 20% validation dataset, and a 10% test dataset.
Step S412: constructing a vulnerability prediction model based on a convolutional neural network, wherein the vulnerability prediction model comprises model training, model verification and model evaluation;
according to the embodiment of the invention, a vulnerability prediction model is constructed by utilizing a convolutional neural network algorithm according to actual conditions, the vulnerability prediction model comprises model training, model verification and model evaluation, the vulnerability prediction model is trained through a training data set, the vulnerability prediction model is verified through a verification data set, and meanwhile, the vulnerability prediction model is evaluated through a test data set, so that the generalization performance and the robustness of the vulnerability prediction model are improved.
Step S413: inputting the training data set into the constructed vulnerability prediction model for model training, and performing tuning treatment on model parameters by a cross verification method to obtain a verification model; performing model verification on the verification data set by using a verification model to obtain a test model;
according to the embodiment of the invention, the divided training data set is input into a pre-constructed vulnerability prediction model for model training, and model parameters are optimized by selecting a proper cross verification method, firstly, the training data set is randomly divided into K mutually disjoint subsets, wherein K is usually 5 or 10, K-1 subsets are randomly used as training data of the model, the remaining 1 subsets are used as verification data for evaluating the performance of the model, after the above process is repeated K times, different subsets are used as verification data for evaluating the model each time, and K different evaluation results are obtained. And then, calculating the average value of K evaluation results to obtain the evaluation result of the verification model. And finally, performing model verification on the divided verification data set by using a verification model to obtain a final test model.
Step S414: performing model evaluation on the test data set by using the test model to obtain an optimal vulnerability prediction model; and re-inputting the vulnerability analysis data set into the optimal vulnerability prediction model to perform vulnerability detection to obtain a vulnerability detection data set.
According to the embodiment of the invention, the divided test data set is input into the test model for model evaluation, and the model parameters are further checked and optimized through indexes such as accuracy, recall rate and F1 value of the calculation model, so that a more efficient and accurate optimal vulnerability prediction model is obtained, and meanwhile, the vulnerability analysis data set is input into the optimal vulnerability prediction model again for vulnerability detection, and finally, the vulnerability detection data set is obtained.
According to the method, the vulnerability analysis data set is divided into the training data set, the verification data set and the test data set, so that the vulnerability prediction model can be comprehensively evaluated and verified. The purpose of the vulnerability analysis data set partitioning is to avoid over-fitting or under-fitting and to verify whether the vulnerability prediction model has a strong generalization capability. The distribution of the data sets and the balance of the sample number also need to be paid attention to when the vulnerability analysis data sets are divided. By constructing a vulnerability prediction model based on the convolutional neural network, the convolutional neural network has excellent effect in processing image and sequence data, and can effectively mine relevant features in a vulnerability analysis data set, so that the vulnerability can be predicted and classified. Then, the training data set is input into the constructed vulnerability prediction model for model training, and model parameters are optimized through a cross verification method, so that generalization capability and accuracy of the model can be improved. Cross-validation is a method of validating model performance by segmenting a dataset that avoids over-fitting and under-fitting of the model, while also selecting appropriate model parameters for optimal performance. Finally, the test data set is subjected to model evaluation by utilizing the test model to obtain an optimal vulnerability prediction model, specifically, performance indexes such as accuracy, recall rate and F1score of the model can be calculated, and the optimal vulnerability prediction model can be selected according to the result of the indexes. And re-inputting the vulnerability analysis data set into the optimal vulnerability prediction model for vulnerability detection, so that the accurate detection and identification of the vulnerabilities in the data set can be realized, thereby obtaining the accurate vulnerability detection data set and providing data guarantee for subsequent vulnerability repair work.
Preferably, step S5 comprises the steps of:
step S51: combining and associating the user using behavior log and the vulnerability data set by using a data association technology to generate a vulnerability association relationship topological graph;
according to the embodiment of the invention, the user use behavior log and the vulnerability data set are combined and associated by utilizing a data association technology based on the graph database, the vulnerability and the user use behavior are respectively used as nodes according to the correlation between different vulnerabilities and the association mode between the vulnerability and the user use behavior and are connected through edges so as to reflect the association relation between the vulnerability and the user use behavior, and finally, the vulnerability association relation topological graph is generated.
Step S52: performing association acquisition processing on the vulnerability association relationship topological graph based on an association processing technology of a graph algorithm to acquire dependence and association between the vulnerability and user use behaviors and obtain vulnerability association data;
according to the embodiment of the invention, through selecting a proper path analysis graph algorithm, the association processing technology based on the graph algorithm is utilized to perform association relation acquisition on the constructed vulnerability association relation topological graph so as to acquire dependence and association between the vulnerability and the user using behavior, and finally, vulnerability association data is obtained.
Step S53: and performing vulnerability analysis processing on the vulnerability association data through a time sequence analysis technology to obtain vulnerability formation rule data.
According to the embodiment of the invention, the time sequence analysis technology is used for analyzing and processing the vulnerability association data, so that the time rule and the periodicity rule of vulnerability formation are obtained. Further understanding the formation and propagation rules of the loopholes, and finally obtaining the loophole formation rule data.
According to the method, the user using behavior logs and the vulnerability data set are subjected to association processing through the data association technology, so that a vulnerability association relationship topological graph can be generated. This process can help us learn about the correlation between different vulnerabilities and the correlation between vulnerabilities and user usage behavior. The mutual connection between the loopholes can be displayed more clearly through the loophole association relationship topological graph, and subsequent deeper analysis of the loopholes is facilitated. Then, by carrying out association acquisition processing on the vulnerability association relationship topological graph by using an association processing technology based on a graph algorithm, dependence and association between the vulnerability and user use behaviors can be obtained, and vulnerability association data can be obtained. This process can provide a more specific and detailed data basis for subsequent vulnerability analysis. Finally, vulnerability association data is subjected to vulnerability analysis processing through a time sequence analysis technology, so that vulnerability formation rule data can be obtained. Time series analysis is a method for analyzing data from the perspective of time, and can reveal time series rules and trends of the data. By analyzing the time sequence of the vulnerability association data, the formation rule of the vulnerability, the trend of the vulnerability and the like can be known more deeply. This can help us better predict and guard against the occurrence of vulnerabilities and formulate countermeasures and measures.
Preferably, step S6 comprises the steps of:
step S61: performing vulnerability risk prediction on vulnerability formation rule data through a risk detection algorithm to obtain a vulnerability risk analysis result;
according to the embodiment of the invention, the collected vulnerability formation rule data are subjected to feature extraction to obtain vulnerability formation rule data features, wherein the vulnerability formation rule data comprise vulnerability formation time, vulnerability type, vulnerability grade, vulnerability attack mode and the like, the vulnerability formation rule data are modeled by adopting a kernel function method to determine related kernel functions among the vulnerability formation rule data, and appropriate feature weight functions and feature distance functions are set to predict vulnerability risk of the vulnerability formation rule data, so that a vulnerability risk analysis result is finally obtained.
The risk detection algorithm function is as follows:
in the method, in the process of the invention,for vulnerability risk analysis results, ++>Forming regular data for the c-th loophole, wherein H is the number of the regular data for loophole formation, and +_>Forming regular data features for vulnerabilities,>for the correlation kernel function between the c-th vulnerability forming rule data under the vulnerability forming rule data feature +_>For the characteristic weight function, ++>As a characteristic distance function, τ is a correction value of the vulnerability risk analysis result;
The invention constructs a formula of a function of a risk detection algorithm, which is used for predicting the risk of the vulnerability forming rule data, the risk detection algorithm takes the vulnerability forming rule data as input, the similarity and the distance between the vulnerability forming rule data and the vulnerability forming rule data are evaluated through a related kernel function and a characteristic weight function, and the vulnerability with the highest risk is found in the vulnerability forming rule data, wherein the related kernel function and the characteristic weight function are two important components of the risk detection algorithm. The related kernel function can construct connection between different vulnerability formation rule data and is implemented by defining the kernel functionAnd (5) calculating the similarity between the current features. The importance among different features can be further adjusted by the feature weight function so as to improve the accuracy and the robustness of the evaluation. The algorithm function formula fully considers the data of the formation rule of the c-th loopholeNumber H of vulnerability forming rule data, vulnerability forming rule data feature +>Correlation kernel function between c-th vulnerability forming rule data under vulnerability forming rule data characteristic ∈>Feature weight function->Feature distance function->Analysis of results according to vulnerability risk>The interrelationship between the parameters forms a functional relationship The vulnerability risk prediction of vulnerability formation rule data is realized, and meanwhile, the introduction of the correction value tau of the vulnerability risk analysis result can be adjusted according to actual conditions, so that the accuracy and the applicability of a risk detection algorithm are improved.
Step S62: analyzing and processing the vulnerability risk analysis result to generate a vulnerability analysis report;
according to the embodiment of the invention, according to the vulnerability risk analysis result, vulnerability analysis reports are generated aiming at different vulnerability types and grades, wherein the vulnerability analysis reports comprise vulnerability feature description, vulnerability formation cause analysis, possible damage generated by the vulnerability and other information.
Step S63: and formulating a corresponding bug repairing scheme according to the bug analysis report, and executing corresponding bug repairing measures on the office software through the bug repairing scheme.
According to the embodiment of the invention, a corresponding bug repair scheme is formulated according to the bug analysis report, wherein the bug repair scheme comprises information such as the emergency degree of bug repair, repair measures, repair time and the like. Meanwhile, aiming at the vulnerability risk analysis result, vulnerability restoration is carried out on office software through a vulnerability restoration scheme, so that software safety and stability are ensured.
According to the method, vulnerability risk prediction is carried out on vulnerability formation rule data through a risk detection algorithm, so that a vulnerability risk analysis result can be obtained. The algorithm analyzes the formation rule data of different loopholes, comprehensively considers a plurality of factors such as characteristic weight, distance and other information, calculates a risk evaluation value of the loopholes, and gives out a corresponding correction value. By means of the vulnerability risk analysis result, the vulnerability risk analysis method can help us to quickly identify high-risk vulnerabilities and conduct preferential treatment, and therefore vulnerability repair efficiency and repair effects are improved. Then, analysis processing is carried out on the vulnerability risk analysis result, so that information such as formation rules of all vulnerabilities, correlation and trend among the vulnerabilities and the like can be known more deeply. Finally, by analyzing the vulnerability risk analysis result, a beneficial reference can be provided for further formulating a vulnerability restoration scheme. And a corresponding vulnerability restoration scheme is formulated according to the vulnerability analysis report, so that vulnerability restoration can be performed in a targeted manner, and vulnerability management and safety are improved. By executing corresponding bug fix measures on the office software, bug attacks can be effectively prevented and stopped, and the security of the office software system and data is protected.
The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.
The foregoing is only a specific embodiment of the invention to enable those skilled in the art to understand or practice the invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. The office software vulnerability analysis method based on big data is characterized by comprising the following steps:
step S1: performing data integration processing on the vulnerability data of the office software and the vulnerability data of the vulnerability information library by using a big data integration algorithm to obtain software vulnerability data; performing historical data acquisition processing on the software vulnerability data through a historical backtracking algorithm to obtain software historical vulnerability data;
Step S2: performing text analysis processing on the software historical vulnerability data by using a natural language processing technology to obtain software historical vulnerability text data; performing user behavior acquisition processing on the software history vulnerability text data by using a behavior acquisition technology to obtain a user use behavior log;
step S3: performing association mining analysis on the software historical vulnerability data through an association data mining algorithm to obtain key features of the historical vulnerability data; performing feature classification processing on key features of the historical vulnerability data by using a preset vulnerability classification algorithm to obtain a vulnerability analysis data set;
step S4: performing vulnerability detection on the vulnerability analysis data set by using a vulnerability prediction model based on deep learning and incremental learning to obtain a vulnerability data set;
step S5: combining and associating the user using behavior log and the vulnerability data set by using a data association technology to obtain vulnerability association data; performing vulnerability analysis processing on the vulnerability association data through a time sequence analysis technology to obtain vulnerability formation rule data;
step S6: performing vulnerability risk prediction on vulnerability formation rule data through a risk detection algorithm to obtain a vulnerability risk analysis result; and executing corresponding bug repairing measures according to the bug risk analysis result.
2. The big data based office software vulnerability analysis method of claim 1, wherein step S1 comprises the steps of:
step S11: performing data integration processing on the vulnerability data of the office software and the vulnerability data of the vulnerability information base by utilizing a big data integration algorithm to obtain software vulnerability initial data;
the big data integration algorithm function is as follows:
wherein D (x) is initial data of software loopholes, x is structural characteristics of the loopholes, n is the quantity of the loopholes, and w i (x) An integration weight parameter alpha for ith vulnerability data i Characteristic coefficient beta of ith vulnerability data i (t) is the time distribution function coefficient of the ith vulnerability data, t is the data integration duration, t' is the data integration time variable, gamma i Adjusting the coefficient for the time distribution function of the ith vulnerability data, f i (x, t') is the time distribution function of the ith vulnerability data, g i (x) The ith vulnerability data of office software, h i (x) The i-th vulnerability data of the vulnerability information base is obtained, and mu is a correction value of initial software vulnerability data;
step S12: performing data preprocessing on the initial data of the software vulnerability to obtain data of the software vulnerability;
step S13: performing historical data acquisition processing on the software vulnerability data through a historical backtracking algorithm to obtain software historical vulnerability data;
The history backtracking algorithm function is as follows:
wherein V is j (X) is software historical vulnerability data after historical tracing of jth software vulnerability data, N is the number of the software vulnerability data after historical tracing, X is the software vulnerability data, W j (X) is the weight function of the jth software vulnerability data, sigma j The standard deviation of vulnerability data of jth software vulnerability data, exp is an exponential function, T is a historical tracing time variable, F j (X, T) is the vulnerability data distribution function of the jth software vulnerability data at time T, T j-1 Starting time for historical tracing of jth software vulnerability data, T j And (3) performing historical tracing on the jth software vulnerability data, wherein S (T) is the number of vulnerability data of which the historical tracing time exceeds T, and epsilon is the correction value of the software historical vulnerability data.
3. The big data based office software vulnerability analysis method of claim 1, wherein step S2 comprises the steps of:
step S21: performing text analysis processing on the software historical vulnerability data by using a natural language processing technology to obtain software historical vulnerability text data;
step S22: noise reduction processing is carried out on the software history vulnerability text data through a vulnerability noise reduction algorithm, so that software history vulnerability noise reduction data are obtained;
Step S23: performing user behavior acquisition processing on the software historical vulnerability noise reduction data by using a behavior acquisition technology to obtain user behavior data;
step S24: performing log conversion processing on the user use behavior data based on a log conversion algorithm to obtain a user use behavior log;
the log conversion algorithm function is as follows:
in the method, in the process of the invention,as a log conversion algorithm function, Z is a time variable of log conversion, s is a log space mapping coordinate after log conversion, y is an initial space mapping coordinate of user use behavior data, Z is a time range of log conversion, R is Gaussian kernel of the log conversion algorithm function, m is Gaussian kernel number, p (y, Z) is a probability density function of the initial space mapping coordinate y of the user use behavior data under the time Z>Weight of the r-th Gaussian kernel, q r (y) the shape function of the (r) th Gaussian kernel, exp is an exponential function, y r For the spatial center coordinates of the r-th Gaussian kernel,/->The space standard deviation of the Gaussian kernel is ρ, the space variance of the Gaussian kernel is ρ, and 1 is the correction value of the log transformation algorithm function.
4. The big data based office software vulnerability analysis method of claim 3, wherein step S22 comprises the steps of:
Step S221: noise reduction processing is carried out on the software history vulnerability text data through a vulnerability noise reduction algorithm, so that a vulnerability noise value is obtained;
the vulnerability noise reduction algorithm function is as follows:
wherein E is a vulnerability noise value, K is the number of software history vulnerability text data subjected to noise reduction processing, K is a noise frequency variable of the software history vulnerability text data, and K is l For the noise frequency variable of the first software history hole text data, a and b are noise harmonic smoothing parameters of the software history hole text data, U (k-k) l ) For software history vulnerabilityNoise frequency domain weighting function of the data, ζ is noise weight parameter of software history vulnerability text data, ω is frequency k-k of software history vulnerability text data l The lower vulnerability noise degree coefficient, θ is the correction value of the vulnerability noise value;
step S222: judging the vulnerability noise value according to a preset vulnerability noise threshold, and if the vulnerability noise value is greater than or equal to the preset vulnerability noise threshold, rejecting software history vulnerability text data corresponding to the vulnerability noise value to obtain software history vulnerability noise reduction data;
step S223: judging the vulnerability noise value according to a preset vulnerability noise threshold, and defining the software history vulnerability text data as software history vulnerability noise reduction data if the vulnerability noise value is smaller than the preset vulnerability noise threshold.
5. The big data based office software vulnerability analysis method of claim 1, wherein step S3 comprises the steps of:
step S31: performing association mining analysis on the software historical vulnerability data through an association data mining algorithm to obtain key features of the historical vulnerability data;
step S32: performing data cleaning treatment on the key features of the historical vulnerability data to obtain a historical vulnerability key feature data set;
step S33: and performing feature classification processing on the historical vulnerability key feature data set by using a vulnerability classification algorithm based on a support vector machine to obtain a vulnerability analysis data set.
6. The big data based office software vulnerability analysis method of claim 5, wherein step S31 comprises the steps of:
step S311: extracting features of the historical vulnerability data of the software to obtain the features of the historical vulnerability data;
step S312: performing association mining analysis on the historical vulnerability data characteristics through an association data mining algorithm to obtain the association degree of the historical vulnerability data;
the association data mining algorithm function is as follows:
in the method, in the process of the invention,for the correlation of historical vulnerability data, +.>For historical vulnerability data feature- >And historical vulnerability data feature->Simultaneous probability density function, < >>For historical vulnerability data feature->Probability density function of>For historical vulnerability data feature->Probability density function of>For historical vulnerability data feature->Historical vulnerability countAccording to the characteristics->Weight factors, which occur simultaneously->For historical vulnerability data feature->And historical vulnerability data feature->A function of the distance between them,the Euclidean space integral variable in the dimension d, and l is a correction value of the correlation degree of the historical vulnerability data;
step S313: and sequencing the historical vulnerability data relevancy according to the sequence from large to small to obtain the historical vulnerability data features corresponding to the historical vulnerability data relevancy ranked at the front so as to obtain the key features of the historical vulnerability data.
7. The big data based office software vulnerability analysis method of claim 1, wherein step S4 comprises the steps of:
step S41: performing vulnerability detection on the vulnerability analysis data set by using a vulnerability prediction model based on deep learning to obtain a vulnerability detection data set;
step S42: performing incremental training on the vulnerability prediction model by using an incremental learning algorithm to obtain a self-adaptive vulnerability prediction model;
Wherein, the function of the increment learning algorithm is as follows:
in the method, in the process of the invention,for the eta parameter of the model parameters theta after incremental training at the kappa+1 time instant,/>For the eta parameter in the model parameters theta after the kappa moment increment training, M is the number of the model parameters,/for the kappa moment increment training>For learning rate of model parameters zeta after incremental training at time kappa+1,/o>Time frame for incremental training, A η,κ For analysis of dataset features for the eta-th vulnerability at time kappa, A k-ζ For vulnerability analysis dataset features at time k- ζ, P (A η,k |A k-ζ ) Analysis of dataset feature A for vulnerability at time k- ζ k-ζ Is used for analyzing the feature A of the data set under the condition of the eta loophole at the moment kappa η,κ Probability density function of occurrence, Q (A η,κ |A κ-ζ ) For a predetermined probability density function +.>Analyzing probability density functions of the features of the data set at the moment kappa for the eta loophole under the condition of the parameter theta, wherein E is a correction value of the model parameter;
step S43: re-inputting the vulnerability detection data set into the self-adaptive vulnerability prediction model for re-detection processing to obtain the vulnerability data set.
8. The big data based office software vulnerability analysis method of claim 7, wherein step S41 comprises the steps of:
Step S411: dividing the vulnerability analysis data set into a training data set, a verification data set and a test data set;
step S412: constructing a vulnerability prediction model based on a convolutional neural network, wherein the vulnerability prediction model comprises model training, model verification and model evaluation;
step S413: inputting the training data set into the constructed vulnerability prediction model for model training, and performing tuning treatment on model parameters by a cross verification method to obtain a verification model; performing model verification on the verification data set by using a verification model to obtain a test model;
step S414: performing model evaluation on the test data set by using the test model to obtain an optimal vulnerability prediction model; and re-inputting the vulnerability analysis data set into the optimal vulnerability prediction model to perform vulnerability detection to obtain a vulnerability detection data set.
9. The big data based office software vulnerability analysis method of claim 1, wherein step S5 comprises the steps of:
step S51: combining and associating the user using behavior log and the vulnerability data set by using a data association technology to generate a vulnerability association relationship topological graph;
step S52: performing association acquisition processing on the vulnerability association relationship topological graph based on an association processing technology of a graph algorithm to acquire dependence and association between the vulnerability and user use behaviors and obtain vulnerability association data;
Step S53: and performing vulnerability analysis processing on the vulnerability association data through a time sequence analysis technology to obtain vulnerability formation rule data.
10. The big data based office software vulnerability analysis method of claim 1, wherein step S6 comprises the steps of:
step S61: performing vulnerability risk prediction on vulnerability formation rule data through a risk detection algorithm to obtain a vulnerability risk analysis result;
the risk detection algorithm function is as follows:
in the method, in the process of the invention,for vulnerability risk analysis results, ++>Forming regular data for the c-th loophole, wherein H is the number of the regular data for loophole formation, and +_>Forming regular data features for vulnerabilities,>for the correlation kernel function between the c-th vulnerability forming rule data under the vulnerability forming rule data feature +_>For the characteristic weight function, ++>As a characteristic distance function, τ is a correction value of the vulnerability risk analysis result;
step S62: analyzing and processing the vulnerability risk analysis result to generate a vulnerability analysis report;
step S63: and formulating a corresponding bug repairing scheme according to the bug analysis report, and executing corresponding bug repairing measures on the office software through the bug repairing scheme.
CN202310619236.2A 2023-05-29 2023-05-29 Office software vulnerability analysis method based on big data Withdrawn CN116776334A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310619236.2A CN116776334A (en) 2023-05-29 2023-05-29 Office software vulnerability analysis method based on big data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310619236.2A CN116776334A (en) 2023-05-29 2023-05-29 Office software vulnerability analysis method based on big data

Publications (1)

Publication Number Publication Date
CN116776334A true CN116776334A (en) 2023-09-19

Family

ID=88009099

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310619236.2A Withdrawn CN116776334A (en) 2023-05-29 2023-05-29 Office software vulnerability analysis method based on big data

Country Status (1)

Country Link
CN (1) CN116776334A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117290851A (en) * 2023-09-21 2023-12-26 广州市动易网络科技有限公司 Vulnerability identification-based reading security enhancement method and system

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117290851A (en) * 2023-09-21 2023-12-26 广州市动易网络科技有限公司 Vulnerability identification-based reading security enhancement method and system
CN117290851B (en) * 2023-09-21 2024-02-20 广州市动易网络科技有限公司 Vulnerability identification-based reading security enhancement method and system

Similar Documents

Publication Publication Date Title
Ektefa et al. Intrusion detection using data mining techniques
Saxena et al. Intrusion detection in KDD99 dataset using SVM-PSO and feature reduction with information gain
Chou et al. Network intrusion detection design using feature selection of soft computing paradigms
Sun et al. Quantifying variable interactions in continuous optimization problems
Rattá et al. Improved feature selection based on genetic algorithms for real time disruption prediction on JET
CN112039903B (en) Network security situation assessment method based on deep self-coding neural network model
Hosseini et al. Anomaly process detection using negative selection algorithm and classification techniques
CN111143838B (en) Database user abnormal behavior detection method
CN102045358A (en) Intrusion detection method based on integral correlation analysis and hierarchical clustering
Lima et al. A comparative study of use of Shannon, Rényi and Tsallis entropy for attribute selecting in network intrusion detection
CN116366376B (en) APT attack traceability graph analysis method
CN105072214A (en) C&amp;C domain name identification method based on domain name feature
CN116776334A (en) Office software vulnerability analysis method based on big data
CN112738092A (en) Log data enhancement method, classification detection method and system
Sakr et al. Filter versus wrapper feature selection for network intrusion detection system
Nakashima et al. Automated feature selection for anomaly detection in network traffic data
CN110097120B (en) Network flow data classification method, equipment and computer storage medium
Bhowmik Data mining techniques in fraud detection
Alqarni et al. Improving intrusion detection for imbalanced network traffic using generative deep learning
CN117278314A (en) DDoS attack detection method
Wang et al. Embedding learning with heterogeneous event sequence for insider threat detection
Thanh et al. An approach to reduce data dimension in building effective network intrusion detection systems
Velliangiri et al. Detection of dos attacks in smart city networks with feature distance maps: A statistical approach
CN116366277A (en) Network security situation assessment method for information fusion
CN115842645A (en) UMAP-RF-based network attack traffic detection method and device and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20230919

WW01 Invention patent application withdrawn after publication