CN111949994A

CN111949994A - Vulnerability analysis method and system, electronic device and storage medium

Info

Publication number: CN111949994A
Application number: CN202010837434.2A
Authority: CN
Inventors: 陈坚
Original assignee: Beijing Ziguang Zhanrui Communication Technology Co Ltd
Current assignee: Beijing Ziguang Zhanrui Communication Technology Co Ltd
Priority date: 2020-08-19
Filing date: 2020-08-19
Publication date: 2020-11-17

Abstract

The invention discloses a vulnerability analysis method and system, electronic equipment and a storage device. The vulnerability analysis method comprises the following steps: acquiring a log packet corresponding to a vulnerability to be analyzed, wherein the log packet comprises a plurality of logs; extracting feature information of the vulnerability to be analyzed from a log related to the vulnerability to be analyzed; and inputting the characteristic information into a machine learning model corresponding to the vulnerability to be analyzed for prediction to obtain the reason for generating the vulnerability to be analyzed. The method and the device analyze the causes of the vulnerability by using the machine learning model without manual participation, thereby improving the analysis efficiency. Meanwhile, the machine learning model is obtained based on a large amount of historical log training, so that the result analyzed by the machine learning model has higher accuracy compared with the result analyzed manually aiming at complex vulnerabilities.

Description

Vulnerability analysis method and system, electronic device and storage medium

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a vulnerability analysis method and system, an electronic device, and a storage medium.

Background

bug refers to a bug existing on a computer, because of a flaw in the security policy of the system, which may enable an attacker to access or destroy the system without authorization.

In the development process of the intelligent terminal, a tester can submit a plurality of bugs. Conventionally, for analysis of a bug, an engineer is required to perform detailed and repeated analysis on a large number of logs submitted by a tester, and finally, the reason of the problem is found out, as shown in fig. 1, a conclusion is finally obtained through manual Log downloading, manual Log decompression and manual Log analysis based on personal experience. Such vulnerability analysis methods are very inefficient and analysis errors may occur due to insufficient experience of the engineer.

Disclosure of Invention

The invention aims to overcome the defects of low efficiency, low accuracy and the like of the reasons for missing artificial analysis in the prior art, and provides a vulnerability analysis method and system, electronic equipment and a storage medium based on artificial intelligence.

The invention solves the technical problems through the following technical scheme:

the first aspect of the present invention provides a vulnerability analysis method, which includes the following steps:

acquiring a log packet corresponding to a vulnerability to be analyzed, wherein the log packet comprises a plurality of logs;

extracting feature information of the vulnerability to be analyzed from a log related to the vulnerability to be analyzed;

and inputting the characteristic information into a machine learning model corresponding to the vulnerability to be analyzed for prediction to obtain the reason for generating the vulnerability to be analyzed.

Preferably, the obtaining of the log packet corresponding to the vulnerability to be analyzed specifically includes:

analyzing a download address of a log packet corresponding to the vulnerability to be analyzed from a webpage of the vulnerability tracking system by using a web crawler technology;

downloading the log package from the download address;

and decompressing the log packet.

Preferably, the log packet includes a log related to the vulnerability to be analyzed and a log unrelated to the vulnerability to be analyzed, and the vulnerability analysis method further includes:

screening logs of a target type from the log packets;

and collecting logs related to the vulnerability to be analyzed from the screened logs according to the time point of the vulnerability to be analyzed.

Preferably, the machine learning model corresponding to the vulnerability to be analyzed is trained by using the following steps:

obtaining a training sample; the training sample comprises a historical log corresponding to the vulnerability to be analyzed and a real reason label added to the historical log;

acquiring a training set and a test set according to the training sample;

training the machine learning model corresponding to the vulnerability to be analyzed according to the training set;

and testing the trained machine learning model by using a test set, stopping training if the accuracy meets the requirement, and otherwise, adjusting the parameters of the machine learning model to train again.

A second aspect of the present invention provides a vulnerability analysis system, including:

the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring a log packet corresponding to a vulnerability to be analyzed, and the log packet comprises a plurality of logs;

the extraction module is used for extracting the characteristic information of the vulnerability to be analyzed from the log related to the vulnerability to be analyzed;

and the prediction module is used for inputting the characteristic information into a machine learning model corresponding to the vulnerability to be analyzed for prediction to obtain the reason for generating the vulnerability to be analyzed.

Preferably, the obtaining module includes:

the analysis unit is used for analyzing the download address of the log packet corresponding to the vulnerability to be analyzed from the webpage of the vulnerability tracking system by utilizing a web crawler technology;

the downloading unit is used for downloading the log packet from the downloading address;

and the decompression unit is used for decompressing the log packets.

Preferably, the log packet includes a log related to the vulnerability to be analyzed and a log unrelated to the vulnerability to be analyzed, and the vulnerability analysis system further includes:

the screening module is used for screening the logs of the target type from the log packets;

and the collection module is used for collecting logs related to the vulnerability to be analyzed from the screened logs according to the time point of the vulnerability to be analyzed.

Preferably, the vulnerability analysis system further comprises a training module, wherein the training module comprises:

the sample acquisition unit is used for acquiring a training sample; the training sample comprises a historical log corresponding to the vulnerability to be analyzed and a real reason label added to the historical log;

the set acquisition unit is used for acquiring a training set and a test set according to the training sample;

the model training unit is used for training the machine learning model which is constructed and corresponds to the vulnerability to be analyzed according to the training set;

and the model testing unit is used for testing the trained machine learning model by using a test set, if the accuracy meets the requirement, the module training unit stops training, otherwise, the model training unit adjusts the parameters of the machine learning model to train again.

A third aspect of the present invention provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the vulnerability analysis method according to the first aspect when executing the computer program.

A fourth aspect of the present invention provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of vulnerability analysis as described in the first aspect.

The positive progress effects of the invention are as follows: the machine learning model is used for analyzing the reasons of vulnerability generation, manual participation is not needed, and the analysis efficiency is improved. Meanwhile, the machine learning model is obtained based on a large amount of historical log training, so that the result analyzed by the machine learning model has higher accuracy compared with the result analyzed manually aiming at complex vulnerabilities.

Drawings

Fig. 1 is a flow chart of a vulnerability analysis method in the prior art.

Fig. 2 is a flowchart of a vulnerability analysis method provided in embodiment 1 of the present invention.

Fig. 3 is a flowchart of a method for training a machine learning model according to embodiment 1 of the present invention.

Fig. 4 is a block diagram of a vulnerability analysis system provided in embodiment 2 of the present invention.

Fig. 5 is a block diagram of an electronic device according to embodiment 3 of the present invention.

Detailed Description

The invention is further illustrated by the following examples, which are not intended to limit the scope of the invention.

The bug related in the embodiment of the invention refers to a bug, namely bug generated in the test process of electronic equipment such as a mobile terminal, an intelligent terminal and the like.

Example 1

The present embodiment provides a vulnerability analysis method, as shown in fig. 2, which includes the following steps:

step S100, obtaining a log packet corresponding to the vulnerability to be analyzed, wherein the log packet comprises a plurality of logs.

It should be noted that there is a corresponding relationship between the vulnerability to be analyzed and the log packets, and different vulnerabilities to be analyzed correspond to different log packets, where one vulnerability to be analyzed may correspond to one log packet or may correspond to multiple log packets. In one example of a specific implementation, the size of all logs included in a log packet may be up to tens of G.

In an optional embodiment, step S100 specifically includes:

step S101, a web crawler technology is utilized to analyze a download address of a log packet corresponding to a vulnerability to be analyzed from a webpage of a vulnerability tracking system.

A Web crawler (Web crawler) is a program or script that automatically crawls Web information according to certain rules. In one particular example, the content on the vulnerability tracking system web page is crawled using html.

The vulnerability tracking system is used for tracking the generated vulnerability. In a specific implementation example, the corresponding browser is automatically opened and the vulnerability tracking system is logged in by configuring a website of the vulnerability tracking system in the browser driver. For example, a website of Bugzilla (an open source vulnerability tracking system) is configured in webdriver (a driver of google browser Chrome), so that Chrome is automatically opened, and Bugzilla is logged in.

And step S102, downloading the log packet from the download address. In one example of the implementation, the download address of the log packet is an ftp download address, and the download address is accessed to download the log packet to the local.

And S103, decompressing the log packet. The log packet downloaded in step S102 is a compressed packet, and is automatically decompressed in step S103.

In the embodiment, the log packets corresponding to the vulnerability to be analyzed are automatically downloaded and decompressed, so that manual participation is not needed, and the vulnerability analysis efficiency is improved.

In an optional implementation manner, the log packet includes a log related to the vulnerability to be analyzed and a log not related to the vulnerability to be analyzed. In order to improve the efficiency of vulnerability analysis, logs related to vulnerabilities to be analyzed need to be screened from the log packets. The step S100 further includes:

and step S110, screening the target type of logs from the log packets. Step S110 is coarse screening, aiming to screen out the target type of log. The target type may be a file type or other types. In one specific example, a log with a file extension of ". log" is screened from the log package.

And S111, collecting logs related to the vulnerability to be analyzed from the screened logs according to the time point of the vulnerability to be analyzed. Step S111 is fine screening, which aims to screen out logs related to the vulnerability to be analyzed. In specific implementation, the vulnerability and the log characteristics corresponding to the vulnerability can be used as scanning rules, and then the time point of vulnerability generation to be analyzed is obtained in a log scanning mode. In an example, the log feature corresponding to the vulnerability a is statement a1, and in the log scanning process, if statement a1 is scanned, it is considered that the vulnerability a is generated, and the time point of occurrence of statement a1 is the time point of generation of the vulnerability a.

In a specific example, log contents related to the vulnerability to be analyzed are collected from the screened logs, that is, contents related to the vulnerability to be analyzed are collected from each screened log file.

And S200, extracting the characteristic information of the vulnerability to be analyzed from the log related to the vulnerability to be analyzed.

In an optional implementation manner of step S200, different feature extraction algorithms are selected for different vulnerabilities to be analyzed, for example, for a vulnerability a to be analyzed, a PCA (Principal component Analysis) algorithm may be selected to extract feature information, and for a vulnerability B to be analyzed, a LDA (Linear Discriminant Analysis) algorithm may be selected to extract feature information.

And S300, inputting the characteristic information into a machine learning model corresponding to the vulnerability to be analyzed for prediction to obtain a reason for generating the vulnerability to be analyzed.

It should be noted that different vulnerabilities to be analyzed correspond to different Machine learning models, such as a Support Vector Machine (SVM) Model corresponding to the vulnerability a to be analyzed, a Logistic Regression (LR) Model corresponding to the vulnerability B to be analyzed, a Naive Bayesian Model (NBM) corresponding to the vulnerability C to be analyzed, a Neural Network (NN) Model corresponding to the vulnerability D to be analyzed, and the like. In this embodiment, the cause of the vulnerability to be analyzed is predicted by using a machine learning model.

In step S300, the cause of the vulnerability to be analyzed is obtained through machine learning model prediction, and the analysis process of the vulnerability to be analyzed is ended. In a specific implementation, the state of the vulnerability to be analyzed in the vulnerability tracking system can be modified to be analyzed. In addition, whether a new vulnerability to be analyzed appears in the vulnerability tracking system can be checked regularly, for example, every 30 minutes, if the new vulnerability to be analyzed appears, the method returns to the step S101 to obtain a log packet corresponding to the new vulnerability to be analyzed, and further, the reason for the generation of the log packet is analyzed.

In an optional implementation manner, a machine learning classification algorithm is used to perform learning training on a history log corresponding to a vulnerability to be analyzed to form a machine learning model. Specifically, as shown in fig. 3, the machine learning model corresponding to the vulnerability to be analyzed is trained by using the following steps:

s001, obtaining a training sample; the training sample comprises a history log corresponding to the vulnerability to be analyzed and a real reason label added to the history log.

And S002, acquiring a training set and a testing set according to the training sample.

In an alternative embodiment of step S002, the training samples obtained in step S001 are respectively used as a training set and a test set, i.e. the training set and the test set are the same.

In another alternative embodiment of step S002, the training samples obtained in step S001 are proportionally divided into training sets and test sets. For example, 80% of the training samples are used as the training set, and 20% of the training samples are used as the test set.

And S003, training the machine learning model which is constructed and corresponds to the vulnerability to be analyzed according to the training set. In specific implementation, aiming at training samples in a training set, logs of a target type are screened from historical logs corresponding to vulnerabilities to be analyzed, logs related to the vulnerabilities to be analyzed are collected from the screened logs according to time points of generation of the vulnerabilities to be analyzed, feature information is extracted from the logs related to the vulnerabilities to be analyzed, the feature information is input into a built machine learning model for training, and specifically, parameters of the machine learning model are adjusted according to reasons obtained through prediction of the machine learning model and real reason labels in the training samples.

And step S004, testing the trained machine learning model by using the test set, stopping training if the accuracy meets the requirement, and otherwise, adjusting the parameters of the machine learning model to train again.

In step S004, the accuracy of the machine learning model trained in step S003 is tested by using the test set, if the accuracy meets the requirement, for example, a preset value is reached, the training is stopped, if the accuracy does not meet the requirement, for example, the preset value is not reached, the parameters of the machine learning model are adjusted, and then the machine learning model is retrained by using the training set, that is, the process returns to step S003. The preset value can be set according to actual conditions, for example, 80%, that is, the accuracy reaches 80%, and the requirement is considered to be met.

In the embodiment, the machine learning model is used for analyzing the reason of the vulnerability generation, manual participation is not needed, and the analysis efficiency is improved. Meanwhile, the machine learning model is obtained based on a large amount of historical log training, so that the result analyzed by the machine learning model has higher accuracy compared with the result analyzed manually aiming at complex vulnerabilities.

The effectiveness of the vulnerability analysis method is described below by taking an ANR (Application Not Responding) problem in an Android platform as an example. The ANR problem is a dialog box displayed to a user by the system when the response of an application program is not sensitive enough on the Android.

219 historical logs of the ANR problem are selected as training samples, real reason labels are manually added to the historical logs, and the reasons for the ANR problem are 6 types. The 219 training samples are used as a training set, and 29 feature values of each training sample are respectively extracted as feature information of the ANR problem, as shown in table 1.

TABLE 1

Sample number	Characteristic value 1	Characteristic value 2	Characteristic value 3	Characteristic value 4	Characteristic value 5	Characteristic value 6	Characteristic value 7	Characteristic value of 8	Characteristic value N	Characteristic value 29
											1	10.52	0	0	0.0072	0.057	0	0.29	0	……	……
2	2.75	0.037	0.011	0	0.018	0	0.34	0.09	……	……
											3	0	0	0	0	0	0	0.059	0.08	……	……
4	2.75	0.037	0.011	0.4184	0.018	0	0.34	0.18	……	……
											N	2.6	0.056	0.02	0.4751	0.061	0	0.31	0.19	……	……
219	……	……	……	……	……	……	……	……	……	……

And inputting the characteristic information into an SVM model for training to obtain the SVM model which can be used for predicting the cause of the ANR problem. The 219 training samples are simultaneously used as a test set to test the trained SVM model, and the accuracy can reach 100%.

The following are the results of testing the SVM model using the test set:

y_pred:

[1 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 0 5 0

5 5 0 5 5 5 2 2 2 2 2 2 0 2 0 2 2 2 0 2 2 2 2 0 2 0 2 2 2 1 1 2 1 2 0 0 0

0 3 1 3 3 3 3 3 3 3 3 3 3 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1 0 0 0 0 0 0 1 0 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0]

where y _ pred is a cause of the ANR problem, and 0-5 represent different causes, respectively.

In the above example, the vulnerability to be analyzed is an ANR problem, and the machine learning model corresponding to the vulnerability to be analyzed is an SVM model.

Example 2

The present embodiment provides a vulnerability analysis system 40, as shown in fig. 4, which includes an obtaining module 41, an extracting module 42 and a predicting module 43.

The obtaining module 41 is configured to obtain a log packet corresponding to a vulnerability to be analyzed, where the log packet includes a plurality of logs.

In an optional embodiment, the obtaining module 41 includes:

and the decompression unit is used for decompressing the log packets.

The extraction module 42 is configured to extract feature information of the vulnerability to be analyzed from a log related to the vulnerability to be analyzed.

The prediction module 43 is configured to input the feature information into a machine learning model corresponding to the vulnerability to be analyzed for prediction, so as to obtain a reason for generating the vulnerability to be analyzed.

In an optional implementation manner, the log packet includes a log related to the vulnerability to be analyzed and a log unrelated to the vulnerability to be analyzed, and the vulnerability analysis system further includes:

In an optional implementation manner, the vulnerability analysis system further includes a training module, where the training module includes:

Example 3

Fig. 5 is a schematic structural diagram of an electronic device provided in this embodiment. The electronic device comprises a memory, a processor, a computer program stored on the memory and capable of running on the processor, and a plurality of subsystems for realizing different functions, wherein the processor realizes the vulnerability analysis method of embodiment 1 when executing the program. The electronic device 3 shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiment of the present invention.

The components of the electronic device 3 may include, but are not limited to: the at least one processor 4, the at least one memory 5, and a bus 6 connecting the various system components (including the memory 5 and the processor 4).

The bus 6 includes a data bus, an address bus, and a control bus.

The memory 5 may include volatile memory, such as Random Access Memory (RAM) and/or cache memory, and may further include Read Only Memory (ROM).

The memory 5 may also include a program/utility having a set (at least one) of program modules including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

The processor 4 executes various functional applications and data processing, such as the vulnerability analysis method of embodiment 1 of the present invention, by running the computer program stored in the memory 5.

The electronic device 3 may also communicate with one or more external devices 7, such as a keyboard, pointing device, etc. Such communication may be via an input/output (I/O) interface 8. Also, the electronic device 3 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via the network adapter 9. As shown in fig. 5, the network adapter 9 communicates with other modules of the electronic device 3 via the bus 6. It should be appreciated that although not shown in FIG. 5, other hardware and/or software modules may be used in conjunction with the electronic device 3, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID (disk array) systems, tape drives, and data backup storage systems, etc.

It should be noted that although in the above detailed description several units/modules or sub-units/modules of the electronic device are mentioned, such a division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more of the units/modules described above may be embodied in one unit/module according to embodiments of the invention. Conversely, the features and functions of one unit/module described above may be further divided into embodiments by a plurality of units/modules.

Example 4

The present embodiment provides a computer-readable storage medium on which a computer program is stored, which when executed by a processor implements the steps of the vulnerability analysis method of embodiment 1.

More specific examples, among others, that the readable storage medium may employ may include, but are not limited to: a portable disk, a hard disk, random access memory, read only memory, erasable programmable read only memory, optical storage device, magnetic storage device, or any suitable combination of the foregoing.

In a possible implementation manner, the present invention can also be implemented in the form of a program product, which includes program code for causing a terminal device to execute the steps of implementing the vulnerability analysis method of embodiment 1 when the program product runs on the terminal device.

Where program code for carrying out the invention is written in any combination of one or more programming languages, the program code may be executed entirely on the user device, partly on the user device, as a stand-alone software package, partly on the user device and partly on a remote device or entirely on the remote device.

While specific embodiments of the invention have been described above, it will be appreciated by those skilled in the art that this is by way of example only, and that the scope of the invention is defined by the appended claims. Various changes and modifications to these embodiments may be made by those skilled in the art without departing from the spirit and scope of the invention, and these changes and modifications are within the scope of the invention.

Claims

1. A vulnerability analysis method is characterized by comprising the following steps:

2. The vulnerability analysis method according to claim 1, wherein the obtaining of the log packet corresponding to the vulnerability to be analyzed specifically comprises:

downloading the log package from the download address;

and decompressing the log packet.

3. The vulnerability analysis method of claim 1, wherein the log package includes logs related to the vulnerability to be analyzed and logs not related to the vulnerability to be analyzed, the vulnerability analysis method further comprising:

screening logs of a target type from the log packets;

4. The vulnerability analysis method of any of claims 1-3, characterized by training a machine learning model corresponding to the vulnerability to be analyzed using the steps of:

acquiring a training set and a test set according to the training sample;

5. A vulnerability analysis system, comprising:

6. The vulnerability analysis system of claim 5, wherein the acquisition module comprises:

and the decompression unit is used for decompressing the log packets.

7. The vulnerability analysis system of claim 5, wherein the log package includes logs related to the vulnerability to be analyzed and logs not related to the vulnerability to be analyzed, the vulnerability analysis system further comprising:

8. The vulnerability analysis system of any of claims 5-7, wherein the vulnerability analysis system further comprises a training module comprising:

9. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the vulnerability analysis method of any of claims 1-4 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of vulnerability analysis according to any of claims 1-4.