CN111049858A - Cross validation based baseline scanning vulnerability duplication removing method, device and equipment - Google Patents

Cross validation based baseline scanning vulnerability duplication removing method, device and equipment Download PDF

Info

Publication number
CN111049858A
CN111049858A CN201911370350.6A CN201911370350A CN111049858A CN 111049858 A CN111049858 A CN 111049858A CN 201911370350 A CN201911370350 A CN 201911370350A CN 111049858 A CN111049858 A CN 111049858A
Authority
CN
China
Prior art keywords
vulnerability
baseline
baseline scanning
report
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911370350.6A
Other languages
Chinese (zh)
Other versions
CN111049858B (en
Inventor
林月晴
范渊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
DBAPPSecurity Co Ltd
Original Assignee
DBAPPSecurity Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by DBAPPSecurity Co Ltd filed Critical DBAPPSecurity Co Ltd
Priority to CN201911370350.6A priority Critical patent/CN111049858B/en
Publication of CN111049858A publication Critical patent/CN111049858A/en
Application granted granted Critical
Publication of CN111049858B publication Critical patent/CN111049858B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1433Vulnerability analysis

Abstract

The invention discloses a cross-validation-based baseline scanning vulnerability duplication removing method, a cross-validation-based baseline scanning vulnerability duplication removing device, cross-validation-based baseline scanning vulnerability duplication removing equipment and a computer readable storage medium, wherein the cross-validation-based baseline scanning vulnerability duplication removing method comprises the following steps of: respectively carrying out normalization analysis on K parts of baseline scanning reports of a target host by K manufacturer scanners to obtain vulnerability field data of each vulnerability in each part of baseline scanning report; respectively extracting the characteristics of each baseline scanning report by using a characteristic keyword extraction algorithm to obtain vulnerability characteristic vectors of each vulnerability in each baseline scanning report; training a vulnerability duplication removing model by adopting a K-fold cross validation algorithm according to vulnerability feature vectors of each vulnerability in each baseline scanning report to obtain a target vulnerability duplication removing model; and inputting the vulnerability characteristic vector of each vulnerability in the baseline scanning report to be deduplicated into a target vulnerability deduplication model so as to obtain the deduplicated vulnerability. The method, the device, the equipment and the computer readable storage medium provided by the invention improve the efficiency and the accuracy of baseline vulnerability duplicate removal.

Description

Cross validation based baseline scanning vulnerability duplication removing method, device and equipment
Technical Field
The invention relates to the technical field of network security, in particular to a baseline scanning vulnerability duplication removing method, device and equipment based on cross validation and a computer readable storage medium.
Background
In the information-based trend, network security plays an increasingly critical supporting role for the business development of each enterprise or unit. Vulnerability scanning techniques are an important class of network security techniques. The network security monitoring system is matched with a firewall and an intrusion detection system, and can effectively improve the network security. By scanning the network, a network administrator can know the security setting and the running application service of the network, find the security loophole in time, objectively evaluate the network risk level, correct the network security loophole and the error setting in the system according to the scanning result, and prevent the hacker attack. The security scanning is used as an active precaution measure, so that the hacking behavior can be effectively avoided, and the attack can be prevented in the bud.
The vulnerability scanning tools in the market are numerous, in the daily vulnerability scanning process, when the same host is scanned by the missed scanning devices of different manufacturers, the scanned vulnerability results have large differences, the vulnerability naming, the classification, the vulnerability description, the solution and the like have various standards, a large number of repeated vulnerabilities exist, and the repeated vulnerability investigation workload is greatly increased for operation and maintenance personnel.
At present, the bug deduplication is mainly performed by using representative characteristics such as CVE (composite video encryption Standard) numbers, CNNVD (CNNVD) numbers and the like as repeated verification deduplication or manual deduplication. However, compared with other scanning type vulnerabilities, the baseline scanning vulnerabilities are huge in number and lack of representative features such as CVE numbers and CNNVD numbers, and difficulty is greatly increased for duplication removal work.
In summary, it can be seen that how to improve the efficiency and accuracy of baseline scanning vulnerability deduplication is a problem to be solved at present.
Disclosure of Invention
The invention aims to provide a baseline scanning vulnerability deduplication method, a baseline scanning vulnerability deduplication device, baseline scanning vulnerability deduplication equipment and a computer readable storage medium, which solve the problems of large workload and low efficiency of baseline vulnerability deduplication in the prior art.
In order to solve the technical problem, the invention provides a cross-validation-based baseline scanning vulnerability duplication removal method, which comprises the following steps: acquiring K parts of baseline scanning reports of a target host by K manufacturer scanners, and respectively carrying out normalization analysis on each part of baseline scanning report to obtain vulnerability field data of each vulnerability in each part of baseline scanning report; respectively extracting the characteristics of the vulnerability field data of each vulnerability in each baseline scanning report by using a characteristic keyword extraction algorithm to obtain a vulnerability characteristic vector of each vulnerability in each baseline scanning report; training a pre-constructed vulnerability duplication removing model by adopting a K-fold cross validation algorithm according to the vulnerability feature vector of each vulnerability in each baseline scanning report to obtain a trained target vulnerability duplication removing model; and inputting the vulnerability characteristic vector of each vulnerability in the to-be-deduplicated baseline scanning report into the target vulnerability deduplication model, outputting the deduplicated target vulnerability characteristic vector, and converting the target vulnerability characteristic vector into a target vulnerability.
Preferably, the obtaining K parts of baseline scan reports of the target host by the K manufacturer scanners, and performing normalization analysis on each part of baseline scan report, respectively, to obtain vulnerability field data of each vulnerability in each part of baseline scan report includes:
acquiring K parts of baseline scanning reports of K parts of manufacturer scanners to the target host;
analyzing each baseline scanning report respectively to obtain vulnerability field data of each vulnerability in each baseline scanning report; the vulnerability field data comprises asset IP, vulnerability grade, vulnerability name, inspection classification, judgment basis, vulnerability description and solution.
Preferably, the extracting the features of the vulnerability field data of each vulnerability of each baseline scanning report by using a feature keyword extraction algorithm, and obtaining the vulnerability feature vector of each vulnerability in each baseline scanning report includes:
respectively deleting the Chinese stop words appointed in the data of each bug field of the current bug in the current baseline scanning report;
calculating the word frequency of each word in each loophole field data after the Chinese stop word is removed;
combining and deleting repeated words in the data of each vulnerability field;
and extracting repeated words and word frequencies of the repeated words in the data of each vulnerability field to obtain vulnerability characteristic vectors of the current vulnerability of the current baseline scanning report.
Preferably, the calculating the word frequency of each word in the data of each vulnerability field after the Chinese stop word is removed includes:
and calculating the inverse document frequency value of each word in the vulnerability field data after the Chinese stop word is removed as the word frequency of each word.
Preferably, the training a pre-constructed vulnerability duplication removal model according to the vulnerability feature vector of each vulnerability in each baseline scanning report by using a K-fold cross validation algorithm to obtain a trained target vulnerability duplication removal model includes:
s1: selecting vulnerability characteristic vectors corresponding to two baseline scanning reports from the K baseline scanning reports as a current test sample, and using vulnerability characteristic vectors corresponding to the remaining K-2 baseline scanning reports as a current training sample, wherein at least one different baseline scanning report exists in the current test sample and the last test sample;
s2: calculating the similarity of each vulnerability in the current training sample and other vulnerabilities according to the similarity of the vulnerability feature vector of each vulnerability in the current training sample and the vulnerability feature vectors of other vulnerabilities;
s3: training a current vulnerability deduplication model based on the similarity of each vulnerability in the current training sample and other vulnerabilities, and testing the trained current vulnerability deduplication model by using the current test sample;
s4: and (5) circulating K (K-1)/2 times from S1 to S3 to obtain the target vulnerability deduplication model.
Preferably, the calculating the similarity between each vulnerability in the training sample and other vulnerabilities according to the similarity between the vulnerability feature vector of each vulnerability in the training sample and the vulnerability feature vectors of other vulnerabilities includes:
calculating the feature vector similarity between the vulnerability feature vector of each vulnerability in the current training sample and the vulnerability feature vectors of other vulnerabilities by using an Euclidean distance algorithm;
calculating the field similarity between the vulnerability field data of each vulnerability in the current training sample and the vulnerability field data of other vulnerabilities according to the feature vector similarity;
and calculating the similarity of each vulnerability in the current training sample and other vulnerabilities according to the field similarity.
The invention also provides a cross-validation-based baseline scanning vulnerability duplication removal device, which comprises:
the normalized analysis module is used for acquiring K parts of baseline scanning reports of the target host by K manufacturer scanners, and performing normalized analysis on each part of baseline scanning report to obtain vulnerability field data of each vulnerability in each part of baseline scanning report;
the characteristic extraction module is used for respectively carrying out characteristic extraction on the vulnerability field data of each vulnerability in each baseline scanning report by utilizing a characteristic keyword extraction algorithm to obtain a vulnerability characteristic vector of each vulnerability in each baseline scanning report;
the training module is used for training a pre-constructed vulnerability duplication removal model by adopting a K-fold cross validation algorithm according to the vulnerability characteristic vector of each vulnerability in each baseline scanning report to obtain a trained target vulnerability duplication removal model;
and the output module is used for inputting the vulnerability characteristic vector of each vulnerability in the baseline scanning report to be deduplicated into the target vulnerability deduplication model, outputting the deduplicated target vulnerability characteristic vector, and converting the target vulnerability characteristic vector into a target vulnerability.
Preferably, the normalization parsing module includes:
the acquisition unit is used for acquiring K parts of baseline scanning reports of K parts of manufacturer scanners on the target host;
the analysis unit is used for respectively analyzing each baseline scanning report to obtain vulnerability field data of each vulnerability in each baseline scanning report; the vulnerability field data comprises asset IP, vulnerability grade, vulnerability name, inspection classification, judgment basis, vulnerability description and solution.
The invention also provides a baseline scanning vulnerability resetting equipment based on cross validation, which comprises:
a memory for storing a computer program; and the processor is used for realizing the steps of the baseline scanning vulnerability duplicate removal method based on the cross validation when the computer program is executed.
The invention further provides a computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the baseline scanning vulnerability deduplication method based on cross validation are realized.
The cross-validation-based baseline scanning vulnerability deduplication method provided by the invention has the advantages that K parts of baseline scanning reports of the same host computer by K manufacturer scanners are obtained, and then the K parts of baseline scanning reports are subjected to normalized analysis to obtain vulnerability field data of each vulnerability in each part of baseline scanning report. And extracting the feature vector of the vulnerability field data of each vulnerability in each baseline scanning report by using a feature keyword extraction algorithm to obtain the vulnerability feature vector of each vulnerability in each baseline scanning report. And training a pre-constructed vulnerability duplication removing model by using vulnerability characteristic vectors of each vulnerability in the K parts of baseline scanning reports and adopting a K-fold cross validation algorithm to obtain a trained target vulnerability duplication removing model. And inputting the vulnerability characteristic vector of each vulnerability in the to-be-deduplicated baseline scanning report into the target vulnerability deduplication model, outputting the deduplicated target vulnerability characteristic vector, and converting the target vulnerability characteristic vector into a target vulnerability. Extracting repeated words and word frequency in a baseline scanning report according to a characteristic keyword technology to serve as vulnerability characteristic vectors; combining the vulnerability characteristic vectors, adopting cross validation to construct a duplication removal model, and repeatedly using the sub-samples for training and validation by the cross validation, the generalization of the duplication removal model can be improved, and further the duplication removal efficiency and accuracy of the baseline vulnerability are improved.
Drawings
In order to more clearly illustrate the embodiments or technical solutions of the present invention, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without creative efforts.
FIG. 1 is a flowchart of a first embodiment of a cross-validation-based baseline scan vulnerability deduplication method according to the present invention;
FIG. 2 is a flowchart of a second embodiment of a cross-validation-based baseline scan vulnerability deduplication method provided by the present invention;
fig. 3 is a block diagram of a cross-validation-based baseline scanning bug duplicate removal apparatus according to an embodiment of the present invention.
Detailed Description
The core of the invention is to provide a baseline scanning vulnerability duplicate removal method, a device, equipment and a computer readable storage medium based on cross validation, which effectively improve the duplicate removal efficiency and accuracy of baseline scanning vulnerabilities.
In order that those skilled in the art will better understand the disclosure, the invention will be described in further detail with reference to the accompanying drawings and specific embodiments. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, fig. 1 is a flowchart illustrating a first embodiment of a cross-validation-based baseline scan vulnerability deduplication method according to the present invention; the specific operation steps are as follows:
step S101: acquiring K parts of baseline scanning reports of a target host by K manufacturer scanners, and respectively carrying out normalization analysis on each part of baseline scanning report to obtain vulnerability field data of each vulnerability in each part of baseline scanning report;
in this embodiment, the vulnerability field data of each vulnerability includes asset IP, vulnerability class, vulnerability name, inspection classification, judgment basis, vulnerability description, solution, and the like.
Step S102: respectively extracting the characteristics of the vulnerability field data of each vulnerability in each baseline scanning report by using a characteristic keyword extraction algorithm to obtain a vulnerability characteristic vector of each vulnerability in each baseline scanning report;
taking the example of extracting the vulnerability feature vector of a certain vulnerability in a certain baseline scanning report as an example, the specific steps of extracting the vulnerability feature vector of each vulnerability in each baseline scanning report are described as follows: deleting a Chinese stop word appointed in each vulnerability field data of the current vulnerability in the current baseline scanning report; calculating the word frequency of each word in each loophole field data after the Chinese stop word is removed; combining and deleting repeated words in the data of each vulnerability field; and extracting repeated words and word frequencies of the repeated words in the data of each vulnerability field to obtain vulnerability characteristic vectors of the current vulnerability of the current baseline scanning report.
In this embodiment, the inverse document frequency value of each word in each vulnerability field data after the Chinese stop word is removed may be calculated as the word frequency of each word.
Step S103: training a pre-constructed vulnerability duplication removing model by adopting a K-fold cross validation algorithm according to the vulnerability feature vector of each vulnerability in each baseline scanning report to obtain a trained target vulnerability duplication removing model;
step S104: and inputting the vulnerability characteristic vector of each vulnerability in the to-be-deduplicated baseline scanning report into the target vulnerability deduplication model, outputting the deduplicated target vulnerability characteristic vector, and converting the target vulnerability characteristic vector into a target vulnerability.
The baseline scanning vulnerability duplication removal method provided by the embodiment performs data feature extraction on vulnerability field data based on a feature keyword extraction technology, does not depend on representative features such as CVE (composite visual inspection) numbers and CNNVD (CNNVD) numbers, improves the generalization and fitting degree of a duplication removal model based on cross validation, and improves the accuracy of baseline vulnerability duplication removal.
Based on the above embodiment, in this embodiment, after the vulnerability feature vector of each vulnerability in the K parts of baseline scanning reports is obtained, the vulnerability feature vector corresponding to the K parts of baseline scanning reports is divided into a training sample and a test sample, and the similarity between the vulnerability feature vector of each vulnerability in the training sample and the vulnerability feature vectors of other vulnerabilities is calculated through an euclidean distance algorithm, so as to determine the similarity between each vulnerability in the training sample and other vulnerabilities; and training a deduplication model by adopting a K-fold cross validation algorithm based on the similarity of each vulnerability in the training sample with other vulnerabilities.
Step S201: acquiring K parts of baseline scanning reports of a target host by K manufacturer scanners, and respectively carrying out normalization analysis on each part of baseline scanning report to obtain vulnerability field data of each vulnerability in each part of baseline scanning report;
in this embodiment, the value of K may be set to be greater than or equal to 5; the larger the value of K is, the higher the accuracy of the target vulnerability deduplication model after training is completed.
Step S202: respectively extracting the characteristics of the vulnerability field data of each vulnerability in each baseline scanning report by using a characteristic keyword extraction algorithm to obtain a vulnerability characteristic vector of each vulnerability in each baseline scanning report;
step S203: selecting vulnerability characteristic vectors corresponding to two baseline scanning reports from the K baseline scanning reports as a current test sample, and using vulnerability characteristic vectors corresponding to the remaining K-2 baseline scanning reports as a current training sample, wherein at least one different baseline scanning report exists in the current test sample and the last test sample;
step S204: calculating the feature vector similarity between the vulnerability feature vector of each vulnerability in the current training sample and the vulnerability feature vectors of other vulnerabilities by using an Euclidean distance algorithm;
step S205: calculating the field similarity between the vulnerability field data of each vulnerability in the current training sample and the vulnerability field data of other vulnerabilities according to the feature vector similarity;
step S206: calculating the similarity of each vulnerability in the current training sample and other vulnerabilities according to the field similarity;
step S207: training a current vulnerability deduplication model based on the similarity of each vulnerability in the current training sample and other vulnerabilities, and testing the trained current vulnerability deduplication model by using the current test sample;
after the trained current vulnerability duplication elimination model is tested by using the current test sample, an evaluation index of the current vulnerability duplication elimination model, namely duplication elimination rate, can be calculated and stored.
Step S208: circulating K (K-1)/2 times from S203 to S207 to obtain the target vulnerability deduplication model;
it should be noted that in this embodiment, K × (K-1)/2 sub-optimization is performed on the vulnerability deduplication model, so that any two different combinations of baseline scan reports have one chance to be used as a test sample, and the remaining combinations of baseline scan reports have one chance to be used as a training sample.
And performing K (K-1)/2 sub-optimization on the vulnerability deduplication model to obtain K (K-1)/2 deduplication rates, and calculating the average value of the K (K-1)/2 deduplication rates to serve as the evaluation index of the target vulnerability deduplication model.
Step S209: after normalization analysis is carried out on the to-be-deduplicated baseline scan report, feature extraction is carried out on the acquired vulnerability field data of each vulnerability in the to-be-deduplicated baseline scan report by using a feature keyword extraction algorithm, and vulnerability feature vectors of each vulnerability in the to-be-deduplicated baseline scan report are obtained;
step S210: and inputting the vulnerability characteristic vector of each vulnerability in the to-be-deduplicated baseline scanning report into the target vulnerability deduplication model, outputting the deduplicated target vulnerability characteristic vector, and converting the target vulnerability characteristic vector into a target vulnerability.
The K-fold cross validation algorithm divides the initial sampling into K sub-samples, one single sub-sample is reserved as data of a validation model, and the other K-1 samples are used for training; cross validation is repeated K times, each sub-sample is validated once, the K results are averaged or other combinations are used, and a single estimate is obtained. The algorithm repeatedly uses the sub-samples for training and verification, the result of each time can be verified once, and the generalization of the model is greatly improved. Extracting repeated words and word frequency of each vulnerability in the K parts of baseline scanning reports according to a feature keyword technology to serve as vulnerability feature vectors; calculating the similarity between the vulnerabilities based on an Euclidean distance similarity algorithm; based on a K-fold cross validation algorithm, two baseline scanning reports are selected from the K baseline scanning reports to serve as a test set, the rest K-2 baseline scanning reports serve as a training set, and the vulnerability deduplication sample is trained and validated, so that the generalization of the deduplication model is effectively improved, and the accuracy of baseline vulnerability deduplication is further improved.
Referring to fig. 3, fig. 3 is a block diagram illustrating a structure of a cross-validation-based baseline scan bug duplication remover according to an embodiment of the present invention; the specific device may include:
the normalized analysis module 100 is configured to obtain K baseline scan reports of the target host from K manufacturer scanners, and perform normalized analysis on each baseline scan report to obtain vulnerability field data of each vulnerability in each baseline scan report;
the feature extraction module 200 is configured to perform feature extraction on vulnerability field data of each vulnerability in each baseline scan report by using a feature keyword extraction algorithm, so as to obtain a vulnerability feature vector of each vulnerability in each baseline scan report;
the training module 300 is configured to train a pre-constructed vulnerability duplication removal model by using a K-fold cross validation algorithm according to the vulnerability feature vector of each vulnerability in each baseline scanning report to obtain a trained target vulnerability duplication removal model;
an output module 400, configured to input the vulnerability feature vector of each vulnerability in the baseline scanning report to be deduplicated to the target vulnerability deduplication model, output the deduplicated target vulnerability feature vector, and convert the target vulnerability feature vector into a target vulnerability.
The cross-validation-based baseline scanning vulnerability deduplication device of the present embodiment is used for implementing the foregoing cross-validation-based baseline scanning vulnerability deduplication method, and therefore specific embodiments of the cross-validation-based baseline scanning vulnerability deduplication device may be found in the foregoing embodiments of the cross-validation-based baseline scanning vulnerability deduplication method, for example, the normalization analysis module 100, the feature extraction module 200, the training module 300, and the output module 400 are respectively used for implementing steps S101, S102, S103, and S104 in the cross-validation-based baseline scanning vulnerability deduplication method, and therefore, the specific embodiments thereof may refer to descriptions of corresponding respective embodiments of the parts, and are not repeated herein.
The specific embodiment of the invention also provides a baseline scanning vulnerability resetting equipment based on cross validation, which comprises: a memory for storing a computer program; and the processor is used for realizing the steps of the baseline scanning vulnerability duplicate removal method based on the cross validation when the computer program is executed.
The specific embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the baseline scanning vulnerability deduplication method based on cross validation are implemented.
The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The cross-validation-based baseline scanning vulnerability deduplication method, the cross-validation-based baseline scanning vulnerability deduplication device, the cross-validation-based baseline scanning vulnerability deduplication method, the cross-validation-based baseline scanning vulnerability deduplication device and the cross-validation-based baseline scanning vulnerability deduplication method. The principles and embodiments of the present invention are explained herein using specific examples, which are presented only to assist in understanding the method and its core concepts. It should be noted that, for those skilled in the art, it is possible to make various improvements and modifications to the present invention without departing from the principle of the present invention, and those improvements and modifications also fall within the scope of the claims of the present invention.

Claims (10)

1. A baseline scanning vulnerability deduplication method based on cross validation is characterized by comprising the following steps:
acquiring K parts of baseline scanning reports of a target host by K manufacturer scanners, and respectively carrying out normalization analysis on each part of baseline scanning report to obtain vulnerability field data of each vulnerability in each part of baseline scanning report;
respectively extracting the characteristics of the vulnerability field data of each vulnerability in each baseline scanning report by using a characteristic keyword extraction algorithm to obtain a vulnerability characteristic vector of each vulnerability in each baseline scanning report;
training a pre-constructed vulnerability duplication removing model by adopting a K-fold cross validation algorithm according to the vulnerability feature vector of each vulnerability in each baseline scanning report to obtain a trained target vulnerability duplication removing model;
and inputting the vulnerability characteristic vector of each vulnerability in the to-be-deduplicated baseline scanning report into the target vulnerability deduplication model, outputting the deduplicated target vulnerability characteristic vector, and converting the target vulnerability characteristic vector into a target vulnerability.
2. The method of claim 1, wherein the obtaining K baseline scan reports of a target host from K vendor scanners, and performing normalized parsing on each baseline scan report to obtain vulnerability field data of each vulnerability in each baseline scan report comprises:
acquiring K parts of baseline scanning reports of K parts of manufacturer scanners to the target host;
analyzing each baseline scanning report respectively to obtain vulnerability field data of each vulnerability in each baseline scanning report; the vulnerability field data comprises asset IP, vulnerability grade, vulnerability name, inspection classification, judgment basis, vulnerability description and solution.
3. The method of claim 2, wherein the extracting the features of the vulnerability field data of each vulnerability of each baseline scan report by using a feature keyword extraction algorithm to obtain the vulnerability feature vector of each vulnerability of each baseline scan report comprises:
respectively deleting the Chinese stop words appointed in the data of each bug field of the current bug in the current baseline scanning report;
calculating the word frequency of each word in each loophole field data after the Chinese stop word is removed;
combining and deleting repeated words in the data of each vulnerability field;
and extracting repeated words and word frequencies of the repeated words in the data of each vulnerability field to obtain vulnerability characteristic vectors of the current vulnerability of the current baseline scanning report.
4. The method of claim 3, wherein the calculating the word frequency of each word in each vulnerability field data after removing the Chinese stop word comprises:
and calculating the inverse document frequency value of each word in the vulnerability field data after the Chinese stop word is removed as the word frequency of each word.
5. The method of claim 1, wherein training a pre-constructed vulnerability deduplication model according to the vulnerability feature vector of each vulnerability in each baseline scanning report by using a K-fold cross validation algorithm to obtain a trained target vulnerability deduplication model comprises:
s1: selecting vulnerability characteristic vectors corresponding to two baseline scanning reports from the K baseline scanning reports as a current test sample, and using vulnerability characteristic vectors corresponding to the remaining K-2 baseline scanning reports as a current training sample, wherein at least one different baseline scanning report exists in the current test sample and the last test sample;
s2: calculating the similarity of each vulnerability in the current training sample and other vulnerabilities according to the similarity of the vulnerability feature vector of each vulnerability in the current training sample and the vulnerability feature vectors of other vulnerabilities;
s3: training a current vulnerability deduplication model based on the similarity of each vulnerability in the current training sample and other vulnerabilities, and testing the trained current vulnerability deduplication model by using the current test sample;
s4: and (5) circulating K (K-1)/2 times from S1 to S3 to obtain the target vulnerability deduplication model.
6. The method of claim 5, wherein the calculating the similarity of each vulnerability in the training samples to other vulnerabilities according to the similarity of the vulnerability feature vector of each vulnerability in the training samples to the vulnerability feature vectors of other vulnerabilities comprises:
calculating the feature vector similarity between the vulnerability feature vector of each vulnerability in the current training sample and the vulnerability feature vectors of other vulnerabilities by using an Euclidean distance algorithm;
calculating the field similarity between the vulnerability field data of each vulnerability in the current training sample and the vulnerability field data of other vulnerabilities according to the feature vector similarity;
and calculating the similarity of each vulnerability in the current training sample and other vulnerabilities according to the field similarity.
7. A cross-validation-based baseline scan vulnerability deduplication device, comprising:
the normalized analysis module is used for acquiring K parts of baseline scanning reports of the target host by K manufacturer scanners, and performing normalized analysis on each part of baseline scanning report to obtain vulnerability field data of each vulnerability in each part of baseline scanning report;
the characteristic extraction module is used for respectively carrying out characteristic extraction on the vulnerability field data of each vulnerability in each baseline scanning report by utilizing a characteristic keyword extraction algorithm to obtain a vulnerability characteristic vector of each vulnerability in each baseline scanning report;
the training module is used for training a pre-constructed vulnerability duplication removal model by adopting a K-fold cross validation algorithm according to the vulnerability characteristic vector of each vulnerability in each baseline scanning report to obtain a trained target vulnerability duplication removal model;
and the output module is used for inputting the vulnerability characteristic vector of each vulnerability in the baseline scanning report to be deduplicated into the target vulnerability deduplication model, outputting the deduplicated target vulnerability characteristic vector, and converting the target vulnerability characteristic vector into a target vulnerability.
8. The apparatus of claim 7, wherein the normalized parsing module comprises:
the acquisition unit is used for acquiring K parts of baseline scanning reports of K parts of manufacturer scanners on the target host;
the analysis unit is used for respectively analyzing each baseline scanning report to obtain vulnerability field data of each vulnerability in each baseline scanning report; the vulnerability field data comprises asset IP, vulnerability grade, vulnerability name, inspection classification, judgment basis, vulnerability description and solution.
9. A cross-validation-based baseline scan vulnerability deduplication apparatus, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the cross-validation based baseline scan vulnerability deduplication method according to any one of claims 1 to 6 when executing the computer program.
10. A computer-readable storage medium, having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of the cross-validation-based baseline scan vulnerability deduplication method according to any one of claims 1 to 6.
CN201911370350.6A 2019-12-26 2019-12-26 Cross validation based baseline scanning vulnerability duplication removing method, device and equipment Active CN111049858B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911370350.6A CN111049858B (en) 2019-12-26 2019-12-26 Cross validation based baseline scanning vulnerability duplication removing method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911370350.6A CN111049858B (en) 2019-12-26 2019-12-26 Cross validation based baseline scanning vulnerability duplication removing method, device and equipment

Publications (2)

Publication Number Publication Date
CN111049858A true CN111049858A (en) 2020-04-21
CN111049858B CN111049858B (en) 2022-05-24

Family

ID=70239156

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911370350.6A Active CN111049858B (en) 2019-12-26 2019-12-26 Cross validation based baseline scanning vulnerability duplication removing method, device and equipment

Country Status (1)

Country Link
CN (1) CN111049858B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113656807A (en) * 2021-08-23 2021-11-16 杭州安恒信息技术股份有限公司 Vulnerability management method, device, equipment and storage medium
CN114329485A (en) * 2021-12-24 2022-04-12 中电信数智科技有限公司 Vulnerability duplicate removal method and device based on deep learning
CN114785574A (en) * 2022-04-07 2022-07-22 国网浙江省电力有限公司宁波供电公司 AI-assisted-based remote vulnerability accurate verification method
CN116502241A (en) * 2023-06-29 2023-07-28 中汽智联技术有限公司 Method and system for enhancing vulnerability scanning tool based on PoC load library
US20230281301A1 (en) * 2022-03-03 2023-09-07 Dell Products, L.P. System and method for detecting and reporting system clock attacks within an indicators of attack platform

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170318048A1 (en) * 2016-04-29 2017-11-02 Ciena Corporation System and method for monitoring network vulnerabilities
CN107704763A (en) * 2017-09-04 2018-02-16 中国移动通信集团广东有限公司 Multi-source heterogeneous leak information De-weight method, stage division and device
CN108494727A (en) * 2018-02-06 2018-09-04 成都清华永新网络科技有限公司 A kind of security incident closed-loop process method for network security management
CN108737425A (en) * 2018-05-24 2018-11-02 北京凌云信安科技有限公司 Fragility based on multi engine vulnerability scanning association analysis manages system
CN109376535A (en) * 2018-08-14 2019-02-22 中国信息安全测评中心 A kind of leak analysis method and system based on intelligent semiology analysis
CN110069930A (en) * 2019-04-29 2019-07-30 广东电网有限责任公司 A kind of loophole restorative procedure, device and computer readable storage medium
CN110198319A (en) * 2019-06-03 2019-09-03 电子科技大学 Security protocol bug excavation method based on more counter-examples
CN110443304A (en) * 2019-08-06 2019-11-12 民生科技有限责任公司 A kind of business risk appraisal procedure based on machine learning model
CN110598787A (en) * 2019-09-12 2019-12-20 北京理工大学 Software bug classification method based on self-defined step length learning

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170318048A1 (en) * 2016-04-29 2017-11-02 Ciena Corporation System and method for monitoring network vulnerabilities
CN107704763A (en) * 2017-09-04 2018-02-16 中国移动通信集团广东有限公司 Multi-source heterogeneous leak information De-weight method, stage division and device
CN108494727A (en) * 2018-02-06 2018-09-04 成都清华永新网络科技有限公司 A kind of security incident closed-loop process method for network security management
CN108737425A (en) * 2018-05-24 2018-11-02 北京凌云信安科技有限公司 Fragility based on multi engine vulnerability scanning association analysis manages system
CN109376535A (en) * 2018-08-14 2019-02-22 中国信息安全测评中心 A kind of leak analysis method and system based on intelligent semiology analysis
CN110069930A (en) * 2019-04-29 2019-07-30 广东电网有限责任公司 A kind of loophole restorative procedure, device and computer readable storage medium
CN110198319A (en) * 2019-06-03 2019-09-03 电子科技大学 Security protocol bug excavation method based on more counter-examples
CN110443304A (en) * 2019-08-06 2019-11-12 民生科技有限责任公司 A kind of business risk appraisal procedure based on machine learning model
CN110598787A (en) * 2019-09-12 2019-12-20 北京理工大学 Software bug classification method based on self-defined step length learning

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
朱湘等: "基于微博的事件传播分析", 《计算机研究与发展》, no. 02, 15 February 2015 (2015-02-15) *
王彤彤等: "一种基于矢量空间模型的垃圾邮件去重复技术", 《通信技术》 *
王彤彤等: "一种基于矢量空间模型的垃圾邮件去重复技术", 《通信技术》, vol. 40, no. 12, 31 December 2007 (2007-12-31), pages 299 - 301 *
王行甫等: "基于余弦相似度和实例加权改进的贝叶斯算法", 《计算机系统应用》 *
王行甫等: "基于余弦相似度和实例加权改进的贝叶斯算法", 《计算机系统应用》, no. 08, 15 August 2016 (2016-08-15) *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113656807A (en) * 2021-08-23 2021-11-16 杭州安恒信息技术股份有限公司 Vulnerability management method, device, equipment and storage medium
CN113656807B (en) * 2021-08-23 2024-04-16 杭州安恒信息技术股份有限公司 Vulnerability management method, device, equipment and storage medium
CN114329485A (en) * 2021-12-24 2022-04-12 中电信数智科技有限公司 Vulnerability duplicate removal method and device based on deep learning
CN114329485B (en) * 2021-12-24 2023-01-10 中电信数智科技有限公司 Vulnerability duplicate removal method and device based on deep learning
US20230281301A1 (en) * 2022-03-03 2023-09-07 Dell Products, L.P. System and method for detecting and reporting system clock attacks within an indicators of attack platform
CN114785574A (en) * 2022-04-07 2022-07-22 国网浙江省电力有限公司宁波供电公司 AI-assisted-based remote vulnerability accurate verification method
CN114785574B (en) * 2022-04-07 2023-09-29 国网浙江省电力有限公司宁波供电公司 AI-assisted remote vulnerability accurate verification method
CN116502241A (en) * 2023-06-29 2023-07-28 中汽智联技术有限公司 Method and system for enhancing vulnerability scanning tool based on PoC load library
CN116502241B (en) * 2023-06-29 2023-10-10 中汽智联技术有限公司 Method and system for enhancing vulnerability scanning tool based on PoC load library

Also Published As

Publication number Publication date
CN111049858B (en) 2022-05-24

Similar Documents

Publication Publication Date Title
CN111049858B (en) Cross validation based baseline scanning vulnerability duplication removing method, device and equipment
CN109347827B (en) Method, device, equipment and storage medium for predicting network attack behavior
CN108920954B (en) Automatic malicious code detection platform and method
CN111355697B (en) Detection method, device, equipment and storage medium for botnet domain name family
JP6030272B2 (en) Website information extraction apparatus, system, website information extraction method, and website information extraction program
CN110768875A (en) Application identification method and system based on DNS learning
EP3905084A1 (en) Method and device for detecting malware
CN111404949A (en) Flow detection method, device, equipment and storage medium
CN109309665B (en) Access request processing method and device, computing device and storage medium
US20190370476A1 (en) Determination apparatus, determination method, and determination program
CN116346456A (en) Business logic vulnerability attack detection model training method and device
CN110020665B (en) Microbial mass spectrometry data analysis method compatible with different flight mass spectrometers
CN116389099A (en) Threat detection method, threat detection device, electronic equipment and storage medium
CN114329452A (en) Abnormal behavior detection method and device and related equipment
CN111159115A (en) Similar file detection method, device, equipment and storage medium
CN112685255A (en) Interface monitoring method and device, electronic equipment and storage medium
CN110233848B (en) Asset situation analysis method and device
CN112153062A (en) Multi-dimension-based suspicious terminal equipment detection method and system
CN111625837A (en) Method and device for identifying system vulnerability and server
CN115643044A (en) Data processing method, device, server and storage medium
CN111917802B (en) Intrusion detection rule test platform and test method
CN113553370A (en) Abnormality detection method, abnormality detection device, electronic device, and readable storage medium
CN113238971A (en) Automatic penetration testing system and method based on state machine
CN115102728B (en) Scanner identification method, device, equipment and medium for information security
CN113810351A (en) Method and device for determining attacker of network attack and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant