CN111049858A

CN111049858A - Cross validation based baseline scanning vulnerability duplication removing method, device and equipment

Info

Publication number: CN111049858A
Application number: CN201911370350.6A
Authority: CN
Inventors: 林月晴; 范渊
Original assignee: DBAPPSecurity Co Ltd
Current assignee: DBAPPSecurity Co Ltd
Priority date: 2019-12-26
Filing date: 2019-12-26
Publication date: 2020-04-21
Anticipated expiration: 2039-12-26
Also published as: CN111049858B

Abstract

The invention discloses a cross-validation-based baseline scanning vulnerability duplication removing method, a cross-validation-based baseline scanning vulnerability duplication removing device, cross-validation-based baseline scanning vulnerability duplication removing equipment and a computer readable storage medium, wherein the cross-validation-based baseline scanning vulnerability duplication removing method comprises the following steps of: respectively carrying out normalization analysis on K parts of baseline scanning reports of a target host by K manufacturer scanners to obtain vulnerability field data of each vulnerability in each part of baseline scanning report; respectively extracting the characteristics of each baseline scanning report by using a characteristic keyword extraction algorithm to obtain vulnerability characteristic vectors of each vulnerability in each baseline scanning report; training a vulnerability duplication removing model by adopting a K-fold cross validation algorithm according to vulnerability feature vectors of each vulnerability in each baseline scanning report to obtain a target vulnerability duplication removing model; and inputting the vulnerability characteristic vector of each vulnerability in the baseline scanning report to be deduplicated into a target vulnerability deduplication model so as to obtain the deduplicated vulnerability. The method, the device, the equipment and the computer readable storage medium provided by the invention improve the efficiency and the accuracy of baseline vulnerability duplicate removal.

Description

Cross validation based baseline scanning vulnerability duplication removing method, device and equipment

Technical Field

The invention relates to the technical field of network security, in particular to a baseline scanning vulnerability duplication removing method, device and equipment based on cross validation and a computer readable storage medium.

Background

In the information-based trend, network security plays an increasingly critical supporting role for the business development of each enterprise or unit. Vulnerability scanning techniques are an important class of network security techniques. The network security monitoring system is matched with a firewall and an intrusion detection system, and can effectively improve the network security. By scanning the network, a network administrator can know the security setting and the running application service of the network, find the security loophole in time, objectively evaluate the network risk level, correct the network security loophole and the error setting in the system according to the scanning result, and prevent the hacker attack. The security scanning is used as an active precaution measure, so that the hacking behavior can be effectively avoided, and the attack can be prevented in the bud.

The vulnerability scanning tools in the market are numerous, in the daily vulnerability scanning process, when the same host is scanned by the missed scanning devices of different manufacturers, the scanned vulnerability results have large differences, the vulnerability naming, the classification, the vulnerability description, the solution and the like have various standards, a large number of repeated vulnerabilities exist, and the repeated vulnerability investigation workload is greatly increased for operation and maintenance personnel.

At present, the bug deduplication is mainly performed by using representative characteristics such as CVE (composite video encryption Standard) numbers, CNNVD (CNNVD) numbers and the like as repeated verification deduplication or manual deduplication. However, compared with other scanning type vulnerabilities, the baseline scanning vulnerabilities are huge in number and lack of representative features such as CVE numbers and CNNVD numbers, and difficulty is greatly increased for duplication removal work.

In summary, it can be seen that how to improve the efficiency and accuracy of baseline scanning vulnerability deduplication is a problem to be solved at present.

Disclosure of Invention

The invention aims to provide a baseline scanning vulnerability deduplication method, a baseline scanning vulnerability deduplication device, baseline scanning vulnerability deduplication equipment and a computer readable storage medium, which solve the problems of large workload and low efficiency of baseline vulnerability deduplication in the prior art.

In order to solve the technical problem, the invention provides a cross-validation-based baseline scanning vulnerability duplication removal method, which comprises the following steps: acquiring K parts of baseline scanning reports of a target host by K manufacturer scanners, and respectively carrying out normalization analysis on each part of baseline scanning report to obtain vulnerability field data of each vulnerability in each part of baseline scanning report; respectively extracting the characteristics of the vulnerability field data of each vulnerability in each baseline scanning report by using a characteristic keyword extraction algorithm to obtain a vulnerability characteristic vector of each vulnerability in each baseline scanning report; training a pre-constructed vulnerability duplication removing model by adopting a K-fold cross validation algorithm according to the vulnerability feature vector of each vulnerability in each baseline scanning report to obtain a trained target vulnerability duplication removing model; and inputting the vulnerability characteristic vector of each vulnerability in the to-be-deduplicated baseline scanning report into the target vulnerability deduplication model, outputting the deduplicated target vulnerability characteristic vector, and converting the target vulnerability characteristic vector into a target vulnerability.

Preferably, the obtaining K parts of baseline scan reports of the target host by the K manufacturer scanners, and performing normalization analysis on each part of baseline scan report, respectively, to obtain vulnerability field data of each vulnerability in each part of baseline scan report includes:

acquiring K parts of baseline scanning reports of K parts of manufacturer scanners to the target host;

analyzing each baseline scanning report respectively to obtain vulnerability field data of each vulnerability in each baseline scanning report; the vulnerability field data comprises asset IP, vulnerability grade, vulnerability name, inspection classification, judgment basis, vulnerability description and solution.

Preferably, the extracting the features of the vulnerability field data of each vulnerability of each baseline scanning report by using a feature keyword extraction algorithm, and obtaining the vulnerability feature vector of each vulnerability in each baseline scanning report includes:

respectively deleting the Chinese stop words appointed in the data of each bug field of the current bug in the current baseline scanning report;

calculating the word frequency of each word in each loophole field data after the Chinese stop word is removed;

combining and deleting repeated words in the data of each vulnerability field;

and extracting repeated words and word frequencies of the repeated words in the data of each vulnerability field to obtain vulnerability characteristic vectors of the current vulnerability of the current baseline scanning report.

Preferably, the calculating the word frequency of each word in the data of each vulnerability field after the Chinese stop word is removed includes:

and calculating the inverse document frequency value of each word in the vulnerability field data after the Chinese stop word is removed as the word frequency of each word.

Preferably, the training a pre-constructed vulnerability duplication removal model according to the vulnerability feature vector of each vulnerability in each baseline scanning report by using a K-fold cross validation algorithm to obtain a trained target vulnerability duplication removal model includes:

s1: selecting vulnerability characteristic vectors corresponding to two baseline scanning reports from the K baseline scanning reports as a current test sample, and using vulnerability characteristic vectors corresponding to the remaining K-2 baseline scanning reports as a current training sample, wherein at least one different baseline scanning report exists in the current test sample and the last test sample;

s2: calculating the similarity of each vulnerability in the current training sample and other vulnerabilities according to the similarity of the vulnerability feature vector of each vulnerability in the current training sample and the vulnerability feature vectors of other vulnerabilities;

s3: training a current vulnerability deduplication model based on the similarity of each vulnerability in the current training sample and other vulnerabilities, and testing the trained current vulnerability deduplication model by using the current test sample;

s4: and (5) circulating K (K-1)/2 times from S1 to S3 to obtain the target vulnerability deduplication model.

Preferably, the calculating the similarity between each vulnerability in the training sample and other vulnerabilities according to the similarity between the vulnerability feature vector of each vulnerability in the training sample and the vulnerability feature vectors of other vulnerabilities includes:

calculating the feature vector similarity between the vulnerability feature vector of each vulnerability in the current training sample and the vulnerability feature vectors of other vulnerabilities by using an Euclidean distance algorithm;

calculating the field similarity between the vulnerability field data of each vulnerability in the current training sample and the vulnerability field data of other vulnerabilities according to the feature vector similarity;

and calculating the similarity of each vulnerability in the current training sample and other vulnerabilities according to the field similarity.

The invention also provides a cross-validation-based baseline scanning vulnerability duplication removal device, which comprises:

the normalized analysis module is used for acquiring K parts of baseline scanning reports of the target host by K manufacturer scanners, and performing normalized analysis on each part of baseline scanning report to obtain vulnerability field data of each vulnerability in each part of baseline scanning report;

the characteristic extraction module is used for respectively carrying out characteristic extraction on the vulnerability field data of each vulnerability in each baseline scanning report by utilizing a characteristic keyword extraction algorithm to obtain a vulnerability characteristic vector of each vulnerability in each baseline scanning report;

the training module is used for training a pre-constructed vulnerability duplication removal model by adopting a K-fold cross validation algorithm according to the vulnerability characteristic vector of each vulnerability in each baseline scanning report to obtain a trained target vulnerability duplication removal model;

and the output module is used for inputting the vulnerability characteristic vector of each vulnerability in the baseline scanning report to be deduplicated into the target vulnerability deduplication model, outputting the deduplicated target vulnerability characteristic vector, and converting the target vulnerability characteristic vector into a target vulnerability.

Preferably, the normalization parsing module includes:

the acquisition unit is used for acquiring K parts of baseline scanning reports of K parts of manufacturer scanners on the target host;

the analysis unit is used for respectively analyzing each baseline scanning report to obtain vulnerability field data of each vulnerability in each baseline scanning report; the vulnerability field data comprises asset IP, vulnerability grade, vulnerability name, inspection classification, judgment basis, vulnerability description and solution.

The invention also provides a baseline scanning vulnerability resetting equipment based on cross validation, which comprises:

a memory for storing a computer program; and the processor is used for realizing the steps of the baseline scanning vulnerability duplicate removal method based on the cross validation when the computer program is executed.

The invention further provides a computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the baseline scanning vulnerability deduplication method based on cross validation are realized.

The cross-validation-based baseline scanning vulnerability deduplication method provided by the invention has the advantages that K parts of baseline scanning reports of the same host computer by K manufacturer scanners are obtained, and then the K parts of baseline scanning reports are subjected to normalized analysis to obtain vulnerability field data of each vulnerability in each part of baseline scanning report. And extracting the feature vector of the vulnerability field data of each vulnerability in each baseline scanning report by using a feature keyword extraction algorithm to obtain the vulnerability feature vector of each vulnerability in each baseline scanning report. And training a pre-constructed vulnerability duplication removing model by using vulnerability characteristic vectors of each vulnerability in the K parts of baseline scanning reports and adopting a K-fold cross validation algorithm to obtain a trained target vulnerability duplication removing model. And inputting the vulnerability characteristic vector of each vulnerability in the to-be-deduplicated baseline scanning report into the target vulnerability deduplication model, outputting the deduplicated target vulnerability characteristic vector, and converting the target vulnerability characteristic vector into a target vulnerability. Extracting repeated words and word frequency in a baseline scanning report according to a characteristic keyword technology to serve as vulnerability characteristic vectors; combining the vulnerability characteristic vectors, adopting cross validation to construct a duplication removal model, and repeatedly using the sub-samples for training and validation by the cross validation, the generalization of the duplication removal model can be improved, and further the duplication removal efficiency and accuracy of the baseline vulnerability are improved.

Drawings

In order to more clearly illustrate the embodiments or technical solutions of the present invention, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained based on these drawings without creative efforts.

FIG. 1 is a flowchart of a first embodiment of a cross-validation-based baseline scan vulnerability deduplication method according to the present invention;

FIG. 2 is a flowchart of a second embodiment of a cross-validation-based baseline scan vulnerability deduplication method provided by the present invention;

fig. 3 is a block diagram of a cross-validation-based baseline scanning bug duplicate removal apparatus according to an embodiment of the present invention.

Detailed Description

The core of the invention is to provide a baseline scanning vulnerability duplicate removal method, a device, equipment and a computer readable storage medium based on cross validation, which effectively improve the duplicate removal efficiency and accuracy of baseline scanning vulnerabilities.

In order that those skilled in the art will better understand the disclosure, the invention will be described in further detail with reference to the accompanying drawings and specific embodiments. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1, fig. 1 is a flowchart illustrating a first embodiment of a cross-validation-based baseline scan vulnerability deduplication method according to the present invention; the specific operation steps are as follows:

step S101: acquiring K parts of baseline scanning reports of a target host by K manufacturer scanners, and respectively carrying out normalization analysis on each part of baseline scanning report to obtain vulnerability field data of each vulnerability in each part of baseline scanning report;

in this embodiment, the vulnerability field data of each vulnerability includes asset IP, vulnerability class, vulnerability name, inspection classification, judgment basis, vulnerability description, solution, and the like.

Step S102: respectively extracting the characteristics of the vulnerability field data of each vulnerability in each baseline scanning report by using a characteristic keyword extraction algorithm to obtain a vulnerability characteristic vector of each vulnerability in each baseline scanning report;

taking the example of extracting the vulnerability feature vector of a certain vulnerability in a certain baseline scanning report as an example, the specific steps of extracting the vulnerability feature vector of each vulnerability in each baseline scanning report are described as follows: deleting a Chinese stop word appointed in each vulnerability field data of the current vulnerability in the current baseline scanning report; calculating the word frequency of each word in each loophole field data after the Chinese stop word is removed; combining and deleting repeated words in the data of each vulnerability field; and extracting repeated words and word frequencies of the repeated words in the data of each vulnerability field to obtain vulnerability characteristic vectors of the current vulnerability of the current baseline scanning report.

In this embodiment, the inverse document frequency value of each word in each vulnerability field data after the Chinese stop word is removed may be calculated as the word frequency of each word.

Step S103: training a pre-constructed vulnerability duplication removing model by adopting a K-fold cross validation algorithm according to the vulnerability feature vector of each vulnerability in each baseline scanning report to obtain a trained target vulnerability duplication removing model;

step S104: and inputting the vulnerability characteristic vector of each vulnerability in the to-be-deduplicated baseline scanning report into the target vulnerability deduplication model, outputting the deduplicated target vulnerability characteristic vector, and converting the target vulnerability characteristic vector into a target vulnerability.

The baseline scanning vulnerability duplication removal method provided by the embodiment performs data feature extraction on vulnerability field data based on a feature keyword extraction technology, does not depend on representative features such as CVE (composite visual inspection) numbers and CNNVD (CNNVD) numbers, improves the generalization and fitting degree of a duplication removal model based on cross validation, and improves the accuracy of baseline vulnerability duplication removal.

Based on the above embodiment, in this embodiment, after the vulnerability feature vector of each vulnerability in the K parts of baseline scanning reports is obtained, the vulnerability feature vector corresponding to the K parts of baseline scanning reports is divided into a training sample and a test sample, and the similarity between the vulnerability feature vector of each vulnerability in the training sample and the vulnerability feature vectors of other vulnerabilities is calculated through an euclidean distance algorithm, so as to determine the similarity between each vulnerability in the training sample and other vulnerabilities; and training a deduplication model by adopting a K-fold cross validation algorithm based on the similarity of each vulnerability in the training sample with other vulnerabilities.

Step S201: acquiring K parts of baseline scanning reports of a target host by K manufacturer scanners, and respectively carrying out normalization analysis on each part of baseline scanning report to obtain vulnerability field data of each vulnerability in each part of baseline scanning report;

in this embodiment, the value of K may be set to be greater than or equal to 5; the larger the value of K is, the higher the accuracy of the target vulnerability deduplication model after training is completed.

Step S202: respectively extracting the characteristics of the vulnerability field data of each vulnerability in each baseline scanning report by using a characteristic keyword extraction algorithm to obtain a vulnerability characteristic vector of each vulnerability in each baseline scanning report;

step S203: selecting vulnerability characteristic vectors corresponding to two baseline scanning reports from the K baseline scanning reports as a current test sample, and using vulnerability characteristic vectors corresponding to the remaining K-2 baseline scanning reports as a current training sample, wherein at least one different baseline scanning report exists in the current test sample and the last test sample;

step S204: calculating the feature vector similarity between the vulnerability feature vector of each vulnerability in the current training sample and the vulnerability feature vectors of other vulnerabilities by using an Euclidean distance algorithm;

step S205: calculating the field similarity between the vulnerability field data of each vulnerability in the current training sample and the vulnerability field data of other vulnerabilities according to the feature vector similarity;

step S206: calculating the similarity of each vulnerability in the current training sample and other vulnerabilities according to the field similarity;

step S207: training a current vulnerability deduplication model based on the similarity of each vulnerability in the current training sample and other vulnerabilities, and testing the trained current vulnerability deduplication model by using the current test sample;

after the trained current vulnerability duplication elimination model is tested by using the current test sample, an evaluation index of the current vulnerability duplication elimination model, namely duplication elimination rate, can be calculated and stored.

Step S208: circulating K (K-1)/2 times from S203 to S207 to obtain the target vulnerability deduplication model;

it should be noted that in this embodiment, K × (K-1)/2 sub-optimization is performed on the vulnerability deduplication model, so that any two different combinations of baseline scan reports have one chance to be used as a test sample, and the remaining combinations of baseline scan reports have one chance to be used as a training sample.

And performing K (K-1)/2 sub-optimization on the vulnerability deduplication model to obtain K (K-1)/2 deduplication rates, and calculating the average value of the K (K-1)/2 deduplication rates to serve as the evaluation index of the target vulnerability deduplication model.

Step S209: after normalization analysis is carried out on the to-be-deduplicated baseline scan report, feature extraction is carried out on the acquired vulnerability field data of each vulnerability in the to-be-deduplicated baseline scan report by using a feature keyword extraction algorithm, and vulnerability feature vectors of each vulnerability in the to-be-deduplicated baseline scan report are obtained;

step S210: and inputting the vulnerability characteristic vector of each vulnerability in the to-be-deduplicated baseline scanning report into the target vulnerability deduplication model, outputting the deduplicated target vulnerability characteristic vector, and converting the target vulnerability characteristic vector into a target vulnerability.

The K-fold cross validation algorithm divides the initial sampling into K sub-samples, one single sub-sample is reserved as data of a validation model, and the other K-1 samples are used for training; cross validation is repeated K times, each sub-sample is validated once, the K results are averaged or other combinations are used, and a single estimate is obtained. The algorithm repeatedly uses the sub-samples for training and verification, the result of each time can be verified once, and the generalization of the model is greatly improved. Extracting repeated words and word frequency of each vulnerability in the K parts of baseline scanning reports according to a feature keyword technology to serve as vulnerability feature vectors; calculating the similarity between the vulnerabilities based on an Euclidean distance similarity algorithm; based on a K-fold cross validation algorithm, two baseline scanning reports are selected from the K baseline scanning reports to serve as a test set, the rest K-2 baseline scanning reports serve as a training set, and the vulnerability deduplication sample is trained and validated, so that the generalization of the deduplication model is effectively improved, and the accuracy of baseline vulnerability deduplication is further improved.

Referring to fig. 3, fig. 3 is a block diagram illustrating a structure of a cross-validation-based baseline scan bug duplication remover according to an embodiment of the present invention; the specific device may include:

the normalized analysis module 100 is configured to obtain K baseline scan reports of the target host from K manufacturer scanners, and perform normalized analysis on each baseline scan report to obtain vulnerability field data of each vulnerability in each baseline scan report;

the feature extraction module 200 is configured to perform feature extraction on vulnerability field data of each vulnerability in each baseline scan report by using a feature keyword extraction algorithm, so as to obtain a vulnerability feature vector of each vulnerability in each baseline scan report;

the training module 300 is configured to train a pre-constructed vulnerability duplication removal model by using a K-fold cross validation algorithm according to the vulnerability feature vector of each vulnerability in each baseline scanning report to obtain a trained target vulnerability duplication removal model;

an output module 400, configured to input the vulnerability feature vector of each vulnerability in the baseline scanning report to be deduplicated to the target vulnerability deduplication model, output the deduplicated target vulnerability feature vector, and convert the target vulnerability feature vector into a target vulnerability.

The cross-validation-based baseline scanning vulnerability deduplication device of the present embodiment is used for implementing the foregoing cross-validation-based baseline scanning vulnerability deduplication method, and therefore specific embodiments of the cross-validation-based baseline scanning vulnerability deduplication device may be found in the foregoing embodiments of the cross-validation-based baseline scanning vulnerability deduplication method, for example, the normalization analysis module 100, the feature extraction module 200, the training module 300, and the output module 400 are respectively used for implementing steps S101, S102, S103, and S104 in the cross-validation-based baseline scanning vulnerability deduplication method, and therefore, the specific embodiments thereof may refer to descriptions of corresponding respective embodiments of the parts, and are not repeated herein.

The specific embodiment of the invention also provides a baseline scanning vulnerability resetting equipment based on cross validation, which comprises: a memory for storing a computer program; and the processor is used for realizing the steps of the baseline scanning vulnerability duplicate removal method based on the cross validation when the computer program is executed.

The specific embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the baseline scanning vulnerability deduplication method based on cross validation are implemented.

The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The cross-validation-based baseline scanning vulnerability deduplication method, the cross-validation-based baseline scanning vulnerability deduplication device, the cross-validation-based baseline scanning vulnerability deduplication method, the cross-validation-based baseline scanning vulnerability deduplication device and the cross-validation-based baseline scanning vulnerability deduplication method. The principles and embodiments of the present invention are explained herein using specific examples, which are presented only to assist in understanding the method and its core concepts. It should be noted that, for those skilled in the art, it is possible to make various improvements and modifications to the present invention without departing from the principle of the present invention, and those improvements and modifications also fall within the scope of the claims of the present invention.

Claims

1. A baseline scanning vulnerability deduplication method based on cross validation is characterized by comprising the following steps:

acquiring K parts of baseline scanning reports of a target host by K manufacturer scanners, and respectively carrying out normalization analysis on each part of baseline scanning report to obtain vulnerability field data of each vulnerability in each part of baseline scanning report;

respectively extracting the characteristics of the vulnerability field data of each vulnerability in each baseline scanning report by using a characteristic keyword extraction algorithm to obtain a vulnerability characteristic vector of each vulnerability in each baseline scanning report;

training a pre-constructed vulnerability duplication removing model by adopting a K-fold cross validation algorithm according to the vulnerability feature vector of each vulnerability in each baseline scanning report to obtain a trained target vulnerability duplication removing model;

and inputting the vulnerability characteristic vector of each vulnerability in the to-be-deduplicated baseline scanning report into the target vulnerability deduplication model, outputting the deduplicated target vulnerability characteristic vector, and converting the target vulnerability characteristic vector into a target vulnerability.

2. The method of claim 1, wherein the obtaining K baseline scan reports of a target host from K vendor scanners, and performing normalized parsing on each baseline scan report to obtain vulnerability field data of each vulnerability in each baseline scan report comprises:

3. The method of claim 2, wherein the extracting the features of the vulnerability field data of each vulnerability of each baseline scan report by using a feature keyword extraction algorithm to obtain the vulnerability feature vector of each vulnerability of each baseline scan report comprises:

combining and deleting repeated words in the data of each vulnerability field;

4. The method of claim 3, wherein the calculating the word frequency of each word in each vulnerability field data after removing the Chinese stop word comprises:

5. The method of claim 1, wherein training a pre-constructed vulnerability deduplication model according to the vulnerability feature vector of each vulnerability in each baseline scanning report by using a K-fold cross validation algorithm to obtain a trained target vulnerability deduplication model comprises:

6. The method of claim 5, wherein the calculating the similarity of each vulnerability in the training samples to other vulnerabilities according to the similarity of the vulnerability feature vector of each vulnerability in the training samples to the vulnerability feature vectors of other vulnerabilities comprises:

7. A cross-validation-based baseline scan vulnerability deduplication device, comprising:

8. The apparatus of claim 7, wherein the normalized parsing module comprises:

9. A cross-validation-based baseline scan vulnerability deduplication apparatus, comprising:

a memory for storing a computer program;

a processor for implementing the steps of the cross-validation based baseline scan vulnerability deduplication method according to any one of claims 1 to 6 when executing the computer program.

10. A computer-readable storage medium, having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of the cross-validation-based baseline scan vulnerability deduplication method according to any one of claims 1 to 6.