CN115563619A - Vulnerability similarity comparison method and system based on text pre-training model - Google Patents
Vulnerability similarity comparison method and system based on text pre-training model Download PDFInfo
- Publication number
- CN115563619A CN115563619A CN202211182151.4A CN202211182151A CN115563619A CN 115563619 A CN115563619 A CN 115563619A CN 202211182151 A CN202211182151 A CN 202211182151A CN 115563619 A CN115563619 A CN 115563619A
- Authority
- CN
- China
- Prior art keywords
- vulnerability
- text
- target
- similarity
- word
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 34
- 238000012549 training Methods 0.000 title claims abstract description 23
- 239000013598 vector Substances 0.000 claims abstract description 48
- 238000004364 calculation method Methods 0.000 claims abstract description 43
- 238000001914 filtration Methods 0.000 claims abstract description 17
- 238000007781 pre-processing Methods 0.000 claims abstract description 17
- 238000012545 processing Methods 0.000 claims abstract description 15
- 230000011218 segmentation Effects 0.000 claims abstract description 15
- 238000000605 extraction Methods 0.000 claims description 5
- 238000005516 engineering process Methods 0.000 description 19
- 238000003058 natural language processing Methods 0.000 description 4
- 238000005065 mining Methods 0.000 description 3
- 238000001514 detection method Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013145 classification model Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005538 encapsulation Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000012502 risk assessment Methods 0.000 description 1
- 238000012038 vulnerability analysis Methods 0.000 description 1
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The application discloses a vulnerability similarity comparison method and system based on a text pre-training model. Firstly, acquiring a vulnerability text data set of a vulnerability scanning product, and preprocessing the vulnerability text data set to obtain a target vulnerability text; vectorizing the target vulnerability text based on a sequence-BERT model to obtain a vulnerability text vector; performing text segmentation and main word bank filtering on the target vulnerability text, and extracting main words; then processing the target vulnerability text based on the vulnerability keyword regular matching and the HMCN model to obtain the vulnerability type of the target vulnerability text; and finally, respectively carrying out vulnerability similarity calculation on the obtained vulnerability text vectors, the subject words and the vulnerability types, and carrying out weighted summation on the calculation results of the vulnerability similarities to obtain vulnerability similarity comparison results. According to the vulnerability similarity judging method, whether two vulnerability texts belong to the same vulnerability description or not is judged according to the three dimensions of the text similarity, the body words and the vulnerability types, and therefore the accuracy of judging the vulnerability similarity is improved.
Description
Technical Field
The invention relates to the field of vulnerability data detection, in particular to a vulnerability similarity comparison method and system based on a text pre-training model.
Background
At present, the vulnerability scanning and evaluating product mainly adopts a technology based on a vulnerability knowledge base. And the vulnerability knowledge base is a vulnerability base established by information security centers of various countries and information security manufacturers and organizations, such as CVE (Common Vulnerabilities & Exposueres) and the like. The existing vulnerability scanning products often support various vulnerability libraries and even support integration of various vulnerability scanning technologies. In order to improve the accuracy of vulnerability scanning results and better perform vulnerability analysis and risk assessment, a vulnerability similarity comparison technology is needed to normalize similar vulnerabilities.
The existing vulnerability similarity detection technology mainly comprises a rule matching-based method and a text mining-based method. For the rule matching method, keywords in vulnerability information are extracted, and the keyword overlap ratio is used as the similarity between vulnerabilities. Vulnerability keywords are often extracted from information such as vulnerability description, vulnerability types and vulnerability risk levels. The method depends on the integrity and consistency of the vulnerability information, and does not dig out the semantic information of deep level in the vulnerability information. Due to the fact that specifications of different vulnerability scanning technologies are different, description modes of vulnerability information are often different, and misjudgment is easy to occur. For the method based on text mining, vulnerability information is modeled and compared mainly by using the existing Natural Language Processing (NLP) technology. The existing vulnerability similarity comparison technology mainly converts a vulnerability similarity comparison problem into a text similarity problem in NLP, vectorizes a vulnerability text by applying a Word2Vec Word vector generation model and a TF-IDF (Term Frequency-Inverse Document Frequency) weighting technology, and then takes the vector similarity as the vulnerability similarity. Compared with a rule matching method, the technology is more flexible, can extract deep semantic information in the loophole text, and makes up for the defects of the rule matching method.
However, due to the rapid development of the NLP technology, the existing vulnerability similarity is more outdated than the technology type selection of the Word2Vec + TF-IDF adopted by the technology, the effect can only meet simple vulnerability similarity judgment with less information, and in the actual vulnerability similarity comparison problem, a plurality of more troublesome similarity judgment problems exist, for example, the rest parts of two vulnerability texts are completely the same except the asset type; or the two vulnerabilities describe different vulnerabilities under the same asset, and the like. Because the vulnerability texts under the conditions have slight differences, even if some text mining technologies are applied, high similarity can be obtained, but the actual description is not of the same vulnerability. Therefore, a more refined and multidimensional vulnerability similarity comparison technology is needed, which can more accurately judge the vulnerability similarity.
Disclosure of Invention
Based on the vulnerability similarity comparison method and system based on the text pre-training model, whether two vulnerability texts belong to the same vulnerability description or not is judged according to three dimensions of text similarity, main word and vulnerability type, and therefore accuracy of vulnerability similarity judgment is improved.
In a first aspect, a vulnerability similarity comparison method based on a text pre-training model is provided, and the method includes:
acquiring a vulnerability text data set of a vulnerability scanning product;
preprocessing a vulnerability text data set to obtain a target vulnerability text;
vectorizing the target vulnerability text based on a pre-trained sequence-BERT model to obtain a vulnerability text vector; the vulnerability text vector is used for representing semantic information of a sentence on a vector space;
performing text word segmentation and main word bank filtering on the target vulnerability text, and extracting main words of the target vulnerability text;
processing the target vulnerability text based on vulnerability keyword regular matching and an HMCN model to obtain a vulnerability type of the target vulnerability text;
and respectively carrying out vulnerability similarity calculation on the obtained vulnerability text vector, the subject word and the vulnerability type, and carrying out weighted summation on the calculation results of the vulnerability similarities to obtain a vulnerability similarity comparison result.
Optionally, the preprocessing the vulnerability text data set includes:
and filtering the vulnerability text data set to describe short and/or long texts, and converting English into lowercase.
Optionally, vectorizing the target vulnerability text based on a pre-trained sequence-BERT model to obtain a vulnerability text vector, including:
and generating a sentence Embedding vector with semantics by using the twin network model and the triplet network model.
Optionally, performing text segmentation and main word bank filtering on the target vulnerability text, and extracting main words of the target vulnerability text, including:
extracting an English part in the vulnerability text, performing word segmentation processing, comparing the part with an English main word bank, and taking a word in a preset word list in a comparison result as a subject word of the vulnerability text; wherein, the preset word list is manually set to have interesting word list.
Optionally, the vulnerability similarity calculation is performed on the obtained vulnerability text vector, and includes:
and calculating to obtain a first vulnerability similarity calculation result based on cosine similarity among vulnerability text vectors.
Alternatively,
and carrying out vulnerability similarity calculation on the obtained subject words, wherein the vulnerability similarity calculation comprises the following steps:
acquiring a main word list and a position weight list of the mobile terminal;
acquiring an intersection part of the main word list and the position weight list;
and according to the formula
Obtaining a second vulnerability similarity calculation result; wherein, A represents a target vulnerability text, B represents a contrast vulnerability text, SPL A (i) Representing the position weight, SPL, of the subject word i in the target vulnerability text B (i) And the position weight of the subject word i in the comparison loophole text is represented, and n represents a subject word list.
Optionally, the vulnerability similarity calculation is performed on the obtained vulnerability types, and includes:
when the types of the vulnerability text pairs are the same, assigning the third vulnerability similarity calculation result to be 1;
and when the types of the vulnerability text pairs are different, assigning the third vulnerability similarity calculation result to be 0.
In a second aspect, a vulnerability similarity comparison system based on a text pre-training model is provided, and the system includes:
the acquisition module is used for acquiring a vulnerability text data set of a vulnerability scanning product;
the preprocessing module is used for preprocessing the vulnerability text data set to obtain a target vulnerability text;
the vectorization module is used for vectorizing the target vulnerability text based on a pre-trained sequence-BERT model to obtain a vulnerability text vector; the vulnerability text vector is used for representing semantic information of a sentence on a vector space;
the extraction module is used for performing text word segmentation and main word bank filtering on the target vulnerability text and extracting main words of the target vulnerability text;
the processing module is used for processing the target vulnerability text based on vulnerability keyword regular matching and an HMCN model to obtain the vulnerability type of the target vulnerability text;
and the calculation module is used for respectively calculating the vulnerability similarity of the obtained vulnerability text vectors, the subject words and the vulnerability types, and weighting and summing the calculation results of the obtained vulnerability similarities to obtain a vulnerability similarity comparison result.
Optionally, the preprocessing module specifically includes:
and filtering and describing the vulnerability text data set.
Optionally, the vectorization module specifically includes:
and generating a sentence Embedding vector with semantics by using the twin network model and the triplet network model.
According to the technical scheme provided by the embodiment of the application, firstly, a vulnerability text data set of a vulnerability scanning product is obtained; preprocessing a vulnerability text data set to obtain a target vulnerability text; vectorizing a target vulnerability text based on a pre-trained sequence-BERT model to obtain a vulnerability text vector; performing text word segmentation and main word bank filtering on the target vulnerability text, and extracting main words of the target vulnerability text; then processing the target vulnerability text based on the vulnerability keyword regular matching and the HMCN model to obtain the vulnerability type of the target vulnerability text; and finally, respectively carrying out vulnerability similarity calculation on the obtained vulnerability text vectors, the subject words and the vulnerability types, and carrying out weighted summation on the calculation results of the vulnerability similarities to obtain vulnerability similarity comparison results. It can be seen that the beneficial effects of the present invention at least include:
(1) Based on multi-dimensional similarity comparison, the calculation accuracy is high;
(2) A large amount of rule matching is not needed, and the calculation efficiency is high;
(3) The model can be reused after being trained, and the maintenance labor cost is low;
(4) The similarity calculation flexibility is high, and the false alarm is low;
(5) The encapsulation degree is high, and the professional level requires lowly.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It should be apparent that the drawings in the following description are merely exemplary, and that other embodiments can be derived from the drawings provided by those of ordinary skill in the art without inventive effort.
Fig. 1 is a flowchart of a vulnerability similarity comparison method based on a text pre-training model according to an embodiment of the present application;
fig. 2 is a flowchart of vulnerability text subject word extraction provided in the embodiment of the present application;
fig. 3 is a flowchart of vulnerability text type identification provided in an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more clearly understood, the present application is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of and not restrictive on the broad application.
In the description of the present invention, the meaning of "a plurality" is two or more unless otherwise specified. The terms "first," "second," "third," "fourth," and the like in the description and claims of the present invention and in the above-described drawings are intended to distinguish between the referenced items. For a scheme with a time sequence flow, the expression of the terms is not necessarily understood to describe a specific sequence or order, and for a scheme with a device structure, the expression of the terms does not have distinction of importance degree, position relation and the like.
Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements specifically listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus or added steps as further optimized based on the inventive concept.
The application provides a multi-dimensional vulnerability similarity comparison technology based on a text pre-training model. The technology mainly judges whether two vulnerability texts belong to the same vulnerability description or not according to three dimensions of text similarity, main words and vulnerability types. Firstly, the technology applies a sequence-BERT text pre-training model to carry out vectorization processing on a vulnerability text to obtain semantic information of a Sentence on a vector space; then, a main word list of vulnerability description is obtained in a text word segmentation and main word library filtering mode; then, a specific vulnerability type of the vulnerability description is obtained through an HMCN (Hierarchical Multi-Label Classification Networks) model. Finally, the technology carries out weighted summation on the data of the three dimensions, and the similarity between the vulnerability texts is calculated. Specifically, please refer to fig. 1, which shows a flowchart of a vulnerability similarity comparison method based on a text pre-training model according to an embodiment of the present application, where the method may include the following steps:
s1, acquiring a vulnerability text data set of a vulnerability scanning product.
In this embodiment, vulnerability text data sets obtained by different vulnerability scanning products can be integrated.
And S2, preprocessing the vulnerability text data set to obtain a target vulnerability text.
In this embodiment, a series of data preprocessing operations are performed on the vulnerability text data set, including filtering text that describes too short or too long, and converting english to lowercase.
And S3, vectorizing the target vulnerability text based on a pre-trained sequence-BERT model to obtain a vulnerability text vector.
The vulnerability text vector is used for representing semantic information of the sentence on the vector space.
In this embodiment, the vulnerability text is input into a sequence-BERT (SBERT) pre-training model to obtain a Sentence vector. The model is based on pre-training BERT, and uses Siamese and tripletNet to generate semantic sentence Embedding vector. Because the Chinese vulnerability text contains Chinese and English (asset-multipurpose English description), the SBERT pre-training model paramhrase-multilingual-MiniLM-L12-v 2 supporting multiple languages is selected as a basic model in the embodiment of the application, and the pre-training model is finely adjusted based on the tagged data set, so that the final SBERT model is obtained. The model can map any text into a sentence vector with specified dimensionality, and the vector contains rich semantic information.
And S4, performing text word segmentation and main word bank filtering on the target vulnerability text, and extracting main words of the target vulnerability text.
In this embodiment, vulnerability text main words (english) are extracted through a word segmentation technology and an asset lexicon, and the extraction flow is shown in fig. 2. As most of the main words are formed by English, the processing method of the method extracts all English parts in the vulnerability text, compares the English parts with an English main word bank after word segmentation processing, and only retains meaningful words as the main words of the vulnerability text.
And S5, processing the target vulnerability text based on the vulnerability keyword regular matching and the HMCN model to obtain the vulnerability type of the target vulnerability text.
In this embodiment, the vulnerability type of the vulnerability text is predicted through vulnerability keyword regular matching and an HMCN (Hierarchical Multi-Label Classification Networks) model. Fig. 3 is an overall flow of the vulnerability type identification scheme. The vulnerability keyword rule matching method with low cost and good performance is preferentially used for determining the vulnerability type. The vulnerability description or the vulnerability name usually directly contains common keywords of the vulnerability, and the vulnerability type can be quickly identified in a regular matching mode. If the text does not contain the keywords, the judgment needs to be made by means of the CWE number of the vulnerability text, the CWE number can indicate a detailed vulnerability type, and the vulnerability type can be directly determined through the corresponding relation between the CWE number and the vulnerability type. If the vulnerability type can not be determined in the rule-based mode, the CWE number corresponding to a section of text needs to be predicted by means of a classification model, and then the vulnerability type of the text is identified by the CWE number.
And S6, respectively carrying out vulnerability similarity calculation on the obtained vulnerability text vector, the subject word and the vulnerability type, and carrying out weighted summation on the calculation results of the vulnerability similarities to obtain a vulnerability similarity comparison result.
In this embodiment, vulnerability text similarity calculation is performed based on < vulnerability text vector, subject word, vulnerability type > triple obtained through the above process. For any pair of vulnerability texts, after vulnerability text similarity calculation, vulnerability subject word similarity calculation and vulnerability type identification are carried out according to the introduction, scores of three dimensions (namely a first vulnerability similarity calculation result, a second vulnerability similarity calculation result and a third vulnerability similarity calculation result) can be obtained. At present, the scores of three dimensions are combined in a weighting mode to obtain the final similarity score of a pair of loophole texts. The vulnerability text similarity is calculated based on cosine similarity between vulnerability text vectors, and the calculation formula is as follows:
and x and y represent loophole text pairs with loophole text similarity to be obtained.
For the calculation of the similarity of the subject words, firstly, the present application acquires a subject Word List WL (Word List) and a Position weight List PL (Position List), where PL (i) = len (WL) -i-1. Then, the present application takes the intersection parts SWL (Same Word List) and SPL (Same Position List) of the two loophole texts WL and PL, and calculates the similarity by using the following formula, wherein n = len (SWL). The cosine similarity calculation formula is used for reference, and the similarity between the main word lists can be measured in the aspect of text contact degree and position contact degree.
As can be seen from the above, A represents the target vulnerability text, B represents the contrast vulnerability text, SPL A (i) Representing the position weight, SPL, of the subject word i in the target vulnerability text B (i) Means that the subject word i is in contrastAnd the position weight in the vulnerability text, and n represents a main word list.
And for the vulnerability type similarity, directly adopting AND operation, wherein if the types of the vulnerability text pairs are the same, the vulnerability text pairs are 1, and otherwise, the vulnerability text pairs are 0. After the scores of the three dimensions are obtained, the final Score similarity Score is calculated according to the formula Score (X, Y) =0.6 × textsimilarity (X, Y) +0.2 × entitysimilarity (X, Y) +0.2 (VulType (X) & VulType (Y)), where the weight of each dimension can be adjusted according to actual conditions.
The embodiment of the application further provides a vulnerability similarity comparison system based on the text pre-training model. The system comprises:
the acquisition module is used for acquiring a vulnerability text data set of a vulnerability scanning product;
the preprocessing module is used for preprocessing the vulnerability text data set to obtain a target vulnerability text;
the vectorization module is used for vectorizing the target vulnerability text based on a pre-trained sequence-BERT model to obtain a vulnerability text vector; the vulnerability text vector is used for representing semantic information of a sentence on a vector space;
the extraction module is used for performing text word segmentation and main word bank filtering on the target vulnerability text and extracting main words of the target vulnerability text;
the processing module is used for processing the target vulnerability text based on vulnerability keyword regular matching and the HMCN model to obtain the vulnerability type of the target vulnerability text;
and the calculation module is used for respectively calculating the vulnerability similarity of the acquired vulnerability text vector, the subject word and the vulnerability type, and weighting and summing the calculation results of the vulnerability similarity to obtain a vulnerability similarity comparison result.
In an optional embodiment of the present application, the preprocessing module specifically includes: and filtering and describing the vulnerability text data set.
In an optional embodiment of the present application, the vectorization module specifically includes: and generating a sentence Embedding vector with semantics by using the twin network model and the triplet network model.
The vulnerability similarity comparison system based on the text pre-training model provided by the embodiment of the application is used for realizing the vulnerability similarity comparison method based on the text pre-training model, and for the specific limitation of the vulnerability similarity comparison system based on the text pre-training model, reference can be made to the limitation of the vulnerability similarity comparison method based on the text pre-training model, and the details are not repeated here. All parts of the vulnerability similarity comparison system based on the text pre-training model can be wholly or partially realized through software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the device, and can also be stored in a memory in the device in a software form, so that the processor can call and execute operations corresponding to the modules.
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several implementation modes of the present application, and the description thereof is specific and detailed, but not construed as limiting the scope of the claims. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.
Claims (10)
1. A vulnerability similarity comparison method based on a text pre-training model is characterized by comprising the following steps:
acquiring a vulnerability text data set of a vulnerability scanning product;
preprocessing a vulnerability text data set to obtain a target vulnerability text;
vectorizing the target vulnerability text based on a pre-trained sequence-BERT model to obtain a vulnerability text vector; the vulnerability text vector is used for representing semantic information of a sentence on a vector space;
performing text word segmentation and main word bank filtering on the target vulnerability text, and extracting main words of the target vulnerability text;
processing the target vulnerability text based on vulnerability keyword regular matching and an HMCN model to obtain a vulnerability type of the target vulnerability text;
and respectively carrying out vulnerability similarity calculation on the obtained vulnerability text vector, the main word and the vulnerability type, and carrying out weighted summation on the calculation results of the vulnerability similarities to obtain a vulnerability similarity comparison result.
2. The method of claim 1, wherein preprocessing the vulnerability text data set comprises:
and filtering and describing short and/or long texts on the vulnerability text data set, and converting English into lowercase.
3. The method of claim 1, wherein vectorizing the target vulnerability text based on a pre-trained sequence-BERT model to obtain a vulnerability text vector comprises:
and generating a sentence Embedding vector with semantics by using the twin network model and the triplet network model.
4. The method according to claim 1, wherein performing text segmentation and main word bank filtering on the target vulnerability text to extract main words of the target vulnerability text comprises:
extracting an English part in the vulnerability text, performing word segmentation processing, comparing the part with an English main word bank, and taking a word in a preset word list in a comparison result as a subject word of the vulnerability text; wherein, the preset word list is manually set to have interesting word list.
5. The method according to claim 1, wherein the vulnerability similarity calculation of the obtained vulnerability text vectors comprises:
and calculating to obtain a first vulnerability similarity calculation result based on cosine similarity among vulnerability text vectors.
6. The method of claim 1, wherein performing vulnerability similarity calculation on the obtained subject words comprises:
acquiring a main word list and a position weight list of the mobile terminal;
acquiring an intersection part of the main word list and the position weight list;
and according to the formula
Obtaining a second vulnerability similarity calculation result; wherein, A represents a target vulnerability text, B represents a contrast vulnerability text, SPL A (i) Representing the position weight, SPL, of the subject word i in the target vulnerability text B (i) And the position weight of the subject word i in the comparison vulnerability text is represented, and n represents a subject word list.
7. The method of claim 1, wherein performing vulnerability similarity calculation on the obtained vulnerability types comprises:
when the types of the vulnerability text pairs are the same, assigning the third vulnerability similarity calculation result to be 1;
and when the types of the vulnerability text pairs are different, assigning the third vulnerability similarity calculation result to be 0.
8. A vulnerability similarity comparison system based on a text pre-training model is characterized by comprising:
the acquisition module is used for acquiring a vulnerability text data set of a vulnerability scanning product;
the preprocessing module is used for preprocessing the vulnerability text data set to obtain a target vulnerability text;
the vectorization module is used for vectorizing the target vulnerability text based on a pre-trained sequence-BERT model to obtain a vulnerability text vector; the vulnerability text vector is used for representing semantic information of a sentence on a vector space;
the extraction module is used for performing text word segmentation and main word bank filtering on the target vulnerability text and extracting main words of the target vulnerability text;
the processing module is used for processing the target vulnerability text based on vulnerability keyword regular matching and an HMCN model to obtain the vulnerability type of the target vulnerability text;
and the calculation module is used for respectively calculating the vulnerability similarity of the obtained vulnerability text vectors, the subject words and the vulnerability types, and weighting and summing the calculation results of the obtained vulnerability similarities to obtain a vulnerability similarity comparison result.
9. The system of claim 8, wherein the preprocessing module specifically comprises:
and filtering and describing the vulnerability text data set.
10. The system of claim 9, wherein the vectorization module specifically comprises:
and generating a sentence Embedding vector with semantics by using the twin network model and the triplet network model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211182151.4A CN115563619B (en) | 2022-09-27 | 2022-09-27 | Vulnerability similarity comparison method and system based on text pre-training model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211182151.4A CN115563619B (en) | 2022-09-27 | 2022-09-27 | Vulnerability similarity comparison method and system based on text pre-training model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115563619A true CN115563619A (en) | 2023-01-03 |
CN115563619B CN115563619B (en) | 2024-06-18 |
Family
ID=84743190
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211182151.4A Active CN115563619B (en) | 2022-09-27 | 2022-09-27 | Vulnerability similarity comparison method and system based on text pre-training model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115563619B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116561764A (en) * | 2023-05-11 | 2023-08-08 | 上海麓霏信息技术服务有限公司 | Computer information data interaction processing system and method |
CN116662576A (en) * | 2023-07-26 | 2023-08-29 | 北京天云海数技术有限公司 | Association method and association system for security vulnerabilities and laws and regulations |
CN116663537A (en) * | 2023-07-26 | 2023-08-29 | 中信联合云科技有限责任公司 | Big data analysis-based method and system for processing selected question planning information |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105159822A (en) * | 2015-08-12 | 2015-12-16 | 南京航空航天大学 | Software defect positioning method based on text part of speech and program call relation |
CN110008699A (en) * | 2019-03-19 | 2019-07-12 | 南瑞集团有限公司 | A kind of software vulnerability detection method neural network based and device |
CN112035846A (en) * | 2020-09-07 | 2020-12-04 | 江苏开博科技有限公司 | Unknown vulnerability risk assessment method based on text analysis |
CN112528294A (en) * | 2020-12-21 | 2021-03-19 | 网神信息技术(北京)股份有限公司 | Vulnerability matching method and device, computer equipment and readable storage medium |
CN112560043A (en) * | 2020-12-02 | 2021-03-26 | 江西环境工程职业学院 | Vulnerability similarity measurement method based on context semantics |
CN113343248A (en) * | 2021-07-19 | 2021-09-03 | 北京有竹居网络技术有限公司 | Vulnerability identification method, device, equipment and storage medium |
CN113656807A (en) * | 2021-08-23 | 2021-11-16 | 杭州安恒信息技术股份有限公司 | Vulnerability management method, device, equipment and storage medium |
WO2022023671A1 (en) * | 2020-07-31 | 2022-02-03 | Institut National De Recherche En Informatique Et En Automatique (Inria) | Computer-implemented method for testing the cybersecurity of a target environment |
CN114329482A (en) * | 2021-12-20 | 2022-04-12 | 扬州大学 | C/C + + vulnerability based on sequencing and inter-patch link recovery system and method thereof |
US20220215100A1 (en) * | 2021-01-07 | 2022-07-07 | Servicenow, Inc. | Systems and methods for predicting cybersecurity vulnerabilities |
-
2022
- 2022-09-27 CN CN202211182151.4A patent/CN115563619B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105159822A (en) * | 2015-08-12 | 2015-12-16 | 南京航空航天大学 | Software defect positioning method based on text part of speech and program call relation |
CN110008699A (en) * | 2019-03-19 | 2019-07-12 | 南瑞集团有限公司 | A kind of software vulnerability detection method neural network based and device |
WO2022023671A1 (en) * | 2020-07-31 | 2022-02-03 | Institut National De Recherche En Informatique Et En Automatique (Inria) | Computer-implemented method for testing the cybersecurity of a target environment |
CN112035846A (en) * | 2020-09-07 | 2020-12-04 | 江苏开博科技有限公司 | Unknown vulnerability risk assessment method based on text analysis |
CN112560043A (en) * | 2020-12-02 | 2021-03-26 | 江西环境工程职业学院 | Vulnerability similarity measurement method based on context semantics |
CN112528294A (en) * | 2020-12-21 | 2021-03-19 | 网神信息技术(北京)股份有限公司 | Vulnerability matching method and device, computer equipment and readable storage medium |
US20220215100A1 (en) * | 2021-01-07 | 2022-07-07 | Servicenow, Inc. | Systems and methods for predicting cybersecurity vulnerabilities |
CN113343248A (en) * | 2021-07-19 | 2021-09-03 | 北京有竹居网络技术有限公司 | Vulnerability identification method, device, equipment and storage medium |
CN113656807A (en) * | 2021-08-23 | 2021-11-16 | 杭州安恒信息技术股份有限公司 | Vulnerability management method, device, equipment and storage medium |
CN114329482A (en) * | 2021-12-20 | 2022-04-12 | 扬州大学 | C/C + + vulnerability based on sequencing and inter-patch link recovery system and method thereof |
Non-Patent Citations (2)
Title |
---|
张鹏;谢晓尧: "基于模糊熵特征选择算法的SVM在漏洞分类中的研究", 计算机应用研究, 30 April 2014 (2014-04-30), pages 1145 - 1148 * |
陈钧衍;陶非凡;张源;: "基于序列标注的漏洞信息结构化抽取方法", 计算机应用与软件, no. 02, pages 272 - 277 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116561764A (en) * | 2023-05-11 | 2023-08-08 | 上海麓霏信息技术服务有限公司 | Computer information data interaction processing system and method |
CN116662576A (en) * | 2023-07-26 | 2023-08-29 | 北京天云海数技术有限公司 | Association method and association system for security vulnerabilities and laws and regulations |
CN116663537A (en) * | 2023-07-26 | 2023-08-29 | 中信联合云科技有限责任公司 | Big data analysis-based method and system for processing selected question planning information |
CN116663537B (en) * | 2023-07-26 | 2023-11-03 | 中信联合云科技有限责任公司 | Big data analysis-based method and system for processing selected question planning information |
Also Published As
Publication number | Publication date |
---|---|
CN115563619B (en) | 2024-06-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107798136B (en) | Entity relation extraction method and device based on deep learning and server | |
CN109918673B (en) | Semantic arbitration method and device, electronic equipment and computer-readable storage medium | |
CN115563619B (en) | Vulnerability similarity comparison method and system based on text pre-training model | |
CN107679039B (en) | Method and device for determining statement intention | |
CN105426354B (en) | The fusion method and device of a kind of vector | |
CN109978060B (en) | Training method and device of natural language element extraction model | |
CN108399157B (en) | Dynamic extraction method of entity and attribute relationship, server and readable storage medium | |
CN111460820A (en) | Network space security domain named entity recognition method and device based on pre-training model BERT | |
CN111949802A (en) | Construction method, device and equipment of knowledge graph in medical field and storage medium | |
CN111866004A (en) | Security assessment method, apparatus, computer system, and medium | |
CN112183102A (en) | Named entity identification method based on attention mechanism and graph attention network | |
CN113807073B (en) | Text content anomaly detection method, device and storage medium | |
CN112528294A (en) | Vulnerability matching method and device, computer equipment and readable storage medium | |
CN114448664B (en) | Method and device for identifying phishing webpage, computer equipment and storage medium | |
CN114925702A (en) | Text similarity recognition method and device, electronic equipment and storage medium | |
CN109635810B (en) | Method, device and equipment for determining text information and storage medium | |
KR20200063067A (en) | Apparatus and method for validating self-propagated unethical text | |
CN112307364B (en) | Character representation-oriented news text place extraction method | |
CN110929647B (en) | Text detection method, device, equipment and storage medium | |
CN114118398A (en) | Method and system for detecting target type website, electronic equipment and storage medium | |
CN113836297B (en) | Training method and device for text emotion analysis model | |
CN113343699A (en) | Log security risk monitoring method and device, electronic equipment and medium | |
CN111061924A (en) | Phrase extraction method, device, equipment and storage medium | |
CN115186775B (en) | Method and device for detecting matching degree of image description characters and electronic equipment | |
CN111259237B (en) | Method for identifying public harmful information |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant |