CN115828270A - Vulnerability verification construction system and method based on NLP - Google Patents

Vulnerability verification construction system and method based on NLP Download PDF

Info

Publication number
CN115828270A
CN115828270A CN202310135996.6A CN202310135996A CN115828270A CN 115828270 A CN115828270 A CN 115828270A CN 202310135996 A CN202310135996 A CN 202310135996A CN 115828270 A CN115828270 A CN 115828270A
Authority
CN
China
Prior art keywords
vulnerability
data
nlp
information
verification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310135996.6A
Other languages
Chinese (zh)
Other versions
CN115828270B (en
Inventor
王骕
陈彬
王凯
鲁兆聪
陈瑞
于冬雨
苏林庭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Zhiyu Information Technology Co ltd
Original Assignee
Nanjing Zhiyu Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Zhiyu Information Technology Co ltd filed Critical Nanjing Zhiyu Information Technology Co ltd
Priority to CN202310135996.6A priority Critical patent/CN115828270B/en
Publication of CN115828270A publication Critical patent/CN115828270A/en
Application granted granted Critical
Publication of CN115828270B publication Critical patent/CN115828270B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a vulnerability verification construction system and a construction method based on NLP, which are characterized in that data of a plurality of vulnerability databases are extracted and then fused, so that heterogeneous data of the vulnerability databases with different sources are integrated, described, processed, retrieved and updated under the same frame specification, the fusion of data, information, methods and experiences is achieved, a high-quality vulnerability database fusion body is formed, a user is helped to process security vulnerabilities of relevant categories more quickly, vulnerability information is supplemented, and a vulnerability database in a text form is converted into a vulnerability verification strategy in a code form; the vulnerability database is used for helping a user to quickly, accurately and comprehensively integrate security vulnerabilities in related categories, provides a diversified vulnerability database with wide coverage range, further converts bulletin and description information (usually text information) of the security vulnerabilities and the like into strategies (usually in a pseudo code or instruction form) capable of forming vulnerability verification tools, and provides high-value guidance content for vulnerability verification practical operation links.

Description

Vulnerability verification construction system and method based on NLP
Technical Field
The invention relates to the technical field of network security, in particular to a vulnerability verification construction system and a vulnerability verification construction method based on NLP.
Background
The management, processing and analysis of the vulnerability database are always the leading and key subjects of information security, and the vulnerability verification strategy is closely associated with the vulnerability database. Security researchers can integrate the vulnerability verification strategies into vulnerability data entries for storage, and often derive the acquired vulnerability database text information into vulnerability verification strategies and codes, and conditionally form vulnerability automatic verification tools. With the continuous promotion of the construction of emerging technologies such as 5G, the Internet of things and the like, the problem of vulnerability security is continuously emerging in an information system, and respective vulnerability databases are correspondingly constructed in each country.
However, the current research is limited to the collection of a single vulnerability database, and a method for fusing data of a plurality of vulnerability databases after extraction is lacked, so that the quality of the vulnerability database is reduced, and further certain influence is generated on the safety of a user.
Disclosure of Invention
The invention provides a vulnerability verification construction system and a construction method based on NLP (non line of sight), which are used for helping a user to quickly, accurately and comprehensively integrate security vulnerabilities of related categories, providing a vulnerability database with wide coverage and multiple elements, further converting bulletin and description information (usually text information) of the security vulnerabilities into strategies (usually in a pseudo code or instruction form) capable of forming vulnerability verification tools, and providing high-value guidance content for vulnerability verification practice links; furthermore, data of a plurality of vulnerability databases are extracted and then fused, so that heterogeneous data of vulnerability databases with different sources are integrated, described, processed, retrieved and updated under the same frame specification, fusion of data, information, methods and experiences is achieved, a high-quality vulnerability database fusion body is formed, a user is helped to process security vulnerabilities of relevant categories more quickly, vulnerability information is supplemented, and further, the vulnerability databases in text forms are converted into vulnerability verification strategies in code forms.
The invention provides a vulnerability verification construction system and a construction method based on NLP, which comprises the following steps:
step 1: collecting vulnerability data of a plurality of different vulnerability platforms;
step 2: intelligently fusing vulnerability data of different platforms, and obtaining fused vulnerability data items;
and step 3: and designing a vulnerability verification strategy for the merged vulnerability data items, and obtaining corresponding vulnerability pseudo codes.
Preferably, the method further comprises the following steps:
acquiring a vulnerability data list based on a plurality of vulnerability databases of different vulnerability platforms, and classifying and sequencing vulnerability names and vulnerability version numbers of the acquired vulnerability data;
and performing similarity measurement on the classified and sequenced vulnerability names and vulnerability version numbers, eliminating similar or consistent vulnerability data, and obtaining heterogeneous data fusion of a plurality of platforms.
Preferably, the step 1 further comprises: the method comprises the steps of completing vulnerability data acquisition from a plurality of vulnerability platforms and obtaining a vulnerability data list, wherein the data list comprises software vulnerability sample acquisition, repair suggestions, vulnerability description, solution, influence objects and threat intelligence.
Preferably, the step 2 further comprises:
extracting features of vulnerability data aiming at data of different vulnerability platforms, and completing discretization mapping based on the extracted features; and carrying out character string matching on the information subjected to discretization mapping, realizing similarity measurement of the vulnerability data, obtaining the similarity score of the vulnerability data, completing multi-platform heterogeneous data fusion, and obtaining fused vulnerability data.
Preferably, the step 3 further comprises: and acquiring function call information and effect characteristic information from the fused vulnerability data items through NLP, and designing vulnerability pseudo codes corresponding to vulnerability verification strategies after aggregating the function call information and the effect characteristic information.
Preferably, in step 1, the vulnerability data acquisition includes: the distributed vulnerability data downloading method comprises the steps of carrying out asynchronous request of multi-platform vulnerability data information based on Scapy information acquisition, then carrying out distributed vulnerability data downloading, customizing different acquisition rules according to the specific acquisition function of each isomerization vulnerability platform, and improving vulnerability data acquisition efficiency and incremental updating of vulnerability data.
Preferably, the step 2 further comprises:
the similarity measure is: measuring information similarity between the NVD database and other vulnerability databases from different dimensions;
the similarity measure includes: extracting the name and version number pairs of the CVE numbered vulnerability data from the NVD vulnerability library and other vulnerability libraries according to the CVE number, performing cross measurement comparison by taking the NVD database as a reference, and performing character-by-character matching comparison on the vulnerability name part so as to complete similarity measurement of the vulnerability names; the vulnerability version number part is used for carrying out numerical comparison on the discretized versions to finish the similarity measurement of the vulnerability version numbers;
the heterogeneous data is fused as follows: according to the similarity relation of the heterogeneous vulnerability data, realizing the intelligent fusion process of the vulnerability data;
the heterogeneous data fusion comprises: aiming at the redundancy problem of the vulnerability data of the heterogeneous database, based on the vulnerability name and the result of the corresponding version similarity measurement, the overall similarity score is given by combining the scores obtained by the two fields, whether the vulnerability data are overall similar is judged, other vulnerability data of the vulnerability database are selected to be deleted or reserved, and the redundancy judgment and data fusion of the heterogeneous database are completed based on the method.
Preferably, the step 3 comprises the following steps:
analyzing and processing the calling relation among each function name, each file name and each component name based on NLP, and storing to obtain function calling information; the obtained function calling information is used for realizing subsequent cross-function and cross-file vulnerability verification strategy generation;
obtaining effect characteristic information based on NLP technology, wherein the effect characteristic information comprises the following parameters in a vulnerability database: an effect mechanism, an effect influence and an effect environment,
the effect characteristic information is non-function and URL plain text information; forming key elements required by a vulnerability verification strategy based on NLP search semantics;
and aggregating the function call information and the effect characteristic information obtained based on the NLP to form a vulnerability verification strategy, and converting the vulnerability verification strategy into a pseudo code form.
The invention also provides a vulnerability verification construction system based on NLP, which comprises:
the data layer is used for storing initial vulnerability data and fused vulnerability data;
the processing layer is used for completing key element extraction and NLP processing of the vulnerability text and converting text data into a vulnerability verification strategy;
the display layer is used for finishing data statistics, data retrieval and data customization required by the user based on the interactive instruction, guiding the user to start an acquisition task, and determining whether text information is converted into a vulnerability verification strategy or not for a specified item in a new vulnerability database generated after fusion; the method is used for generating the vulnerability verification pseudo code from the multi-element vulnerability information and managing, retrieving and sharing vulnerability data by using the produced vulnerability verification pseudo code.
Preferably, the data layer stores the acquired vulnerability data through a Redis cache database and performs similarity comparison on the vulnerability data.
The working principle and the beneficial effects of the invention are as follows:
the invention provides a vulnerability verification construction system and a construction method based on NLP, which comprises the following steps: step 1: collecting vulnerability data of a plurality of different vulnerability platforms; step 2: intelligently fusing vulnerability data of different platforms, and obtaining fused vulnerability data items; and step 3: and designing a vulnerability verification strategy for the merged vulnerability data items, and obtaining corresponding vulnerability pseudo codes.
The invention provides a vulnerability verification construction system and a construction method based on NLP (non line of sight) for helping a user to quickly, accurately and comprehensively integrate security vulnerabilities of related categories, and provides a vulnerability database with wide coverage and multiple elements, so that bulletin and description information and the like (usually text information) of the security vulnerabilities are converted into strategies (usually in a pseudo code or instruction form) capable of forming vulnerability verification tools, and high-value guidance contents are provided for vulnerability verification real operation links; furthermore, data of a plurality of vulnerability databases are extracted and then fused, so that heterogeneous data of vulnerability databases with different sources are integrated, described, processed, retrieved and updated under the same frame specification, fusion of data, information, methods and experiences is achieved, a high-quality vulnerability database fusion body is formed, a user is helped to process security vulnerabilities of relevant categories more quickly, vulnerability information is supplemented, and further, the vulnerability databases in text forms are converted into vulnerability verification strategies in code forms.
Natural Language Processing (NLP) is an important direction in the fields of computer science and artificial intelligence. It studies various theories and methods that enable efficient communication between humans and computers using natural language. Natural language processing is not a general study of natural language but is directed to the development of computer systems, and particularly software systems therein, that can efficiently implement natural language communications.
The vulnerability database collects vulnerability data and some information such as patching measures and the like in the whole internet range through various channels without being bound to the country, and releases the collected information in time, so that people can know and solve the problems of the information system at the first time. The vulnerability database greatly improves the capability of the security industry to deal with network security threats and the risk management level through vulnerability collection, analysis, notification and application-oriented working mechanisms.
Through the fusion of various vulnerability databases, a complete, standard and complementary new library is formed, and then the vulnerability information is transformed into text information such as vulnerability bulletin, description content and the like based on the new library to serve as verification logic, so that the maximum and effective utilization of the vulnerability information is realized, and a vulnerability verification strategy which meets actual combat conditions is designed.
The invention provides a vulnerability verification construction system and a construction method based on NLP, which particularly realize the following beneficial effects:
1. and completing data acquisition of the multi-element leak library. According to the characteristics of each isomerization vulnerability platform, different acquisition rules are customized, so that the vulnerability databases with different sources are subjected to integration, description, processing, retrieval and updating of redundant and isomerization data under the same frame specification, fusion of data, information, methods and experiences is achieved, and a high-quality vulnerability database fusion body is formed.
2. The method designs a framework for fusion and transformation of the multi-vulnerability database, and the framework comprises a data layer, a processing layer and a display layer. The data layer is used for storing initial and fused vulnerability data, the processing layer is used for completing key element extraction and NLP processing of vulnerability texts, converting the text data into vulnerability verification strategies, the display layer completes functional modules required by interactive instructions, data statistics, data retrieval, data customization and the like, guides a user to start an acquisition task, determines whether text information is converted into the vulnerability verification strategies or not for specified items in a new vulnerability database generated after fusion, achieves the purpose of constructing a vulnerability verification pseudo code by using multi-element vulnerability information, and can manage, retrieve and share the vulnerability data.
3. The method designs a method for processing the text information of the vulnerability database based on the NLP technology, and supports the subsequent cross-function and cross-file vulnerability verification strategy construction by acquiring function call information. By acquiring the effect characteristic information, aiming at vulnerability information embodied by a plain text instead of a function, a URL (uniform resource locator) and the like, semantics are searched through natural language processing, and other key elements required by a vulnerability verification strategy are formed.
4. The vulnerability verification construction system and method are realized, timeliness and convenience of security researchers in collection, understanding and conversion of text contents in vulnerability bulletins are greatly improved, core data loss of vulnerability analysis and security research work is filled, and huge efficiency improvement and value improvement are brought to related work.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and drawings.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:
FIG. 1 is a schematic flow chart of the present invention.
Detailed description of the preferred embodiments
The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.
As shown in fig. 1, an embodiment of the present invention provides a vulnerability verification construction system and a construction method based on NLP, including the following steps:
step 1: collecting vulnerability data of a plurality of different vulnerability platforms;
step 2: intelligently fusing vulnerability data of different platforms, and obtaining fused vulnerability data items;
and step 3: and designing a vulnerability verification strategy for the merged vulnerability data items, and obtaining corresponding vulnerability pseudo codes.
The method extracts and fuses data of a plurality of vulnerability databases, so that heterogeneous data of the vulnerability databases with different sources are integrated, described, processed, retrieved and updated under the same frame specification, the fusion of data, information, methods and experiences is achieved, a high-quality vulnerability database fusion body is formed, a user is helped to process security vulnerabilities of relevant categories more quickly, vulnerability information is supplemented, and further, the vulnerability database in a text form is converted into a vulnerability verification strategy in a code form; the vulnerability analysis method is used for helping a user to quickly, accurately and comprehensively integrate the security vulnerabilities in related categories, provides a multi-element vulnerability database with wide coverage range, further converts bulletin and description information (usually text information) of the security vulnerabilities and the like into strategies (usually in a pseudo code or instruction form) capable of forming vulnerability verification tools, and provides high-value guidance content for vulnerability verification practical operation links.
To be further explained: in the invention, a vulnerability database based on a plurality of different vulnerability platforms is used for acquiring a vulnerability data list, and the acquired vulnerability data is classified and sequenced according to vulnerability names and vulnerability version numbers; and performing similarity measurement on the classified and sequenced vulnerability names and vulnerability version numbers, eliminating similar or consistent vulnerability data, and obtaining heterogeneous data fusion of a plurality of platforms.
Specifically, the step 1 further includes: the method comprises the steps of completing vulnerability data acquisition from a plurality of vulnerability platforms and obtaining a vulnerability data list, wherein the data list comprises software vulnerability sample acquisition, repair suggestions, vulnerability description, solution, influence objects and threat intelligence.
In step 1, the vulnerability data collection includes: and after asynchronous request of multi-platform vulnerability data information is carried out based on Scapy information acquisition, distributed vulnerability data downloading is carried out, and different acquisition rules are customized according to the specific acquisition functions of various heterogeneous vulnerability platforms and are used for improving vulnerability data acquisition efficiency and incremental updating of vulnerability data.
Specifically, the step 2 further includes: extracting characteristics of vulnerability data aiming at data of different vulnerability platforms, and completing discretization mapping based on the extracted characteristics; and carrying out character string matching on the information subjected to discretization mapping, realizing similarity measurement of the vulnerability data, obtaining the similarity score of the vulnerability data, completing multi-platform heterogeneous data fusion, and obtaining fused vulnerability data.
In the step 2: the similarity measure is: information similarity between the NVD database and other vulnerability databases is measured from different dimensions.
The similarity measure includes: completing extraction of name and version number pairs of CVE number vulnerability data from an NVD (nonvolatile video disk) vulnerability library and other vulnerability libraries according to the CVE number, and performing cross measurement comparison by taking the NVD database as a reference, wherein the vulnerability name part adopts a character-by-character matching comparison form, so that similarity measurement of vulnerability names is completed; and the vulnerability version number part is used for carrying out numerical comparison on the discretized versions to finish the similarity measurement of the vulnerability version numbers.
The heterogeneous data is fused as follows: and according to the similarity relation of the heterogeneous vulnerability data, realizing the intelligent fusion process of the vulnerability data.
The heterogeneous data fusion comprises: aiming at the redundancy problem of the vulnerability data of the heterogeneous database, based on the vulnerability name and the result of the corresponding version similarity measurement, the overall similarity score is given by combining the scores obtained by the two fields, whether the vulnerability data are overall similar is judged, other vulnerability data of the vulnerability database are selected to be deleted or reserved, and the redundancy judgment and data fusion of the heterogeneous database are completed based on the method.
Specifically, the step 3 further includes: and acquiring function call information and effect characteristic information from the fused vulnerability data items through NLP, and constructing and designing vulnerability pseudo codes corresponding to vulnerability verification strategies after aggregating the function call information and the effect characteristic information.
In the step 3: analyzing and processing the calling relationship among each function name, file name and component name based on NLP, and storing to obtain function calling information; and the obtained function calling information is used for realizing the subsequent cross-function and cross-file vulnerability verification strategy construction.
Obtaining effect characteristic information based on NLP technology, wherein the effect characteristic information comprises the following parameters in a vulnerability database: the system comprises an effect mechanism, effect influence and effect environment, wherein the effect characteristic information is non-function and URL (uniform resource locator) plain text information; forming key elements required by a vulnerability verification strategy based on NLP search semantics; and aggregating the function call information and the effect characteristic information obtained based on the NLP to form a vulnerability verification strategy, and converting the vulnerability verification strategy into a pseudo code form.
In the invention, after asynchronous request of multi-platform vulnerability data information is carried out based on Scapy information acquisition, distributed vulnerability data downloading is carried out; performing similarity measurement on data based on redis, and then realizing framework specification through fusing a database; and finally, generating function call information and effect characteristic information based on the NLP, and aggregating and converting the function call information and the effect characteristic information into random pseudo codes. The vulnerability database in the text form is converted into a vulnerability verification strategy in the code form; the vulnerability database is used for helping a user to quickly, accurately and comprehensively integrate security vulnerabilities in related categories, provides a diversified vulnerability database with wide coverage range, further converts bulletin and description information (usually text information) of the security vulnerabilities and the like into strategies (usually in a pseudo code or instruction form) capable of forming vulnerability verification tools, and provides high-value guidance content for vulnerability verification practical operation links.
The invention provides a vulnerability verification construction system and a construction method based on NLP, which can realize the following beneficial effects:
1. and completing data acquisition of the multi-element leak library. According to the characteristics of each isomerization vulnerability platform, different acquisition rules are customized, so that the vulnerability databases with different sources are subjected to integration, description, processing, retrieval and updating of redundant and isomerization data under the same frame specification, fusion of data, information, methods and experiences is achieved, and a high-quality vulnerability database fusion body is formed.
2. The method designs a framework for fusion and transformation of the multi-element vulnerability database, and the framework comprises a data layer, a processing layer and a display layer. The data layer is used for storing initial and fused vulnerability data, the processing layer is used for completing key element extraction and NLP processing of vulnerability texts, converting the text data into a vulnerability verification strategy, the display layer completes required functional modules such as interactive instructions, data statistics, data retrieval and data customization, guiding a user to start an acquisition task, determining whether text information is converted into a vulnerability verification strategy or not for specified items in a new vulnerability database generated after fusion, achieving the purpose of constructing and obtaining vulnerability verification pseudo codes from multi-element vulnerability information, and managing, retrieving and sharing the vulnerability data.
3. The method designs a method for processing the text information of the vulnerability database based on the NLP technology, and supports the subsequent cross-function and cross-file vulnerability verification strategy construction by acquiring function call information. By acquiring the effect characteristic information, aiming at vulnerability information embodied by a plain text instead of a function, a URL (uniform resource locator) and the like, semantics are searched through natural language processing, and other key elements required by a vulnerability verification strategy are formed.
4. The vulnerability verification construction system and method are realized, timeliness and convenience of security researchers in collection, understanding and conversion of text contents in vulnerability bulletins are greatly improved, core data loss of vulnerability analysis and security research work is filled, and huge efficiency improvement and value improvement are brought to related work.
Examples
Compared with the existing single vulnerability database, the method completes the collection of the multi-element vulnerability database, completes the processing and transformation of data based on natural language processing, and finally constructs the system to obtain products such as vulnerability verification pseudo codes.
In order to achieve the advantage, the embodiment of the invention provides a vulnerability verification construction system and a construction method based on NLP, which comprises the following steps:
the method comprises the following steps: collecting vulnerability database data, and completing vulnerability data list collection from a plurality of known vulnerability platforms;
wherein, the first step specifically comprises:
the vulnerability data list specifically comprises software vulnerability samples, repair suggestions, vulnerability descriptions, solutions, influence objects, threat intelligence and the like;
the acquisition method adopts the method of realizing asynchronous request and distributed downloading of vulnerability data based on Scapy information acquisition, customizes different acquisition rules according to the characteristics of each isomerization vulnerability platform, improves the vulnerability data acquisition efficiency and realizes incremental updating of vulnerability data;
the storage method uses a Redis cache database to carry out similarity comparison on the collected vulnerability data.
Step two: intelligently fusing heterogeneous vulnerability data, extracting features of data of different vulnerability platforms, completing discretization mapping, realizing similarity measurement of the vulnerability data through character string matching, obtaining similarity values of the vulnerability data, and completing fusion of the heterogeneous data;
wherein the second step specifically comprises the following steps:
in the first stage, the information similarity between the NVD database and other vulnerability databases is measured in a large scale from different dimensions and is called similarity measurement.
The similarity measurement process comprises the steps of extracting the name and version number pair of the CVE number vulnerability data from the NVD vulnerability library and other vulnerability libraries according to the CVE number, carrying out cross measurement comparison by taking the NVD database as a reference, and completing the similarity measurement of the vulnerability names by adopting a character-by-character matching comparison mode in the vulnerability name part; and the vulnerability version number part is used for carrying out numerical comparison on the discretized versions to finish the similarity measurement of the vulnerability version numbers.
And in the second stage, according to the similarity relation of the heterogeneous vulnerability data, the intelligent fusion process of the vulnerability data is realized, which is called data fusion.
Aiming at the redundancy problem of the vulnerability data of the heterogeneous database, based on the vulnerability name and the result of the corresponding version similarity measurement, the overall similarity score is given by combining the scores obtained by the two fields, whether the vulnerability data are overall similar is judged, other vulnerability data of the vulnerability database are selected to be deleted or reserved, and the redundancy judgment and data fusion of the heterogeneous database are completed based on the method.
Step three: and acquiring function call information and effect characteristic information of the fused vulnerability data items based on NLP, and aggregating the result to construct and design a pseudo code of the vulnerability verification strategy.
Wherein the third step specifically comprises the following steps:
function calling information is obtained based on the NLP technology, and in order to support the subsequent cross-function and cross-file vulnerability verification strategy construction, the calling relation among function names, file names and component names needs to be analyzed and processed based on the NLP and stored.
The vulnerability verification method includes the steps that effect characteristic information is obtained based on an NLP technology, the effect characteristic information is usually embodied in a vulnerability database in the form of an effect mechanism, effect influence, an effect environment and the like, and aiming at the effect information which is embodied in a pure text and is not in the form of a function, a URL and the like, semantics are searched based on the NLP, for example, semantics are realized based on a certain function, or vulnerabilities occur due to the fact that a certain third-party library is introduced, and the like, so that key elements needed by vulnerability verification strategies are formed.
And aggregating the function call information and the effect characteristic information obtained based on the NLP to form a vulnerability verification strategy, and converting the vulnerability verification strategy into a pseudo code form.
According to the embodiment 1, the data of a plurality of vulnerability databases are extracted and then fused, so that heterogeneous data of the vulnerability databases with different sources are integrated, described, processed, retrieved and updated under the same frame specification, fusion of data, information, methods and experiences is achieved, a high-quality vulnerability database fusion body is formed, a user is helped to process security vulnerabilities of relevant categories more quickly, vulnerability information is supplemented, and further, the vulnerability databases in a text form are converted into vulnerability verification strategies in a code form; the vulnerability database is used for helping a user to quickly, accurately and comprehensively integrate security vulnerabilities in related categories, provides a diversified vulnerability database with wide coverage range, further converts bulletin and description information (usually text information) of the security vulnerabilities and the like into strategies (usually in a pseudo code or instruction form) capable of forming vulnerability verification tools, and provides high-value guidance content for vulnerability verification practical operation links.
The invention can realize the following technical effects through the embodiment 1:
1. and completing data acquisition of the multi-element leak library. According to the characteristics of each isomerization vulnerability platform, different acquisition rules are customized, so that the vulnerability databases with different sources are subjected to integration, description, processing, retrieval and updating of redundant and isomerization data under the same frame specification, fusion of data, information, methods and experiences is achieved, and a high-quality vulnerability database fusion body is formed.
2. The method designs a framework for fusion and transformation of the multi-element vulnerability database, and the framework comprises a data layer, a processing layer and a display layer. The data layer is used for storing initial and fused vulnerability data, the processing layer is used for completing key element extraction and NLP processing of vulnerability texts, converting the text data into a vulnerability verification strategy, the display layer completes required functional modules such as interactive instructions, data statistics, data retrieval and data customization, guiding a user to start an acquisition task, determining whether text information is converted into a vulnerability verification strategy or not for specified items in a new vulnerability database generated after fusion, achieving the purpose of constructing and obtaining vulnerability verification pseudo codes from multi-element vulnerability information, and managing, retrieving and sharing the vulnerability data.
3. The method designs a method for processing the text information of the vulnerability database based on the NLP technology, and supports the subsequent cross-function and cross-file vulnerability verification strategy construction by acquiring function call information. By acquiring the effect characteristic information, aiming at vulnerability information embodied by a plain text instead of a function, a URL (uniform resource locator) and the like, semantics are searched through natural language processing, and other key elements required by a vulnerability verification strategy are formed.
4. The vulnerability verification construction system and method are realized, timeliness and convenience of security researchers for collecting, understanding and converting text contents in vulnerability bulletins are greatly improved, core data loss of vulnerability analysis and security research work is filled, and huge efficiency improvement and value improvement are brought to related work.
The method also comprises the following steps of comparing the similarity of the vulnerability data:
based on optimal parameters
Figure SMS_1
Providing an adjacency matrix
Figure SMS_2
Based on adjacency matrices
Figure SMS_3
Performing a node similarity score matrix; and sequencing the node similarity scores to obtain sequencing of vulnerability data similarity data, and obtaining vulnerability data with the highest similarity based on a sequencing result.
Wherein the optimum parameter
Figure SMS_4
The corresponding parameter value is taken when the algorithm measurement index reaches the optimal value, and the value range is
Figure SMS_5
(ii) a The adjacency matrix is
Figure SMS_6
(ii) a Wherein the content of the first and second substances,
Figure SMS_7
Figure SMS_8
is composed of
Figure SMS_9
A matrix of individual nodes;
further, performing the node similarity score matrix includes:
initializing a shortest distance matrix
Figure SMS_10
Node similarity score matrix
Figure SMS_11
Using a contiguous matrix
Figure SMS_12
Calculating node pairs
Figure SMS_13
Matrix of shortest paths between
Figure SMS_14
Based on
Figure SMS_15
Calculating in each node using equation (1)
Figure SMS_16
Centrality of nodes
Figure SMS_17
Figure SMS_18
(1)
Calculating the optimal path number within the range of 6 orders of optimal path length between nodes by using formula (2)
Figure SMS_19
Figure SMS_20
(2)
Calculating similarity scores between nodes using equation (3)
Figure SMS_21
Figure SMS_22
(3)
Using similarity scores
Figure SMS_23
A node similarity score matrix is generated.
Wherein the content of the first and second substances,
Figure SMS_24
is a node
Figure SMS_25
The optimal path length therebetween;
Figure SMS_26
the number of nodes.
In the embodiment, the maximum length of the optimal path considered when calculating the optimal path number is set as 6, and if the maximum length exceeds the highest-order path, the optimal path number is considered to have no influence on the similarity between the characterization nodes; therefore, it is
Figure SMS_27
Can be decomposed into formula (4) and counted by using formula (4)Calculating the optimal path number in the range of 6 orders:
Figure SMS_28
(4)
wherein the content of the first and second substances,
Figure SMS_29
is a similarity value; g is the maximum order.
The data with higher similarity are screened out according to the sorting result by sorting the node similarity scoring matrix, so that the aim of quickly screening the data with higher similarity is fulfilled; in addition, in the formula (4), following
Figure SMS_30
The computational complexity of the method is increased; however, in the actual testing process, if the maximum order path is exceeded, the optimal path number has no or very little influence on the similarity of the characterization nodes, so that the calculation efficiency of the system can be improved and unnecessary calculation can be reduced through calculation within the range of 6 orders based on the optimized calculation strategy, thereby improving the efficiency of vulnerability similarity screening.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (10)

1. A vulnerability verification construction method based on NLP is characterized by comprising the following steps:
step 1: collecting vulnerability data of a plurality of different vulnerability platforms;
step 2: intelligently fusing vulnerability data of different platforms, and obtaining fused vulnerability data items;
and step 3: and designing a vulnerability verification strategy for the merged vulnerability data items, and obtaining corresponding vulnerability pseudo codes.
2. The NLP-based vulnerability verification construction method according to claim 1, further comprising:
acquiring a vulnerability data list based on a plurality of vulnerability databases of different vulnerability platforms, and classifying and sequencing vulnerability names and vulnerability version numbers of the acquired vulnerability data;
and performing similarity measurement on the classified and sequenced vulnerability names and vulnerability version numbers, eliminating similar or consistent vulnerability data, and obtaining heterogeneous data fusion of a plurality of platforms.
3. The NLP-based vulnerability verification construction method according to claim 1, wherein the step 1 further comprises: the method comprises the steps of completing vulnerability data acquisition from a plurality of vulnerability platforms and obtaining a vulnerability data list, wherein the data list comprises software vulnerability sample acquisition, repair suggestions, vulnerability description, solution, influence objects and threat intelligence.
4. The NLP-based vulnerability verification construction method according to claim 1, wherein the step 2 further comprises:
extracting characteristics of vulnerability data aiming at data of different vulnerability platforms, and completing discretization mapping based on the extracted characteristics; performing character string matching on the information after discretization mapping, realizing similarity measurement of vulnerability data,
and obtaining the similarity score of the vulnerability data, completing multi-platform heterogeneous data fusion, and obtaining fused vulnerability data.
5. The NLP-based vulnerability verification construction method according to claim 1, wherein the step 3 further comprises: and acquiring function call information and effect characteristic information from the fused vulnerability data items through NLP, and designing vulnerability pseudo codes corresponding to vulnerability verification strategies after aggregating the function call information and the effect characteristic information.
6. The NLP-based vulnerability verification construction method according to claim 1, wherein in the step 1, the vulnerability data acquisition comprises: and after asynchronous request of multi-platform vulnerability data information is carried out based on Scapy information acquisition, distributed vulnerability data downloading is carried out, different acquisition rule customization is carried out according to the specific acquisition function of each isomerization vulnerability platform, and the distributed vulnerability data downloading method is used for improving vulnerability data acquisition efficiency and incremental updating of vulnerability data.
7. The NLP-based vulnerability verification construction method according to claim 4, wherein the step 2 further comprises:
the similarity measure is: measuring information similarity between the NVD database and other vulnerability databases from different dimensions;
the similarity measure includes: extracting the name and version number pairs of the CVE numbered vulnerability data from the NVD vulnerability library and other vulnerability libraries according to the CVE number, performing cross measurement comparison by taking the NVD database as a reference, and performing character-by-character matching comparison on the vulnerability name part so as to complete similarity measurement of the vulnerability names; the vulnerability version number part is used for carrying out numerical comparison on the discretized versions to finish the similarity measurement of the vulnerability version numbers;
the heterogeneous data is fused as follows: according to the similarity relation of the heterogeneous vulnerability data, realizing the intelligent fusion process of the vulnerability data;
the heterogeneous data fusion comprises: aiming at the redundancy problem of the vulnerability data of the heterogeneous database, based on the vulnerability name and the result of the corresponding version similarity measurement, the overall similarity score is given by combining the scores obtained by the two fields, whether the vulnerability data are overall similar is judged, other vulnerability data of the vulnerability database are selected to be deleted or reserved, and the redundancy judgment and data fusion of the heterogeneous database are completed based on the method.
8. The NLP-based vulnerability verification construction method according to claim 1,
the step 3 comprises the following steps:
analyzing and processing the calling relation among each function name, each file name and each component name based on NLP, and storing to obtain function calling information; the obtained function calling information is used for realizing subsequent cross-function and cross-file vulnerability verification strategy generation;
obtaining effect characteristic information based on NLP technology, wherein the effect characteristic information comprises the following parameters in a vulnerability database: an effect mechanism, an effect influence and an effect environment,
the effect characteristic information is non-function and URL plain text information; forming key elements required by a vulnerability verification strategy based on NLP search semantics;
and aggregating the function call information and the effect characteristic information obtained based on the NLP to form a vulnerability verification strategy, and converting the vulnerability verification strategy into a pseudo code form.
9. The utility model provides a vulnerability verification construction system based on NLP which characterized in that includes:
a data layer for storing the initial vulnerability data and the fused vulnerability data,
the processing layer is used for completing key element extraction and NLP processing of the vulnerability text, converting text data into a vulnerability verification strategy,
the display layer is used for finishing data statistics, data retrieval and data customization required by the user based on the interactive instruction, guiding the user to start an acquisition task, and determining whether text information is converted into a vulnerability verification strategy or not for a specified item in a new vulnerability database generated after fusion; the method is used for generating the vulnerability verification pseudo code from the multi-element vulnerability information and managing, retrieving and sharing vulnerability data by using the produced vulnerability verification pseudo code.
10. The NLP-based vulnerability verification construction system according to claim 9, wherein the data layer stores the collected vulnerability data through a Redis cache database and performs similarity comparison on the vulnerability data.
CN202310135996.6A 2023-02-20 2023-02-20 NLP-based vulnerability verification construction system and construction method Active CN115828270B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310135996.6A CN115828270B (en) 2023-02-20 2023-02-20 NLP-based vulnerability verification construction system and construction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310135996.6A CN115828270B (en) 2023-02-20 2023-02-20 NLP-based vulnerability verification construction system and construction method

Publications (2)

Publication Number Publication Date
CN115828270A true CN115828270A (en) 2023-03-21
CN115828270B CN115828270B (en) 2023-06-09

Family

ID=85521935

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310135996.6A Active CN115828270B (en) 2023-02-20 2023-02-20 NLP-based vulnerability verification construction system and construction method

Country Status (1)

Country Link
CN (1) CN115828270B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190147167A1 (en) * 2017-11-15 2019-05-16 Korea Internet & Security Agency Apparatus for collecting vulnerability information and method thereof
CN110688456A (en) * 2019-09-25 2020-01-14 北京计算机技术及应用研究所 Vulnerability knowledge base construction method based on knowledge graph
CN111859387A (en) * 2019-04-25 2020-10-30 北京九州正安科技有限公司 Automatic construction method for Android platform software vulnerability model
CN113961786A (en) * 2021-10-22 2022-01-21 苏州棱镜七彩信息科技有限公司 Multi-element heterogeneous vulnerability integration and library building method
CN114021156A (en) * 2022-01-05 2022-02-08 北京华云安信息技术有限公司 Method, device and equipment for organizing vulnerability automatic aggregation and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190147167A1 (en) * 2017-11-15 2019-05-16 Korea Internet & Security Agency Apparatus for collecting vulnerability information and method thereof
CN111859387A (en) * 2019-04-25 2020-10-30 北京九州正安科技有限公司 Automatic construction method for Android platform software vulnerability model
CN110688456A (en) * 2019-09-25 2020-01-14 北京计算机技术及应用研究所 Vulnerability knowledge base construction method based on knowledge graph
CN113961786A (en) * 2021-10-22 2022-01-21 苏州棱镜七彩信息科技有限公司 Multi-element heterogeneous vulnerability integration and library building method
CN114021156A (en) * 2022-01-05 2022-02-08 北京华云安信息技术有限公司 Method, device and equipment for organizing vulnerability automatic aggregation and storage medium

Also Published As

Publication number Publication date
CN115828270B (en) 2023-06-09

Similar Documents

Publication Publication Date Title
CN110134757B (en) Event argument role extraction method based on multi-head attention mechanism
KR100816934B1 (en) Clustering system and method using search result document
CN109886294A (en) Knowledge fusion method, apparatus, computer equipment and storage medium
CN108984775B (en) Public opinion monitoring method and system based on commodity comments
CN109214004B (en) Big data processing method based on machine learning
CN111950622B (en) Behavior prediction method, device, terminal and storage medium based on artificial intelligence
CN111708774B (en) Industry analytic system based on big data
CN113254630B (en) Domain knowledge map recommendation method for global comprehensive observation results
CN112800172A (en) Code searching method based on two-stage attention mechanism
CN110990718A (en) Social network model building module of company image improving system
CN109241298A (en) Semantic data stores dispatching method
CN111680506A (en) External key mapping method and device of database table, electronic equipment and storage medium
CN117149974A (en) Knowledge graph question-answering method for sub-graph retrieval optimization
CN114897085A (en) Clustering method based on closed subgraph link prediction and computer equipment
CN113378024B (en) Deep learning-oriented public inspection method field-based related event identification method
CN117114105B (en) Target object recommendation method and system based on scientific research big data information
CN112232576B (en) Decision prediction method, device, electronic equipment and readable storage medium
CN115828270A (en) Vulnerability verification construction system and method based on NLP
CN115905705A (en) Industrial algorithm model recommendation method based on industrial big data
CN115147020A (en) Decoration data processing method, device, equipment and storage medium
CN108256083A (en) Content recommendation method based on deep learning
CN108256086A (en) Data characteristics statistical analysis technique
CN109977227B (en) Text feature extraction method, system and device based on feature coding
CN113298448B (en) Lease index analysis method and system based on Internet and cloud platform
CN113536077B (en) Mobile APP specific event content detection method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant