CN116775910B - Automatic vulnerability reproduction knowledge base construction method and medium based on information collection - Google Patents

Automatic vulnerability reproduction knowledge base construction method and medium based on information collection Download PDF

Info

Publication number
CN116775910B
CN116775910B CN202311041050.XA CN202311041050A CN116775910B CN 116775910 B CN116775910 B CN 116775910B CN 202311041050 A CN202311041050 A CN 202311041050A CN 116775910 B CN116775910 B CN 116775910B
Authority
CN
China
Prior art keywords
vulnerability
information
intelligence
knowledge base
knowledge
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311041050.XA
Other languages
Chinese (zh)
Other versions
CN116775910A (en
Inventor
李季
汪晓慧
梁露露
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Yuanbao Technology Co ltd
Original Assignee
Beijing Yuanbao Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Yuanbao Technology Co ltd filed Critical Beijing Yuanbao Technology Co ltd
Priority to CN202311041050.XA priority Critical patent/CN116775910B/en
Publication of CN116775910A publication Critical patent/CN116775910A/en
Application granted granted Critical
Publication of CN116775910B publication Critical patent/CN116775910B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/50Monitoring users, programs or devices to maintain the integrity of platforms, e.g. of processors, firmware or operating systems
    • G06F21/57Certifying or maintaining trusted computer platforms, e.g. secure boots or power-downs, version controls, system software checks, secure updates or assessing vulnerabilities
    • G06F21/577Assessing vulnerabilities and evaluating computer system security
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Animal Behavior & Ethology (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Stored Programmes (AREA)

Abstract

The application discloses an automatic vulnerability reproduction knowledge base construction method and medium based on information collection. The method may include: establishing a vulnerability reproduction knowledge base, determining CVE numbers, analyzing the CVE numbers, and constructing feature vectors; based on the feature vector, collecting vulnerability information, extracting vulnerability information and reproducing a vulnerability environment; collecting vulnerability proving scheme information based on vulnerability information; analyzing the vulnerability proving scheme information, correlating information and evaluating the information to generate a vulnerability proving scheme report; generating a vulnerability exploitation script according to the vulnerability proving scheme report; starting the vulnerability environment, executing the vulnerability exploitation script to simulate attack, and storing the vulnerability environment and the vulnerability exploitation script into a vulnerability reproduction knowledge base. According to the application, through the pre-constructed CVE number, information is automatically collected, a vulnerability environment is built, a vulnerability exploitation script is generated, and simulation attack is performed. The vulnerability reproduction efficiency is improved, and researchers can conveniently and rapidly perform tasks such as vulnerability verification and utilization.

Description

Automatic vulnerability reproduction knowledge base construction method and medium based on information collection
Technical Field
The application relates to the field of vulnerability reproduction, in particular to an automatic vulnerability reproduction knowledge base construction method and medium based on information collection.
Background
According to the relevant vulnerability organization and the data disclosed by the platform, the number of the included vulnerabilities steadily increases each year in recent years, and the influence range of the vulnerabilities is also continuously expanded. Learning known vulnerabilities, understanding the formation principle and utilization details thereof can effectively prevent most security problems.
At present, the vulnerability attack mode in the repeated security event mainly comprises the steps of configuring a vulnerability environment and carrying out vulnerability reproduction by utilizing vulnerability certification. But this approach has three problems: firstly, the configuration of the vulnerability environment is complex, manufacturers release security patches to repair public vulnerabilities along with the time, and most vulnerabilities can be successfully utilized only under specific environments or conditions. Therefore, the software and hardware environment related to the loopholes needs to be configured, and specific systems and software versions are installed, so that manpower is wasted; secondly, the information collection effect is poor, the main vulnerability information collection scheme at present comprises manual search, web page crawling, API calling and community integration, but the problems of low efficiency, narrow coverage, poor instantaneity and the like are faced by the methods, and the application requirements are difficult to meet; thirdly, the technical difficulty is high, certain requirements are provided for the capability of security practitioners, the public details of some loopholes are few, a detailed utilization process manual is lacking, the operation complexity of testers is increased, and a great amount of time and energy are required to be input.
The current vulnerability reproduction environment construction can quickly and automatically construct a vulnerability reproduction environment according to vulnerability information, can acquire the necessary vulnerability information element information for constructing the cloud native application vulnerability without manpower, writes a vulnerability information data packet according to the vulnerability information element information, and constructs the vulnerability reproduction environment according to the vulnerability information data packet. But the necessary vulnerability element information needs to be collected by security practitioners to be filled in accordance with the structural standards of the vulnerability information file, and the software installation configuration file needs to be specially written. The method needs manual operation, and the work efficiency is not obviously improved; in addition, the method does not collect vulnerability proving scheme information, only builds a vulnerability environment, does not generate vulnerability exploitation scripts, and is difficult to simulate vulnerability attack modes.
Therefore, it is necessary to develop an automated vulnerability discovery knowledge base construction method and medium based on information collection.
The information disclosed in the background section of the application is only for enhancement of understanding of the general background of the application and should not be taken as an acknowledgement or any form of suggestion that this information forms the prior art already known to a person skilled in the art.
Disclosure of Invention
The application provides an automatic vulnerability reproduction knowledge base construction method and medium based on information collection, which can construct a feature vector according to a pre-constructed CVE number, automatically analyze and collect and generate the environment and vulnerability evidence of a vulnerability based on the feature vector, and finally automatically construct the vulnerability environment and execute a vulnerability exploitation script to reproduce the vulnerability. The operation complexity of researchers is reduced, so that the researchers can conveniently conduct vulnerability reproduction.
In a first aspect, an embodiment of the present disclosure provides a method for constructing an automated vulnerability discovery knowledge base based on information collection, including:
step 1: establishing a vulnerability reproduction knowledge base, determining CVE numbers of vulnerability reproduction, analyzing the CVE numbers, and constructing feature vectors;
step 2: collecting vulnerability information based on the feature vector, extracting vulnerability information and reproducing a vulnerability environment;
step 3: collecting vulnerability proving scheme information based on the vulnerability information;
step 4: analyzing the vulnerability proving scheme information, carrying out information association through a knowledge graph, carrying out information evaluation through an information evaluation mechanism, and generating a vulnerability proving scheme report;
step 5: generating a vulnerability exploitation script according to the vulnerability proving scheme report;
step 6: and starting the vulnerability environment, executing the vulnerability exploitation script to perform simulation attack, and storing the vulnerability environment and the vulnerability exploitation script into the vulnerability reproduction knowledge base.
Preferably, the vulnerability proving scheme information is analyzed through NLP technology.
Preferably, analyzing the vulnerability proving scheme intelligence includes:
cleaning the vulnerability proving scheme information, and extracting keywords from the cleaned information through a TF-IDF algorithm;
analyzing semantic information of the information text through a dependency syntax analysis technology;
generating abstract information of the information text through an LSTM algorithm;
judging the type of the information according to the keywords, the semantic information and the abstract content, and realizing classification and induction of the information.
Preferably, the keywords include vulnerability names, influencing systems/devices, exploitation tools.
Preferably, the information association by the knowledge graph includes:
extracting key information of different informations as nodes, and taking the nodes as basic elements for constructing a knowledge graph;
according to the internal association between nodes, defining different types of edges to represent the relationship, linking the related nodes, and realizing association and semantic expression between nodes;
combining the nodes and the edges to construct a knowledge triplet with semantics and storing the knowledge triplet in a knowledge graph;
when new information is collected, extracting nodes, inquiring similar or related nodes in the knowledge graph, and judging the category and the related scheme of the new information;
if the new information is related to the knowledge graph but is incomplete, extracting new information content from the new information for supplementing;
if the nodes in the new information are not directly related to the knowledge graph, generating a new knowledge triplet, and adding the new knowledge triplet to the knowledge graph.
Preferably, the information evaluation by the information evaluation mechanism includes:
determining multi-angle information evaluation indexes;
determining the weight of each evaluation index according to the characteristics of different types of information;
judging the information performance of each evaluation index according to the evaluation method, and determining the score of each index;
and calculating the total score of the information aiming at each index score and the corresponding weight thereof, thereby realizing the comprehensive evaluation of the information quality.
Preferably, the multi-angle information evaluation index comprises source authority, information integrity, consistency test and information timeliness.
Preferably, if the score of a certain index is too high or too low, the evaluation result of the index is subjected to key judgment to judge whether the overall evaluation of the information is obviously affected, so that the error judgment caused by the extreme score of the certain index is avoided.
Preferably, after the vulnerability environment executes the exploit script to perform the simulated attack, the step 6 further includes:
judging whether the simulation attack result is matched with the real result, if so, directly storing the current vulnerability environment and the vulnerability exploitation script, if not, alarming and circularly executing the steps 3-6 until the matching or simulation attack times reach the set times, and then exiting the task.
In a second aspect, the embodiments of the present disclosure further provide a computer readable storage medium storing a computer program, where the computer program when executed by a processor implements the method for constructing an automated vulnerability replication knowledge base based on intelligence collection.
The method and apparatus of the present application have other features and advantages which will be apparent from or are set forth in detail in the accompanying drawings and the following detailed description, which are incorporated herein, and which together serve to explain certain principles of the present application.
Drawings
The foregoing and other objects, features and advantages of the application will be apparent from the following more particular descriptions of exemplary embodiments of the application as illustrated in the accompanying drawings wherein like reference numbers generally represent like parts throughout the exemplary embodiments of the application.
FIG. 1 illustrates a flowchart of the steps of an automated vulnerability discovery knowledge base construction method based on intelligence collection, according to an embodiment of the application.
Fig. 2 shows a flow chart of steps for intelligence analysis using NLP techniques in accordance with one embodiment of the present application.
Fig. 3 shows a flow chart of the steps of establishing a knowledge-graph for intelligence association, according to an embodiment of the application.
FIG. 4 shows a flowchart of the steps of intelligence evaluation by an intelligence evaluation mechanism, according to one embodiment of the application.
Detailed Description
Preferred embodiments of the present application will be described in more detail below. While the preferred embodiments of the present application are described below, it should be understood that the present application may be embodied in various forms and should not be limited to the embodiments set forth herein.
In order to facilitate understanding of the solution and the effects of the embodiments of the present application, two specific application examples are given below. It will be understood by those of ordinary skill in the art that the examples are for ease of understanding only and that any particular details thereof are not intended to limit the present application in any way.
Example 1
FIG. 1 illustrates a flowchart of the steps of an automated vulnerability discovery knowledge base construction method based on intelligence collection, according to an embodiment of the application.
As shown in fig. 1, the method for constructing the automated vulnerability reproduction knowledge base based on information collection includes:
step 1: establishing a vulnerability reproduction knowledge base, determining CVE numbers of vulnerability reproduction, analyzing the CVE numbers, and constructing feature vectors;
step 2: based on the feature vector, collecting vulnerability information, extracting vulnerability information and reproducing a vulnerability environment;
step 3: collecting vulnerability proving scheme information based on vulnerability information;
step 4: analyzing the information of the vulnerability proving scheme, carrying out information association through a knowledge graph, carrying out information evaluation through an information evaluation mechanism, and generating a vulnerability proving scheme report;
step 5: generating a vulnerability exploitation script according to the vulnerability proving scheme report;
step 6: starting the vulnerability environment, executing the vulnerability exploitation script to simulate attack, and storing the vulnerability environment and the vulnerability exploitation script into a vulnerability reproduction knowledge base.
In one example, vulnerability attestation solution intelligence is analyzed by NLP technology.
In one example, analyzing vulnerability attestation scheme intelligence includes:
cleaning vulnerability proving scheme information, and extracting keywords from the cleaned information through a TF-IDF algorithm;
analyzing semantic information of the information text through a dependency syntax analysis technology;
generating abstract information of the information text through an LSTM algorithm;
judging the type of the information according to the key words, the semantic information and the abstract content, and realizing classification and induction of the information.
In one example, keywords include vulnerability names, impact systems/devices, exploitation tools.
In one example, information association by knowledge-graph includes:
extracting key information of different informations as nodes, and taking the nodes as basic elements for constructing a knowledge graph;
according to the internal association between nodes, defining different types of edges to represent the relationship, linking the related nodes, and realizing association and semantic expression between nodes;
combining the nodes and the edges to construct a knowledge triplet with semantics and storing the knowledge triplet in a knowledge graph;
when new information is collected, extracting nodes, inquiring similar or related nodes in the knowledge graph, and judging the category and the related scheme of the new information;
if the new information is related to the knowledge graph but is incomplete, extracting new information content from the new information for supplementing;
if the nodes in the new information are not directly related to the knowledge graph, generating a new knowledge triplet, and adding the new knowledge triplet to the knowledge graph.
In one example, the intelligence evaluation by the intelligence evaluation mechanism includes:
determining multi-angle information evaluation indexes;
determining the weight of each evaluation index according to the characteristics of different types of information;
judging the information performance of each evaluation index according to the evaluation method, and determining the score of each index;
and calculating the total score of the information aiming at each index score and the corresponding weight thereof, thereby realizing the comprehensive evaluation of the information quality.
In one example, the multi-angle intelligence evaluation index includes source authority, information integrity, consistency check, intelligence timeliness.
In one example, if the score of a certain index is too high or too low, the evaluation result of the index is subjected to key judgment to judge whether the overall evaluation of the information is significantly affected, so that the error judgment caused by the extreme score of the certain index is avoided.
In one example, after the vulnerability environment executes the exploit script to perform the simulated attack, step 6 further includes:
judging whether the simulation attack result is matched with the real result, if so, directly storing the current vulnerability environment and the vulnerability exploitation script, if not, alarming and circularly executing the steps 3-6 until the matching or simulation attack times reach the set times, and then exiting the task.
Specifically, step 1: a security researcher prepares a vulnerability reproduction knowledge base, and the content of the knowledge base is a CVE number for vulnerability reproduction. Analyzing the CVE number of the vulnerability to obtain the vulnerability name and vulnerability description of the current CVE number, performing text word segmentation on the vulnerability description name and vulnerability description, and constructing a feature vector.
Step 2: based on the feature vector, crawling vulnerability information platforms of enterprises of public vulnerability platforms, related Internet and the like, and collecting vulnerability information. And extracting vulnerability information based on the collected information. And customizing the Dockerfile configuration file according to the extracted vulnerability information to realize quick reproduction of the vulnerability environment.
Step 3: collecting vulnerability demonstration scheme information: based on the extracted vulnerability information, a crawler program is used for monitoring security research information such as emergency response platforms, hack forums, public vulnerability platforms, papers of security researchers and blogs of enterprises of related internet and the like, vulnerability proving scheme information is collected, and information sources are expanded to the greatest extent.
Step 4: extracting a vulnerability proving scheme: and analyzing the collected information by using an NLP technology, understanding the information key points, and identifying influencing keywords and utilizing processes. And then, associating each information through the knowledge graph, finding potential association and supplementing missing information. In addition, an information evaluation mechanism is established to evaluate the accuracy of each information and select high-quality information. And finally, generating a vulnerability proving scheme report according to the high-quality information.
Fig. 2 shows a flow chart of steps for intelligence analysis using NLP techniques in accordance with one embodiment of the present application.
As shown in fig. 2, the intelligence analysis using NLP technique includes:
1) And (5) preprocessing information. The collected information text is cleaned, such as uniform format, spelling correction, stop word filtering and the like, so that the structure is neat, the format is uniform, and the subsequent processing is convenient;
2) And extracting keywords. Keywords, such as vulnerability names, impact systems/devices, exploitation tools, etc., are extracted from the informative text using TF-IDF algorithm. Primary content and features for identifying the intelligence;
3) Semantic parsing. Semantic structures and elements of the informative text are parsed using dependency syntax analysis techniques. Identifying key information such as vulnerability names, influence ranges, utilization processes and the like in the text, and understanding logic relations and meaning expressions in the key information;
4) The information abstract. And generating abstract information of the information text by using an LSTM algorithm. The main content and the key points of the information can be quickly understood from the abstract content, and meanwhile, the downstream information association and evaluation process is facilitated;
5) And (5) classifying information. Judging the information type and the described vulnerability proving scheme according to the key words, the semantic information and the abstract content, and realizing classification and induction of the information. This facilitates the directed processing and management of different types of intelligence.
Fig. 3 shows a flow chart of the steps of establishing a knowledge-graph for intelligence association, according to an embodiment of the application.
As shown in fig. 3, the information analysis information obtained by using the NLP technology is constructed into structured knowledge, and is stored in a knowledge graph, so as to realize standardized expression and management of information content, and the establishment of the knowledge graph for information association includes:
1) Node definition, extracting key information of different informations as nodes, and using the key information as a basic element for constructing a knowledge graph to represent main characteristics of the informations;
2) Edge definition, namely defining different types of edges according to the inherent association among the nodes to represent the relationship, such as 'yes', 'utilized', 'influencing' and the like, linking related nodes, and realizing association and semantic expression among the nodes;
3) Knowledge construction, namely combining nodes and edges to form a structure of 'subjects-predicates-objects', constructing a knowledge triplet with semantics, and storing the knowledge in a knowledge graph. The method comprises the steps of structuring and standardizing management of collected information content;
4) When new information is collected, firstly extracting keywords and elements from a text to serve as nodes, then inquiring similar or related nodes in a knowledge graph, judging which existing knowledge the new information is related to or repeated with, and further presuming the category and the related scheme of the new information;
5) Knowledge supplementing, if the new information is related to some knowledge in the knowledge graph, but the knowledge item is incomplete, more detailed content needs to be extracted from the new information for supplementing;
6) If some key nodes in the new information are not directly related to the knowledge in the knowledge graph, new knowledge triples can be generated, at the moment, the content of the new information needs to be analyzed, the relation and the semantics between the nodes are understood, the new knowledge triples are generated, and the new knowledge triples are added to the knowledge graph to realize the discovery and the expansion of the knowledge.
FIG. 4 shows a flowchart of the steps of intelligence evaluation by an intelligence evaluation mechanism, according to one embodiment of the application.
As shown in fig. 4, the information evaluation by the information evaluation mechanism includes:
1) Index determination, evaluating from a plurality of angles intelligence, comprising:
(1) the authority of the source, the authority and the public belief of the information release source are evaluated, such as government institutions, well-known enterprises, personal researchers and the like, the information quality and the accuracy of the authority source are generally higher, and the authority source can be used as an important reference factor for judging the information;
(2) the information integrity is used for evaluating whether the information contained in the information is complete and detailed, and whether the information has various elements required by the report of the vulnerability proving scheme, such as vulnerability names, influence ranges, utilization conditions, technical details and the like, so that the quality of the information with complete information is generally higher;
(3) consistency test, comparing information from different sources, judging whether the same vulnerability proving scheme is described, judging whether the extracted key information is consistent, such as whether vulnerability names in different information, influencing system versions and the like are the same, and judging whether the utilization processes are similar. The reliability of the information with higher consistency is higher;
(4) the timeliness of the information, the time of evaluating the information output, the timeliness of the information which is higher than that of the information which is released more recently, the contained information can be more detailed and accurate, and on the contrary, the timeliness of the information which is released for a longer time is reduced, wherein the information can be already out of date, and the reliability of the information can be affected.
2) And (5) weight determination. And determining the weight of each evaluation index according to the characteristics of different types of information.
3) Index score. And judging the information performance of each evaluation index according to the evaluation method, setting high scores matched with the information types and setting low scores not matched with the information types, and finally determining the score of each index.
4) And (5) calculating a total score. And multiplying each index score by the weight of the index score, and summing to obtain the total score of the information, wherein the total score represents the comprehensive evaluation of the information quality. The higher the total score, the higher its quality and importance.
5) And (5) judging key. When the score of a certain index is too high or too low, key judgment needs to be carried out on the evaluation result of the index to judge whether the overall evaluation of the information can be obviously influenced or not, and error judgment caused by extreme scoring of the certain index is avoided.
Step 5: generating a vulnerability exploitation script: and generating the vulnerability exploitation script according to the predefined template based on the generated vulnerability proving scheme report.
Step 6: and starting a vulnerability environment, executing a vulnerability exploitation script in the vulnerability environment, and performing simulation attack. If the attack result is matched with the real result, the vulnerability automatic reproduction of the current number is successful, and the test is ended; if the attack result is not matched with the real result, alarming and executing the steps 3-6 circularly until the simulated attack result is matched with the real result or the simulated attack times reach 10 times, exiting the task. And (3) processing the simulated attack feedback by a security researcher, and recording the vulnerability environment and the vulnerability utilization script into a vulnerability reproduction knowledge base.
CVE: public vulnerabilities and exposures (Common Vulnerabilities and Exposures, CVE), also known as generic vulnerability disclosure, common vulnerabilities and disclosure, are a database related to information security, collecting various information security vulnerabilities and numbering for public review. CVE assigns a proprietary number to each vulnerability in the format: CVE-YYYY-NNNN, CVE is a fixed prefix word, YYY is the century of West origin, NNNN is the running water number.
NLP: natural language processing (Natural Language Processing, NLP) is a branch discipline in the fields of artificial intelligence and linguistics. The field discusses how natural language is handled and used; natural language processing includes aspects and steps, basically including cognition, understanding, generation, and the like. Natural language cognition and understanding is to let a computer change the input language into interesting symbols and relationships, and then reprocess them according to the purpose. The natural language generation system converts the computer data into natural language.
Vulnerability information: by analyzing the CVE number, obtaining a vulnerability name and vulnerability description, collecting vulnerability information on vulnerability information platforms of enterprises of public vulnerability platforms, related Internet and the like according to the name and the description, and extracting the information to obtain vulnerability information: including vulnerability sources, impact versions, hazard levels, solutions, etc., which are used to build vulnerability environments and gather intelligence of vulnerability attestation schemes.
Vulnerability demonstration scheme: proof of concept (PoC), a short, incomplete implementation of certain ideas, can be used to verify that a vulnerability or class of vulnerabilities is actually present, demonstrating its principle, its purpose being to verify some concepts or theories.
Vulnerability exploitation script: exploit (Exp) is a section of program that can Exploit the value of a vulnerability. And compiling a vulnerability exploitation script to exploit the vulnerability through the collected vulnerability proving scheme. For example, when a certain system has SQL injection loopholes, a loophole utilizing script can be written to extract database version information and the like.
Dockerfile: dock is used to develop applications, deliver applications, and run applications, allowing users to separate applications in a base set into smaller containers, thereby increasing the speed of delivering software. Dockerfile is a file format of Docker that contains text of all commands that the user wants to build a mirror image, defining the content of a single container and the behavior at startup.
Simulation attack: after the vulnerability environment is built and the vulnerability exploitation script is generated, the script can be executed in the environment, whether the script execution result is consistent with the vulnerability response or not is judged, and if so, the simulation attack is considered to be successful.
The TF-IDF algorithm (Term Frequency-Inverse Document Frequency, TF-IDF) is a common statistical method for information retrieval and text mining to evaluate the importance of a word to one of a set of documents or a corpus of documents. The importance of a word increases proportionally with the number of times it appears in the file, but at the same time decreases inversely with the frequency with which it appears in the corpus.
Dependency syntax analysis: in natural language processing, a framework that describes a language structure in terms of word-to-word dependencies is called a dependency grammar, also called a dependency grammar. Syntactic analysis using dependency syntax is also one of the important techniques for natural language understanding. It analyzes the sentence into a dependency syntax tree describing the dependency relationship between the words. That is, a syntactically collocation relationship between words is indicated, which is semantically associated.
LSTM: long Short-Term Memory (LSTM) is a recurrent neural network suitable for processing Long sequence text. The system has a gating structure and a memory system, and can selectively forget certain information and update other information, so that the system is particularly suitable for understanding the semantics of long text sequences.
The automatic loophole reproduction knowledge base is created, so that the working such as loophole verification and utilization can be conveniently and quickly carried out by researchers, the time and energy for collecting information of the researchers and reproducing the loopholes can be saved, and the researchers can concentrate on the study of the loophole technology.
Example 2
The embodiment of the disclosure provides a computer readable storage medium, which stores a computer program, and the computer program realizes the automated vulnerability reproduction knowledge base construction method based on information collection when being executed by a processor.
A computer-readable storage medium according to an embodiment of the present disclosure has stored thereon non-transitory computer-readable instructions. When executed by a processor, perform all or part of the steps of the methods of embodiments of the present disclosure described above.
The computer-readable storage medium described above includes, but is not limited to: optical storage media (e.g., CD-ROM and DVD), magneto-optical storage media (e.g., MO), magnetic storage media (e.g., magnetic tape or removable hard disk), media with built-in rewritable non-volatile memory (e.g., memory card), and media with built-in ROM (e.g., ROM cartridge).
It will be appreciated by persons skilled in the art that the above description of embodiments of the application has been given for the purpose of illustrating the benefits of embodiments of the application only and is not intended to limit embodiments of the application to any examples given.
The foregoing description of embodiments of the application has been presented for purposes of illustration and description, and is not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the various embodiments described.

Claims (9)

1. An automated vulnerability discovery knowledge base construction method based on information collection is characterized by comprising the following steps:
step 1: establishing a vulnerability reproduction knowledge base, determining CVE numbers of vulnerability reproduction, analyzing the CVE numbers, and constructing feature vectors;
step 2: collecting vulnerability information based on the feature vector, extracting vulnerability information and reproducing a vulnerability environment; customizing a Dockerfile configuration file according to the extracted vulnerability information, and rapidly reproducing a vulnerability environment;
step 3: collecting vulnerability proving scheme information based on the vulnerability information;
step 4: analyzing the vulnerability proving scheme information, carrying out information association through a knowledge graph, carrying out information evaluation through an information evaluation mechanism, and generating a vulnerability proving scheme report;
step 5: generating a vulnerability exploitation script according to the vulnerability proving scheme report;
step 6: starting the vulnerability environment, executing the vulnerability exploitation script to perform simulation attack, and storing the vulnerability environment and the vulnerability exploitation script into the vulnerability reproduction knowledge base;
after the vulnerability environment executes the vulnerability exploitation script to perform the simulation attack, the step 6 further includes:
judging whether the simulation attack result is matched with the real result, if so, directly storing the current vulnerability environment and the vulnerability exploitation script, if not, alarming and circularly executing the steps 3-6 until the matching or simulation attack times reach the set times, and then exiting the task.
2. The automated vulnerability discovery knowledge base construction method based on intelligence collection of claim 1, wherein the vulnerability demonstration scheme intelligence is analyzed by NLP technique.
3. The automated vulnerability discovery knowledge base construction method based on intelligence collection of claim 2, wherein analyzing the vulnerability attestation scheme intelligence comprises:
cleaning the vulnerability proving scheme information, and extracting keywords from the cleaned information through a TF-IDF algorithm;
analyzing semantic information of the information text through a dependency syntax analysis technology;
generating abstract information of the information text through an LSTM algorithm;
judging the type of the information according to the keywords, the semantic information and the abstract content, and realizing classification and induction of the information.
4. The automated vulnerability discovery knowledge base construction method based on intelligence collection of claim 3, wherein the keywords comprise vulnerability names, influencing systems/devices, and utilizing tools.
5. The automated vulnerability discovery knowledge base construction method based on intelligence collection of claim 4, wherein the information association by knowledge graph comprises:
extracting key information of different informations as nodes, and taking the nodes as basic elements for constructing a knowledge graph;
according to the internal association between nodes, defining different types of edges to represent the relationship, linking the related nodes, and realizing association and semantic expression between nodes;
combining the nodes and the edges to construct a knowledge triplet with semantics and storing the knowledge triplet in a knowledge graph;
when new information is collected, extracting nodes, inquiring similar or related nodes in the knowledge graph, and judging the category and the related scheme of the new information;
if the new information is related to the knowledge graph but is incomplete, extracting new information content from the new information for supplementing;
if the nodes in the new information are not directly related to the knowledge graph, generating a new knowledge triplet, and adding the new knowledge triplet to the knowledge graph.
6. The automated vulnerability discovery knowledge base construction method based on intelligence collection of claim 5, wherein the intelligence assessment by the intelligence assessment mechanism comprises:
determining multi-angle information evaluation indexes;
determining the weight of each evaluation index according to the characteristics of different types of information;
judging the information performance of each evaluation index according to the evaluation method, and determining the score of each index;
and calculating the total score of the information aiming at each index score and the corresponding weight thereof, thereby realizing the comprehensive evaluation of the information quality.
7. The automated vulnerability discovery knowledge base construction method based on intelligence collection of claim 6, wherein the multi-angle intelligence evaluation index comprises source authority, information integrity, consistency check, intelligence timeliness.
8. The method for constructing an automated vulnerability reproduction knowledge base based on intelligence collection according to claim 7, wherein if the score of a certain index is too high or too low, the evaluation result of the index is subjected to key judgment to judge whether the overall evaluation of intelligence is significantly affected, so as to avoid erroneous judgment caused by extreme scoring of the certain index.
9. A computer readable storage medium, characterized in that the computer readable storage medium stores a computer program which, when executed by a processor, implements the automated vulnerability discovery knowledge base construction method based on intelligence collection according to any one of claims 1-8.
CN202311041050.XA 2023-08-18 2023-08-18 Automatic vulnerability reproduction knowledge base construction method and medium based on information collection Active CN116775910B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311041050.XA CN116775910B (en) 2023-08-18 2023-08-18 Automatic vulnerability reproduction knowledge base construction method and medium based on information collection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311041050.XA CN116775910B (en) 2023-08-18 2023-08-18 Automatic vulnerability reproduction knowledge base construction method and medium based on information collection

Publications (2)

Publication Number Publication Date
CN116775910A CN116775910A (en) 2023-09-19
CN116775910B true CN116775910B (en) 2023-11-24

Family

ID=87993389

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311041050.XA Active CN116775910B (en) 2023-08-18 2023-08-18 Automatic vulnerability reproduction knowledge base construction method and medium based on information collection

Country Status (1)

Country Link
CN (1) CN116775910B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106295347A (en) * 2015-05-28 2017-01-04 国家计算机网络与信息安全管理中心 For building the method and device of validating vulnerability environment
CN110688456A (en) * 2019-09-25 2020-01-14 北京计算机技术及应用研究所 Vulnerability knowledge base construction method based on knowledge graph
CN110717049A (en) * 2019-08-29 2020-01-21 四川大学 Text data-oriented threat information knowledge graph construction method
CN111259406A (en) * 2020-01-14 2020-06-09 中国传媒大学 Automatic construction method and system for cloud native application vulnerability reproduction environment
CN115357722A (en) * 2022-07-06 2022-11-18 四维创智(北京)科技发展有限公司 Correlation method of vulnerability information
CN115827895A (en) * 2022-12-12 2023-03-21 绿盟科技集团股份有限公司 Vulnerability knowledge graph processing method, device, equipment and medium
CN115859304A (en) * 2022-12-19 2023-03-28 南京理工大学 Vulnerability discovery knowledge graph construction method fusing ATT and CK frameworks
CN116318818A (en) * 2022-12-30 2023-06-23 中国人民解放军战略支援部队信息工程大学 Network security intelligent decision automatic arrangement response method and system
CN116484025A (en) * 2023-06-15 2023-07-25 北京电子科技学院 Vulnerability knowledge graph construction method, vulnerability knowledge graph evaluation equipment and storage medium
CN116595542A (en) * 2023-07-12 2023-08-15 北京安数云信息技术有限公司 Vulnerability scanning method and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11171982B2 (en) * 2018-06-22 2021-11-09 International Business Machines Corporation Optimizing ingestion of structured security information into graph databases for security analytics

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106295347A (en) * 2015-05-28 2017-01-04 国家计算机网络与信息安全管理中心 For building the method and device of validating vulnerability environment
CN110717049A (en) * 2019-08-29 2020-01-21 四川大学 Text data-oriented threat information knowledge graph construction method
CN110688456A (en) * 2019-09-25 2020-01-14 北京计算机技术及应用研究所 Vulnerability knowledge base construction method based on knowledge graph
CN111259406A (en) * 2020-01-14 2020-06-09 中国传媒大学 Automatic construction method and system for cloud native application vulnerability reproduction environment
CN115357722A (en) * 2022-07-06 2022-11-18 四维创智(北京)科技发展有限公司 Correlation method of vulnerability information
CN115827895A (en) * 2022-12-12 2023-03-21 绿盟科技集团股份有限公司 Vulnerability knowledge graph processing method, device, equipment and medium
CN115859304A (en) * 2022-12-19 2023-03-28 南京理工大学 Vulnerability discovery knowledge graph construction method fusing ATT and CK frameworks
CN116318818A (en) * 2022-12-30 2023-06-23 中国人民解放军战略支援部队信息工程大学 Network security intelligent decision automatic arrangement response method and system
CN116484025A (en) * 2023-06-15 2023-07-25 北京电子科技学院 Vulnerability knowledge graph construction method, vulnerability knowledge graph evaluation equipment and storage medium
CN116595542A (en) * 2023-07-12 2023-08-15 北京安数云信息技术有限公司 Vulnerability scanning method and system

Also Published As

Publication number Publication date
CN116775910A (en) 2023-09-19

Similar Documents

Publication Publication Date Title
Raharjana et al. User stories and natural language processing: A systematic literature review
Huang et al. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions
Carreño et al. Analysis of user comments: an approach for software requirements evolution
CA3052527C (en) Target document template generation
Xu et al. Domain-specific cross-language relevant question retrieval
Ahasanuzzaman et al. CAPS: a supervised technique for classifying Stack Overflow posts concerning API issues
Ganguly et al. Retrieving and classifying instances of source code plagiarism
Loyola et al. Bug localization by learning to rank and represent bug inducing changes
Stancheva et al. A model for generation of test questions
Vachharajani et al. A proposed architecture for automated assessment of use case diagrams
Mariani et al. An evolutionary approach to adapt tests across mobile apps
Zhu et al. Restoring the executability of jupyter notebooks by automatic upgrade of deprecated apis
Wu et al. Turn tree into graph: Automatic code review via simplified ast driven graph convolutional network
Zacharis et al. AiCEF: an AI-assisted cyber exercise content generation framework using named entity recognition
Liu et al. Supporting features updating of apps by analyzing similar products in App stores
Sales et al. Identification of semantic anti-patterns in ontology-driven conceptual modeling via visual simulation
Xiao et al. Extracting prerequisite relations among concepts from the course descriptions
Newberry et al. Constructing causal loop diagrams from large interview data sets
Moradi Dakhel et al. Assessing developer expertise from the statistical distribution of programming syntax patterns
Shrestha Detecting fake news with sentiment analysis and network metadata
Calle Gallego et al. QUARE: towards a question-answering model for requirements elicitation
CN116775910B (en) Automatic vulnerability reproduction knowledge base construction method and medium based on information collection
Wang et al. Difftech: Differencing similar technologies from crowd-scale comparison discussions
Liu et al. Recommending security requirements for the development of Android applications based on sensitive APIs
Frankel et al. Machine learning approaches for authorship attribution using source code stylometry

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant