CN114900346B

CN114900346B - Network security testing method and system based on knowledge graph

Info

Publication number: CN114900346B
Application number: CN202210461327.3A
Authority: CN
Inventors: 谢凌云; 王馨雨; 杨紫柠; 潘乐炳; 王艺婷; 帅源
Original assignee: Shanghai Institute of Microwave Technology CETC 50 Research Institute
Current assignee: Shanghai Institute of Microwave Technology CETC 50 Research Institute
Priority date: 2022-04-28
Filing date: 2022-04-28
Publication date: 2023-09-19
Anticipated expiration: 2042-04-28
Also published as: CN114900346A

Abstract

The invention provides a network security testing method and system based on a knowledge graph, comprising the following steps: step S1: extracting a knowledge triplet from the network security domain text; step S2: storing the extracted knowledge triples in a database in a preset form, and constructing a network security test knowledge graph; step S3: acquiring information by inquiring based on a network security test knowledge graph; step S4: loading a network security test scheme template, and generating a network security test scheme by using the queried information. According to the invention, by combining the ternary extraction model of the Encoder coding structure and the conditional random field CRF, the knowledge ternary related to the network security test in the network security text is extracted, and the network security test knowledge graph is constructed, so that network security testers can query information related to the network security test by using the knowledge graph, and the requirement on own knowledge storage of the network security testers is reduced.

Description

Network security testing method and system based on knowledge graph

Technical Field

The invention relates to the field of network security, in particular to a network security testing method and system based on a knowledge graph.

Background

With the development of the age and the progress of society, more and more devices for network informatization are provided, and the network security problem is increased. The network attacker can utilize the loopholes existing in the information system, and adopt various attack modes to carry out network attack on the information system, so that the security of the information network is seriously jeopardized. The network scale is greatly enlarged, so that network attack activities are more frequent and attack modes are more various, and the network security protection is provided with serious challenges. Therefore, testing the network security performance of the information system, improving the defending ability of the information system against network attacks, and getting more and more attention.

At present, when network security test is performed, a tester usually writes a network security test scheme by referring to related data. And then, according to the contents of the test outline, the test rules and the like, the network security test is implemented on the tested system. However, the fields related to network security are numerous, data are fragmented and massive, and a tester consumes a great deal of time and energy in the process of inquiring and testing data of network security. And when writing a network security test scheme, the method has high knowledge storage requirements for testers. The testers need to know the related knowledge in the network security field and also know the related information of the tested system so as to write an accurate and reliable network security test scheme. These problems reduce to some extent the efficiency of the performance of network security tests.

Patent document CN110688456A (application number: CN 201910909082.4) discloses a knowledge graph-based vulnerability knowledge base construction method, and relates to the technical field of network security. According to the knowledge fusion method, knowledge extracted by a plurality of data sources is fused through knowledge fusion, so that knowledge from different knowledge sources is subjected to heterogeneous data integration, disambiguation, processing, reasoning verification and updating under the same frame specification, and the fusion of data, information, methods, experience and attack and defense knowledge is achieved, so that a vulnerability knowledge base is formed. The invention extracts the knowledge triplets related to the network security test in the network security text and builds the network security test knowledge graph by combining the triple extraction model of the Encoder coding structure and the conditional random field CRF.

Disclosure of Invention

Aiming at the defects in the prior art, the invention aims to provide a network security testing method and system based on a knowledge graph.

The network security testing method based on the knowledge graph provided by the invention comprises the following steps:

step S1: extracting a knowledge triplet from the network security domain text;

step S2: storing the extracted knowledge triples in a database in a preset form, and constructing a network security test knowledge graph;

Step S3: acquiring information by inquiring based on a network security test knowledge graph;

step S4: loading a network security test scheme template, and generating a network security test scheme by using the queried information.

Preferably, in said step S1:

based on a coding network structure of a transducer, extracting a knowledge triplet related to network security test from a network security field text in combination with a conditional random field;

analyzing the network security text, determining the entity category to be extracted, labeling the entity relation triples existing in each text, and marking the entity relation triples in the form of: a host entity, a relationship, a guest entity; taking the labeling text as training test data of the model;

the marked text is subjected to single-heat coding, the input text is converted into a vector, matrix operation is carried out on the input text vector through a coding network structure, a context feature vector of a network security text sequence is obtained, a multi-head attention mechanism is utilized to capture local features meeting preset conditions in the context, and based on the extracted feature vector, a triplet in the input text is predicted by utilizing a conditional random field;

for the model for extracting the knowledge triples, adjusting model parameters according to training results, and training the model for multiple times to enable the accuracy of knowledge extraction to reach preset requirements; the adjustable model parameters comprise training times, batch size, learning rate, discarding rate and optimization function; and extracting knowledge triples in the unlabeled network security text by using the trained model.

Preferably, in said step S2:

storing the extracted knowledge triples into a Neo4j database, and constructing a network security test knowledge graph, wherein the storage form is as follows: nodes, attributes, attribute values; or the storage form is as follows: nodes, relations, nodes;

determining the storage form of the extracted network security knowledge triples in Neo4j according to the priori knowledge;

storing the processed knowledge triples into Neo4j respectively by using a Cypher language; the storage form is as follows: nodes, attributes, attribute values; alternatively, the storage form is: nodes, relationships, nodes.

Preferably, in said step S3:

based on the constructed knowledge graph, acquiring a network security entity and attribute information thereof through node inquiry, and acquiring information related to the entity node through path inquiry;

the information related to the network security test is queried through the Cypher language, which concretely comprises two information query modes:

a. node information query: inputting entity names, setting query conditions by using a WHERE command, matching nodes which are the same as the input entity names in a network security test knowledge graph by using a MATCH command, and returning entity information meeting the preset query conditions by using a RETURN command;

b. Node path query: and inputting entity names and path names, matching the node and node paths meeting preset conditions on the network security test knowledge graph through MATCH and WHERE commands, and returning the node attribute information and the association relation on the paths by utilizing RETURN commands.

Preferably, in said step S4:

loading a network security test scheme template by using a Python-docx library, and obtaining a mapping relation between the template and the test scheme; the template content comprises a tested object, a testing method and a testing tool;

the mapping relation between the template and the test scheme is expressed in the form of a dictionary, the template information is the key in the dictionary, and the knowledge inquired from the knowledge graph is the value corresponding to the key in the dictionary;

and converting the keys and the values in the dictionary into corresponding test outline and test rules, and generating a corresponding network security test scheme.

The invention provides a network security testing system based on a knowledge graph, which comprises the following components:

module M1: extracting a knowledge triplet from the network security domain text;

module M2: storing the extracted knowledge triples in a database in a preset form, and constructing a network security test knowledge graph;

module M3: acquiring information by inquiring based on a network security test knowledge graph;

Module M4: loading a network security test scheme template, and generating a network security test scheme by using the queried information.

Preferably, in said module M1:

Preferably, in said module M2:

Preferably, in said module M3:

Preferably, in said module M4:

Compared with the prior art, the invention has the following beneficial effects:

1. according to the invention, by combining the ternary extraction model of the Encoder coding structure and the conditional random field CRF, the knowledge ternary related to the network security test in the network security text is extracted, and the network security test knowledge graph is constructed, so that network security testers can query information related to the network security test by using the knowledge graph, and the requirement on own knowledge storage of the network security testers is reduced;

2. According to the invention, by utilizing the constructed network security test knowledge graph, a tester can automatically generate a network security test scheme by loading a network security test scheme template, so that the intelligent level of scheme design is improved;

3. the invention utilizes the generated network security test scheme, and the tester can rapidly and efficiently test the network security, thereby improving the execution efficiency of the network security test.

Drawings

Other features, objects and advantages of the present invention will become more apparent upon reading of the detailed description of non-limiting embodiments, given with reference to the accompanying drawings in which:

FIG. 1 is a flow chart of a network security test method;

FIG. 2 is a schematic diagram of a network security knowledge triplet extraction model;

FIG. 3 is a diagram of the internal architecture of a single Encoder Encoder;

FIG. 4 is a Self-attention structure diagram.

Detailed Description

The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the present invention, but are not intended to limit the invention in any way. It should be noted that variations and modifications could be made by those skilled in the art without departing from the inventive concept. These are all within the scope of the present invention.

Example 1:

according to the network security testing method based on the knowledge graph, as shown in fig. 1-4, the method comprises the following steps:

step S1: extracting a knowledge triplet from the network security domain text;

Specifically, in the step S1:

Specifically, in the step S2:

Specifically, in the step S3:

Specifically, in the step S4:

Example 2:

example 2 is a preferable example of example 1 to more specifically explain the present invention.

A person skilled in the art may understand the network security testing method based on a knowledge graph provided by the present invention as a specific implementation manner of the network security testing system based on a knowledge graph, that is, the network security testing system based on a knowledge graph may be implemented by executing the step flow of the network security testing method based on a knowledge graph.

Specifically, in the module M1:

Specifically, in the module M2:

Specifically, in the module M3:

Specifically, in the module M4:

Example 3:

example 3 is a preferable example of example 1 to more specifically explain the present invention.

Aiming at the defects in the prior art, the technical problems to be solved by the invention are as follows:

1) Constructing a network security test knowledge graph by utilizing massive and fragmented network security data;

2) And generating a network security test scheme by using the constructed network security test knowledge graph.

The invention aims to provide a network security testing method based on a knowledge graph, which is convenient for testing personnel to test the network security of equipment and systems. The method comprises the following steps:

step S100, extracting a knowledge triplet related to network security test from a network security domain text based on a code network structure (Encoder) of a transducer and in combination with a Conditional Random Field (CRF);

step S200, storing the extracted knowledge triples in a Neo4j database in the form of < nodes, attributes, attribute values > and < nodes, relations and nodes > to construct a network security test knowledge graph;

Step S300, based on the constructed knowledge graph, acquiring network security entities and attribute information thereof through node inquiry, and acquiring information related to entity nodes through path inquiry;

step S400, loading a network security test scheme template, and automatically generating a corresponding network security test scheme by using the queried knowledge.

The extracting the network security knowledge triples in step S100 specifically includes:

in step S101, in the network security text, the entity relationship triples existing in each text segment, such as < device name, existence, vulnerability name >, < tool name, attack tool, vulnerability name > are marked in the form of < host entity, relationship, guest entity >, and the like, and are used as training test data of the model.

Step S102, converting the input text into a vector by adopting single-hot coding on the marked text, performing matrix operation on the input text vector through an encoder model to obtain a context feature vector of a network security text sequence, and capturing key local features in the context by utilizing a Self-attention multi-head attention mechanism. And finally, predicting the < main entity, relation and guest entity > triples in the input text by utilizing a CRF model based on the extracted feature vectors.

Step S103, for the model for extracting the knowledge triples in step S102, model parameters are required to be adjusted according to training results, and the model is trained for multiple times, so that the accuracy of knowledge extraction reaches the use requirement. The adjustable model parameters comprise training times, batch size, learning rate, discarding rate, optimization function and the like. And finally, extracting knowledge triples in the unlabeled network security text by using the trained model.

After step S100, step S200 stores the extracted knowledge triples in a graph database Neo4j, and builds a network security test knowledge graph, where step S200 specifically includes:

step S201, according to the priori knowledge, the storage form of the extracted network security knowledge triples in Neo4j is determined. If the vulnerability name is included, the vulnerability score > belongs to the category < node, attribute value >, < tool name, attack tool, vulnerability name > belongs to the category < node, relationship, node >.

Step S202, after step S201, the processed knowledge triples are stored in Neo4j according to the form of < node, attribute value > and < node, relation, node > by using commands such as CREATE, LOAD of the Cypher language.

After step S200, step S300 queries information related to the network security test by a command such as MATCH, WHERE, RETURN in the Cypher language. The step S300 specifically includes two information query methods:

1. And inquiring node information. And inputting the entity name, setting a query condition by utilizing a WHERE command, matching nodes which are the same as the input entity name in a network security test knowledge graph by utilizing a MATCH command, and returning entity information meeting the query condition by utilizing a RETURN command.

2. And inquiring the node path. And inputting entity names and path names, matching the node and node paths meeting the conditions on the network security test knowledge graph through MATCH and WHERE commands, and returning the node attribute information and the association relation thereof on the paths by utilizing RETURN commands.

After step S300, step S400 automatically generates a corresponding network security test scheme by loading a network security test scheme template and using the queried knowledge. The step S400 specifically includes:

and S401, loading a network security test scheme template by using a Python-docx library, and obtaining the mapping relation between the template and the test scheme. The template content comprises frame information such as a tested object, a testing method, a testing tool and the like;

step S402, representing the mapping relation between the template and the test scheme in the form of a dictionary, wherein the template information is keys in the dictionary, and the knowledge queried from the knowledge graph is values corresponding to the keys in the dictionary;

And step S403, converting the keys and the values in the dictionary into corresponding test outline and test rules, and generating a corresponding network security test scheme.

Example 4:

example 4 is a preferable example of example 1 to more specifically explain the present invention.

The technical scheme of the invention will be clearly and completely described below with reference to the accompanying drawings.

Referring to fig. 1, the network security testing method based on the knowledge graph provided by the invention comprises the following steps:

step S100, extracting, by a knowledge extraction algorithm, a knowledge triplet related to a network security test from a network security domain text, where step S100 includes:

in step S101, firstly, network security text is analyzed, determining entity types to be extracted, such as equipment, tools, vulnerabilities, etc., and then, according to expert knowledge, triples of < host entity, relationship, guest entity > in the text, such as < equipment name, existence, vulnerability name >, < tool name, attack tool, vulnerability name > etc., are marked;

in step S102, a model of entity-relationship joint extraction is used, the model is input as text, and output as triples (host entity S, relationship p, guest entity o) in the text. S is predicted first, then s is input to predict o corresponding to s, and then s and o are input to predict the relation p of s and o. The extraction model of the network security knowledge triples is shown in fig. 2.

Network security data is first converted into Token, segment and Position vectors for model input. The Token is an input sequence of a text and represents text content; segment is clause information, the first sentence is represented by '1', and the second sentence is represented by '0'; position is Position information representing the Position index of each input character in the library. Each Input of the model consists of token+segment+position and, when passed to the Encoder, is converted to Input Embedding and Position inputs of the Encoder. The individual Encoder structures are shown in FIG. 3, including Self-attention, add & Norm and Feed-force layers.

The Self-attribute structure is shown in fig. 4, and the context information of each Input Embedding is read according to the Position by querying 3 vectors with the same length, namely, a vector Q, a key vector K and a value vector V. Wherein the calculation formulas of the query vector Q, the key vector K and the value vector V are respectively as follows

Q＝XW ^Q ,K＝XW ^K ,V＝XW ^V

Wherein X is an input matrix, W ^Q ，W ^K ，W ^V The weight matrix can be obtained through model training. The output Y of Self-attribute is

In the method, in the process of the invention,to penalty factors, the effect is to ensure that the product of Q and K is not too large.

The Self-intent outputs the processed data to an Add & Nor layer, the Add adds the input and output of the Self-intent layer, nor normalizes the added result so that the Self-intent layer outputs a word vector list with a mean value of 0 and a variance of 1, and the outputted word vector list is processed by the Feed-word and Add & Nor layer to obtain a new word vector.

And (5) performing matrix operation by a plurality of encoders to obtain a final text feature vector. The CRF classifier calculates the value score of each possible knowledge triplet sequence based on the text feature vector, and outputs the triplet sequence with the highest value as the extraction result of the network security text. The way the CRF calculates the value is as follows:

wherein L (y) ₁ ,…y _m ) To calculate the total value, b (y ₁ ) And e (y) _m ) The value s of the initial state and the end state respectively _t (y _t ) Is that the entity label is y _t Value at time, T (y) _t ,y _t+1 ) For the entity tag y _t State transition to entity tag y _t+1 The value of the state.

By the method, the key technology of network security knowledge extraction can be realized, and a knowledge triplet is provided for subsequent network security test knowledge graph construction

And step S103, when the model described in the step S102 is utilized to extract the network security knowledge triples, parameters such as training times, batch size, learning rate, discarding rate, optimization function and the like of the model are adjusted according to the accuracy, recall rate and F1 value of the experimental result, so that the final experimental result is optimal, and the training error of the model is ensured to be converged rapidly. And finally, extracting knowledge triples in the unlabeled network security text by using the trained model.

step S201, according to the priori knowledge, the storage form of the extracted network security knowledge triples in Neo4j is determined. If the vulnerability name is included, the vulnerability score > belongs to the category < node, attribute value >, < tool name, attack tool, vulnerability name > belongs to the category < node, relationship, node >. For the knowledge triples stored in the form of < nodes, relations, nodes >, each type of entity is a type of node in the Neo4j database, and the relations between the two types of entities are the connection relations between the nodes. For a knowledge triplet stored in a form of < node, attribute and attribute value >, the host entity is a type of node in the Neo4j database, the guest entity is the attribute value contained in the node, and the relationship between the two types of entities is the relationship between the node and the attribute value.

Step S202, after determining the node type, attribute type and relationship type in step S201, each central node is first created by using a CREATE command, and based on the node, the knowledge triples of the < node, attribute value > type are stored into the Neo4j database in the form of "node (attribute 1: attribute value 1, attribute 2: attribute value 2, … …)". Then, a node association relation is created, and the nodes with the association relation are connected in a node-relation-node mode, so that the warehousing operation of the knowledge triples of the types of the nodes, the relations and the nodes is realized.

After step S200, the construction of the network security test knowledge graph is implemented. At this time, step S300 will query the constructed knowledge graph for information related to the network security test by commands such as MATCH, WHERE, RETURN in the Cypher language. The query method of step S300 includes:

1. and inquiring node information. Inputting entity names, setting query conditions by using a WHERE command, matching nodes which are the same as the input entity names in a network security test knowledge graph by using a MATCH command, and returning to meet the query conditions by using a RETURN command

Entity information. When information inquiry is carried out, the accurate inquiry mode can be adopted to inquire nodes with the same names as the input entity, and the fuzzy inquiry mode can also be adopted to inquire the nodes containing the names of the input entity in the network security test knowledge graph.

2. And inquiring the node path. The entity name and the path name are input, the node is found in the network security test knowledge graph according to the input entity name, and then other nodes associated with the node are matched according to the input path name. The node path queries include one-hop queries and multi-hop queries. One-hop queries, i.e., queries nodes directly associated with the input node, and multi-hop queries, i.e., queries nodes indirectly associated with the input node through multiple paths. The node path query may return all node information and path information on the entire path.

The step S400 may generate a corresponding network security test scheme according to the network security test knowledge acquired in the step S300. The step S400 specifically includes:

and S401, loading a network security test scheme template by using a Python-docx plug-in. The Python-docx plug-in can analyze and obtain the information of the primary title, the secondary title, the tertiary title and the like in the network test scheme template, and the specific content comprises a tested object, a test method and a test tool.

Step S402, after the template frame information of the network security test scheme is obtained in step S401, related information can be queried in the network security test knowledge graph according to the test task and filled into the template frame. For a known tested object, node information query can be utilized to input the name of the tested object, and information such as vulnerability and vulnerability of the tested object can be obtained. And then inquiring what tools can perform corresponding network security tests aiming at the vulnerabilities and the vulnerabilities by utilizing node path, and using methods of the tools.

When corresponding test information is queried, a mapping form of a template title and a query result is established in the form of a dictionary, wherein the title in the template is a key in the dictionary, the queried corresponding information is a value corresponding to the title key, and the nested relation in the dictionary is the relation among a primary title, a secondary title and a tertiary title in the network test scheme template.

Step S403, after the dictionary mapping of the template title and the test content is obtained in step S402, the keys and the corresponding values in the dictionary are required to be sequentially changed into the corresponding test outline and the corresponding test rule, and the corresponding network security test scheme is generated by using the Python-docx plugin. The key of the outermost layer is the 1-level title of the network security test scheme, the key of the second layer is the 2-level title, and so on.

By using the network security test scheme generated in step S400, a tester can perform network security test according to the information of the test tool, the test method, the test target, and the like in the scheme without excessive knowledge storage.

Those skilled in the art will appreciate that the systems, apparatus, and their respective modules provided herein may be implemented entirely by logic programming of method steps such that the systems, apparatus, and their respective modules are implemented as logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc., in addition to the systems, apparatus, and their respective modules being implemented as pure computer readable program code. Therefore, the system, the apparatus, and the respective modules thereof provided by the present invention may be regarded as one hardware component, and the modules included therein for implementing various programs may also be regarded as structures within the hardware component; modules for implementing various functions may also be regarded as being either software programs for implementing the methods or structures within hardware components.

The foregoing describes specific embodiments of the present application. It is to be understood that the application is not limited to the particular embodiments described above, and that various changes or modifications may be made by those skilled in the art within the scope of the appended claims without affecting the spirit of the application. The embodiments of the application and the features of the embodiments may be combined with each other arbitrarily without conflict.

Claims

1. The network security testing method based on the knowledge graph is characterized by comprising the following steps of:

step S1: extracting a knowledge triplet from the network security domain text;

step S4: loading a network security test scheme template, and generating a network security test scheme by using the queried information;

in the step S1:

analyzing the network security text, determining entity types to be extracted, and labeling entity relation triples existing in each text, wherein the labeling forms are as follows: a host entity, a relationship, a guest entity; taking the labeling text as training test data of the model;

for the model for extracting the knowledge triples, adjusting model parameters according to training results, and training the model for multiple times to enable the accuracy of knowledge extraction to reach preset requirements; the adjustable model parameters comprise training times, batch size, learning rate, discarding rate and optimization function; extracting knowledge triples in unlabeled network security texts by using the trained model;

in the step S4:

Converting keys and values in the dictionary into corresponding test outline and test rules, and generating a corresponding network security test scheme;

after template frame information of a network security test scheme is obtained, relevant information is queried in a network security test knowledge graph according to a test task and filled into the template frame, and for a known tested object, the name of the tested object is input by utilizing node information query to obtain vulnerability and vulnerability information of the tested object; inquiring tools capable of carrying out corresponding network security tests aiming at vulnerabilities and vulnerabilities by utilizing node path inquiry, and a using method of the tools;

when corresponding test information is queried, a mapping form of a template title and a query result is established in the form of a dictionary, wherein the title in the template is a key in the dictionary, the queried corresponding information is a value corresponding to the title key, and the nesting relationship in the dictionary is the relationship among a primary title, a secondary title and a tertiary title in the network test scheme template;

after dictionary mapping of the template title and the test content is obtained, sequentially traversing the keys and the corresponding values in the dictionary, converting the keys and the values into corresponding test outline and test detail, generating a corresponding network security test scheme by using a Python-docx plug-in, wherein the key at the outermost layer is a 1-level title of the network security test scheme, and the key at the second layer is a 2-level title.

2. The network security testing method based on the knowledge-graph according to claim 1, wherein in the step S2:

3. The network security testing method based on the knowledge-graph according to claim 1, wherein in the step S3:

4. A knowledge-graph-based network security testing system, comprising:

module M4: loading a network security test scheme template, and generating a network security test scheme by using the queried information;

in the module M1:

in the module M4:

5. The knowledge-graph-based network security test system of claim 4, wherein in said module M2:

6. The knowledge-graph-based network security test system of claim 4, wherein in said module M3: