CN115860117B - MDTA knowledge extraction method and system based on attack and defense behaviors - Google Patents

MDTA knowledge extraction method and system based on attack and defense behaviors Download PDF

Info

Publication number
CN115860117B
CN115860117B CN202310149931.7A CN202310149931A CN115860117B CN 115860117 B CN115860117 B CN 115860117B CN 202310149931 A CN202310149931 A CN 202310149931A CN 115860117 B CN115860117 B CN 115860117B
Authority
CN
China
Prior art keywords
attack
data
semantic understanding
feature vector
matrix
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310149931.7A
Other languages
Chinese (zh)
Other versions
CN115860117A (en
Inventor
贾焰
方滨兴
顾钊铨
魏松漩
张欢
廖清
闫昊
杜磊
高翠芸
罗文坚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Graduate School Harbin Institute of Technology
Original Assignee
Shenzhen Graduate School Harbin Institute of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Graduate School Harbin Institute of Technology filed Critical Shenzhen Graduate School Harbin Institute of Technology
Priority to CN202310149931.7A priority Critical patent/CN115860117B/en
Publication of CN115860117A publication Critical patent/CN115860117A/en
Application granted granted Critical
Publication of CN115860117B publication Critical patent/CN115860117B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Computer And Data Communications (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses an MDTA knowledge extraction method and system based on attack and defense actions, wherein the method is based on an artificial intelligence technology and a natural language processing technology of deep learning so as to record attack and defense action data in network attack and defense exercises, perform joint analysis on the attack and defense data of both attack and defense parties, remove all invalid attack steps, and extract all valid attack steps as MDTA knowledge to construct a network security knowledge base. Therefore, the space-time characteristics in the attack process of the attacker are extracted from comprehensive and rich attack and defense behavior data, and the effectiveness of knowledge extraction is improved.

Description

MDTA knowledge extraction method and system based on attack and defense behaviors
Technical Field
The present disclosure relates to the field of knowledge extraction technologies, and in particular, to an MDATA knowledge extraction method and system based on attack and defense actions.
Background
MDTA is a knowledge representation model, which can effectively represent the time-space attribute of network attack, so that storing network security knowledge as MDTA knowledge base is a practical method for constructing network security knowledge graph. In order to acquire knowledge in the field of network security and construct a network security knowledge base, conventional methods often extract attack knowledge from network security text data, which includes unstructured and semi-structured data such as security bulletins, technical blogs, CVE vulnerability bases, and the like.
In order to extract attack knowledge from unstructured data such as security bulletins, technical blogs and the like, the existing method uses natural language processing technology to acquire entities (such as intrusion Indicators (IOCs), attack types, assets, vulnerabilities and the like) in text and relationships (utilization, inclusion and the like) between the entities. For example, icace uses a web crawler to crawl technical blogs from blogs websites and uses a preprocessor to eliminate extraneous content, and then uses a related content picker and relationship checker to extract IOCs and their contextual terms, which are converted to a standard IOCs format. However, the knowledge extracted by this method is relatively simple and does not have spatiotemporal properties; TTPDrill defines an ontology according to ATT & CK and CAPEC, then crawls technical articles from a blog website, deletes irrelevant pages, clears irrelevant contents, uses NLP part-of-speech analysis to generate candidate threat actions, calculates similarity between the candidate threat actions and the ontology by using TF-IDF (information retrieval technology) weighted based on BM-25, and maps the candidate actions into the ontology when the similarity is higher than a certain threshold. While the knowledge extracted by this method has temporal properties, it does not have spatial properties; chainSmith first crawls articles in a web page using a web crawler, and then sequentially classifies IOCs into specific attack phases using an expression detector, a parser, a semantic analyzer, named entity recognition, and an IOCs classifier. However, the knowledge extracted by this method is also relatively simple (only IOCs) and has no spatial properties; the extrator represents an attack technique of an attacker by regularizing a security report (synonym transformation, active-passive transformation and the like), resolving (subject matter complement, noun disambiguation and the like), summarizing (removing redundant sentences and words), and generating a graph (semantic role marking and causal inference) to construct a traceable graph. Although the attack knowledge extracted by the method is rich and contains time attributes, the attack spatial attributes are not contained.
From the above methods, it can be seen that, due to the diversity of natural language and the more technical terms in the network security field, it is extremely difficult to extract structured data from unstructured text data, the extracted network security knowledge is simpler and does not contain the spatial features of network attacks, and the spatial features are an important feature of network attacks, and can well represent network attacks so as to better help the detection of network attacks.
The same problem is faced with extracting attack knowledge from semi-structured text data such as a CVE vulnerability database and an ATT & CK matrix, and the CVE vulnerability database emphasizes the knowledge such as the influence, repair and the like of the CVE vulnerability, but cannot show in detail what spatial characteristics an attacker has in the process of completing the attack purpose step by utilizing the vulnerability. The ATT & CK matrix emphasizes single-step attack classification and utilization stage, and cannot represent the spatial characteristics of the attack.
Based on the defects of the prior method, a method for extracting MDTA knowledge from attack and defense behavior data is expected, and the attack and defense behavior data is comprehensive and rich, so that the space-time characteristics of an attacker in the attack process can be expressed, and rich MDTA knowledge containing the space-time characteristics can be extracted from the attack and defense behavior data, so that the defects of the prior method are overcome.
Disclosure of Invention
The present application has been made in order to solve the above technical problems. The embodiment of the application provides an MDTA knowledge extraction method and system based on attack and defense behaviors, which are based on an artificial intelligence technology and a natural language processing technology of deep learning so as to record attack and defense behavior data in network attack and defense exercises, perform joint analysis on the attack and defense data of both attack and defense parties, remove all invalid attack steps, and extract all valid attack steps as MDTA knowledge to construct a network security knowledge base. Therefore, the space-time characteristics in the attack process of the attacker are extracted from comprehensive and rich attack and defense behavior data, and the effectiveness of knowledge extraction is improved.
According to one aspect of the present application, there is provided an MDATA knowledge extraction method based on attack and defense actions, including: constructing a target network in a network target range to simulate a network environment, and setting an attacker and a defender in the target network; recording first attack data of the attacker and first defending data of the defender; performing joint analysis on the first attack data and the first defending data to obtain a joint analysis result, wherein the joint analysis result is used for indicating whether the attack of the attacker is effective; and extracting network security data knowledge based on the joint analysis result.
In the above MDATA knowledge extraction method based on attack and defense actions, the performing joint analysis on the first attack data and the first defense data to obtain a joint analysis result, where the joint analysis result is used to indicate whether the attack of the attacker is effective, and includes: passing the first attack data through a first context encoder comprising an embedded layer to obtain an attack data semantic understanding feature vector; passing the first defending data through a second context encoder comprising an embedded layer to obtain defending data semantic understanding feature vectors; performing association coding on the attack data semantic understanding feature vector and the defending data semantic understanding feature vector to obtain a joint representation matrix; based on the attack data semantic understanding feature vector and the defending data semantic understanding feature vector, performing class center offset correction on the joint representation matrix to obtain an optimized joint representation matrix; and passing the optimized joint representation matrix through a classifier to obtain a classification result, wherein the classification result is used for representing whether the attack of the attacker is effective.
In the above method for extracting MDATA knowledge based on attack and defense actions, the step of passing the first attack data through a first context encoder including an embedded layer to obtain an attack data semantic understanding feature vector includes: the first attack data passes through an embedding layer to convert the first attack data into a plurality of attack embedding vectors to obtain a sequence of attack embedding vectors, wherein the embedding layer uses a learnable embedding matrix to carry out embedded coding on the first attack data; inputting the sequence of attack embedded vectors into the first context encoder to obtain a plurality of attack semantic feature vectors; and cascading the plurality of attack semantic feature vectors to obtain the attack data semantic understanding feature vector.
In the above MDATA knowledge extraction method based on attack and defense actions, the inputting the sequence of the attack embedded vector into the first context encoder to obtain a plurality of attack semantic feature vectors includes: arranging the sequence of attack embedded vectors into an input vector; respectively converting the input vector into a query vector and a key vector through a learning embedding matrix; calculating the product between the query vector and the transpose vector of the key vector to obtain a self-attention correlation matrix; carrying out standardization processing on the self-attention association matrix to obtain a standardized self-attention association matrix; inputting the standardized self-attention association matrix into a Softmax activation function to activate so as to obtain a self-attention feature matrix; and multiplying the self-attention feature matrix with each attack embedded vector in the sequence of attack embedded vectors as a value vector to obtain the plurality of attack semantic feature vectors.
In the above MDATA knowledge extraction method based on attack and defense actions, the performing association coding on the attack data semantic understanding feature vector and the defending data semantic understanding feature vector to obtain a joint representation matrix includes: performing association coding on the attack data semantic understanding feature vector and the defending data semantic understanding feature vector by using the following formula to obtain a joint representation matrix; wherein, the formula is:
Figure SMS_1
wherein
Figure SMS_2
Transpose vector representing the attack data semantic understanding feature vector,/->
Figure SMS_3
Representing the defending data semantic understanding feature vector, < ->
Figure SMS_4
Representing the joint representation matrix,/->
Figure SMS_5
Representing matrix multiplication.
In the above MDATA knowledge extraction method based on attack and defense actions, the performing class center offset correction on the joint representation matrix based on the attack data semantic understanding feature vector and the defense data semantic understanding feature vector to obtain an optimized joint representation matrix includes: performing topology-class center optimization of class nodes on the attack data semantic understanding feature vector and the defending data semantic understanding feature vector to obtain an optimized feature matrix; and multiplying the optimized feature matrix with the joint representation matrix to obtain the optimized joint representation matrix.
In the above MDATA knowledge extraction method based on attack and defense actions, the performing topology-class center optimization of class nodes on the attack data semantic understanding feature vector and the defending data semantic understanding feature vector to obtain an optimized feature matrix includes: performing topology-class center optimization of class nodes on the attack data semantic understanding feature vector and the defending data semantic understanding feature vector by using the following formula to obtain an optimized feature matrix; wherein, the formula is:
Figure SMS_6
wherein ,
Figure SMS_8
representing the attack data semantic understanding feature vector, < ->
Figure SMS_11
Representing the defending data semantic understanding feature vector, < ->
Figure SMS_14
Representing the optimization featuresSymptom matrix, < >>
Figure SMS_9
and />
Figure SMS_12
Kronecker product and Hadamard product, respectively representing matrix (vector), are +.>
Figure SMS_15
Is a feature vector +>
Figure SMS_17
and />
Figure SMS_10
Distance matrix between, i.e.)>
Figure SMS_13
And->
Figure SMS_16
and />
Figure SMS_18
Are column vectors, +.>
Figure SMS_7
An exponential operation representing a vector that represents a calculation of a natural exponential function value that is a power of a eigenvalue of each position in the vector.
In the above MDATA knowledge extraction method based on attack and defense actions, the step of passing the optimized joint representation matrix through a classifier to obtain a classification result, where the classification result is used for indicating whether an attack of an attacker is effective, includes: expanding the optimized joint representation matrix into classification feature vectors according to row vectors or column vectors; performing full-connection coding on the classification feature vectors by using a full-connection layer of the classifier to obtain coded classification feature vectors; and inputting the coding classification feature vector into a Softmax classification function of the classifier to obtain the classification result.
According to another aspect of the present application, there is provided an MDATA knowledge extraction system based on attack and defense actions, including: the environment construction module is used for constructing a target network in the network target range so as to simulate the network environment and setting an attacker and a defender in the target network; the data recording module is used for recording first attack data of the attacker and first defending data of the defender; the attack and defending party joint analysis module is used for carrying out joint analysis on the first attack data and the first defending data to obtain a joint analysis result, and the joint analysis result is used for indicating whether the attack of the attack party is effective or not; and the knowledge extraction module is used for carrying out network security data knowledge extraction based on the joint analysis result.
In the above MDATA knowledge extraction system based on attack and defense actions, the attack and defense party joint analysis module includes: the attack party semantic understanding unit is used for enabling the first attack data to pass through a first context encoder comprising an embedded layer to obtain attack data semantic understanding feature vectors; the defending part semantic understanding unit is used for enabling the first defending data to pass through a second context encoder comprising an embedded layer to obtain defending data semantic understanding feature vectors; the association coding unit is used for carrying out association coding on the attack data semantic understanding feature vector and the defending data semantic understanding feature vector to obtain a joint representation matrix; the class center offset correction unit is used for carrying out class center offset correction on the joint representation matrix based on the attack data semantic understanding feature vector and the defending data semantic understanding feature vector so as to obtain an optimized joint representation matrix; and the validity judging unit is used for enabling the optimized joint representation matrix to pass through a classifier to obtain a classification result, wherein the classification result is used for indicating whether the attack of the attacker is valid or not.
In the above MDATA knowledge extraction system based on attack and defense actions, the attack semantic understanding unit includes: an embedding subunit, configured to pass the first attack data through an embedding layer to convert the first attack data into a plurality of attack embedding vectors to obtain a sequence of attack embedding vectors, where the embedding layer uses a learnable embedding matrix to perform embedded encoding on the first attack data; a context encoding subunit, configured to input the sequence of attack embedding vectors into the first context encoder to obtain a plurality of attack semantic feature vectors; and the cascading subunit is used for cascading the plurality of attack semantic feature vectors to obtain the attack data semantic understanding feature vector.
In the above MDATA knowledge extraction system based on attack and defense actions, the context encoding subunit is further configured to: arranging the sequence of attack embedded vectors into an input vector; respectively converting the input vector into a query vector and a key vector through a learning embedding matrix; calculating the product between the query vector and the transpose vector of the key vector to obtain a self-attention correlation matrix; carrying out standardization processing on the self-attention association matrix to obtain a standardized self-attention association matrix; inputting the standardized self-attention association matrix into a Softmax activation function to activate so as to obtain a self-attention feature matrix; and multiplying the self-attention feature matrix with each attack embedded vector in the sequence of attack embedded vectors as a value vector to obtain the plurality of attack semantic feature vectors.
In the above MDATA knowledge extraction system based on attack and defense actions, the association coding unit is further configured to: performing association coding on the attack data semantic understanding feature vector and the defending data semantic understanding feature vector by using the following formula to obtain a joint representation matrix; wherein, the formula is:
Figure SMS_19
wherein
Figure SMS_20
Transpose vector representing the attack data semantic understanding feature vector,/->
Figure SMS_21
Representing the defending data semantic understanding feature vector, < ->
Figure SMS_22
Representing the joint representation matrix,/->
Figure SMS_23
Representing matrix multiplication.
In the above MDATA knowledge extraction system based on attack and defense actions, the center-like offset correction unit includes: the optimizing subunit is used for carrying out topology-class center optimization of class nodes on the attack data semantic understanding feature vector and the defending data semantic understanding feature vector so as to obtain an optimized feature matrix; and an application subunit, configured to perform matrix multiplication on the optimized feature matrix and the joint representation matrix to obtain the optimized joint representation matrix.
In the above MDATA knowledge extraction system based on attack and defense actions, the optimizing subunit is further configured to: performing topology-class center optimization of class nodes on the attack data semantic understanding feature vector and the defending data semantic understanding feature vector by using the following formula to obtain an optimized feature matrix; wherein, the formula is:
Figure SMS_24
wherein ,
Figure SMS_27
representing the attack data semantic understanding feature vector, < ->
Figure SMS_31
Representing the defending data semantic understanding feature vector, < - >
Figure SMS_33
Representing the optimized feature matrix,>
Figure SMS_28
and />
Figure SMS_29
Kronecker product and Hadamard product, respectively representing matrix (vector), are +.>
Figure SMS_32
Is a feature vector +>
Figure SMS_35
and />
Figure SMS_25
Distance matrix between, i.e.)>
Figure SMS_30
And->
Figure SMS_34
and />
Figure SMS_36
Are column vectors, +.>
Figure SMS_26
An exponential operation representing a vector that represents a calculation of a natural exponential function value that is a power of a eigenvalue of each position in the vector.
In the above MDATA knowledge extraction system based on attack and defense actions, the validity judging unit is further configured to: expanding the optimized joint representation matrix into classification feature vectors according to row vectors or column vectors; performing full-connection coding on the classification feature vectors by using a full-connection layer of the classifier to obtain coded classification feature vectors; and inputting the coding classification feature vector into a Softmax classification function of the classifier to obtain the classification result.
According to still another aspect of the present application, there is provided an electronic apparatus including: a processor; and a memory having stored therein computer program instructions that, when executed by the processor, cause the processor to perform the attack-and-defense behavior-based MDATA knowledge extraction method as described above.
According to yet another aspect of the present application, there is provided a computer readable medium having stored thereon computer program instructions which, when executed by a processor, cause the processor to perform the attack-and-defense-behavior-based MDATA knowledge extraction method as described above.
Compared with the prior art, the MDTA knowledge extraction method and the system based on the attack and defense behaviors are based on the artificial intelligence technology and the natural language processing technology of deep learning, so that the attack and defense behavior data are recorded in the network attack and defense exercise, the attack and defense data of both attack and defense parties are subjected to joint analysis, all invalid attack steps are removed, and all effective attack steps are extracted to serve as MDTA knowledge to construct a network security knowledge base. Therefore, the space-time characteristics in the attack process of the attacker are extracted from comprehensive and rich attack and defense behavior data, and the effectiveness of knowledge extraction is improved.
Drawings
The foregoing and other objects, features and advantages of the present application will become more apparent from the following more particular description of embodiments of the present application, as illustrated in the accompanying drawings. The accompanying drawings are included to provide a further understanding of embodiments of the application and are incorporated in and constitute a part of this specification, illustrate the application and not constitute a limitation to the application. In the drawings, like reference numerals generally refer to like parts or steps.
Fig. 1 is a flowchart of an MDATA knowledge extraction method based on attack and defense actions according to an embodiment of the present application.
Fig. 2 is a schematic diagram of a network topology of an attack and defense exercise according to an embodiment of the present application.
Fig. 3 is a flowchart of performing joint analysis on the first attack data and the first defense data in the MDATA knowledge extraction method based on the attack and defense behavior according to an embodiment of the present application to obtain a joint analysis result.
Fig. 4 is a schematic diagram of a joint analysis of the first attack data and the first defense data in the MDATA knowledge extraction method based on the attack and defense behavior according to an embodiment of the present application to obtain a joint analysis result.
Fig. 5 is a block diagram of an MDATA knowledge extraction system based on offensive and defensive actions according to an embodiment of the present application.
Fig. 6 is a block diagram of a offender joint analysis module in an MDATA knowledge extraction system based on offensive and defensive actions according to an embodiment of the present application.
Fig. 7 is a block diagram of an electronic device according to an embodiment of the present application.
Detailed Description
Hereinafter, example embodiments according to the present application will be described in detail with reference to the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application and not all of the embodiments of the present application, and it should be understood that the present application is not limited by the example embodiments described herein.
Summary of the application: the network target range is an important infrastructure for performing network attack and defense exercises, and can provide a high-fidelity exercise environment for the network attack and defense exercises. In the process of attacking and defending exercise, red and blue parties participate in attacking and defending, wherein the red party is responsible for attacking the target, and the blue party is responsible for defending the attack. In the attack and defense process, the network target range can monitor the actions of both red and blue parties, and record various data in the attack and defense process, including the attack actions of the red party, whether the attack is successful, whether the defense method of the blue party is effective, and the like, so that the attack and defense success process, the attack technique, the defense method, and the like can be restored through the data, and the attack/defense methods are important knowledge in the network security knowledge base.
Based on the above, an MDTA knowledge extraction method based on attack and defense behaviors in the field of network security is provided, and the knowledge is extracted from attack and defense behavior data in a network target range to construct a network security knowledge base.
Before the attack and defense exercise is performed, a target network and a simulated network environment are firstly required to be built in a network target range, and fig. 2 is a network topology structure diagram of one attack and defense exercise, wherein a red party and a blue party respectively control a certain number of hosts and form two sets of systems, and the red party uses the hosts (also called attack machines) to initiate attack to the hosts (also called target machines) of the blue party through the network. In the process, the system used by the red party can record single-step attacks initiated by the red party, the time of the single-step attacks, the source IP address, the destination IP address and other data, but cannot acquire the specific actual effects generated by the single-step attacks (namely whether the attacks are successful or not); in the blue-side system, the actual influence caused by the attack (such as scanned port, stolen data, remote login of the attacker, etc.) can be recorded, but the specific attack type adopted by the attacker cannot be known accurately. If the data recorded by the red and blue two-party system are subjected to joint analysis, the situation of which attack steps, whether the attack steps are effective or not and the source IP address and the destination IP address of the attack are used by an attacker in the process of completing the attack purpose can be accurately known, wherein the effective attack steps reflect the time attribute of the attack, the distribution of the source IP address and the destination IP address reflect the space attribute of the attack, and the knowledge containing the time-space attribute can be used for constructing an MDTA knowledge base.
The invention provides an MDTA knowledge extraction method based on attack and defense actions in the field of network security. Specifically, the method is divided into the following steps:
step 1: and developing network attack and defense exercise. Firstly, a target network is built in a network target range, a network environment is simulated, and a host in the network is divided into two systems for red staff and blue staff to use respectively. Then, the red and blue parties are organized to develop network attack and defense activities, an attack target (such as data stealing, system damage and the like) is set for the red party, the red party can adopt various attack methods (by combining different single-step attack methods) to complete the attack target, and the attack knowledge to be extracted is attack steps which can be adopted by an attacker to complete the attack target, and the combination of the attack steps is called an attack strategy.
Step 2: and recording attack and defense behavior data. And recording the attack and defense behavior data in the attack and defense exercise process. In the red party system, single-step attack information adopted by red party personnel, such as the type of single-step attack, the time of attack initiation, the source IP address and the destination IP address of the attack and the like, can be recorded in a system log and the like. In the blue-side system, security events (such as port scan, remote user login, permission change, data theft, etc.) of the system and the time of occurrence of the events can also be recorded by a system log mode.
Step 3: and (5) joint analysis of attack and defense data. The single-step attack data obtained from the red party system cannot indicate whether the single-step attack is successful or not, and if the single-step attack steps are simply organized in time sequence and stored in the MDATA knowledge base as attack strategies, inaccuracy of attack knowledge can be caused, because invalid attack steps may be included in the single-step attack data, which affects the accuracy of attack detection based on the MDATA knowledge base. The data recorded in the blue-side system only contains the influence of the attack on the system, and cannot reflect the attack manipulation of an attacker. In order to accurately acquire the effective attack steps adopted by an attacker to finish an attack target, the attack and defense data of the red and blue parties are subjected to joint analysis.
Step 4: and (5) knowledge extraction. And for an attacker who completes the final attack target, removing all invalid attack steps, and extracting all valid attack steps to be used as MDTA knowledge to construct a network security knowledge base. To represent the time attributes of the attack knowledge, these effective single-step attacks are launched according to the time
Figure SMS_37
Ordering is performed, so that an attack strategy is composed. For spatial attributes, specific IP addresses are not stored for generalization, but the relationships of these valid attack steps on the source IP address and the destination IP address are extracted. By the method, the extracted knowledge has space-time attribute, and an attack method which can be adopted for completing an attack target can be better identified, so that multi-step attack detection is better carried out in a real network. / >
In the technical scheme of the application, first, semantic understanding is carried out on the first attack data and the first defending data to capture attack behavior characteristics represented by the first attack data in a high-dimensional feature space and defending behavior characteristics represented by the first defending data in the high-dimensional feature space; further, in the high-dimensional feature space, joint analysis is performed based on the high-dimensional semantic features of the first attack data and the high-dimensional semantic features of the first defending data to obtain a joint representation matrix, and class probability tags to which the joint representation matrix belongs are determined through a classifier, wherein the class probability tags comprise attack validity (first tag) of an attacker and attack invalidity (second tag) of the attacker.
Specifically, in the technical scheme of the application, after the first attack data and the first defending data are obtained, the first attack data are passed through a first context encoder comprising an embedded layer to obtain attack data semantic understanding feature vectors, and the first defending data are passed through a second context encoder comprising an embedded layer to obtain defending data semantic understanding feature vectors. That is, in the technical solution of the present application, the first context encoder and the second context encoder are context encoders based on a converter model, which perform global context semantic understanding on the first attack data and the first defense data based on a converter concept to obtain the attack data semantic understanding feature vector and the defense data semantic understanding feature vector.
And carrying out association coding on the attack data semantic understanding feature vector and the defending data semantic understanding feature vector to obtain a joint representation matrix. In a specific example of the present application, the product between the transpose vector of the attack data semantic understanding feature vector and the defending data semantic understanding feature vector is calculated to obtain the joint representation matrix, that is, the joint representation matrix is obtained by calculating a position-by-position response between the attack data semantic understanding feature vector and the defending data semantic understanding feature vector. And then, the joint representation matrix passes through a classifier to obtain a classification result, wherein the classification result is used for indicating whether the attack of an attacker is effective, that is, the class probability label to which the joint representation matrix belongs is determined by using the classifier.
In the technical scheme of the application, when the attack data semantic understanding feature vector and the defending data semantic understanding feature vector are subjected to associated coding to obtain a joint representation matrix, the attack data semantic understanding feature vector and the defending data semantic understanding feature vector respectively represent the context semantic features of the first attack data and the first defending data, so that the attack data semantic understanding feature vector deviates from the classification probability representation of the defending data semantic understanding feature vector on the classification probability representation, that is, the attack data semantic understanding feature vector and the defending data semantic understanding feature vector have class center deviation, and the accuracy of classification results of the joint representation matrix obtained by carrying out associated coding on the attack data semantic understanding feature vector and the defending data semantic understanding feature vector is affected.
Therefore, it is preferable to understand feature vectors semantically for the attack data
Figure SMS_38
And said defensive data semantic understanding feature vector +.>
Figure SMS_39
Topology-class center optimization of class nodes is performed, expressed as:
Figure SMS_40
Figure SMS_42
and />
Figure SMS_44
Kronecker product and Hadamard product, respectively representing matrix (vector), are +.>
Figure SMS_46
Is a feature vector +>
Figure SMS_41
and />
Figure SMS_45
Distance matrix between, i.e.)>
Figure SMS_47
And->
Figure SMS_48
and />
Figure SMS_43
Are column vectors.
In particular, in the classification problem of the classifier, if the attack data is semantically understood to feature vectors
Figure SMS_51
And said defensive data semantic understanding feature vector +.>
Figure SMS_53
The optimized class node is expressed in a tree form, and the attack data semantically understand the feature vector +.>
Figure SMS_55
And said defensive data semantic understanding feature vector +.>
Figure SMS_50
The respective class nodes are distributed as subtrees based on the root node, so that the node distribution of the optimized class nodes can be expressed as a sub-graph structure centered on the respective node based on the graph topology by utilizing the graph topology associated between the nodes, thereby expressing the semantic understanding feature vector of the attack data
Figure SMS_52
And said defensive data semantic understanding feature vector +.>
Figure SMS_56
The respective class node is the subtree structure of the root to realize the attack data semantic understanding feature vector +.>
Figure SMS_58
And said defensive data semantic understanding feature vector +. >
Figure SMS_49
Is based on node-centric topology optimization,thereby eliminating the attack data semantic understanding feature vector +.>
Figure SMS_54
And said defensive data semantic understanding feature vector +.>
Figure SMS_57
Center-of-class offset between.
Furthermore, the feature matrix will be optimized
Figure SMS_59
With said joint representation matrix, e.g. denoted +.>
Figure SMS_60
Multiplying to multiply the joint representation matrix +.>
Figure SMS_61
Mapping into an optimized feature space that eliminates class center offset, thus, the joint representation matrix can be lifted +.>
Figure SMS_62
Is used for classifying the classification result.
It is worth mentioning that in other examples of the present application, other methods may also be used for joint analysis. For example, use is made of
Figure SMS_64
A single step attack representing the recording of the red party system, wherein +.>
Figure SMS_67
Is a type of single step attack, +.>
Figure SMS_69
and />
Figure SMS_65
Source IP address and destination IP address of attack, respectively,/-for each of the attacks>
Figure SMS_66
Is the time at which a single step attack was initiated. Using
Figure SMS_68
Indicating the time at which a security event observed in the blue party system occurred. For each single step attack recorded in the red party system, we look for security events occurring in close time in the host of the corresponding IP address of the blue party system if it is possible to delay +.>
Figure SMS_70
Find the corresponding security event, i.e. there is a time of occurrence of a security event
Figure SMS_63
The single step attack of the attacker is effective, otherwise, the defense of the blue party is successful, and the single step attack is ineffective.
Based on this, the application provides an MDATA knowledge extraction method based on attack and defense behaviors, which comprises the following steps: constructing a target network in a network target range to simulate a network environment, and setting an attacker and a defender in the target network; recording first attack data of the attacker and first defending data of the defender; performing joint analysis on the first attack data and the first defending data to obtain a joint analysis result, wherein the joint analysis result is used for indicating whether the attack of the attacker is effective; and extracting network security data knowledge based on the joint analysis result.
Having described the basic principles of the present application, various non-limiting embodiments of the present application will now be described in detail with reference to the accompanying drawings.
An exemplary method is: fig. 1 is a flowchart of an MDATA knowledge extraction method based on attack and defense actions according to an embodiment of the present application. As shown in fig. 1, an MDATA knowledge extraction method based on attack and defense actions according to an embodiment of the present application includes: s110, constructing a target network in a network target range to simulate a network environment, and setting an attacker and a defender in the target network; s120, recording first attack data of the attacker and first defending data of the defender; s130, carrying out joint analysis on the first attack data and the first defending data to obtain a joint analysis result, wherein the joint analysis result is used for indicating whether the attack of the attacker is effective; and S140, extracting network security data knowledge based on the joint analysis result.
The network target range is an important infrastructure for performing network attack and defense exercises, and can provide a high-fidelity exercise environment for the network attack and defense exercises. In the process of attacking and defending exercise, red and blue parties participate in attacking and defending, wherein the red party is responsible for attacking the target, and the blue party is responsible for defending the attack. In the attack and defense process, the network target range can monitor the actions of both red and blue parties, and record various data in the attack and defense process, including the attack actions of the red party, whether the attack is successful, whether the defense method of the blue party is effective, and the like, so that the attack and defense success process, the attack technique, the defense method, and the like can be restored through the data, and the attack/defense methods are important knowledge in the network security knowledge base.
Based on the above, an MDTA knowledge extraction method based on attack and defense behaviors in the field of network security is provided, and the knowledge is extracted from attack and defense behavior data in a network target range to construct a network security knowledge base.
Before the attack and defense exercise is performed, a target network and a simulated network environment are first required to be built in a network shooting range, and fig. 2 is a schematic diagram of a network topology structure of the attack and defense exercise according to an embodiment of the present application, as shown in fig. 2, where a red party and a blue party respectively control a certain number of hosts and form two sets of systems, and the red party uses the hosts (also referred to as attack machines) to launch an attack to the blue party's hosts (also referred to as target machines) through the network. In the process, the system used by the red party can record single-step attacks initiated by the red party, the time of the single-step attacks, the source IP address, the destination IP address and other data, but cannot acquire the specific actual effects generated by the single-step attacks (namely whether the attacks are successful or not); in the blue-side system, the actual influence caused by the attack (such as scanned port, stolen data, remote login of the attacker, etc.) can be recorded, but the specific attack type adopted by the attacker cannot be known accurately. If the data recorded by the red and blue two-party system are subjected to joint analysis, the situation of which attack steps, whether the attack steps are effective or not and the source IP address and the destination IP address of the attack are used by an attacker in the process of completing the attack purpose can be accurately known, wherein the effective attack steps reflect the time attribute of the attack, the distribution of the source IP address and the destination IP address reflect the space attribute of the attack, and the knowledge containing the time-space attribute can be used for constructing an MDTA knowledge base.
The invention provides an MDTA knowledge extraction method based on attack and defense actions in the field of network security.
In step S110, a target network is constructed in a network target range to simulate a network environment, and an attacker and a defender are set in the target network. The target network is built in the network target range, the network environment is simulated, and the host in the network is divided into two systems for red party personnel (i.e. an attacker) and blue party personnel (i.e. a defender) to use respectively. Then, the red and blue parties are organized to develop network attack and defense activities, an attack target (such as data stealing, system damage and the like) is set for the red party, the red party can adopt various attack methods (by combining different single-step attack methods) to complete the attack target, and the attack knowledge to be extracted is attack steps which can be adopted by an attacker to complete the attack target, and the combination of the attack steps is called an attack strategy.
In step S120, first attack data of the attacker and first defending data of the defender are recorded. The first attack data and the first defense data form attack and defense behavior data in the attack and defense exercise process. In the red party system, single-step attack information adopted by red party personnel, such as the type of single-step attack, the time of attack initiation, the source IP address and the destination IP address of the attack and the like, can be recorded in a system log and the like. In the blue-side system, security events (such as port scan, remote user login, permission change, data theft, etc.) of the system and the time of occurrence of the events can also be recorded by a system log mode.
In step S130, the first attack data and the first defending data are subjected to joint analysis to obtain a joint analysis result, where the joint analysis result is used to indicate whether the attack of the attacker is valid. Considering that the single-step attack data obtained from the red party system cannot indicate whether the single-step attack is successful or not, if the single-step attack steps are simply organized in time sequence and stored in the MDATA knowledge base as attack strategies, inaccuracy of attack knowledge is caused, because invalid attack steps may be included in the single-step attack data, which affects the accuracy of attack detection based on the MDATA knowledge base. The data recorded in the blue-side system only contains the influence of the attack on the system, and cannot reflect the attack manipulation of an attacker. In order to accurately acquire the effective attack steps adopted by an attacker to finish an attack target, the attack and defense data of the red and blue parties are subjected to joint analysis.
In the technical scheme of the application, first, semantic understanding is carried out on the first attack data and the first defending data to capture attack behavior characteristics represented by the first attack data in a high-dimensional feature space and defending behavior characteristics represented by the first defending data in the high-dimensional feature space; further, in the high-dimensional feature space, joint analysis is performed based on the high-dimensional semantic features of the first attack data and the high-dimensional semantic features of the first defending data to obtain a joint representation matrix, and class probability tags to which the joint representation matrix belongs are determined through a classifier, wherein the class probability tags comprise attack validity (first tag) of an attacker and attack invalidity (second tag) of the attacker.
Fig. 3 is a flowchart of performing joint analysis on the first attack data and the first defense data in the MDATA knowledge extraction method based on the attack and defense behavior according to an embodiment of the present application to obtain a joint analysis result. As shown in fig. 3, the performing joint analysis on the first attack data and the first defense data to obtain a joint analysis result includes the steps of: s210, enabling the first attack data to pass through a first context encoder comprising an embedded layer to obtain attack data semantic understanding feature vectors; s220, passing the first defending data through a second context encoder comprising an embedded layer to obtain defending data semantic understanding feature vectors; s230, carrying out association coding on the attack data semantic understanding feature vector and the defending data semantic understanding feature vector to obtain a joint representation matrix; s240, based on the attack data semantic understanding feature vector and the defending data semantic understanding feature vector, performing class center offset correction on the joint representation matrix to obtain an optimized joint representation matrix; and S250, enabling the optimized joint representation matrix to pass through a classifier to obtain a classification result, wherein the classification result is used for representing whether the attack of an attacker is effective or not.
Fig. 4 is a schematic diagram of a joint analysis of the first attack data and the first defense data in the MDATA knowledge extraction method based on the attack and defense behavior according to an embodiment of the present application to obtain a joint analysis result. In this architecture, as shown in fig. 4, first, the first attack data is passed through a first context encoder including an embedded layer to obtain an attack data semantic understanding feature vector, and at the same time, the first defending data is passed through a second context encoder including an embedded layer to obtain a defending data semantic understanding feature vector; then, carrying out association coding on the attack data semantic understanding feature vector and the defending data semantic understanding feature vector to obtain a joint representation matrix; then, based on the attack data semantic understanding feature vector and the defending data semantic understanding feature vector, performing class center offset correction on the joint representation matrix to obtain an optimized joint representation matrix; and then, the optimized joint representation matrix passes through a classifier to obtain a classification result, wherein the classification result is used for representing whether the attack of an attacker is effective.
Specifically, in step S210, the first attack data is passed through a first context encoder including an embedded layer to obtain an attack data semantic understanding feature vector. That is, in the technical solution of the present application, the first context encoder is a context encoder based on a converter model, which performs global context semantic understanding on the first attack data based on a converter idea to obtain the attack data semantic understanding feature vector.
Specifically, in the embodiment of the application, first, the first attack data is passed through an embedding layer to convert the first attack data into a plurality of attack embedding vectors to obtain a sequence of attack embedding vectors, wherein the embedding layer uses a learnable embedding matrix to carry out embedded coding on the first attack data; then, inputting the sequence of attack embedded vectors into the first context encoder to obtain a plurality of attack semantic feature vectors; and then cascading the plurality of attack semantic feature vectors to obtain the attack data semantic understanding feature vector.
More specifically, in an embodiment of the present application, the inputting the sequence of attack embedding vectors into the first context encoder to obtain a plurality of attack semantic feature vectors includes: arranging the sequence of attack embedded vectors into an input vector; respectively converting the input vector into a query vector and a key vector through a learning embedding matrix; calculating the product between the query vector and the transpose vector of the key vector to obtain a self-attention correlation matrix; carrying out standardization processing on the self-attention association matrix to obtain a standardized self-attention association matrix; inputting the standardized self-attention association matrix into a Softmax activation function to activate so as to obtain a self-attention feature matrix; and multiplying the self-attention feature matrix with each attack embedded vector in the sequence of attack embedded vectors as a value vector to obtain the plurality of attack semantic feature vectors.
Specifically, in step S220, the first defensive data is passed through a second context encoder including an embedded layer to obtain a defensive data semantic understanding feature vector. Likewise, the second context encoder is a converter model based context encoder that performs a global-based context semantic understanding of the first defensive data based on converter ideas to obtain the defensive data semantic understanding feature vector.
Specifically, in step S230, the attack data semantic understanding feature vector and the defending data semantic understanding feature vector are associated and encoded to obtain a joint representation matrix. In a specific example of the present application, the product between the transpose vector of the attack data semantic understanding feature vector and the defending data semantic understanding feature vector is calculated to obtain the joint representation matrix, that is, the joint representation matrix is obtained by calculating a position-by-position response between the attack data semantic understanding feature vector and the defending data semantic understanding feature vector.
Specifically, in the embodiment of the application, the attack data semantic understanding feature vector and the defending data semantic understanding feature vector are subjected to association coding according to the following formula to obtain a joint representation matrix; wherein, the formula is:
Figure SMS_71
, wherein />
Figure SMS_72
Transpose vector representing the attack data semantic understanding feature vector,/->
Figure SMS_73
Representing the defending data semantic understanding feature vector, < ->
Figure SMS_74
Representing the joint representation matrix,/->
Figure SMS_75
Representing matrix multiplication.
Specifically, in step S240, based on the attack data semantic understanding feature vector and the defending data semantic understanding feature vector, a class center offset correction is performed on the joint representation matrix to obtain an optimized joint representation matrix. Specifically, in the embodiment of the present application, first, topology-class center optimization of class nodes is performed on the attack data semantic understanding feature vector and the defending data semantic understanding feature vector to obtain an optimized feature matrix; and then, carrying out matrix multiplication on the optimized characteristic matrix and the joint representation matrix to obtain the optimized joint representation matrix.
In the technical scheme of the application, when the attack data semantic understanding feature vector and the defending data semantic understanding feature vector are subjected to associated coding to obtain a joint representation matrix, the attack data semantic understanding feature vector and the defending data semantic understanding feature vector respectively represent the context semantic features of the first attack data and the first defending data, so that the attack data semantic understanding feature vector deviates from the classification probability representation of the defending data semantic understanding feature vector on the classification probability representation, that is, the attack data semantic understanding feature vector and the defending data semantic understanding feature vector have class center deviation, and the accuracy of classification results of the joint representation matrix obtained by carrying out associated coding on the attack data semantic understanding feature vector and the defending data semantic understanding feature vector is affected.
Therefore, it is preferable to understand feature vectors semantically for the attack data
Figure SMS_76
And said defensive data semantic understanding feature vector +.>
Figure SMS_77
Topology-class center optimization of class nodes is performed, expressed as:
Figure SMS_78
wherein ,
Figure SMS_81
representing the attack data semantic understanding feature vector, < ->
Figure SMS_84
Representing the defending data semantic understanding feature vector, < ->
Figure SMS_87
Representing the optimized feature matrix,>
Figure SMS_82
and />
Figure SMS_85
Kronecker product and Hadamard product, respectively representing matrix (vector), are +.>
Figure SMS_88
Is a feature vector +>
Figure SMS_90
and />
Figure SMS_79
Distance matrix between, i.e.)>
Figure SMS_83
And->
Figure SMS_86
and />
Figure SMS_89
Are column vectors, +.>
Figure SMS_80
An exponential operation representing a vector that represents a calculation of a natural exponential function value that is a power of a eigenvalue of each position in the vector.
In particular, in the classification problem of the classifier, if the attack data is semantically understood to feature vectors
Figure SMS_93
And said defensive data semantic understanding feature vector +.>
Figure SMS_95
The optimized class node is expressed in a tree form, and the attack data semantically understand the feature vector +.>
Figure SMS_97
And the defending numberSemantic understanding of feature vector->
Figure SMS_92
The respective class nodes are distributed as subtrees based on the root node, so that the node distribution of the optimized class nodes can be expressed as a sub-graph structure centered on the respective node based on the graph topology by utilizing the graph topology associated between the nodes, thereby expressing the semantic understanding feature vector of the attack data
Figure SMS_94
And said defensive data semantic understanding feature vector +.>
Figure SMS_99
The respective class node is the subtree structure of the root to realize the attack data semantic understanding feature vector +.>
Figure SMS_100
And said defensive data semantic understanding feature vector +.>
Figure SMS_91
Node-center-based topology optimization to eliminate the attack data semantic understanding feature vector +.>
Figure SMS_96
And said defensive data semantic understanding feature vector +.>
Figure SMS_98
Center-of-class offset between.
Furthermore, the feature matrix will be optimized
Figure SMS_101
With said joint representation matrix, e.g. denoted +.>
Figure SMS_102
Multiplying to multiply the joint representation matrix +.>
Figure SMS_103
Optimized features mapped to eliminate class center offsetsIn space, the joint representation matrix can thus be lifted +.>
Figure SMS_104
Is used for classifying the classification result.
It is worth mentioning that in other examples of the present application, other methods may also be used for joint analysis. For example, use is made of
Figure SMS_106
A single step attack representing the recording of the red party system, wherein +.>
Figure SMS_108
Is a type of single step attack, +.>
Figure SMS_110
and />
Figure SMS_107
Source IP address and destination IP address of attack, respectively,/-for each of the attacks>
Figure SMS_109
Is the time at which a single step attack was initiated. Use->
Figure SMS_111
Indicating the time at which a security event observed in the blue party system occurred. For each single step attack recorded in the red party system, we look for security events occurring in close time in the host of the corresponding IP address of the blue party system if it is possible to delay +. >
Figure SMS_112
Find the corresponding security event, i.e. there is a time of occurrence of a security event
Figure SMS_105
The single step attack of the attacker is effective, otherwise, the defense of the blue party is successful, and the single step attack is ineffective.
Specifically, in step S250, the optimized joint representation matrix is passed through a classifier to obtain a classification result, where the classification result is used to represent whether the attack of the attacker is effective. That is, the classifier is used to determine class probability tags to which the joint representation matrix belongs. Wherein the classification result is a joint analysis result.
Specifically, in the embodiment of the present application, the passing the optimized joint representation matrix through a classifier to obtain a classification result, where the classification result is used to represent whether the attack of the attacker is effective, includes: expanding the optimized joint representation matrix into classification feature vectors according to row vectors or column vectors; performing full-connection coding on the classification feature vectors by using a full-connection layer of the classifier to obtain coded classification feature vectors; and inputting the coding classification feature vector into a Softmax classification function of the classifier to obtain the classification result.
In step S140, based on the joint analysis result, network security data knowledge extraction is performed. And for an attacker who completes the final attack target, removing all invalid attack steps, and extracting all valid attack steps to be used as MDTA knowledge to construct a network security knowledge base. To represent the time attributes of the attack knowledge, these effective single-step attacks are launched according to the time
Figure SMS_113
Ordering is performed, so that an attack strategy is composed. For spatial attributes, specific IP addresses are not stored for generalization, but the relationships of these valid attack steps on the source IP address and the destination IP address are extracted. By the method, the extracted knowledge has space-time attribute, and an attack method which can be adopted for completing an attack target can be better identified, so that multi-step attack detection is better carried out in a real network.
In summary, the MDATA knowledge extraction method based on the attack and defense actions according to the embodiments of the present application is illustrated, which is based on an artificial intelligence technology and a natural language processing technology of deep learning, so as to record attack and defense action data in network attack and defense exercises, perform joint analysis on the attack and defense data of both attack and defense parties, remove all invalid attack steps, and extract all valid attack steps as MDATA knowledge to construct a network security knowledge base. Therefore, the space-time characteristics in the attack process of the attacker are extracted from comprehensive and rich attack and defense behavior data, and the effectiveness of knowledge extraction is improved.
Exemplary System: fig. 5 is a block diagram of an MDATA knowledge extraction system based on offensive and defensive actions according to an embodiment of the present application. As shown in fig. 5, the MDATA knowledge extraction system 100 based on attack and defense actions according to an embodiment of the present application includes: an environment construction module 110 for constructing a target network in a network target range to simulate a network environment, and setting an attacker and a defender in the target network; a data recording module 120, configured to record first attack data of the attacker and first defending data of the defending party; the attack-defender joint analysis module 130 is configured to perform joint analysis on the first attack data and the first defender data to obtain a joint analysis result, where the joint analysis result is used to indicate whether an attack of the attack-defender is effective; and a knowledge extraction module 140, configured to perform network security data knowledge extraction based on the joint analysis result.
Fig. 6 is a block diagram of a offender joint analysis module in an MDATA knowledge extraction system based on offensive and defensive actions according to an embodiment of the present application. As shown in fig. 6, the attack-side joint analysis module 130 includes: an attack semantic understanding unit 131, configured to pass the first attack data through a first context encoder that includes an embedded layer to obtain an attack data semantic understanding feature vector; a defending part semantic understanding unit 132, configured to pass the first defending data through a second context encoder including an embedded layer to obtain defending data semantic understanding feature vectors; an association encoding unit 133, configured to perform association encoding on the attack data semantic understanding feature vector and the defending data semantic understanding feature vector to obtain a joint representation matrix; a class center offset correction unit 134, configured to perform class center offset correction on the joint representation matrix based on the attack data semantic understanding feature vector and the defending data semantic understanding feature vector to obtain an optimized joint representation matrix; and a validity judging unit 135 for passing the optimized joint representation matrix through a classifier to obtain a classification result, wherein the classification result is used for indicating whether the attack of the attacker is valid or not.
In one example, in the above MDATA knowledge extraction system 100 based on attack and defense actions, the attack semantic understanding unit 131 includes: an embedding subunit, configured to pass the first attack data through an embedding layer to convert the first attack data into a plurality of attack embedding vectors to obtain a sequence of attack embedding vectors, where the embedding layer uses a learnable embedding matrix to perform embedded encoding on the first attack data; a context encoding subunit, configured to input the sequence of attack embedding vectors into the first context encoder to obtain a plurality of attack semantic feature vectors; and the cascading subunit is used for cascading the plurality of attack semantic feature vectors to obtain the attack data semantic understanding feature vector.
In one example, in the above-described attack-and-defense-based MDATA knowledge extraction system 100, the context encoding subunit is further configured to: arranging the sequence of attack embedded vectors into an input vector; respectively converting the input vector into a query vector and a key vector through a learning embedding matrix; calculating the product between the query vector and the transpose vector of the key vector to obtain a self-attention correlation matrix; carrying out standardization processing on the self-attention association matrix to obtain a standardized self-attention association matrix; inputting the standardized self-attention association matrix into a Softmax activation function to activate so as to obtain a self-attention feature matrix; and multiplying the self-attention feature matrix with each attack embedded vector in the sequence of attack embedded vectors as a value vector to obtain the plurality of attack semantic feature vectors.
In one example, in the above MDATA knowledge extraction system 100 based on attack and defense actions, the association encoding unit 133 is further configured to: performing association coding on the attack data semantic understanding feature vector and the defending data semantic understanding feature vector by using the following formula to obtain a joint representation matrix; wherein, the formula is:
Figure SMS_114
, wherein />
Figure SMS_115
Transpose vector representing the attack data semantic understanding feature vector,/->
Figure SMS_116
Representing the defending data semantic understanding feature vector, < ->
Figure SMS_117
Representing the joint representation matrix,/->
Figure SMS_118
Representing matrix multiplication.
In one example, in the above MDATA knowledge extraction system 100 based on attack and defense actions, the class center offset correction unit 134 includes: the optimizing subunit is used for carrying out topology-class center optimization of class nodes on the attack data semantic understanding feature vector and the defending data semantic understanding feature vector so as to obtain an optimized feature matrix; and an application subunit, configured to perform matrix multiplication on the optimized feature matrix and the joint representation matrix to obtain the optimized joint representation matrix.
In one example, in the above-mentioned attack-and-defense-based MDATA knowledge extraction system 100, the optimization subunit is further configured to: performing topology-class center optimization of class nodes on the attack data semantic understanding feature vector and the defending data semantic understanding feature vector by using the following formula to obtain an optimized feature matrix; wherein, the formula is:
Figure SMS_119
wherein ,
Figure SMS_120
representing the attack data semantically understood feature vector,/>
Figure SMS_124
Representing the defending data semantic understanding feature vector, < ->
Figure SMS_128
Representing the optimized feature matrix,>
Figure SMS_121
and />
Figure SMS_125
Kronecker product and Hadamard product, respectively representing matrix (vector), are +.>
Figure SMS_127
Is a feature vector +>
Figure SMS_131
and />
Figure SMS_123
Distance matrix between, i.e.)>
Figure SMS_126
And->
Figure SMS_129
And
Figure SMS_130
are column vectors, +.>
Figure SMS_122
An exponential operation representing a vector that represents a calculation of a natural exponential function value that is a power of a eigenvalue of each position in the vector.
In one example, in the above MDATA knowledge extraction system 100 based on attack and defense actions, the validity determining unit 135 is further configured to: expanding the optimized joint representation matrix into classification feature vectors according to row vectors or column vectors; performing full-connection coding on the classification feature vectors by using a full-connection layer of the classifier to obtain coded classification feature vectors; and inputting the coding classification feature vector into a Softmax classification function of the classifier to obtain the classification result.
Here, it will be understood by those skilled in the art that the specific functions and operations of the respective units and modules in the above-described attack-and-defense-based MDATA knowledge extraction system 100 have been described in detail in the above description of the attack-and-defense-based MDATA knowledge extraction method with reference to fig. 1 to 4, and thus, repetitive descriptions thereof will be omitted.
As described above, the MDATA knowledge extraction system 100 based on the attack and defense actions according to the embodiment of the present application may be implemented in various terminal devices, for example, a server or the like for MDATA knowledge extraction based on the attack and defense actions. In one example, the offensive and defensive behavior based MDATA knowledge extraction system 100 according to embodiments of the present application may be integrated into a terminal device as one software module and/or hardware module. For example, the attack-and-defense-based MDATA knowledge extraction system 100 may be a software module in the operating system of the terminal device, or may be an application developed for the terminal device; of course, the MDATA knowledge extraction system 100 based on the attack and defense actions can be one of many hardware modules of the terminal device.
Alternatively, in another example, the offensive and defensive MDATA knowledge extraction system 100 and the terminal device may be separate devices, and the offensive and defensive MDATA knowledge extraction system 100 may be connected to the terminal device through a wired and/or wireless network and transmit interaction information according to an agreed data format.
Exemplary electronic device: next, an electronic device according to an embodiment of the present application is described with reference to fig. 7. Fig. 7 is a block diagram of an electronic device according to an embodiment of the present application. As shown in fig. 7, the electronic device 10 includes one or more processors 11 and a memory 12.
The processor 11 may be a Central Processing Unit (CPU) or other form of processing unit having data processing and/or instruction execution capabilities, and may control other components in the electronic device 10 to perform desired functions.
Memory 12 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, random Access Memory (RAM) and/or cache memory (cache), and the like. The non-volatile memory may include, for example, read Only Memory (ROM), hard disk, flash memory, and the like. One or more computer program instructions may be stored on the computer readable storage medium that may be executed by the processor 11 to implement the functionality in the attack-and-defense-behavior-based MDATA knowledge extraction method of the various embodiments of the present application described above and/or other desired functionality. Various contents such as first attack data and first defense data may also be stored in the computer-readable storage medium.
In one example, the electronic device 10 may further include: an input device 13 and an output device 14, which are interconnected by a bus system and/or other forms of connection mechanisms (not shown).
The input means 13 may comprise, for example, a keyboard, a mouse, etc.
The output device 14 may output various information including the classification result and the like to the outside. The output means 14 may include, for example, a display, speakers, a printer, and a communication network and remote output devices connected thereto, etc.
Of course, only some of the components of the electronic device 10 that are relevant to the present application are shown in fig. 7 for simplicity, components such as buses, input/output interfaces, etc. are omitted. In addition, the electronic device 10 may include any other suitable components depending on the particular application.
Exemplary computer program product and computer readable storage Medium: in addition to the methods and apparatus described above, embodiments of the present application may also be a computer program product comprising computer program instructions which, when executed by a processor, cause the processor to perform steps in the functions of the attack-and-defense-behavior-based MDATA knowledge extraction method according to the various embodiments of the present application described in the "exemplary methods" section of this specification.
The computer program product may write program code for performing the operations of embodiments of the present application in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server.
Furthermore, embodiments of the present application may also be a computer-readable storage medium, having stored thereon computer program instructions that, when executed by a processor, cause the processor to perform steps in the functionality of the attack-defense behavior-based MDATA knowledge extraction method according to the various embodiments of the present application described in the "exemplary methods" section of the present specification.
The computer readable storage medium may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may include, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The basic principles of the present application have been described above in connection with specific embodiments, however, it should be noted that the advantages, benefits, effects, etc. mentioned in the present application are merely examples and not limiting, and these advantages, benefits, effects, etc. are not to be considered as necessarily possessed by the various embodiments of the present application. Furthermore, the specific details disclosed herein are for purposes of illustration and understanding only, and are not intended to be limiting, as the application is not intended to be limited to the details disclosed herein as such.
The block diagrams of the devices, apparatuses, devices, systems referred to in this application are only illustrative examples and are not intended to require or imply that the connections, arrangements, configurations must be made in the manner shown in the block diagrams. As will be appreciated by one of skill in the art, the devices, apparatuses, devices, systems may be connected, arranged, configured in any manner. Words such as "including," "comprising," "having," and the like are words of openness and mean "including but not limited to," and are used interchangeably therewith. The terms "or" and "as used herein refer to and are used interchangeably with the term" and/or "unless the context clearly indicates otherwise. The term "such as" as used herein refers to, and is used interchangeably with, the phrase "such as, but not limited to.
It is also noted that in the apparatus, devices and methods of the present application, the components or steps may be disassembled and/or assembled. Such decomposition and/or recombination should be considered as equivalent to the present application.
The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present application. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the application. Thus, the present application is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit the embodiments of the application to the form disclosed herein. Although a number of example aspects and embodiments have been discussed above, a person of ordinary skill in the art will recognize certain variations, modifications, alterations, additions, and subcombinations thereof.

Claims (8)

1. An MDATA knowledge extraction method based on attack and defense behaviors is characterized by comprising the following steps:
constructing a target network in a network target range to simulate a network environment, and setting an attacker and a defender in the target network;
Recording first attack data of the attacker and first defending data of the defender;
performing joint analysis on the first attack data and the first defending data to obtain a joint analysis result, wherein the joint analysis result is used for indicating whether the attack of the attacker is effective; and
based on the joint analysis result, extracting network security data knowledge;
the MDTA is a knowledge representation model, and can effectively represent the time-space attribute of network attack;
the performing a joint analysis on the first attack data and the first defending data to obtain a joint analysis result, where the joint analysis result is used to indicate whether the attack of the attacker is effective, and the method includes:
passing the first attack data through a first context encoder comprising an embedded layer to obtain an attack data semantic understanding feature vector;
passing the first defending data through a second context encoder comprising an embedded layer to obtain defending data semantic understanding feature vectors;
performing association coding on the attack data semantic understanding feature vector and the defending data semantic understanding feature vector to obtain a joint representation matrix;
Based on the attack data semantic understanding feature vector and the defending data semantic understanding feature vector, performing class center offset correction on the joint representation matrix to obtain an optimized joint representation matrix; and
and the optimized joint representation matrix passes through a classifier to obtain a classification result, wherein the classification result is used for representing whether the attack of an attacker is effective.
2. The method for MDATA knowledge extraction based on attack and defense actions according to claim 1, wherein the step of passing the first attack data through a first context encoder including an embedded layer to obtain an attack data semantic understanding feature vector includes:
the first attack data passes through an embedding layer to convert the first attack data into a plurality of attack embedding vectors to obtain a sequence of attack embedding vectors, wherein the embedding layer uses a learnable embedding matrix to carry out embedded coding on the first attack data;
inputting the sequence of attack embedded vectors into the first context encoder to obtain a plurality of attack semantic feature vectors; and
and cascading the plurality of attack semantic feature vectors to obtain the attack data semantic understanding feature vector.
3. The MDATA knowledge extraction method based on attack and defense actions according to claim 2, wherein said inputting the sequence of attack embedded vectors into the first context encoder to obtain a plurality of attack semantic feature vectors comprises:
arranging the sequence of attack embedded vectors into an input vector;
respectively converting the input vector into a query vector and a key vector through a learning embedding matrix;
calculating the product between the query vector and the transpose vector of the key vector to obtain a self-attention correlation matrix;
carrying out standardization processing on the self-attention association matrix to obtain a standardized self-attention association matrix;
inputting the standardized self-attention association matrix into a Softmax activation function to activate so as to obtain a self-attention feature matrix; and
and multiplying the self-attention feature matrix with each attack embedded vector in the sequence of attack embedded vectors as a value vector to obtain a plurality of attack semantic feature vectors.
4. The MDATA knowledge extraction method based on attack and defense actions according to claim 3, wherein said performing association coding on the attack data semantic understanding feature vector and the defense data semantic understanding feature vector to obtain a joint representation matrix includes:
Performing association coding on the attack data semantic understanding feature vector and the defending data semantic understanding feature vector by using the following formula to obtain a joint representation matrix;
wherein, the formula is:
Figure QLYQS_1
wherein
Figure QLYQS_2
Transpose vector representing the attack data semantic understanding feature vector,/->
Figure QLYQS_3
Representing the defending data semantic understanding feature vector, < ->
Figure QLYQS_4
Representing the joint representation matrix,/->
Figure QLYQS_5
Representing matrix multiplication.
5. The MDATA knowledge extraction method based on attack and defense actions according to claim 4, wherein the performing center-like offset correction on the joint representation matrix based on the attack data semantic understanding feature vector and the defense data semantic understanding feature vector to obtain an optimized joint representation matrix includes:
performing topology-class center optimization of class nodes on the attack data semantic understanding feature vector and the defending data semantic understanding feature vector to obtain an optimized feature matrix; and
and carrying out matrix multiplication on the optimized characteristic matrix and the joint representation matrix to obtain the optimized joint representation matrix.
6. The MDATA knowledge extraction method based on attack and defense actions according to claim 5, wherein the performing topology-class center optimization of class nodes on the attack data semantic understanding feature vector and the defense data semantic understanding feature vector to obtain an optimized feature matrix includes:
Performing topology-class center optimization of class nodes on the attack data semantic understanding feature vector and the defending data semantic understanding feature vector by using the following formula to obtain an optimized feature matrix;
wherein, the formula is:
Figure QLYQS_6
wherein ,
Figure QLYQS_8
representing the attack data semantic understanding feature vector, < ->
Figure QLYQS_12
Representing the defending data semantic understanding feature vector, < ->
Figure QLYQS_15
Representing the optimized feature matrix,>
Figure QLYQS_9
and />
Figure QLYQS_14
Kronecker product and Hadamard product, respectively representing matrix (vector), are +.>
Figure QLYQS_17
Is a feature vector +>
Figure QLYQS_18
and />
Figure QLYQS_7
Distance matrix between, i.e.)>
Figure QLYQS_11
And->
Figure QLYQS_13
and />
Figure QLYQS_16
Are column vectors, +.>
Figure QLYQS_10
An exponential operation representing a vector that represents a calculation of a natural exponential function value that is a power of a eigenvalue of each position in the vector.
7. The MDATA knowledge extraction method based on attack and defense actions according to claim 6, wherein the step of passing the optimized joint representation matrix through a classifier to obtain a classification result, wherein the classification result is used for indicating whether the attack of the attacker is effective or not, includes:
expanding the optimized joint representation matrix into classification feature vectors according to row vectors or column vectors;
performing full-connection coding on the classification feature vectors by using a full-connection layer of the classifier to obtain coded classification feature vectors; and
And inputting the coding classification feature vector into a Softmax classification function of the classifier to obtain the classification result.
8. An MDATA knowledge extraction system based on attack and defense actions, comprising:
the environment construction module is used for constructing a target network in the network target range so as to simulate the network environment and setting an attacker and a defender in the target network;
the data recording module is used for recording first attack data of the attacker and first defending data of the defender;
the attack and defending party joint analysis module is used for carrying out joint analysis on the first attack data and the first defending data to obtain a joint analysis result, and the joint analysis result is used for indicating whether the attack of the attack party is effective or not; and
the knowledge extraction module is used for extracting network security data knowledge based on the joint analysis result;
the MDTA is a knowledge representation model, and can effectively represent the time-space attribute of network attack;
wherein, the attack and defense party joint analysis module comprises:
the attack party semantic understanding unit is used for enabling the first attack data to pass through a first context encoder comprising an embedded layer to obtain attack data semantic understanding feature vectors;
The defending part semantic understanding unit is used for enabling the first defending data to pass through a second context encoder comprising an embedded layer to obtain defending data semantic understanding feature vectors;
the association coding unit is used for carrying out association coding on the attack data semantic understanding feature vector and the defending data semantic understanding feature vector to obtain a joint representation matrix;
the class center offset correction unit is used for carrying out class center offset correction on the joint representation matrix based on the attack data semantic understanding feature vector and the defending data semantic understanding feature vector so as to obtain an optimized joint representation matrix; and the validity judging unit is used for enabling the optimized joint representation matrix to pass through a classifier to obtain a classification result, wherein the classification result is used for indicating whether the attack of the attacker is valid or not.
CN202310149931.7A 2023-02-22 2023-02-22 MDTA knowledge extraction method and system based on attack and defense behaviors Active CN115860117B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310149931.7A CN115860117B (en) 2023-02-22 2023-02-22 MDTA knowledge extraction method and system based on attack and defense behaviors

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310149931.7A CN115860117B (en) 2023-02-22 2023-02-22 MDTA knowledge extraction method and system based on attack and defense behaviors

Publications (2)

Publication Number Publication Date
CN115860117A CN115860117A (en) 2023-03-28
CN115860117B true CN115860117B (en) 2023-05-09

Family

ID=85658624

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310149931.7A Active CN115860117B (en) 2023-02-22 2023-02-22 MDTA knowledge extraction method and system based on attack and defense behaviors

Country Status (1)

Country Link
CN (1) CN115860117B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116204266A (en) * 2023-05-04 2023-06-02 深圳市联合信息技术有限公司 Remote assisted information creation operation and maintenance system and method thereof

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114172701A (en) * 2021-11-25 2022-03-11 北京天融信网络安全技术有限公司 Knowledge graph-based APT attack detection method and device
CN115296924A (en) * 2022-09-22 2022-11-04 中国电子科技集团公司第三十研究所 Network attack prediction method and device based on knowledge graph

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111741023B (en) * 2020-08-03 2020-11-17 中国人民解放军国防科技大学 Attack studying and judging method, system and medium for network attack and defense test platform
CN115080756B (en) * 2022-06-09 2023-05-23 广州大学 Attack and defense behavior and space-time information extraction method oriented to threat information map

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114172701A (en) * 2021-11-25 2022-03-11 北京天融信网络安全技术有限公司 Knowledge graph-based APT attack detection method and device
CN115296924A (en) * 2022-09-22 2022-11-04 中国电子科技集团公司第三十研究所 Network attack prediction method and device based on knowledge graph

Also Published As

Publication number Publication date
CN115860117A (en) 2023-03-28

Similar Documents

Publication Publication Date Title
CN112131882B (en) Multi-source heterogeneous network security knowledge graph construction method and device
Karimi et al. Learning hierarchical discourse-level structure for fake news detection
Zhou et al. An ensemble learning approach for XSS attack detection with domain knowledge and threat intelligence
CN111552855B (en) Network threat information automatic extraction method based on deep learning
Joshi et al. Extracting cybersecurity related linked data from text
CN113596007B (en) Vulnerability attack detection method and device based on deep learning
Ma et al. Easy-to-deploy API extraction by multi-level feature embedding and transfer learning
CN111931935B (en) Network security knowledge extraction method and device based on One-shot learning
CN115080756B (en) Attack and defense behavior and space-time information extraction method oriented to threat information map
Yu et al. Attention-based Bi-LSTM model for anomalous HTTP traffic detection
Zhao et al. Automatically predicting cyber attack preference with attributed heterogeneous attention networks and transductive learning
Zhou et al. CTI view: APT threat intelligence analysis system
CN115860117B (en) MDTA knowledge extraction method and system based on attack and defense behaviors
Yong et al. Malicious Web traffic detection for Internet of Things environments
Guo et al. CyberRel: Joint entity and relation extraction for cybersecurity concepts
Mitra et al. Combating fake cyber threat intelligence using provenance in cybersecurity knowledge graphs
Mumtaz et al. Learning word representation for the cyber security vulnerability domain
CN112115326A (en) Multi-label classification and vulnerability detection method for Ether house intelligent contracts
Lu et al. A semantic learning-based SQL injection attack detection technology
Zhu et al. CCBLA: a lightweight phishing detection model based on CNN, BiLSTM, and attention mechanism
Hu et al. Cross-site scripting detection with two-channel feature fusion embedded in self-attention mechanism
Shang et al. A framework to construct knowledge base for cyber security
Zhou et al. Cdtier: A Chinese dataset of threat intelligence entity relationships
Guo et al. A framework for threat intelligence extraction and fusion
Ouyang et al. Phishing web page detection with html-level graph neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant