CN112468440A - Knowledge graph-based industrial control system attack clue discovery system - Google Patents

Knowledge graph-based industrial control system attack clue discovery system Download PDF

Info

Publication number
CN112468440A
CN112468440A CN202011168061.0A CN202011168061A CN112468440A CN 112468440 A CN112468440 A CN 112468440A CN 202011168061 A CN202011168061 A CN 202011168061A CN 112468440 A CN112468440 A CN 112468440A
Authority
CN
China
Prior art keywords
attack
industrial control
entity
knowledge
vulnerability
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011168061.0A
Other languages
Chinese (zh)
Other versions
CN112468440B (en
Inventor
赖英旭
周昆
刘静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN202011168061.0A priority Critical patent/CN112468440B/en
Publication of CN112468440A publication Critical patent/CN112468440A/en
Application granted granted Critical
Publication of CN112468440B publication Critical patent/CN112468440B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1433Vulnerability analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computational Linguistics (AREA)
  • Computer Hardware Design (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses an industrial control system attack clue discovery system based on a knowledge graph. Most industrial control systems are designed and developed years ago, corresponding safety considerations are lacked, and many vulnerabilities which endanger the system safety inevitably exist, and the vulnerabilities are likely to be utilized by intruders. Aiming at the problem that an industrial control intrusion detection system can only find attacks but cannot provide clues related to the attacks, and the clues have important effect on quick recovery after the system attacks, the method provides the clues related to the attacks from the aspect of vulnerability exploitation by constructing the knowledge base of the industrial control system vulnerability exploitation. In the process of constructing the knowledge graph, an attack information named entity identification method based on a conditional random field, an entity alignment framework based on rule and character similarity calculation and a knowledge inference algorithm based on type definition and pre-training model negative triple potential correct probability are provided. The invention visually displays the knowledge graph in a force-directed graph mode according to the attack clues input by the user, and is more accurate and visual.

Description

Knowledge graph-based industrial control system attack clue discovery system
Technical Field
The invention belongs to the field of industrial control system network security, and particularly relates to an industrial control system attack clue discovery system based on a knowledge graph.
Background
The industrial control system is composed of various automatic control components and a process control component for real-time data acquisition and monitoring, and is applied to very wide industrial fields. Most of industrial control systems used at present are designed and developed years ago, corresponding safety consideration is lacked, many vulnerabilities which endanger the system safety inevitably exist, and the vulnerabilities can be possibly utilized by intruders to cause safety accidents. With the continuous promotion of the two-way integration, the mature IT technology breaks the relative closure of the industrial control system, the faced safety problem and risk become more and more prominent, and the network safety accident of the industrial control system seriously affects the life and property safety of industrial production. At present, network attacks on an industrial control system are usually discovered by using an intrusion detection technology, but the intrusion detection technology can only detect the attacks and give an alarm, and cannot provide relevant information such as a method used by the attacks, the consequences caused by the attacks, processing opinions and the like, the information can provide decision support for security personnel, and the method plays an important role in reducing attack loss by quickly recovering the system after the system is attacked.
Through the above analysis, when an attacker attacks the industrial control system by using the industrial control system bug, in order to ensure the system security, not only the attack needs to be detected, but also a method for providing a clue related to the attack is urgently needed. Aiming at the problem, the knowledge graph is introduced into the safety field of the industrial control system, and the knowledge graph is constructed for discovering the attack clues of the industrial control system by utilizing the vulnerabilities of the industrial control system. The knowledge graph is a semantic network consisting of entities and relations and formally proposed by Google, and aims to change a keyword-based search mode, understand the input of a user based on semantic retrieval of the knowledge graph and provide a more direct and systematic result for a searcher. Therefore, the knowledge graph of the industrial control system can be used for reasoning according to the input of the user, and the clue information related to the attack is visually displayed in a force-directed graph mode, so that the safety personnel can conveniently make correct safety decisions.
Disclosure of Invention
When the industrial control system detects the network attack, the intrusion detection system cannot provide the relevant information of the attack, and in order to quickly restore the system, security personnel need information such as an attack method, consequences caused by the attack, opinion processing and the like. In order to solve the problem, the invention provides a knowledge-graph-based industrial control system attack clue discovery system, and provides a more intuitive and accurate attack clue for security personnel by constructing the industrial control system vulnerability exploitation knowledge graph. The knowledge graph can be divided into a mode layer and a data layer, wherein the mode layer stores refined knowledge, and the data layer stores specific data information. According to the difference of the construction sequence of the two parts, the construction of the knowledge graph can be divided into two construction modes of top-down and bottom-up. According to the invention, a top-down mode is adopted, a knowledge graph mode layer is extracted from scene and data information and is used for guiding the construction of the data layer. According to the network security state and other related information under the multi-source heterogeneous network environment, security related elements are classified according to the specific characteristics of different data sources and are mainly divided into three dimensions of a basic dimension, a threat dimension and a fragile dimension. The attack clue discovery concept set C of the industrial control system is obtained from three different dimensions and an industrial control specific scene, wherein the concept set C is { Vendor, Device, Vulnerability, Mean and sequence }. Vendor: manufacturer, Device: industrial control system equipment, Vulnerability: equipment vulnerability, Mean: vulnerability exploitation attack method, sequence: the abnormal result caused by the attack. The relationship R between concepts is { product, wave, show, cause, use, kid-of, lead-to }, which is a production relationship between a manufacturer and a device, an ownership relationship between a device and a vulnerability, an expression relationship between a device and an attack anomaly, a cause relationship between a vulnerability and an attack anomaly, a utilization relationship between an attack method and a vulnerability, a hierarchical relationship between entities, and a causal relationship between an attack result and an attack result. The equipment and the vulnerability data come from various large vulnerability libraries, information is extracted by using a web crawler, and the attack method and the attack result information of the vulnerability come from unstructured text information, such as equipment manufacturer bulletins, vulnerability descriptions and the like.
After determining the mode layer, extracting attack information in the unstructured text. Aiming at the problems of large length difference of an attack method and an abnormal result and the existence of a large number of nests and aliases, the invention provides a feature combination containing entity features and context information based on a linear chain element random field model, wherein the feature combination comprises word features, part of speech features, entity boundary features, key word features before and after an entity and entity high-frequency word features, and the integrity of entity recognition is improved.
Since the extracted attack information comes from multiple data sources, the situation of 'meaning by word' is inevitable, and entity alignment is needed. Under the scene that the corpus is limited and a large number of complex long entities with large occurrence frequency difference exist, the simple character string similarity calculation cannot solve the problems of abbreviations and synonyms, the reason for the problem of 'multiple words and one meaning' is analyzed, and English name variation can be divided into the following five types probably:
1) the letters are identical in composition and sequence, and the names are different due to case and punctuation. Such as "email" and "E-mail".
2) Synonym substitution results in a different name. For example, "temperature increment" and "temperature built-up". 3) Abbreviations cause the names to differ. English abbreviation rules are different and can be divided into:
a. the initials of each word constitute an abbreviation.
b. Prefixes of words constitute abbreviations.
c. The combination of the prefix and suffix of a word constitutes an abbreviation.
4) The misspelling constitutes a difference.
5) Others
The invention provides a rule and aggregation similarity method, which comprises the steps of firstly using the rule to judge whether the multiple words and one meaning are caused by abbreviation and synonym replacement, and if not, then using the similarity calculation to judge. Respectively calculating the Edit distance, Jaro-Wrinkler, ISUB and Jaccard similarity, using sigmoid function aggregation as the final similarity, and representing the same entity if the similarity is larger than a threshold value. The method improves the entity alignment effect to a certain extent.
After entity alignment, a basic knowledge graph has been formed, but alsoThe problem of relationship loss exists, and the relationship needs to be complemented by reasoning. The translation model TransE is often used for knowledge reasoning, but the random replacement of the TransE model to generate negative triples may include positive triples and low-quality negative triples, which affects the reasoning effect of the model. The method introduces the concept of the potential correct probability of the negative triples, scores the negative triples generated by random replacement according to the correct probability of the negative triples, and leads the training weights of the negative triples with different scores to be different for the model, so that the training weights of the positive triples and the low-quality negative triples in the negative triples are reduced, and further the reasoning effect of the model is improved. With respect to the calculation of the potential correct probability, the invention uses a pre-trained TransE model, and the formula is defined as:
Figure BDA0002746389210000031
wherein f (h ', r, t') is the score of the negative triplet on the pre-trained TransE, and then the TransE model is retrained by using the negative triplet containing the potential correct probability as the final inference model.
After the knowledge graph of the vulnerability utilization of the industrial control system is constructed, the vulnerability utilization knowledge graph is combined with the intrusion detection system of the industrial control system, when the intrusion detection system finds a certain attack, the information such as the name of equipment and a manufacturer where the attack occurs is utilized to carry out knowledge reasoning in the knowledge graph, the clues relevant to the attack are found, and are visually displayed by using a Baidu visual framework ECharts, so that security personnel can visually obtain the information such as the vulnerability relevant to the attack, a utilization method, an attack result, a protection suggestion and the like, and the method has important value for the rapid recovery of the system.
Drawings
FIG. 1 is a schematic diagram of the general construction of a knowledge graph according to the present invention.
FIG. 2 is a schematic diagram of named entity recognition in accordance with the present invention.
FIG. 3 is a schematic diagram of the alignment of the entities of the present invention.
FIG. 4 is a schematic diagram of the knowledge inference of the present invention.
FIG. 5 is an example of knowledge-graph attack cue discovery constructed based on the present invention.
Detailed Description
The present invention will be described in detail below with reference to specific embodiments shown in the drawings.
Fig. 1 is a general structure diagram constructed by the knowledge graph of the present invention, as shown in fig. 1, in order to obtain device information, vulnerabilities, and vendor information, a web crawler is used to obtain and analyze web page information of each vulnerability library, and the device, vulnerabilities, and vendor entities are merged and deduplicated to form the knowledge graph, which includes attributes such as vulnerability name, release date, threat level, CVE number, description, vulnerability patch, vulnerability type, vulnerability reference, and the like. Aiming at unstructured vulnerability bulletins, vulnerability descriptions and the like, a named entity recognition and extraction attack method and attack consequences based on a linear chain random field are used. The entity extracted by multiple data sources has the condition of 'multiple words and one meaning', the reason of the attack method and the attack consequence entity generating 'multiple words and one meaning' is analyzed, and an entity alignment framework is provided for entity alignment.
The problem of relation loss among a plurality of entities exists in the process of establishing the knowledge graph, and the purpose of knowledge reasoning is to complement the relation of the knowledge graph. On the basis of a translation model TransE, aiming at the fact that positive triples and meaningless triples are generated in the process of randomly replacing and generating negative triples by the model, the invention provides a knowledge inference algorithm based on the potential correct probability of the negative triples of a pre-training model, and effectively improves the effect of knowledge inference by combining an entity type limiting method.
Fig. 2 is a schematic diagram of a named entity recognition process based on a CRF model, as shown in fig. 2, including:
extracting attack information from the unstructured text by using a linear chain element random field model, firstly labeling the text by using a BIOES labeling method, and then dividing a training set and a test set in a ratio of 7: 3. The characteristics used by the invention are word characteristics, part of speech characteristics, entity boundary characteristics, key word characteristics before and after the entity and high-frequency word characteristics of the entity. The key word characteristics before and after the entity refer to an attack method and high-frequency words before and after the entity with attack consequences, for example, a noun phrase after by and using generally represents the attack method, the high-frequency words of the entity refer to words with high occurrence frequency in the entity, and the words have a triggering effect on model identification. Training corpuses are subjected to pretreatment and feature extraction to continuously and iteratively train the CRF model, testing corpuses are also subjected to pretreatment and feature extraction to verify the effect of the CRF model, the accuracy rate and the recall rate are used as the evaluation standards of the model, and the CRF model with the best effect is selected as the model for named entity recognition.
FIG. 3 is a flow diagram of a physical alignment framework, as shown in FIG. 3, including:
step 31, constructing rules of English abbreviations;
step 32, judging whether the two input entities are in an abbreviated form of one another according to the rule of step 31;
step 33, if the entity is not the multiple word meaning caused by the abbreviation, standardizing the entity;
step 34, extracting the word stem of the entity;
step 35, removing stop words contained in the entity;
step 36, synonym replacement is carried out on the words in the entities one by utilizing WordNet, and whether the two entities have multiple words and one meaning caused by synonym replacement is judged;
step 37, if the similarity is not caused by synonym replacement, calculating the similarity of the two entities, wherein the similarity comprises Edit distance, Jaro-Winkler, ISUB and Jaccard;
step 38, aggregating the four similarities as comprehensive similarities by using a Sigmoid function;
step 39, judging whether the comprehensive similarity is greater than a threshold value, if so, determining the entity is the same entity, otherwise, determining the entity is two different entities;
FIG. 4 is a flow chart of knowledge inference of the present invention, as shown in FIG. 4, comprising:
in the process of establishing the knowledge graph, a plurality of relations are lost, and knowledge reasoning is needed to complement the relations. The translation model based on representation learning overcomes the problems of low calculation efficiency and data sparsity of symbolic representation triples, and a maximum interval method is adopted to train the model, namely, an optimization target separates positive samples from negative samples. Most translation models generate negative samples in a random way, which may cause the negative samples to include positive samples and also include many negative samples with low quality, for example, (Beijing, Lound, Banana) is a very poor negative sample. The method for calculating the potential correct probability of the negative triple based on the pre-training model is provided based on the thought of the potential correct probability of the negative triple, the problem of model deviation caused by the problem is solved by combining type limitation, the smaller the difference between the positive triple and the negative triple is, the larger the potential correct probability of the negative triple is, and the smaller the target function score of the negative triple is, so that the score of the negative triple on the pre-training TransE model can be used as the measurement of the potential correct probability.
The score formula of the translation model is: f (h, r, t) | | | h + r-t | | non-conducting hair1/2Where h, r, t represent the vector representation of head, relationship and tail entities, respectively, 1 and 2 represent the L1 norm and L2 norm, respectively, the positive triplet score is close to 0, the larger the negative triplet score the better. The probability of potential correctness of the negative triplet based on the pre-trained model is defined as:
Figure BDA0002746389210000051
where T is the set of negative samples generated by the random substitution, and f (h ', r, T') represents the score of the negative triplet in the pre-trained model. Adding the concept of the potential correct probability of the negative sample into the calculation process, wherein the objective function is as follows:
Figure BDA0002746389210000052
wherein S is a positive triple set, S ' is a negative triple set, delta is the potential correct probability of the negative triple, and lambda is a hyper-parameter of the model, and (h, r, t) and (h ', r, t ') respectively represent the positive triple and the negative triple.
The whole process can be summarized as: firstly, a negative triple pre-training TransE model is generated by randomly replacing triples in a knowledge graph, the pre-training model is used for calculating the potential correct probability of the negative triples, and the nonsense negative triples doped in the negative triples are removed through type limitation. The TransE model, trained using data containing potentially correct probabilistic negative triples, is then used as the knowledge inference model.
After named entity recognition, entity alignment and knowledge reasoning, a relatively complete engineering system vulnerability exploitation knowledge graph is obtained. And (4) building an attack discovery system, and displaying clues related to the attack in a force guide graph mode according to the input of a user by using a knowledge graph. Specific examples are as follows:
the Phoenix Contact is a german industrial automation, connectivity and interface solution provider, the product of which is mainly applied in the key infrastructure field, such as the industries of communication, key manufacturing and information technology, and the FL SWITCH produced by the product of which has a plurality of exploitable holes. When the equipment is detected to have a denial of service attack, the channel is positioned to specific equipment by 'Phoenix Contact' and device is positioned to 'FL SWITCH', and depth-first traversal is performed by taking the channel as a starting point to find a vulnerability existing in the equipment, wherein the vulnerability may cause an abnormal result, and the vulnerability utilization method causes the abnormal result. The entities and the relations of the paths are shown in a force-directed graph manner as in fig. 5, and detailed attribute information of the entities and the relations can be displayed by mouse hovering, so that the entities are numbered for convenient expression. As can be seen from the path 1- >5- >12- >4, the attacker utilizes the cache overflow vulnerability (CVE-2018 and 10728) of the FL SWITCH device by constructing the cookie information of the GET request, thereby causing a denial of service attack. Path 1- >10- >2- >12- >4 may also cause device buffer overflow. Different exception results may be generated at different stages of attack execution, resulting in remote code execution due to buffer overflow. The entities 8 and 14 are other vulnerabilities existing in the device, and the path 1- >6- >7- >8- >9 is known, so that an attacker utilizes the vulnerabilities in a command injection mode (CVE-2018 and 10730), obtains the authority for executing the system commands in the privilege escalation stage, can update the firmware of the device, and prepares for further expanding the attack influence. The path 1- >13- >14- >15 shows that information leakage is caused by the device vulnerability (CVE-2018 and 10729), and an attacker can read the configuration file of the device. Knowing the cause of the attack, security personnel can obtain a mitigation solution and a patch for the vulnerability from the "refer" and "patch" properties of the vulnerability, thereby thwarting the attack. The presentation mode of the force guide graph enables data presentation to be more visual, clue information can be presented in real time when an attack occurs, and decision support is provided for security personnel.
It should be understood that although the description is made in terms of embodiments, not every embodiment includes only a single embodiment, and such description is for clarity only, and those skilled in the art will recognize that the embodiments described herein may be combined as appropriate, and implemented as would be understood by those skilled in the art.
The above-listed series of detailed descriptions are merely specific illustrations of possible embodiments of the present invention, and they are not intended to limit the scope of the present invention, and all equivalent embodiments or modifications that do not depart from the technical spirit of the present invention should be included within the scope of the present invention.

Claims (6)

1. Knowledge graph-based industrial control system attack clue discovery system is characterized in that: the method for constructing the knowledge graph of the industrial control system vulnerability exploitation provides an attack clue and a construction method of the knowledge graph from the aspect of vulnerability exploitation. The construction of the knowledge graph of the industrial control system vulnerability exploitation comprises the following steps:
1) and constructing a knowledge graph mode layer, determining the extracted entities and the relationship between the entities according to the application scene, and guiding the construction of the data layer.
2) And acquiring vulnerability and equipment data from NVD, CNVD and CNNVD vulnerability libraries by using a web crawler, acquiring related vulnerability descriptions and acquiring manufacturer notices according to manufacturer notice links.
3) And 2) extracting attack methods and attack result information by using a linear chain element random field, wherein the vulnerability description and manufacturer notice obtained in the step 2) are unstructured text data.
4) The information extracted by the multivariate data has the condition of 'meaning by word', and an entity alignment framework based on rules and similarity calculation is used for entity alignment.
5) The problem that entity relations are lost exists in the constructed knowledge graph inevitably, and the relation among the knowledge graph entities is complemented by a negative sample potential correct probability knowledge inference algorithm based on a pre-training model.
2. The knowledge-graph-based industrial control system attack cue discovery system according to claim 1, wherein: most of the currently used industrial control systems are designed and developed years ago, corresponding safety considerations are lacked, and many vulnerabilities which endanger the system safety inevitably exist, and the vulnerabilities can be possibly utilized by intruders to cause safety accidents. The current intrusion detection technology of the industrial control system can not provide attack clue information, so the invention introduces the knowledge graph into the attack clue discovery field of the industrial control system, can understand the advantages input by a user by utilizing the semantic retrieval of the knowledge graph, and provides visual and accurate attack clues for security personnel.
3. The knowledge-graph-based industrial control system attack cue discovery system according to claim 1, wherein the knowledge-graph pattern layer constructed in the step 1) is: the invention obtains an attack clue discovery concept set C of the industrial control system from three different dimensions by combining with an industrial control concrete scene, wherein the concept set C is { Vendor, Device, Vulnerability, Mean and sequence }. Vendor: manufacturer, Device: industrial control system equipment, Vulnerability: equipment vulnerability, Mean: vulnerability exploitation attack method, sequence: the abnormal result caused by the attack. The relationship R between concepts is { product, wave, show, cause, use, kid-of, lead-to }, which is a production relationship between a manufacturer and a device, an ownership relationship between a device and a vulnerability, an expression relationship between a device and an attack anomaly, a cause relationship between a vulnerability and an attack anomaly, a utilization relationship between an attack method and a vulnerability, a hierarchical relationship between entities, and a causal relationship between an attack result and an attack result.
4. The system for discovering industrial control system attack clues based on knowledge graph of claim 1, wherein the attack method based on conditional random field and the attack result named entity recognition method used in step 3). The length difference between the attack method and the attack result is large, a large number of nests and aliases exist, and in order to ensure the integrity of the extracted entity, the invention introduces the context environmental characteristics of the entity and determines the optimal characteristic combination:
1) word features. Each word generated after the text is segmented serves as a characteristic, and the characteristic can reflect basic information of the text more completely.
2) And (4) part of speech characteristics. During the process of text word segmentation, part-of-speech tagging is carried out on each word, and the used part-of-speech includes more than 20 part-of-speech characteristics including verbs, nouns and prepositions.
3) And (4) entity boundary characteristics. And marking the corpus by adopting a BIEOS marking method.
4) And key word characteristics before and after the entity. Attack methods and attack result entities generally appear before or after some keywords, and can be used for characteristics of entity identification.
5) And (4) entity high-frequency word characteristics. Many attack methods and attack result entities have a high probability of certain words occurring, which triggers recognition.
5. The knowledge-graph-based industrial control system attack cue discovery system according to claim 1, wherein the entity alignment framework used in step 4). The framework carries out targeted entity alignment aiming at the 'multi-word one meaning' caused by the reasons of abbreviation, synonym replacement, spelling error, symbol and the like by analyzing the attack method and the attack result entity. The process comprises the following steps of,
step 31, constructing rules of English abbreviations;
step 32, judging whether the two input entities are in an abbreviated form of one another according to the rule of step 31;
step 33, if the entity is not the multiple word meaning caused by the abbreviation, standardizing the entity;
step 34, extracting the word stem of the entity;
step 35, removing stop words contained in the entity;
step 36, synonym replacement is carried out on the words in the entities one by utilizing WordNet, and whether the two entities have multiple words and one meaning caused by synonym replacement is judged;
step 37, if the similarity is not caused by synonym replacement, calculating the similarity of the two entities, wherein the similarity comprises Editdasce, Jaro-Winkler, ISUB and Jaccard;
step 38, aggregating the four similarities as comprehensive similarities by using a Sigmoid function;
and 39, judging whether the comprehensive similarity is greater than a threshold value, wherein the comprehensive similarity is the same entity if the comprehensive similarity is greater than the threshold value, and otherwise, the comprehensive similarity is two different entities.
6. The knowledge-graph-based industrial control system attack cue discovery system according to claim 1, wherein the knowledge inference method used in step 5). In order to overcome the defect that negative samples are generated by random substitution of a translation model and are possibly doped with positive samples and meaningless samples in the negative samples, the method uses a type-limited and pre-trained model-based negative triple potential correct probability knowledge inference algorithm, and the inference effect of the model is improved. The specific process is as follows: firstly, a negative triple pre-training TransE model is generated by randomly replacing triples in a knowledge graph, the pre-training model is used for calculating the potential correct probability of the negative triples, and the nonsense negative triples doped in the negative triples are removed through type limitation. The TransE model, trained using data containing potentially correct probabilistic negative triples, is then used as the knowledge inference model.
The scoring formula for the translation model is: f (h, r, t) | | | h + r-t | | non-conducting hair1/2Where h, r, t represent vector representations of head, relationship and tail entities, respectively, and 1 and 2 represent L1 and L2 norms, respectively. The score formula is used to define the potential correct probability of the negative triple as:
Figure FDA0002746389200000031
and after each negative triple obtains the potential correct probability, training the TransE model again, and adding the concept of the potential correct probability of the negative sample into the calculation process, wherein the objective function is as follows:
Figure FDA0002746389200000032
wherein S is a positive triple set and S' is a negative triple setAnd in the set, delta is the potential correct probability of the negative triplet, and lambda is the hyper-parameter of the model. Compared with the traditional TransE model, the TransE model trained at this time is used as a knowledge reasoning model, and has an obvious improvement effect.
CN202011168061.0A 2020-10-28 2020-10-28 Knowledge graph-based attack clue discovery system for industrial control system Active CN112468440B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011168061.0A CN112468440B (en) 2020-10-28 2020-10-28 Knowledge graph-based attack clue discovery system for industrial control system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011168061.0A CN112468440B (en) 2020-10-28 2020-10-28 Knowledge graph-based attack clue discovery system for industrial control system

Publications (2)

Publication Number Publication Date
CN112468440A true CN112468440A (en) 2021-03-09
CN112468440B CN112468440B (en) 2022-11-15

Family

ID=74834195

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011168061.0A Active CN112468440B (en) 2020-10-28 2020-10-28 Knowledge graph-based attack clue discovery system for industrial control system

Country Status (1)

Country Link
CN (1) CN112468440B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113127707A (en) * 2021-04-22 2021-07-16 中国美术学院 Product design influence analysis method
CN113783896A (en) * 2021-11-10 2021-12-10 北京金睛云华科技有限公司 Network attack path tracking method and device
CN114004230A (en) * 2021-09-23 2022-02-01 杭萧钢构股份有限公司 Industrial control scheduling method and system for producing steel structure
CN114116957A (en) * 2021-04-02 2022-03-01 集美大学 Electromagnetic information leakage intelligent analysis method based on knowledge graph
CN114443863A (en) * 2022-04-07 2022-05-06 北京网藤科技有限公司 Attack vector generation method and system based on machine learning in industrial control network
CN114553534A (en) * 2022-02-22 2022-05-27 国网河北省电力有限公司电力科学研究院 Power grid security vulnerability assessment method based on knowledge graph
CN114844712A (en) * 2022-05-23 2022-08-02 苏州思萃工业互联网技术研究所有限公司 Safety detection system and method based on knowledge graph edge nodes
CN115859305A (en) * 2022-12-26 2023-03-28 国家工业信息安全发展研究中心 Knowledge graph-based industrial control security situation sensing method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106951499A (en) * 2017-03-16 2017-07-14 中国人民解放军国防科学技术大学 A kind of knowledge mapping method for expressing based on translation model
CN109347801A (en) * 2018-09-17 2019-02-15 武汉大学 A kind of vulnerability exploit methods of risk assessment based on multi-source word insertion and knowledge mapping
US10218717B1 (en) * 2016-02-11 2019-02-26 Awake Security, Inc. System and method for detecting a malicious activity in a computing environment
CN110688456A (en) * 2019-09-25 2020-01-14 北京计算机技术及应用研究所 Vulnerability knowledge base construction method based on knowledge graph
CN111241840A (en) * 2020-01-21 2020-06-05 中科曙光(南京)计算技术有限公司 Named entity identification method based on knowledge graph

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10218717B1 (en) * 2016-02-11 2019-02-26 Awake Security, Inc. System and method for detecting a malicious activity in a computing environment
CN106951499A (en) * 2017-03-16 2017-07-14 中国人民解放军国防科学技术大学 A kind of knowledge mapping method for expressing based on translation model
CN109347801A (en) * 2018-09-17 2019-02-15 武汉大学 A kind of vulnerability exploit methods of risk assessment based on multi-source word insertion and knowledge mapping
CN110688456A (en) * 2019-09-25 2020-01-14 北京计算机技术及应用研究所 Vulnerability knowledge base construction method based on knowledge graph
CN111241840A (en) * 2020-01-21 2020-06-05 中科曙光(南京)计算技术有限公司 Named entity identification method based on knowledge graph

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ZHENG ZHU等: ""Cyber Security Knowledge Graph Based Cyber Attack Attribution Framework for Space-ground Integration Information Network"", 《2018 IEEE 18TH INTERNATIONAL CONFERENCE ON COMMUNICATION TECHNOLOGY (ICCT)》 *
陶耀东等: "一种基于知识图谱的工业互联网安全漏洞研究方法", 《信息技术与网络安全》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114116957A (en) * 2021-04-02 2022-03-01 集美大学 Electromagnetic information leakage intelligent analysis method based on knowledge graph
CN113127707A (en) * 2021-04-22 2021-07-16 中国美术学院 Product design influence analysis method
CN114004230A (en) * 2021-09-23 2022-02-01 杭萧钢构股份有限公司 Industrial control scheduling method and system for producing steel structure
CN113783896A (en) * 2021-11-10 2021-12-10 北京金睛云华科技有限公司 Network attack path tracking method and device
CN114553534A (en) * 2022-02-22 2022-05-27 国网河北省电力有限公司电力科学研究院 Power grid security vulnerability assessment method based on knowledge graph
CN114553534B (en) * 2022-02-22 2024-01-23 国网河北省电力有限公司电力科学研究院 Knowledge graph-based power grid security vulnerability assessment method
CN114443863A (en) * 2022-04-07 2022-05-06 北京网藤科技有限公司 Attack vector generation method and system based on machine learning in industrial control network
CN114443863B (en) * 2022-04-07 2022-07-26 北京网藤科技有限公司 Attack vector generation method and system based on machine learning in industrial control network
CN114844712A (en) * 2022-05-23 2022-08-02 苏州思萃工业互联网技术研究所有限公司 Safety detection system and method based on knowledge graph edge nodes
CN115859305A (en) * 2022-12-26 2023-03-28 国家工业信息安全发展研究中心 Knowledge graph-based industrial control security situation sensing method and system

Also Published As

Publication number Publication date
CN112468440B (en) 2022-11-15

Similar Documents

Publication Publication Date Title
CN112468440B (en) Knowledge graph-based attack clue discovery system for industrial control system
CN110516067B (en) Public opinion monitoring method, system and storage medium based on topic detection
CN114610515B (en) Multi-feature log anomaly detection method and system based on log full semantics
Aussel et al. Improving performances of log mining for anomaly prediction through nlp-based log parsing
CN111291195B (en) Data processing method, device, terminal and readable storage medium
CN109918505B (en) Network security event visualization method based on text processing
CN114861194B (en) Multi-type vulnerability detection method based on BGRU and CNN fusion model
Maakoul et al. Towards evaluating the COVID’19 related fake news problem: case of morocco
CN115017898A (en) Sensitive text recognition method and device, electronic equipment and storage medium
Dong et al. A sentence-level text adversarial attack algorithm against IIoT based smart grid
Shang et al. A framework to construct knowledge base for cyber security
US20240071375A1 (en) System and a method for detectiing point anomaly
Wankhede et al. Data preprocessing for efficient sentimental analysis
Yu et al. Self-supervised log parsing using semantic contribution difference
US10204146B2 (en) Automatic natural language processing based data extraction
US20230342410A1 (en) Inferring information about a webpage based upon a uniform resource locator of the webpage
Zhang et al. Threat analysis of IoT security knowledge graph based on confidence
Liu et al. The runtime system problem identification method based on log analysis
Ramakrishnan et al. Health Misinformation in the Covid-19 Era-Detecting Misinformation on Bi-lingual Corpora using Lexical Features
US11829423B2 (en) Determining that a resource is spam based upon a uniform resource locator of the webpage
Pokharel Information Extraction Using Named Entity Recognition from Log Messages
Jayaramulu et al. DLOT-Net: A Deep Learning Tool For Outlier Identification
Wang et al. Text Classification and Threat Intelligence Generation for Industrial Control System Security
Assali et al. Automated metadata hierarchy derivation
CN116756341A (en) Complete knowledge graph construction method based on multi-source vulnerability data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant