Disclosure of Invention
The embodiment of the invention provides a prediction method, a prediction device, electronic equipment and a storage medium based on a knowledge graph, and aims to solve the problems that the existing network space target behavior prediction technology is not mature, and the established model and the used algorithm are not systematic.
Based on the above problem, the prediction method based on the knowledge graph provided by the embodiment of the invention includes:
constructing a knowledge graph, wherein the basic unit of the knowledge graph is a triple; the knowledge graph comprises a mode layer and a data layer, and the triples are time sequence triples; and predicting the relation between a certain entity and other entities in the network space based on the knowledge graph.
Further, the triplet of the schema layer is (head entity, relationship | timing information, tail entity) or (entity, attribute value), wherein the attribute includes timing information and/or spatial information, and the timing information includes relationship establishment time, relationship deadline, and/or relationship elimination time.
Further, the data layer is established according to the established mode layer, and the data layer performs entity extraction, attribute extraction and relationship extraction from sources of unstructured data, semi-structured data and structured data, and integrates and disambiguates data from different sources.
Further, collecting various information X of a certain entity X in a network space based on the knowledge graph(m)Each item of information X(m)Including behavior information, traffic information and device log information associated with the entity X, denoted X(m)=X(m) t1,X(m) t2,…,X(m) tn(m ═ 1,2, …, N, m defined as some item of relevant information related to the entity); through link prediction, whether a relation Y exists between the output and other entities Z or not can be represented as (X, whether a relation Y and Z exists or not), and the prediction process is represented as P ═ { X, Y } ═ X(1),X(2),…, X(N)Y, wherein P is the predicted result.
Further, collecting each item of information X of a certain entity in the network space based on the knowledge graph(m)Through link prediction, whether a relation exists between the output and other entities is specifically as follows: collecting each item of information X of a certain entity in a network space based on the knowledge graph(m)Performing quantization processing by using a TransR conversion model; vectorizing X based on TransR conversion model(m)Carrying out incremental learning on the afferent time recurrent neural network model; and performing sequence increment combination on the output of the time recurrent neural network model, and then entering result classification, wherein if the result classification shows that a relation exists between the output of the time recurrent neural network model and other entities, the two entities are considered to be related.
The embodiment of the invention provides a prediction device based on a knowledge graph, which comprises:
a knowledge graph construction module: the knowledge graph is constructed, and the basic unit of the knowledge graph is a triad; the knowledge graph comprises a mode layer and a data layer, and the triples are time sequence triples; a prediction module: and the system is used for predicting the relation between a certain entity and other entities in the network space based on the knowledge graph.
Further, the triplet of the schema layer is (head entity, relationship | timing information, tail entity) or (entity, attribute value), wherein the attribute includes timing information and/or spatial information, and the timing information includes relationship establishment time, relationship deadline, and/or relationship elimination time.
Further, the data layer is established according to the established mode layer, and the data layer performs entity extraction, attribute extraction and relationship extraction from sources of unstructured data, semi-structured data and structured data, and integrates and disambiguates data from different sources.
Further, the prediction module further comprises an information collection module: for collecting information X of an entity X in a network space based on the knowledge graph(m)Each item of information X(m)Including the entity X related behavior information, traffic information and device log information, denoted X(m)=X(m) t1,X(m) t2,…,X(m) tn(m ═ 1,2, …, N, m defined as some item of relevant information related to the entity); a result output module: for link prediction, the existence of a relationship Y between the output and other entities Z can be represented as (X, existence of a relationship Y, Z), and the prediction process is represented as P ═ { X, Y } ═ X(1),X(2),…,X(N)Y, wherein P is the predicted result.
Further, the prediction module is specifically configured to: collecting each item of information X of an entity in a network space based on the knowledge graph(m)Vectorizing by using a TransR conversion model; vectorizing X based on TransR conversion model(m)Carrying out incremental learning on the afferent time recurrent neural network model; and performing sequence increment combination on the output of the time recurrent neural network model, then entering result classification, and if the result classification shows that a relation exists between the result classification and other entities, considering that the two entities are related.
The embodiment of the invention also discloses an electronic device based on prediction of the knowledge graph, which comprises: the device comprises a shell, a processor, a memory, a circuit board and a power circuit, wherein the circuit board is arranged in a space enclosed by the shell, and the processor and the memory are arranged on the circuit board; a power supply circuit for supplying power to each circuit or device of the electronic apparatus; the memory is used for storing executable program codes; the processor executes a program corresponding to the executable program code by reading the executable program code stored in the memory for performing any of the aforementioned knowledge-graph based prediction methods.
Embodiments of the present invention provide a computer readable storage medium having stored thereon one or more programs executable by one or more processors to implement any of the aforementioned knowledge-graph based prediction methods.
Compared with the prior art, the prediction method, the prediction device, the electronic equipment and the storage medium based on the knowledge graph provided by the embodiment of the invention at least realize the following beneficial effects: the method solves the problems of mining and predicting the target behavior rule of the network space, realizes the short-term prediction and the medium-and-long-term trend study and judgment of the network security situation, provides effective prediction information for assisting decision in time for commanders of our army, and reduces decision errors.
Detailed Description
The following describes specific embodiments of a prediction method, a prediction apparatus, an electronic device, and a storage medium based on a knowledge graph according to embodiments of the present invention with reference to the accompanying drawings.
The prediction method based on the knowledge graph provided by the embodiment of the invention, as shown in fig. 1, specifically comprises the following steps:
s101, constructing a knowledge graph, wherein the knowledge graph comprises a mode layer and a data layer;
in 2012, google provides the concept of a knowledge graph, wherein the knowledge graph aims at describing the concept, entity, event and the relationship among the concept, entity and event of objective time, and the knowledge graph is essentially a knowledge base called a semantic network, namely a knowledge base with a directed graph structure; in the knowledge graph, if a Relationship exists between two nodes, the two nodes are connected together by an edge, and then the node is called an Entity (Entity), and the edge between the two nodes is called a Relationship (Relationship); the example of a knowledge graph is shown in figure 2.
The basic unit of the knowledge graph is a triple, the triple is a time sequence triple, the triple of the mode layer is (a head entity, a relation | time sequence information, a tail entity) or (an entity, an attribute value), wherein the attribute comprises time sequence information and/or space information, and the time sequence information comprises relation establishment time, a relation limited term and/or relation elimination time; and establishing the data layer according to the established mode layer, wherein the data layer performs entity extraction, attribute extraction and relationship extraction from sources of unstructured data, semi-structured data and structured data, and integrates and disambiguates data from different sources.
S102, predicting the relation between an entity and other entities in a network space based on the knowledge graph;
collecting various items of information X of a certain entity X in a network space based on the knowledge graph(m)Each item of information X(m)Including but not limited to that entity X is relatedBehavior information, traffic information, and device log information, denoted X(m)=X(m) t1,X(m) t2,…,X(m) tn(m ═ 1,2, …, N, m defined as some item of relevant information related to the entity); through link prediction, whether a relation Y exists between the output and other entities Z or not can be represented as (X, whether a relation Y and Z exists or not), and the prediction process is represented as P ═ { X, Y } ═ X(1),X(2),…, X(N)Y, where P is the predicted outcome;
in more detail, information X of an entity in a network space is collected based on the knowledge graph(m)Vectorizing by using a TransR conversion model; vectorizing X based on TransR conversion model(m)Carrying out incremental learning on the afferent time recurrent neural network model; and performing sequence increment combination on the output of the time recurrent neural network model, then entering result classification, and if the result classification shows that the relation exists between the output of the time recurrent neural network model and other entities, considering that the two entities are related.
The embodiment of the invention fully considers the elements of multidimensional attribute, communication mode, target behavior characteristic and the like of the network space target, combines the clear advantage that the knowledge map can objectively describe entities, concepts and incidence relation in the real world, and mainly solves the problems of mining and intention prediction of the network space target behavior rule by researching the construction of the knowledge map oriented to the network space target behavior, thereby realizing the research and judgment of short-term prediction and medium-term trend of network security situation, providing effective prediction information for assisting decision in time for commanders of our army and reducing decision errors.
As shown in fig. 3, the other prediction method based on the knowledge graph provided in the embodiment of the present invention specifically includes the following steps:
s201, constructing a knowledge graph, wherein the knowledge graph comprises a mode layer and a data layer;
in general, there are two methods for knowledge graph construction: in the embodiment of the invention, a knowledge graph is constructed in a top-down construction mode, namely, the top-down construction is realized by extracting ontology and mode information from high-quality data by means of structured data sources such as encyclopedic websites and the like and adding the ontology and mode information into a knowledge base;
the knowledge graph comprises a mode layer and a data layer, the mode layer is established firstly, the data layer is established continuously according to the mode layer, in the embodiment of the invention, an ontology base is adopted to manage the mode layer of the knowledge graph, the ontology is a concept template of a structured knowledge base, and the knowledge base formed by the ontology base has a stronger hierarchical structure and smaller redundancy;
the embodiment of the invention also considers the time sequence characteristic of the development of the target behavior along with the time in the network space, and from the practical data, by combining the experience and knowledge of the network security engineer, the inference problem of knowledge graph time sequence information is researched by utilizing the triple with time annotation, for example, the common triple is expanded into a time sequence triple (H, R | tau, T), wherein H is a head entity, R is a relation, T is a tail entity, and tau provides additional time sequence information about when the fact is established. In practical applications, the behavior triple sequence of the target host a may be X(1){ (host a, behavior |2020, 3 month, 1 day 14:10, masquerading email), (host a, behavior |2020, 3 month, 1 day 16:44, directing recipient to link to tailored web page), (host a, behavior |2020, 3 month, 2 day 10:27, directing recipient to enter account password), … … }.
S202, collecting various information X of a certain entity in a network space based on the knowledge graph(m);
Whether an entity (such as a host, a router, a printer, a server) in a network space has an association relationship with other entities is judged, the judgment cannot be completed only by a behavior sequence, and the judgment needs to be completed by combining a traffic sequence, an equipment log sequence and the like, wherein the other entities can include a certain intention (such as internal investigation, password modification, right lifting and service denial) or next-step behaviors. For example, a sequence of time-series triples X characterizing the target-related dynamic information will be shown in the following table(m)As input, X(m)And representing a triple sequence with time labels corresponding to the characteristics of behavior, flow and the like related to the target.
Then the sequence of dynamic timing triples for host a may be represented as X ═ X(1),X(2),X(3)When a phishing attack is performed, the intent prediction process may be expressed as P ═ X, Y ═ X(1),X(2),X(3)(host a, whether there is a relationship, implements phishing attack) }.
S203, predicting whether there is a relationship Y between the output and the other entity Z, Y may be represented as (X, whether there is a relationship Y, Z), and the prediction process is represented as P ═ { X, Y } ═ X ═ Y ═ X(1),X(2),…,X(N)Y, where P is the predicted outcome;
according to the table, Y can be expressed as (whether the host a has the relationship Y to implement phishing mail attack), it can be easily inferred that the host a is started up, then network connection is performed, an electronic mailbox is disguised to guide a receiver to be linked to a specific webpage, then a series of operations such as guiding the receiver to input an account password are performed, and finally the device is shut down after the behavior is finished, and the action is finished. This is very consistent with the characteristics of the phishing mail attack process, and the prediction result P can be deduced: host a has this intent. Then the equipment node of the 'host A' can be connected with the intention node of 'implementing phishing mail attack' in the knowledge graph to complete the reasoning, and the processes are all automated without manual intervention.
The embodiment of the invention considers that the behavior related information of the network space target can be established only in a specific time period, expands the common knowledge domain into the time sequence knowledge domain, utilizes the ternary group with time labels to research the reasoning problem of the time sequence information of the network space target behavior knowledge domain, and makes up the defect that the existing knowledge domain link prediction method is difficult to predict the time sequence information in the network space domain knowledge domain.
Another prediction method based on a knowledge graph provided in the embodiment of the present invention, as shown in fig. 4, specifically includes the following steps:
s301, constructing a knowledge graph, wherein the knowledge graph comprises a mode layer and a data layer;
s302, collecting various information X of a certain entity in a network space based on the knowledge graph(m);
S303, linking a prediction model based on the knowledge graph of LSTM sequence incremental learning, and enabling each item of information X(m)Vectorizing by using a TransR conversion model;
the knowledge-graph link prediction model for incremental learning of the LSTM sequence comprises the following steps: a triple sequence input layer, an incremental computation layer, an LSTM sequence combination layer, and a result output layer, as shown in fig. 5;
all the information X(m)And performing vectorization processing by using a TransR conversion model, and then using the vectorized vector as a triple sequence input layer of the model.
S304, vectorizing X based on TransR conversion model(m)Carrying out incremental learning on the afferent time recurrent neural network model;
s305, performing sequence increment combination on the output of the time recurrent neural network model;
output V of time recursive neural network model(m)And performing sequence increment combination to obtain V, wherein the principle is that if the latter vector and the former vector have values at the same position, the feature is enhanced in an increment superposition mode, and the enhanced feature is used as the next input to continue operation, as shown in FIG. 5, a rectangular box in the figure represents each bit of the vector, different colors represent different values, wherein white represents no data, white oblique lines represent data, gray oblique lines represent enhanced data after two white oblique lines are superposed, and black represents that the bit has been subjected to multiple increment combinations.
S306, entering result classification, and if the result classification indicates that the relation exists between the two entities, considering that the two entities are related, otherwise, considering that the two entities are unrelated.
The embodiment of the invention provides a prediction method of a knowledge graph link prediction model based on LSTM sequence incremental learning, which further perfects the discovery and reasoning process of missing information in a network space target behavior knowledge graph.
An embodiment of the present invention further provides a prediction apparatus based on a knowledge graph, as shown in fig. 6, including:
knowledge graph building module 401: the knowledge graph is used for constructing a knowledge graph, and the basic unit of the knowledge graph is a triple; the knowledge graph comprises a mode layer and a data layer, and the triples are time sequence triples;
the prediction module 402: and the system is used for predicting the relation between a certain entity and other entities in the network space based on the knowledge graph.
Further, the triplet of the schema layer is (head entity, relationship | timing information, tail entity) or (entity, attribute value), wherein the attribute includes timing information and/or spatial information, and the timing information includes relationship establishment time, relationship deadline, and/or relationship elimination time.
Further, the data layer is established according to the established mode layer, and the data layer performs entity extraction, attribute extraction and relationship extraction from sources of unstructured data, semi-structured data and structured data, and integrates and disambiguates data from different sources.
Further, the prediction module further comprises an information collection module: for collecting information X of an entity X in a network space based on the knowledge graph(m)Each item of information X(m)Including the entity X related behavior information, traffic information and device log information, denoted X(m)=X(m) t1,X(m) t2,…,X(m) tn(m ═ 1,2, …, N, m defined as some item of relevant information related to the entity); a result output module: for link prediction, the relation Y between the output and other entity Z may be represented as (X, Y, Z), and the prediction process is represented as P ═ { X, Y } ═ X(1),X(2),…,X(N)Y, wherein P is the predicted result.
Further, the prediction module is specifically configured to: collecting each item of information X of an entity in a network space based on the knowledge graph(m)Vectorizing by using a TransR conversion model; vectorizing X based on TransR conversion model(m)Carrying out incremental learning on the afferent time recurrent neural network model; and performing sequence increment combination on the output of the time recurrent neural network model, then entering result classification, and if the result classification shows that a relation exists between the result classification and other entities, considering that the two entities are related.
The embodiment of the invention fully considers the elements of multidimensional attribute, communication mode, target behavior characteristic and the like of the network space target, combines the clear advantage that the knowledge map can objectively describe entities, concepts and incidence relation in the real world, and mainly solves the problems of mining and intention prediction of the network space target behavior rule by researching the construction of the knowledge map oriented to the network space target behavior, thereby realizing the research and judgment of short-term prediction and medium-term trend of network security situation, providing effective prediction information for assisting decision in time for commanders of our army and reducing decision errors.
An embodiment of the present invention further provides an electronic device, fig. 7 is a schematic structural diagram of an embodiment of the electronic device of the present invention, and a flow of the embodiments shown in fig. 1 to 5 of the present invention can be implemented, as shown in fig. 7, where the electronic device may include: the device comprises a shell 51, a processor 52, a memory 53, a circuit board 54 and a power circuit 55, wherein the circuit board 54 is arranged inside a space enclosed by the shell 51, and the processor 52 and the memory 53 are arranged on the circuit board 54; a power supply circuit 55 for supplying power to each circuit or device of the electronic apparatus; the memory 53 is used to store executable program code; the processor 52 executes a program corresponding to the executable program code by reading the executable program code stored in the memory 53, so as to execute the method for detecting a port scan attack according to any one of the foregoing embodiments.
The specific execution process of the above steps by the processor 52 and the steps further executed by the processor 52 by running the executable program code may refer to the description of the embodiment shown in fig. 1 to 5 of the present invention, and are not described herein again.
The electronic device exists in a variety of forms, including but not limited to:
(1) a mobile communication device: such devices are characterized by mobile communications capabilities and are primarily targeted for providing voice and data communications. Such terminals include: smart phones (e.g., iphones), multimedia phones, functional phones, and low-end phones, among others.
(2) Ultra mobile personal computer device: the equipment belongs to the category of personal computers, has calculation and processing functions and generally has the characteristic of mobile internet access. Such terminals include: PDA, MID, and UMPC devices, etc., such as ipads.
(3) A portable entertainment device: such devices can display and play multimedia content. This type of equipment comprises: audio, video players (e.g., ipods), handheld game consoles, electronic books, and smart toys and portable car navigation devices.
(4) A server: the device for providing computing service, the server comprises a processor, a hard disk, a memory, a system bus and the like, the server is similar to a general computer architecture, but the server needs to provide highly reliable service, so the requirements on processing capability, stability, reliability, safety, expandability, manageability and the like are high.
(5) And other electronic equipment with data interaction function.
Embodiments of the present invention also provide a computer-readable storage medium, wherein the computer-readable storage medium stores one or more programs, which are executable by one or more processors to implement the aforementioned prediction method.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and similar parts between the embodiments may be referred to each other, and each embodiment focuses on differences from other embodiments.
In particular, as for the apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
For convenience of description, the above devices are described separately in terms of functional division into various units/modules. However, the functionality of the units/modules may be implemented in one or more software and/or hardware when implementing the invention.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.
The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.