CN113676484B

CN113676484B - Attack tracing method and device and electronic equipment

Info

Publication number: CN113676484B
Application number: CN202110993536.8A
Authority: CN
Inventors: 王星凯; 薛见新; 吴复迪; 刘文懋; 张润滋
Original assignee: Nsfocus Technologies Inc; Nsfocus Technologies Group Co Ltd
Current assignee: Nsfocus Technologies Inc; Nsfocus Technologies Group Co Ltd
Priority date: 2021-08-27
Filing date: 2021-08-27
Publication date: 2023-04-18
Anticipated expiration: 2041-08-27
Also published as: CN113676484A

Abstract

In the embodiment of the application, redundant log data are removed in a mode of constructing a baseline model, so that the problem of explosion of the dependency relationship between terminal side data is solved, the operation load is reduced, representative fields are extracted from the log data of a network side and the terminal side, a time sequence heterogeneous graph is constructed by defining the dependency relationship between the log data, and a complete attack traceability graph is constructed by defining the dependency relationship between nodes in the knowledge graph and nodes in the time sequence heterogeneous graph through the constructed knowledge graph. The attack tracing method, the attack tracing device and the electronic equipment provided by the embodiment of the application can obtain a more complete attack path with higher accuracy from the attack tracing graph based on the relevant information of the attack source and the target object in the scene associated with the network side and the terminal side.

Description

Attack tracing method and device and electronic equipment

Technical Field

The present application relates to the field of network security technologies, and in particular, to an attack tracing method, an attack tracing apparatus, and an electronic device.

Background

In recent years, with the expansion of cyberspace attack surfaces, new generation attack threats have frequently occurred. In order to deal with the attack threats, besides preventing various security accidents in advance, the method also comprises the step of carrying out post response on various security accidents, wherein the attack tracing is an important component for carrying out post response in various security accidents, and the method is a method for restoring the attack path and the attack technique of an attacker to a certain extent by analyzing the flow of the damaged asset and the intranet. By the attack tracing method, the attack source and the corresponding path of the attack can be determined, so that a defender specifies a better protection and countercheck scheme. Attack tracing is an important ring for constructing a network security defense system.

In the occurring attack events, the attack behaviors of attackers have causal association relationship, and the attack tracing is to associate the information related to the attack together to construct an attack tracing graph based on the causal association relationship, and find the attackers and attack paths from the attack tracing graph. In general, the causal association relationship needs to be set based on the dependency relationship between the historical alarm data and the historical log data, and a corresponding attack tracing graph needs to be constructed.

Generally, the attack tracing technology is built on the analysis process of the attack tracing graph. In the related art, the construction of the attack tracing graph is mainly divided into three cases:

1. and constructing a source tracing graph on the host side.

2. And constructing an attack tracing graph related to the system log and the application program log.

3. And constructing an attack tracing graph associated with the network side and the terminal side.

The construction technology of the attack tracing source diagram on the host side and the construction technology of the attack tracing source diagram associated with the system log and the application program log are relatively perfect, but the construction technology of the attack tracing source diagram associated with the network side and the terminal side is not mature, and the following technical defects exist:

1) Determining the attack path results in a large computational load.

Specifically, the construction of the attack tracing graph on the host side and the construction of the attack tracing graph associated with the system log and the application log are all performed on a single device, while a complete attack process usually spans multiple devices, and only the attack tracing graph associated with the network side and the terminal side is constructed to be possible to trace to a complete attack path.

However, in this case, the historical log data related to the attack flow across multiple devices is massive, which may cause the dependency relationship between the data to be too complex, and thus the constructed attack tracing graph may be too huge, and therefore, the complexity of the flow for determining the attack path is significantly increased to a certain extent, and a large amount of computation load is caused.

2) The accuracy of the acquired attack path is not high.

In particular, when an attacker uses some attack means, there may be a problem that the acquired attack path is incomplete or not accurate.

For example, an attacker uploads Webshell on a certain device, and then acquires the host authority through an SQL Server weak password, and then creates a new user. Because the network connection from the middleware to the database is more, the corresponding connection cannot be found accurately by manual judgment, and the acquired attack path is incomplete.

In this case, setting the causal association relationship may cause that the constructed attack tracing graph associated with the network side and the terminal side is not accurate, and further, the attack path obtained from the attack tracing graph is incomplete or the accuracy is not high.

Disclosure of Invention

The embodiment of the invention provides an attack tracing method, an attack tracing device and electronic equipment, which are used for improving the accuracy of an acquired target attack path and reducing the operation load generated when the target attack path is determined in an attack tracing scene associated with a network side and a terminal side.

In a first aspect, an embodiment of the present application provides an attack tracing method, where the method includes:

the method comprises the steps of obtaining first historical log data of a target object, and determining an attack target and an attack source based on the first historical log data.

Second historical log data of each device related to the target object are obtained, each target feature is extracted from the first historical log data and the second historical log data based on the data type of the second historical log data, and a corresponding feature time sequence abnormal graph is generated based on each target feature.

And generating a corresponding attack tracing graph based on the characteristic time sequence abnormal graph and by combining a preset knowledge graph.

And obtaining each candidate attack path comprising the attack target and the attack source based on the attack tracing graph.

And screening target attack paths meeting preset path conditions from the candidate attack paths based on the preset path conditions.

In a second aspect, an embodiment of the present application further provides an attack tracing apparatus, including:

and the alarm module is used for acquiring first historical log data of the target object and determining an attack target and an attack source based on the first historical log data.

The first generation module is used for acquiring second historical log data of each device associated with the target object, extracting each target feature from the first historical log data and the second historical log data based on the data type of the second historical log data, and generating a corresponding feature time sequence abnormal graph based on each target feature.

And the second generation module is used for generating a corresponding attack tracing graph based on the characteristic time sequence difference graph and by combining a preset knowledge graph.

And the searching module is used for obtaining each candidate attack path comprising the attack target and the attack source based on the attack tracing source diagram, and screening the target attack path which meets the preset path condition from each candidate attack path based on the preset path condition.

Optionally, the attack tracing apparatus further includes a removing module, configured to remove, based on a preset baseline model, second historical log data whose related predetermined parameter is lower than a preset threshold from each second historical log data, respectively.

In an optional embodiment, if the first historical log data of the target object at least includes alarm data recorded by performing an illegal operation on the target object, when determining an attack target and an attack source based on the first historical log data, the alarm module is specifically configured to:

and determining an event described by the alarm data as an attack event when the alarm data is determined to meet a preset alarm condition based on the alarm data recorded by executing illegal operation on the target object in the first historical log data, and determining an attack target and an attack source of the attack event based on the alarm data.

In an optional embodiment, when second history log data of each device associated with the target object is acquired, each target feature is extracted from the first history log data and the second history log data based on a data type of the second history log data, and a corresponding feature timing difference graph is generated based on each target feature, the first generating module is specifically configured to:

and acquiring second historical log data of each device associated with the target object.

And determining the data type corresponding to the target characteristic based on the data type of the attack source and the log data types of the first historical log data and the second historical log data.

And extracting the target features containing preset feature fields from the first historical log data and the second historical log data of the corresponding data types respectively based on the data types corresponding to the target features, wherein the feature fields are set aiming at an attack tracing scene and are used for representing relevant fields of an attack path.

And respectively taking each target feature as a corresponding feature node, and respectively setting the association mode between every two feature nodes as a feature association mode.

And based on each feature node, respectively taking each corresponding feature association mode as a corresponding feature edge to generate a corresponding feature time sequence differential graph.

In an optional embodiment, when the corresponding attack tracing graph is generated based on the feature timing difference graph and by combining a preset knowledge graph, the second generating module is specifically configured to:

and respectively setting the association mode between each feature node and each rule node as a complementary association mode, wherein each rule node corresponds to each preset knowledge base rule one by one, and each rule node is set based on the corresponding knowledge base rule.

And based on the characteristic time sequence abnormal graph and the preset knowledge graph, taking each completion association mode as a corresponding completion edge for connecting the characteristic time sequence abnormal graph and the preset knowledge graph, and generating a corresponding attack tracing graph.

In an optional embodiment, before the association manner between each feature node and each rule node is set as a complementary association manner, the second generating module is further configured to:

the method comprises the steps of obtaining preset knowledge base rules, taking the preset knowledge base rules as corresponding rule nodes, setting the association mode between every two rule nodes as a rule association mode, taking the corresponding rule association modes as corresponding rule edges on the basis of the rule nodes, and generating a corresponding knowledge graph.

Alternatively, the first and second electrodes may be,

the method comprises the steps of obtaining a preset knowledge base, generating each knowledge base rule based on the preset knowledge base by adopting a preset knowledge map algorithm, generating an association mode between every two knowledge base rules into a rule association mode, taking each generated knowledge base rule as a corresponding rule node, and taking each corresponding rule association mode as a corresponding rule edge to generate a corresponding knowledge map.

In an optional embodiment, when obtaining, based on the attack tracing graph, each candidate attack path including the attack target and the attack source, and screening, based on a preset path condition, a target attack path that meets the preset path condition from the candidate attack paths, the search module is specifically configured to:

for each candidate attack path, performing the following operations:

obtaining each candidate node in one candidate attack path, and determining a candidate edge between every two candidate nodes, wherein each candidate node is any one feature node or any one rule node in the attack tracing graph; the association mode corresponding to each candidate edge is determined based on the node types of the two candidate nodes connected by the candidate edge.

Respectively extracting the association modes corresponding to the candidate edges, and respectively executing the following operations aiming at the candidate edges: and obtaining a candidate feature vector corresponding to one candidate edge based on an association mode corresponding to the candidate edge, wherein each candidate feature vector comprises a plurality of dimension elements, and each dimension element represents one attribute of the association mode.

Aiming at the obtained candidate feature vectors corresponding to the candidate edges respectively, the following operations are respectively executed: respectively obtaining preset dimension weights corresponding to all dimension elements contained in one candidate feature vector, and performing weighted summation based on the values of the dimension elements and the corresponding dimension weights to obtain edge weights of candidate edges corresponding to the candidate feature vector, wherein each dimension weight represents the occurrence probability of the corresponding dimension element.

And carrying out weighted summation based on the obtained association mode corresponding to each candidate edge and the corresponding edge weight to obtain the path weight corresponding to the candidate attack path.

And based on the obtained path weights corresponding to the candidate attack paths, taking the candidate attack path with the path weight reaching a path weight threshold as the target attack path.

In an optional embodiment, when obtaining the candidate feature vector corresponding to one candidate edge based on the association manner corresponding to the candidate edge, the search module is specifically configured to:

and respectively setting dimension elements corresponding to the attributes in the candidate feature vector corresponding to the candidate edge based on the attributes of the association mode corresponding to the candidate edge.

In a third aspect, an embodiment of the present application further provides an electronic device, including a memory and a processor, where the memory stores a computer program that is executable on the processor, and when the computer program is executed by the processor, the processor is enabled to implement the attack tracing method according to the first aspect.

The technical effect brought by any one implementation manner of the second aspect and the third aspect may refer to the technical effect brought by the corresponding implementation manner of the first aspect, and details are not described here again.

In the embodiment of the application, redundant log data are removed in a mode of constructing a baseline model, so that the problem of explosion of the dependency relationship between terminal side data is solved, the operation load is reduced, representative fields are extracted from the log data of a network side and the terminal side, a time sequence heterogeneous graph is constructed by defining the dependency relationship between the log data, and a complete attack traceability graph is constructed by defining the dependency relationship between nodes in the knowledge graph and nodes in the time sequence heterogeneous graph through the constructed knowledge graph. By the method, a more complete attack path with higher accuracy can be obtained from the attack tracing graph based on the relevant information of the attack source and the target object in the scene associated with the network side and the terminal side.

Drawings

Fig. 1 is a system architecture diagram of an attack tracing method according to an embodiment of the present application;

fig. 2a, fig. 2b, and fig. 2c are diagrams illustrating a knowledge graph in a local right-granting scenario according to an embodiment of the present application;

fig. 3 is an attack tracing method provided in the embodiment of the present application;

fig. 4 is a timing sequence heterogeneous graph generating method according to an embodiment of the present application;

fig. 5a and 5b are exemplary diagrams of a local authorization attack behavior link in a timing diversity diagram provided in an embodiment of the present application;

fig. 6 is a method for generating an attack tracing graph according to an embodiment of the present application;

fig. 7 is an illustration diagram of attack tracing source in a local right-granting scenario according to an embodiment of the present application;

fig. 8 is a method for obtaining a path weight corresponding to a candidate attack path according to an embodiment of the present application;

fig. 9a, fig. 9b, and fig. 9c are exemplary diagrams of candidate attack paths provided by an embodiment of the present application;

fig. 10 is a schematic diagram of an attack tracing apparatus according to an embodiment of the present application;

fig. 11 is a schematic diagram of another attack tracing apparatus according to an embodiment of the present application;

fig. 12 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In order to improve the accuracy of an acquired attack path and reduce the operation load generated when the attack path is determined in an attack tracing scene associated between a network side and a terminal side, in the embodiment of the application, a baseline model is constructed, logs which cannot be used as attack behaviors in data of the terminal side are filtered, so that the problem of explosion of the dependency relationship between the data of the terminal side is relieved, the operation load is reduced, representative fields are extracted from log data of the network side and the log data of the terminal side, a time sequence heterogeneous graph is constructed by defining the dependency relationship between the log data, the dependency relationship between nodes in the knowledge graph and nodes in the time sequence heterogeneous graph is defined by the constructed knowledge graph, and a complete attack tracing graph is constructed. By the method, the attack path with higher accuracy can be obtained through the attack tracing graph under the scene that the network side is associated with the terminal side.

Referring to fig. 1, in the embodiment of the present application, an attacked target device 100 is associated with a network-side device 110 and a terminal-side device 120, the network-side device 110 and the terminal-side device 120 may have multiple attacked target devices, network connections are established between the attacked target device and the network-side device 110 and the terminal-side device 120, first history log data of the attacked target device is stored in the attacked target device, second history log data of the attacked target device is stored in the network-side device 110 and the terminal-side device 120 associated with the attacked target device, and the attacked target device 100 can extract the corresponding second history log data stored in each of the network-side device 110 and the terminal-side device 120 from the network-side device 110 and the terminal-side device 120 by means of local area network transmission or the like.

In the attack tracing method provided by the embodiment of the application, a preset knowledge graph is required to be utilized, and the knowledge graph can be constructed by, but not limited to, the following methods:

the method comprises the following steps: the method comprises the steps of obtaining preset knowledge base rules, taking the preset knowledge base rules as corresponding rule nodes, setting the association mode between every two rule nodes as a rule association mode, taking the corresponding rule association modes as corresponding rule edges on the basis of the rule nodes, and generating a corresponding knowledge graph.

Optionally, the security expert combines knowledge of itself with an existing public knowledge base, and according to different application scenarios, on one hand, the security expert is compatible with existing standards and architectures, on the other hand, a proper knowledge range is selected according to the application scenarios, data is abstracted and proprialized from a global perspective, a corresponding knowledge base is designed, and each knowledge base rule is acquired from the knowledge base, wherein each rule defines a precondition for occurrence of an attack behavior and a possible behavior or generated influence after execution of the attack behavior.

For example, in a scenario for local right-granting provided by the embodiment of the present application, each acquired knowledge base rule is as shown in table 1:

TABLE 1

Rule 1	Establishing foothold on a victim host by writing a file
		Rule 2	Non-write file execution instructions directly on victim host
Rule 3	Connecting foothold points on victim host
		Rule 4	Promoting user rights
Rule 5	Creating new users
		Rule 6	Uploading a scanner on a victim host
Rule 7	Traversing within an intranet

Referring to fig. 2a, regarding each knowledge base rule as a corresponding rule node, the method includes: regular node 1, regular node 2, regular node 7, the corresponding node types are shown in table 2:

TABLE 2

Regular node type	Rule node
		Uploading Webshell	Rule node 1
Exploit	Rule node 2
		Webshell terminal connection	Rule node 3
Right to be increased	Rule node 4
		Creating new users	Rule node 5
Upload scanner	Rule node 6
		Moving in the transverse direction	Rule node 7

Optionally, a rule association manner between each two rule nodes is set through expert definition.

For example, referring to fig. 2b, the directions indicated by arrows in the figure indicate the time sequence, in the embodiment of the present application, for the rule association manner e1-e8 between each two different rule nodes, the following definitions are given, as shown in table 3:

TABLE 3

Rule association mode	Feature representation of rule association
		e1	Uploading Webshell-Webshell terminal connection
e2	Webshell terminal connection-privilege escalation
		e3	Webshell terminal connection-creation of new user
e4	Webshell terminal connection-uploading scanner
		e5	Webshell terminal connection-lateral movement
e6	Right-creation of New user
		e7	Creation of New user-upload scanner
e8	Upload scanner-traversing

In the embodiment of the present application, a knowledge graph applied to a local right-raising scene is generated according to the association manner of each rule given above, as shown in fig. 2c, in the figure, the direction indicated by an arrow indicates the time sequence.

The method 2 comprises the following steps: the method comprises the steps of obtaining a preset knowledge base, generating each knowledge base rule based on the preset knowledge base by adopting a preset knowledge map algorithm, generating an association mode between every two knowledge base rules into a rule association mode, taking each generated knowledge base rule as a corresponding rule node, and taking each corresponding rule association mode as a corresponding rule edge to generate a corresponding knowledge map.

Optionally, based on the existing public knowledge base or the preset knowledge base, different knowledge map algorithms can be adopted according to the actual application scenario, so that knowledge is extracted from the knowledge base, corresponding knowledge base rules are generated, and meanwhile, the corresponding knowledge maps are automatically generated.

Referring to fig. 3, an attack tracing method provided in the embodiment of the present application is shown:

step 310: the method comprises the steps of obtaining first historical log data of a target object, and determining an attack target and an attack source based on the first historical log data.

Optionally, the first historical log data of the target object at least includes alarm data recorded by performing an illegal operation on the target object, and based on the alarm data recorded by performing the illegal operation on the target object in the first historical log data, when it is determined that the alarm data meets a preset alarm condition, an event described by the alarm data is determined as an attack event, and based on the alarm data, an attack target and an attack source of the attack event are determined.

For example, according to the IPS alarm data, according to a specific scenario, when the IPS alarm is an alarm confirmed to be successful in attack by manual investigation or belongs to a high-confidence rule in a specific occasion, an event described by the IPS alarm data is determined as an attack event, and an attack source IP and an attack target of the attack event are determined according to the IPS alarm data.

Step 320: acquiring second historical log data of each device associated with the target object, extracting each target feature from the first historical log data and the second historical log data based on the data type of the second historical log data, and generating a corresponding feature time sequence abnormal graph based on each target feature.

Specifically, referring to fig. 4, the method includes the following steps:

step 3201: and acquiring second historical log data of each device associated with the target object.

Optionally, step 3202: and based on a preset baseline model, respectively removing second historical log data with related preset parameters lower than a preset threshold value from each second historical log data.

For example, second historical log data associated with a target object is obtained, and according to a preset baseline model, second historical log data on a terminal device Z is determined, and the network device a, the network device B, the terminal device a, the terminal device B, and the like are determined, wherein the second historical log data is lower than a preset threshold in relevant preset parameters among all the second historical log data on the terminal device Z, specifically, differences among similar hosts are considered in a transverse direction, differences between the historical baselines of the hosts and the own historical baselines of the hosts are considered in a longitudinal direction, a baseline model is learned and trained according to a specific scene, and filtering is performed on the second historical log data which is lower than the preset threshold in the relevant preset parameters by using the baseline model, wherein the second historical log data which is filtered out is log data which cannot be associated with an attacker through manual judgment of the baseline, and even if log data without attack behaviors generated by the attacker exists, the corresponding attack source cannot be influenced.

Optionally, different baseline models are used for different types of second historical log data.

For example, second historical log data of each process type and historical log data of a Windows event type in the second historical log data of the terminal side devices a-Z are obtained, and different baseline models are independently learned and trained respectively.

The second history log data of each device associated with the target object includes both history log data of each network-side device and history log data of a plurality of associated terminal-side devices, and statistical analysis is performed on the history process log data of the terminal-side devices, so that it can be found that the size of a process behavior set of a user on each terminal-side device tends to be stable over time, which indicates that the user behavior set is limited, therefore, by adopting a baseline model filtering manner, redundant second history log data which cannot be used as an attack behavior can be removed, generally, each retained second history log data can be distinguished by a security expert or an alarm program to be related to the attack behavior or an attacker, and each removed second history log data cannot be distinguished to be related to the attack behavior or the attacker, and includes history log data which does not record the attack behavior and history log data which records no attack behavior generated by the attacker, and each removed second history log data does not have an influence on tracing source, so that the extracted second history log data can be subjected to screening, and only a subsequent history log data related to an attack behavior is retained, thereby reducing the load tracing of the attack source.

Step 3203: and determining the data type corresponding to the target characteristic based on the data type of the attack source and the log data types of the first historical log data and the second historical log data.

For example, in a SQL Server right-giving scenario, log data types of historical log data on each network-side device and terminal-side device associated with a target object are: running a process, process activities, file operations and network connections, as shown in table 4 below, L1-L4 correspond to the four log data types respectively:

TABLE 4

#	Log data type
		L1	Running a process
L2	Process activity
		L3	File manipulation
L4	Network connection

Correspondingly, if the data type of the attack source is an IP type, determining the data type corresponding to the target feature includes: IP type, process type, file type, and user type.

Step 3204: and respectively extracting target characteristics containing preset characteristic fields from the first historical log data and the second historical log data of the corresponding data types based on the data types corresponding to the target characteristics, wherein the characteristic fields are set aiming at the attack tracing scene and are used for representing relevant fields of the attack path.

For example, in the embodiment of the present application, for a scenario in which the SQL Server gives rights, the set feature fields are as shown in table 5 below:

TABLE 5

Characteristic field	Log data type	Brief introduction to the drawings
			timestamp	L1-L4	Time stamp
dport	L4	Destination port
			log_id	L1-L4	Log id
log_name	L1-L4	Name of log
			process_pid	L1-L4	Process ID
process_parent_pid	L1-L2	PID of process parent process
			protocol	L4	Transmission protocol
sample_file_path	L3	File path
			process_path	L1-L4	Process path
process_user_name	L1-L2	Process username
			remote_address	L4	Remote host ip
process_action	L2	Process actions
			sip	L1-L4	Source ip
dip	L1-L4	Destination ip

And respectively extracting target characteristics containing the characteristic fields based on the characteristic fields.

Step 3205: and respectively taking each target feature as a corresponding feature node, and respectively setting the association mode between every two feature nodes as a feature association mode.

For example, referring to fig. 5a, in the embodiment of the present application, for a scenario in which the SQL Server rights-giving is performed, corresponding feature nodes are set based on the extracted target features, and if the set feature nodes include feature nodes related to an attack path: a feature node 1, a feature node 2, a feature node 6, a feature node 7, and a plurality of other feature nodes unrelated to attack behavior, such as a feature node a, a feature node b, and a feature node c, where the node types of the feature nodes include: IP, process, file, user, as shown in table 6 below:

TABLE 6

Characteristic node type
	IP
process
	file
user

Optionally, in this embodiment of the present application, the security expert defines an association manner between every two feature nodes in the scenario, that is, each feature association manner d1 to d14, as shown in table 7 below:

TABLE 7

In the feature representation of the feature association mode d1, the corresponding t represents a time sequence feature threshold set for the SQL Server weighted scene.

Step 3206: and based on each feature node, respectively taking each corresponding feature association mode as a corresponding feature edge to generate a corresponding feature time sequence abnormal graph.

Optionally, different time sequence feature thresholds t are set through respective corresponding experiments for different security scenes, and corresponding feature time sequence differential graphs meeting the time sequence feature thresholds are generated according to the set time sequence feature thresholds.

Based on the feature node 1, the feature node 2, the feature node 7, each corresponding feature association mode is respectively used as a corresponding feature edge, and a corresponding feature time sequence differential graph which accords with a preset time sequence feature threshold is generated.

It can be seen that the host permission is acquired through the SQL Server weak password, and then a new user is created, because the network connection from the middleware to the database is more, and the corresponding connection cannot be found accurately by manual judgment, so that the acquired attack path is incomplete, the node (chopper) of the Webshell cannot be associated with the node of the new user, and therefore, the link between the two corresponding feature nodes is disconnected.

Step 330: and generating a corresponding attack tracing graph by combining a preset knowledge graph based on the characteristic time sequence abnormal graph.

Specifically, referring to fig. 6, the method includes the following steps:

step 3301: and respectively setting the association mode between each characteristic node and each rule node as a complementary association mode, wherein each rule node corresponds to each preset knowledge base rule one by one, and each rule node is set based on the corresponding knowledge base rule.

For example, in the context of the SQL Server right-giving provided in the embodiment of the present application, the set completion association modes and their characteristics are shown in table 8 below:

TABLE 8

Complementary correlation mode	Feature representation of complementary correlation
		f1	Node-upload Webshell
f2	node-Exploit
		f3	Webshell terminal connection-node
f4	Node-privilege escalation
		f5	Creating a new user-node
f6	Upload scanner-node

For the complementary correlation methods f1 to f6 given above, respective explanations are given as shown in table 9 below:

TABLE 9

Step 3302: and based on the characteristic time sequence abnormal graph and the preset knowledge graph, taking each completion association mode as a corresponding completion edge for connecting the characteristic time sequence abnormal graph and the preset knowledge graph, and generating a corresponding attack tracing graph.

For example, in a scenario of SQL Server extraction, a network side auditing system cannot capture network connection of a loopback address, because network connection of middleware to a database is more, and corresponding connection cannot be found accurately by manual judgment, an obtained attack path is incomplete, as shown in fig. 5b, a node (chop) of a Webshell cannot be associated with a node of a new user through knowledge of an operating system and the like, a link between two corresponding feature nodes is disconnected, and in combination with a knowledge graph shown in fig. 2c, f4 and f5 in table 8 are adopted to construct a complete attack path, so that the node (chop) of the Webshell is associated with the new user through the knowledge graph, as shown in fig. 7, and the indication directions of arrows in the graph represent the time sequence. Therefore, based on the characteristic time sequence difference graph and the preset knowledge graph, the generated attack tracing graph can complement the broken attack paths in the prior art, and the accuracy of each attack path extracted from the attack tracing graph is higher.

Step 340: and obtaining each candidate attack path comprising an attack target and an attack source based on the attack tracing graph.

For example, in a SQL Server right-raising scenario, candidate attack paths including feature nodes corresponding to attack source IP information and feature nodes corresponding to users are obtained.

Step 350: and screening target attack paths meeting the preset path conditions from the candidate attack paths based on the preset path conditions.

Specifically, step 3501 and step 3502 are performed.

Step 3501: for each candidate attack path, referring to fig. 8, the following operations are performed:

step 35001: obtaining each candidate node in a candidate attack path, and determining a candidate edge between every two candidate nodes, wherein each candidate node is any one feature node or any one rule node in an attack tracing graph; the association mode corresponding to each candidate edge is determined based on the node types of the two candidate nodes connected by the candidate edge.

Specifically, the association method corresponding to each candidate edge has three cases:

case 1: and if the two candidate nodes are both feature nodes, the association mode corresponding to the candidate edge between the two candidate nodes is a feature association mode.

Case 2: and if the two candidate nodes are both regular nodes, the association mode corresponding to the candidate edge between the two candidate nodes is a regular association mode.

Case 3: if the two candidate nodes are a rule node and a feature node, the association mode between the two candidate nodes is a complementary association mode.

For example, in a SQL Server right-giving scenario, a candidate attack path including a feature node corresponding to the attack source IP information, that is, a feature node acker, a feature node corresponding to the user, and a feature node Tester is obtained, and if the following three candidate attack paths are obtained:

referring to fig. 9a, the direction indicated by the arrow in the figure represents the chronological order, and the candidate attack path 1: feature node Attacker (feature node 1) -feature node httpd. Exe (feature node 2) -feature node chopper (feature node 3) -rule node empowers (rule node 4) -rule node creates a new user (rule node 5) -feature node Tester (feature node 7).

Referring to fig. 9b, the direction indicated by the arrow in the figure represents the chronological order, and the candidate attack path 2: feature node Attacker (feature node 1) -feature node httpd.exe (feature node 2) -feature node a-feature node b-feature node chopper (feature node 3) -rule node weighted (rule node 4) -rule node created new user (rule node 5) -feature node Tester (feature node 7).

Referring to fig. 9c, the direction indicated by the arrow in the figure represents the chronological order, and the candidate attack path 3: feature node Attacker (feature node 1) -feature node httpd. Exe (feature node 2) -feature node chopper (feature node 3) -feature node c-rule node upload Webshell (rule node 2) -rule node Webshell terminal connection (rule node 3) -rule node privilege (rule node 4) -rule node creation new user (rule node 5) -feature node Tester (feature node 7).

Based on the attack tracing graph shown in fig. 7, the association manners of the candidate edges between the candidate nodes of the candidate attack path are shown in fig. 9a, fig. 9b, and fig. 9 c.

Step 35002: respectively extracting the association modes corresponding to the candidate edges, and respectively executing the following operations aiming at the candidate edges: and obtaining a candidate feature vector corresponding to a candidate edge based on an association mode corresponding to the candidate edge, wherein each candidate feature vector comprises a plurality of dimension elements, and each dimension element represents one attribute of the association mode.

Specifically, based on each attribute of the association manner corresponding to one candidate edge, dimension elements corresponding to each attribute in the candidate feature vector corresponding to the candidate edge are respectively set.

Each feature node is denoted as u1, u 2. If both candidate nodes are feature nodes, e.g., u1 and u2, the association manner corresponding to the candidate edge u1-u2 between the two candidate nodes is a feature association manner. If the two candidate nodes are both regular nodes, such as v1 and v2, the association mode corresponding to the candidate edge v1-v2 between the two candidate nodes is a regular association mode. If the two candidate nodes are a rule node and a feature node, e.g., u1-v1, the association manner between the two candidate nodes u1-v1 is a complementary association manner. In the SQL Server privilege escalation scenario, as shown in table 7, the feature association manner includes d1 to d14, as shown in table 3, the rule association manner includes e1 to e8, as shown in table 8, the completion association manner includes f1 to f6, then the candidate feature vector corresponding to the candidate edge u2-v1 has 28 dimension elements n1-n28, which can be represented as [ n1, n 2.. Multidigit, n28], and each dimension element characterizes an attribute of the respective association manner d1,. Multidigit, d14, e1,. Multidigit, e8, f1,. Multidigit, f6, that is, whether each of the above-mentioned association manners exists is taken as an example, if the candidate edge u2-v1 corresponds to the complete association manner f4 in the scene, if 28 dimension elements n1-n28 correspond to each other in the order of d1,. Multidot., d14, e1,. Multidot.e 8, f1,. Multidot.f 6, the value of the dimension element n26 in the dimension corresponding to f4 is set to be 1, and the other dimension elements are set to be 0, the candidate feature vector corresponding to the candidate edge u2-v1 can be represented as [0,. Multidot.1, 0].

Step 35003: aiming at the obtained candidate feature vectors corresponding to the candidate edges respectively, the following operations are respectively executed: respectively obtaining preset dimension weights corresponding to all dimension elements contained in one candidate feature vector, and carrying out weighted summation based on the values of the dimension elements and the corresponding dimension weights to obtain edge weights of candidate edges corresponding to one candidate feature vector, wherein each dimension weight represents the occurrence probability of the corresponding dimension element.

Optionally, the dimension weight may be obtained by counting the occurrence probability of each dimension element according to an actual scene, for example, the occurrence probability of each dimension is represented by the reciprocal of the occurrence frequency of each dimension element in a specific scene. For example, if the number of occurrences of the dimension element n20 in the SQL Server weighting scenario is 5 times, the corresponding dimension weight of the dimension element n20 is 1/5, and if the number of occurrences of the dimension element n2 in another scenario is 1 time, the corresponding dimension weight is 1.

Step 35004: and carrying out weighted summation based on the obtained association mode corresponding to each candidate edge and the corresponding edge weight to obtain the path weight corresponding to one candidate attack path.

For example, in a SQL Server weighting scenario, based on the obtained dimensional weights, weighted summation is performed on the candidate feature vectors corresponding to the candidate edges, that is, after the corresponding multiplication is performed on the dimensional weights, each dimension in each edge is summed, and a path weight corresponding to one candidate attack path is obtained.

Step 3502: and based on the obtained path weights corresponding to the candidate attack paths, taking the candidate attack path with the path weight reaching a path weight threshold as the target attack path.

For example, if the path weight of the candidate attack path shown in fig. 9a is 7, the path weight of the candidate attack path shown in fig. 9b is 8, the path weight of the candidate attack path shown in fig. 9c is 5, and the preset path weight threshold is 6, the candidate attack path shown in fig. 9a and fig. 9b is provided to the security analyst as the target attack path.

Referring to fig. 10, an attack tracing apparatus provided in this embodiment of the present application includes an alarm module 1001, a first generation module 1002, a second generation module 1003, a search module 1004, and an optional removal module 1000, where:

the warning module 1001 is configured to obtain first historical log data of a target object, and determine an attack target and an attack source based on the first historical log data.

The first generating module 1002 is configured to acquire second historical log data of each device associated with the target object, extract each target feature from the first historical log data and the second historical log data based on a data type of the second historical log data, and generate a corresponding feature time sequence difference map based on each target feature.

The second generating module 1003 is configured to generate a corresponding attack tracing graph based on the feature timing difference graph and by combining a preset knowledge graph.

The searching module 1004 is configured to obtain, based on the attack tracing graph, each candidate attack path including the attack target and the attack source, and screen, based on a preset path condition, a target attack path that meets the preset path condition from the candidate attack paths.

As shown in fig. 11, optionally, the attack tracing apparatus further includes a removing module 1000, configured to remove, based on a preset baseline model, second historical log data with a relevant predetermined parameter lower than a preset threshold from each second historical log data, respectively.

In an optional embodiment, if the first historical log data of the target object at least includes alarm data recorded by performing an illegal operation on the target object, when determining an attack target and an attack source based on the first historical log data, the alarm module 1001 is specifically configured to:

In an optional embodiment, when second historical log data of each device associated with the target object is obtained, each target feature is extracted from the first historical log data and the second historical log data based on a data type of the second historical log data, and a corresponding feature time sequence difference diagram is generated based on each target feature, the first generating module 1002 is specifically configured to:

second historical log data of each device associated with the target object is obtained.

And determining a data type corresponding to the target feature based on the data type of the attack source and the log data types of the first historical log data and the second historical log data.

In an optional embodiment, when generating a corresponding attack tracing graph based on the feature timing difference graph and by combining a preset knowledge graph, the second generating module 1003 is specifically configured to:

and respectively setting the association mode between each feature node and each rule node as a complete association mode, wherein each rule node corresponds to each preset knowledge base rule one by one, and each rule node is set based on the corresponding knowledge base rule.

In an optional embodiment, before the association manner between each feature node and each rule node is set as a complementary association manner, the second generating module 1003 is further configured to:

Alternatively, the first and second electrodes may be,

In an optional embodiment, when obtaining, based on the attack tracing graph, each candidate attack path including the attack target and the attack source, and screening, based on a preset path condition, a target attack path that meets the preset path condition from the candidate attack paths, the searching module 1004 is specifically configured to:

for each candidate attack path, performing the following operations:

Respectively extracting the association modes corresponding to the candidate edges, and respectively executing the following operations aiming at the candidate edges: and obtaining candidate feature vectors corresponding to one candidate edge based on an association mode corresponding to the candidate edge, wherein each candidate feature vector comprises a plurality of dimension elements, and each dimension element represents one attribute of the association mode.

And respectively executing the following operations for the obtained candidate characteristic vectors corresponding to the candidate edges: respectively obtaining preset dimension weights corresponding to all dimension elements contained in one candidate feature vector, and performing weighted summation based on the values of the dimension elements and the corresponding dimension weights to obtain edge weights of candidate edges corresponding to the candidate feature vector, wherein each dimension weight represents the occurrence probability of the corresponding dimension element.

And based on the obtained path weight corresponding to each candidate attack path, taking the candidate attack path with the path weight reaching the path weight threshold as the target attack path.

In an alternative embodiment, when obtaining the candidate feature vector corresponding to one candidate edge based on the association manner corresponding to the candidate edge, the searching module 1004 is specifically configured to:

Based on the same inventive concept as the above-mentioned application embodiment, the embodiment of the present application further provides an electronic device, which can be used for attack tracing. In one embodiment, the electronic device may be a server, a terminal device, or other electronic device. In this embodiment, the electronic device may be configured as shown in fig. 12, and include a memory 1201, a communication interface 1203, and one or more processors 1202.

A memory 1201 for storing computer programs executed by the processor 1202. The memory 1201 may mainly include a storage program area and a storage data area, where the storage program area may store an operating system, a program required for running an instant messaging function, and the like; the storage data area can store various instant messaging information, operation instruction sets and the like.

The memory 1201 may be a volatile memory (RAM), such as a random-access memory (RAM); the memory 1201 may also be a non-volatile memory (non-volatile memory) such as, but not limited to, a read-only memory (rom), a flash memory (flash memory), a hard disk (HDD) or a solid-state drive (SSD), or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. The memory 1201 may be a combination of the above memories.

The processor 1202 may include one or more Central Processing Units (CPUs), a digital Processing Unit, and the like. A processor 1202 for implementing the image search method described above when calling a computer program stored in the memory 1201.

The communication interface 1203 is used for communication with a terminal device and other servers.

In the embodiment of the present application, a specific connection medium among the memory 1201, the communication interface 1203, and the processor 1202 is not limited. In the embodiment of the present application, the memory 1201 and the processor 1202 are connected through the bus 1204 in fig. 12, the bus 1204 is represented by a thick line in fig. 12, and the connection manner between other components is only schematically illustrated and is not limited. The bus 1204 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 12, but this is not intended to represent only one bus or type of bus.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and so forth) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. An attack tracing method is characterized by comprising the following steps:

acquiring first historical log data of a target object, and determining an attack target and an attack source based on the first historical log data;

acquiring second historical log data of each device associated with the target object, extracting each target feature from the first historical log data and the second historical log data based on the data type of the second historical log data, taking each target feature as a corresponding feature node, setting the association mode between every two feature nodes as a feature association mode, taking each corresponding feature association mode as a corresponding feature edge based on each feature node, and generating a corresponding feature time sequence abnormal graph;

respectively setting the association mode between each feature node and each rule node as a completion association mode, and taking each completion association mode as a corresponding completion edge connecting the feature time sequence abnormal pattern and a preset knowledge graph based on the feature time sequence abnormal pattern and the preset knowledge graph to generate a corresponding attack tracing graph, wherein each rule node corresponds to each preset knowledge base rule one by one, and each rule node is set based on the corresponding knowledge base rule;

obtaining each candidate attack path comprising the attack target and the attack source based on the attack tracing graph;

2. The method according to claim 1, wherein the first historical log data of the target object at least includes alarm data recorded when an illegal operation is performed on the target object, and determining an attack target and an attack source based on the first historical log data comprises:

3. The method of claim 1 or 2, prior to obtaining the first historical log data of the target object, further comprising:

and based on a preset baseline model, respectively removing second historical log data with related preset parameters lower than a preset threshold value from each second historical log data.

4. The method of claim 3, wherein the knowledge-graph is obtained by:

acquiring preset knowledge base rules, respectively taking the preset knowledge base rules as corresponding rule nodes, respectively setting the association mode between every two rule nodes as a rule association mode, and respectively taking the corresponding rule association modes as corresponding rule edges on the basis of the rule nodes to generate corresponding knowledge maps;

alternatively, the first and second electrodes may be,

the method comprises the steps of acquiring a preset knowledge base, generating each knowledge base rule based on the preset knowledge base by adopting a preset knowledge graph algorithm, generating a rule association mode according to an association mode between every two knowledge base rules, taking the generated knowledge base rules as corresponding rule nodes, and taking the corresponding rule association modes as corresponding rule edges to generate a corresponding knowledge graph.

5. The method of claim 4, wherein obtaining second historical log data for respective devices associated with the target object and extracting respective target features from the first historical log data and the second historical log data based on a data type of the second historical log data comprises:

acquiring second historical log data of each device associated with the target object;

determining a data type corresponding to a target feature based on the data type of the attack source and the log data types of the first historical log data and the second historical log data;

6. The method of claim 5, wherein screening the target attack path meeting a preset path condition from the candidate attack paths based on the preset path condition comprises:

for each candidate attack path, performing the following operations:

obtaining each candidate node in one candidate attack path, and determining a candidate edge between every two candidate nodes, wherein each candidate node is any one feature node or any one rule node in the attack tracing graph; the association mode corresponding to each candidate edge is determined based on the node types of two candidate nodes connected by the candidate edge;

respectively extracting the association modes corresponding to the candidate edges, and respectively executing the following operations aiming at the candidate edges: obtaining candidate feature vectors corresponding to one candidate edge based on an association mode corresponding to the candidate edge, wherein each candidate feature vector comprises a plurality of dimension elements, and each dimension element represents one attribute of the association mode;

aiming at the obtained candidate feature vectors corresponding to the candidate edges respectively, the following operations are respectively executed:

respectively obtaining preset dimension weights corresponding to all dimension elements contained in one candidate feature vector, and performing weighted summation based on the values of the dimension elements and the corresponding dimension weights to obtain edge weights of candidate edges corresponding to the candidate feature vector, wherein each dimension weight represents the occurrence probability of the corresponding dimension element;

carrying out weighted summation based on the obtained association mode corresponding to each candidate edge and the corresponding edge weight to obtain the path weight corresponding to the candidate attack path;

7. The method of claim 6, wherein obtaining the candidate feature vector corresponding to one candidate edge based on the association manner corresponding to the one candidate edge comprises:

8. An attack tracing apparatus, comprising:

the warning module is used for acquiring first historical log data of a target object and determining an attack target and an attack source based on the first historical log data;

the first generation module is used for acquiring second historical log data of each device associated with the target object, extracting each target feature from the first historical log data and the second historical log data based on the data type of the second historical log data, taking each target feature as a corresponding feature node, setting the association mode between every two feature nodes as a feature association mode, taking each corresponding feature association mode as a corresponding feature edge based on each feature node, and generating a corresponding feature timing difference graph;

a second generation module, configured to set association manners between each feature node and each rule node as completion association manners, and based on the feature timing differential graph and the preset knowledge graph, use each completion association manner as a corresponding completion edge connecting the feature timing differential graph and the preset knowledge graph to generate a corresponding attack tracing graph, where each rule node corresponds to each preset knowledge base rule one by one, and each rule node is set based on the corresponding knowledge base rule;

the searching module is used for obtaining each candidate attack path comprising the attack target and the attack source based on the attack tracing graph and screening a target attack path meeting preset path conditions from each candidate attack path based on preset path conditions;

optionally, the removing module is configured to remove, based on a preset baseline model, second historical log data of which related predetermined parameters are lower than a preset threshold from each second historical log data, respectively.

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the attack tracing method according to any one of claims 1-7 when executing the computer program.