CN117040921A

CN117040921A - Big data-based APT attack identification method and device and electronic equipment

Info

Publication number: CN117040921A
Application number: CN202311236743.4A
Authority: CN
Inventors: 黄晨静子; 陆洲
Original assignee: DBAPPSecurity Co Ltd
Current assignee: DBAPPSecurity Co Ltd
Priority date: 2023-09-22
Filing date: 2023-09-22
Publication date: 2023-11-10

Abstract

The invention provides an APT attack identification method and device based on big data and electronic equipment, and relates to the technical field of network security, comprising the following steps: collecting all flow data and all log data in a network system to be protected; extracting a first target feature from the target flow data to obtain first feature data; extracting a second target feature from the target log data to obtain second feature data; and respectively processing the first characteristic data/the second characteristic data by utilizing the target intelligent analysis model to obtain an APT attack identification result of the target flow data/the target log data. According to the method, the first characteristic data and the second characteristic data are analyzed by using the target intelligent analysis model which learns the known APT attack characteristics and the normal behavior characteristics, so that the accuracy of an APT attack identification result can be ensured, and the technical problem of APT attack missing identification existing in the existing APT attack identification method is effectively solved.

Description

Big data-based APT attack identification method and device and electronic equipment

Technical Field

The invention relates to the technical field of network security, in particular to an APT attack identification method and device based on big data and electronic equipment.

Background

In the current network security environment, APT (Advanced Persistent Threat ) attacks are a concealed and persistent threat. An attacker can acquire the authority to enter the system by utilizing means of loopholes, social engineering and the like, and perform malicious activities silently during latency.

There are some solutions based on network monitoring and intrusion detection systems for detecting and defending against network attacks. However, the existing technical schemes can only defend against specific types of attacks, and as network attacks continue to evolve, challenges facing the schemes are also increasing, and complex features of APT attacks cannot be covered comprehensively. That is, the existing APT attack recognition method has the technical problem of APT attack missing recognition.

Disclosure of Invention

The invention aims to provide an APT attack identification method and device based on big data and electronic equipment, so as to solve the technical problem of APT attack missing identification existing in the existing APT attack identification method.

In a first aspect, the present invention provides an APT attack recognition method based on big data, including: collecting all flow data and all log data in a network system to be protected; extracting first target features from target flow data to obtain first feature data, and extracting second target features from target log data to obtain second feature data; wherein the target flow data represents any one of the all flow data, and the target log data represents any one of the all log data; the first target feature comprises: source IP address, destination IP address, port number, and transport protocol; the second target feature comprises: logging in data, a process number and an operation instruction; processing the first characteristic data by using a target intelligent analysis model to obtain an APT attack identification result of the target flow data, and processing the second characteristic data by using a target intelligent analysis model to obtain an APT attack identification result of the target log data; the target intelligent analysis model is a network model which learns known APT attack characteristics and normal behavior characteristics; the APT attack identification result comprises one of the following: there is an APT attack and no APT attack.

In an alternative embodiment, collecting all traffic data and all log data in the network system to be protected includes: and collecting all flow data and all log data through the distributed data collection nodes in the network system to be protected.

In an alternative embodiment, extracting the first target feature from the target traffic data and extracting the second target feature from the target log data includes: processing the all traffic data and all log data in parallel with a plurality of compute nodes in a distributed compute engine to extract a first target feature from the target traffic data and a second target feature from the target log data.

In an alternative embodiment, the method further comprises: acquiring a training sample set; wherein the training sample set comprises: a plurality of sample traffic data and a plurality of sample log data, and each of the sample traffic data and each of the sample log data has a corresponding sample tag, the sample tag comprising one of: APT attack behavior and normal behavior; extracting the first target feature from target sample flow data to obtain first sample feature data, and extracting the second target feature from target sample log data to obtain second sample feature data; wherein the target sample flow data represents any one of the plurality of sample flow data, and the target sample log data represents any one of the plurality of sample log data; and training the initial intelligent analysis model based on sample characteristic data of all sample flow data and all sample log data in the training sample set and corresponding sample labels to obtain the target intelligent analysis model.

In an alternative embodiment, the method further comprises: threat information data sent by an external network system is received through a secure data exchange interface; acquiring threat data characteristics in the threat information data; storing the threat data characteristics as an expansion sample into the training sample set to obtain an updated training sample set; and training the target intelligent analysis model by using the updated training sample set to obtain an updated target intelligent analysis model.

In an alternative embodiment, the method further comprises: and under the condition that the APT attack identification result output by the target intelligent analysis model is that the APT attack exists, sharing the APT attack identification result and corresponding model input data as threat information to an external network system through a secure data exchange interface.

In an alternative embodiment, the method further comprises: and counting the input data and the output data of the target intelligent analysis model, and displaying the counting result through a visualization tool.

In a second aspect, the present invention provides an APT attack recognition device based on big data, including: the acquisition module is used for acquiring all flow data and all log data in the network system to be protected; the first extraction module is used for extracting first target features from the target flow data to obtain first feature data, and extracting second target features from the target log data to obtain second feature data; wherein the target flow data represents any one of the all flow data, and the target log data represents any one of the all log data; the first target feature comprises: source IP address, destination IP address, port number, and transport protocol; the second target feature comprises: logging in data, a process number and an operation instruction; the processing module is used for processing the first characteristic data by utilizing a target intelligent analysis model to obtain an APT attack identification result of the target flow data, and processing the second characteristic data by utilizing the target intelligent analysis model to obtain an APT attack identification result of the target log data; the target intelligent analysis model is a network model which learns known APT attack characteristics and normal behavior characteristics; the APT attack identification result comprises one of the following: there is an APT attack and no APT attack.

In a third aspect, the present invention provides an electronic device, including a memory, and a processor, where the memory stores a computer program that can be executed on the processor, and the processor implements the steps of the APT attack recognition method based on big data in any of the foregoing embodiments when the processor executes the computer program.

In a fourth aspect, the present invention provides a computer readable storage medium storing computer instructions that when executed by a processor implement the big data based APT attack recognition method of any of the preceding embodiments.

The invention provides an APT attack identification method based on big data, which uses a target intelligent analysis model which learns known APT attack characteristics and normal behavior characteristics to analyze first characteristic data of target flow data and second characteristic data of target log data so as to determine an APT attack identification result of the target flow data/the target log data. In view of the fact that the target intelligent analysis model learns a large number of APT attack characteristics, the target intelligent analysis model has higher accuracy in carrying out APT attack recognition, and therefore the technical problem of APT attack missing recognition existing in an existing APT attack recognition method is effectively solved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of an APT attack recognition method based on big data provided in an embodiment of the present invention;

FIG. 2 is a histogram of threat quantity statistics of different types;

FIG. 3 is a schematic diagram of an analysis of an attacked asset;

FIG. 4 is a threat situation report presentation interface diagram;

fig. 5 is a functional block diagram of an APT attack recognition device based on big data according to an embodiment of the present invention;

fig. 6 is a schematic diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.

Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Some embodiments of the present invention are described in detail below with reference to the accompanying drawings. The following embodiments and features of the embodiments may be combined with each other without conflict.

Example 1

Fig. 1 is a flowchart of an APT attack recognition method based on big data provided in an embodiment of the present invention, as shown in fig. 1, the method specifically includes the following steps:

step S102, collecting all flow data and all log data in the network system to be protected.

In order to accurately identify the APT attack occurring in the network system to be protected, the embodiment of the present invention needs to collect all traffic data and all log data in the network system to be protected in real time, so as to analyze whether the APT attack exists in the system from two aspects of traffic and log. Compared with the method for carrying out APT attack identification purely according to the flow data or the log data, the method for carrying out APT attack identification according to the embodiment of the invention has the advantage that the data acquisition is more comprehensive, so that the APT attack omission ratio can be effectively reduced.

Step S104, extracting a first target feature from the target flow data to obtain first feature data, and extracting a second target feature from the target log data to obtain second feature data.

Wherein the target flow data represents any one of all flow data, and the target log data represents any one of all log data; the first target feature comprises: source IP address, destination IP address, port number, and transport protocol; the second target feature comprises: login data, process number and operation instruction.

APT attacks have specific behavioral characteristics, so embodiments of the present invention identify these attacks by extracting features in the traffic data and log data. Optionally, the first target feature that the flow data needs to extract may include, in addition to the features listed above: the size of the network traffic.

And S106, processing the first characteristic data by using the target intelligent analysis model to obtain an APT attack identification result of the target flow data, and processing the second characteristic data by using the target intelligent analysis model to obtain an APT attack identification result of the target log data.

The target intelligent analysis model is a network model which learns the known APT attack characteristics and normal behavior characteristics; the APT attack recognition result includes one of the following: there is an APT attack and no APT attack.

The target intelligent analysis model in the embodiment of the invention is an analysis model constructed based on machine learning and deep learning algorithms, and can process the first characteristic data of target flow data and the second characteristic data of target log data. The model can analyze the newly input characteristic data by learning the characteristics of various known attack modes and normal behaviors, and determine the APT attack recognition results of the target flow data and the target log data by analyzing the characteristic data. After determining that the APT attack exists, the network system to be protected immediately responds to take corresponding defending measures.

The embodiment of the invention provides an APT attack identification method based on big data, which uses a target intelligent analysis model which learns known APT attack characteristics and normal behavior characteristics to analyze first characteristic data of target flow data and second characteristic data of target log data so as to determine an APT attack identification result of the target flow data/the target log data. In view of the fact that the target intelligent analysis model learns a large number of APT attack characteristics, the target intelligent analysis model has higher accuracy in carrying out APT attack recognition, and therefore the technical problem of APT attack missing recognition existing in an existing APT attack recognition method is effectively solved.

In an optional embodiment, the step S102 collects all traffic data and all log data in the network system to be protected, which specifically includes the following contents: all flow data and all log data are collected through distributed data collection nodes in the network system to be protected.

In the step S104, the first target feature is extracted from the target flow data, and the second target feature is extracted from the target log data, which specifically includes the following contents: all traffic data and all log data are processed in parallel with a plurality of compute nodes in the distributed compute engine to extract a first target feature from the target traffic data and a second target feature from the target log data.

In the prior art, a single server processes data basically, and the bottleneck exists in the process of processing large-scale network data, so that real-time analysis and mining cannot be performed quickly and efficiently. In order to improve the security detection capability and the response efficiency, the embodiment of the invention uses big data processing technology (refer to technology and method for processing huge and complex data sets) to protect a central server of a network system, and comprises the following steps: distributed storage and distributed computing, and efficient processing and storage of the collected data.

Specifically, the embodiment of the invention adopts the distributed data acquisition nodes to realize the real-time collection of data such as network traffic, system logs and the like. The distributed data acquisition nodes are devices or nodes deployed in a network system to be protected, and mainly comprise a network traffic monitor, a log collector and the like. The nodes are distributed at different positions, and information of a plurality of data sources is obtained by monitoring network traffic and collecting system logs.

Where network traffic data (i.e., traffic data above) includes information of various data packets transmitted in the network, system log data (i.e., log data above) includes log information generated by the operating system, applications, and network devices, such as login activity, administrator operations, system responses, and the like.

After the distributed data acquisition nodes acquire data, the data can be stored on a plurality of physical nodes in a scattered manner through an open-source distributed file system (HDFS), and the reliability and the expandability of the data are ensured through data redundancy and the distributed file system.

Before the data stored in the distributed file system is transmitted to the subsequent module for performing feature extraction, an encryption algorithm (for example, AES encryption algorithm) is used to protect the data, and even if the data is obtained by an unauthorized person, the specific content thereof cannot be understood and used, so that the security of the data is ensured.

According to the embodiment of the invention, a plurality of computing nodes of an open-source distributed computing engine (for example, apache Spark) are adopted to process all flow data and all log data in parallel, so that first characteristic data are extracted from target flow data, and second characteristic data are extracted from target log data.

In an alternative embodiment, the method of the present invention further comprises the steps of:

step S201, a training sample set is acquired.

Wherein the training sample set comprises: a plurality of sample stream data and a plurality of sample log data, each sample stream data and each sample log data having a corresponding sample tag, the sample tag comprising one of: APT attack behavior, normal behavior.

Specifically, the training samples in the training sample set have both sample traffic data with sample labels and sample log data with sample labels, i.e., the APT attack recognition result of each sample traffic data/sample log data is known. If the APT attack identification result of the sample flow data/sample log data is that the APT attack does not exist, the sample flow data/sample log data is considered to belong to data generated by normal behaviors.

Step S202, extracting a first target feature from target sample flow data to obtain first sample feature data, and extracting a second target feature from target sample log data to obtain second sample feature data.

Wherein the target sample flow rate data represents any one of the plurality of sample flow rate data, and the target sample log data represents any one of the plurality of sample log data.

Referring to the feature extraction item in step S104 described above, that is, the first target feature includes: source IP address, destination IP address, port number, and transport protocol; the second target feature comprises: login data, process number and operation instruction. Corresponding features are extracted from each sample stream data and sample log data, thereby obtaining sample feature data of each training sample.

Step S203, training the initial intelligent analysis model based on sample characteristic data of all sample flow data and all sample log data in the training sample set and corresponding sample labels to obtain a target intelligent analysis model.

According to the embodiment of the invention, the data set with the sample characteristics extracted is used, an initial intelligent analysis model is trained through machine learning and deep learning algorithms, and a target intelligent analysis model capable of predicting new data classification is established by learning the relevance between the characteristics and the labels in the sample data.

Alternatively, the initial intelligent analysis model may select a support vector machine (Support Vector Machines, SVM), or a Decision Tree (Decision Tree). The SVM is a supervised learning algorithm and is widely applied to APT attack detection. It constructs an optimal hyperplane by mapping data into high-dimensional space to distinguish between different categories, such as normal behavior and malicious behavior. SVM typically classifies samples according to the distance of the data point from the hyperplane. Decision trees are a popular supervised learning algorithm that can be used for feature selection and classification in APT attack detection. According to the method, a tree structure consisting of nodes and branches is constructed by recursively dividing according to attribute values of the features, and classification and prediction of new data are finally realized.

After model training is completed, the model can be evaluated and verified by using the test data set, and the aim is to ensure that the model can accurately distinguish APT attack behaviors from normal behaviors. And evaluating the performance of the model by calculating indexes such as the accuracy, recall rate, precision, F1 score and the like of the model.

step S301, threat information data sent by an external network system is received through a secure data exchange interface.

Conventional security monitoring systems often operate in isolation, and lack a mechanism for collaboration and information sharing, which makes it difficult to form a unified countermeasure strategy. A secure data exchange interface is a communication interface or specification for secure data sharing and exchange. It defines a good data format and structure, ensuring the accuracy and consistency of the transmission and interpretation of data between different systems or organizations. Common secure data exchange interfaces include STIX (Structured Threat Information eXpression), TAXII (Trusted Automated eXchange of Indicator Information), MISP (Malware Information Sharing Platform), and the like. These interfaces and specifications provide standardized data formats and mechanisms that allow different organizations to conveniently share and exchange secure data, such as malicious IP addresses, malicious URLs, malicious code, etc. Through the secure data exchange interface, the organization can acquire real-time threat information and rapidly perform risk assessment and response.

The network system to be protected in the embodiment of the invention applies the cooperative mechanism and threat information sharing strategy, and can receive threat information data, security event information and the like shared by an external network system by using a standardized data format in real time through the security data exchange interface.

Step S302, threat data characteristics in threat intelligence data are acquired.

Step S303, the threat data features are stored as extended samples in a training sample set, and an updated training sample set is obtained.

And step S304, training the target intelligent analysis model by using the updated training sample set to obtain an updated target intelligent analysis model.

After threat information data shared by an external network system is received, in order to improve the coping capability of the network system to be protected to the APT attack, threat data features (i.e., feature data of an APT attack behavior) need to be obtained from the threat information data, and stored as an expansion sample into a training sample set of a model, so as to enrich sample data, and of course, sample labels of the threat data features are as follows: APT attack behavior. And then training the model by using a training sample set with richer samples, and updating model parameters so as to improve the accuracy of the model.

In an alternative embodiment, the method of the present invention further comprises the following: and under the condition that the APT attack recognition result output by the target intelligent analysis model is that the APT attack exists, sharing the APT attack recognition result and corresponding model input data as threat information to an external network system through a secure data exchange interface.

For example, the APT detection tool of the network system to be protected detects the attack of APT attack, analyzes the characteristics of the corresponding tool, such as information of IP, and immediately synchronizes the information to the firewall of the external network system, so that the firewall of other manufacturers can quickly defend against the designated IP. Sharing, exchanging and analyzing security data between different organizations may improve the ability to handle APT attacks.

The existing safety monitoring tool usually displays data in a form or a simple graph, threat information and attack trend are difficult to intuitively present, and the visualization effect is poor. To solve this problem, in an alternative embodiment, the method of the present invention further comprises the following: and counting the input data and the output data of the target intelligent analysis model, and displaying the counting result through a visualization tool.

Visualization tools are tools that use a graphical interface to present data in a visual form for helping a user understand and analyze the data. Visualization tools can provide charts, network topologies, threat intelligence, attack trends, statistical analysis, reporting, alarms, and event analysis.

Specifically, the chart includes: time series, pie charts, bar charts, and thermodynamic diagrams. The time sequence diagram is used for showing the threat activity trend in time, and helping an analyst find the time distribution situation of abnormal behaviors or attack events, such as a time sequence diagram of the number of alarms and a time sequence diagram of the number of attack events. The pie charts and the bar charts are used for showing the proportional relation of information such as different types of threat sources, attack modes, affected areas and the like, and fig. 2 is a statistical bar chart of the number of different types of threats. Thermodynamic diagrams are used to show activity intensity over a specific period or region by darkening of color, helping to discover abnormal behavior or high risk regions. The network topology is used to demonstrate hosts, servers, devices and connections between them in the network, helping analysts identify potential attack paths and targets.

The threat information display can be in the form of a threat information report, and comprises detailed information about new threat loopholes, malicious codes, attack organizations or attack modes and the like, so that an analyst is helped to know the current threat environment, and a threat information sharing platform is used for providing real-time threat information update, helping the analyst to acquire the latest threat information in time and coping with the corresponding threat.

The attack trend display can be an attack activity diagram and a threat portrait, wherein the attack activity diagram is used for displaying attack activity trends of different time periods or geographic positions so as to help analysts to know main attack types and attack targets; threat portrayal is to analyze and sort specific attacker, attack organization or attack tool, and show its attack technique, target and attack mode.

The embodiment of the invention can count the attack sources and analyze the attacked assets, wherein the attack source statistics refers to the statistics of attack activities in different source areas, so as to help an analyst to know the geographic distribution and attack trend of the attacker. The analysis of the attacked assets refers to counting the type, quantity and frequency of important assets under attack, helping analysts to determine the most risky asset and take corresponding safeguards. FIG. 3 is a schematic diagram of an analysis of an attacked asset.

Report generation refers to generating charts, statistics and trend analysis reports according to safety data so as to comprehensively know safety conditions and threat conditions. FIG. 4 is a diagram of a threat situation report presentation interface.

Alarm and event analysis refers to monitoring system logs and security events in real time, and by setting rules and strategies, alarms are sent out timely and event analysis is carried out, so that quick response and potential threat processing are facilitated. For example, if there are more than X alarms or Y events or a large amount of traffic suddenly occurs in a certain IP within a preset period of time, an alarm method such as a real-time message notification or mail notification is performed.

The comprehensive application of the functions and the tools can improve the recognition, early warning and response capability of security analysts to APT attacks, and help to protect the security and the integrity of an information system. Through the support of the visualization and analysis tools, the security team can better understand threat information, analyze attack trends, generate reports and alarms, and perform effective statistical analysis and event analysis.

In summary, the embodiment of the invention provides an APT attack identification method based on big data, which utilizes big data technology to acquire, store and analyze information such as network security logs, traffic data, threat information and the like, thereby realizing comprehensive perception and analysis of network threats. By establishing a cooperative mechanism, information sharing with an external network system is realized, so that safety operators can acquire latest threat information in time, and the capability of coping with complex threats is enhanced. In order to better present big data analysis results and threat information, a set of visual and easy-to-use visualization tools is developed, and global situation awareness and attack analysis support is provided for safety operators. The tool plays an increasingly important role in the field of network security, and provides stronger security guarantee for enterprises.

Example two

The embodiment of the invention also provides an APT attack recognition device based on big data, which is mainly used for executing the APT attack recognition method based on big data provided by the first embodiment, and the device provided by the embodiment of the invention is specifically introduced below.

Fig. 5 is a functional block diagram of an APT attack recognition device based on big data according to an embodiment of the present invention, and as shown in fig. 5, the device mainly includes: the device comprises an acquisition module 10, a first extraction module 20 and a processing module 30, wherein:

the collection module 10 is configured to collect all traffic data and all log data in the network system to be protected.

A first extraction module 20, configured to extract a first target feature from the target flow data to obtain first feature data, and extract a second target feature from the target log data to obtain second feature data; wherein the target flow data represents any one of all flow data, and the target log data represents any one of all log data; the first target feature comprises: source IP address, destination IP address, port number, and transport protocol; the second target feature comprises: login data, process number and operation instruction.

The processing module 30 is configured to process the first feature data by using the target intelligent analysis model to obtain an APT attack recognition result of the target flow data, and process the second feature data by using the target intelligent analysis model to obtain an APT attack recognition result of the target log data; the target intelligent analysis model is a network model which learns the known APT attack characteristics and normal behavior characteristics; the APT attack recognition result includes one of the following: there is an APT attack and no APT attack.

The embodiment of the invention provides an APT attack identification device based on big data, which uses a target intelligent analysis model which learns known APT attack characteristics and normal behavior characteristics to analyze first characteristic data of target flow data and second characteristic data of target log data so as to determine an APT attack identification result of the target flow data/the target log data. In view of the fact that the target intelligent analysis model learns a large number of APT attack characteristics, the target intelligent analysis model has higher accuracy in carrying out APT attack recognition, and therefore the technical problem of APT attack missing recognition existing in an existing APT attack recognition method is effectively solved.

The collection module 10 is specifically configured to:

all flow data and all log data are collected through distributed data collection nodes in the network system to be protected.

Optionally, the first extraction module 20 is specifically configured to:

all traffic data and all log data are processed in parallel with a plurality of compute nodes in the distributed compute engine to extract a first target feature from the target traffic data and a second target feature from the target log data.

Optionally, the apparatus further comprises:

the first acquisition module is used for acquiring a training sample set; wherein the training sample set comprises: a plurality of sample stream data and a plurality of sample log data, each sample stream data and each sample log data having a corresponding sample tag, the sample tag comprising one of: APT attack behavior, normal behavior.

The second extraction module is used for extracting first target features from target sample flow data to obtain first sample feature data, and extracting second target features from target sample log data to obtain second sample feature data; wherein the target sample flow rate data represents any one of the plurality of sample flow rate data, and the target sample log data represents any one of the plurality of sample log data.

The first training unit is used for training the initial intelligent analysis model based on sample characteristic data of all sample flow data and all sample log data in the training sample set and corresponding sample labels to obtain a target intelligent analysis model.

Optionally, the apparatus further comprises:

and the receiving module is used for receiving threat information data sent by the external network system through the secure data exchange interface.

And the second acquisition module is used for acquiring threat data characteristics in the threat intelligence data.

And the storage module is used for storing the threat data characteristics as an expansion sample into a training sample set to obtain an updated training sample set.

And the second training module is used for training the target intelligent analysis model by using the updated training sample set to obtain an updated target intelligent analysis model.

Optionally, the apparatus further comprises:

and the sharing module is used for sharing the APT attack recognition result and corresponding model input data to an external network system through a secure data exchange interface as threat information under the condition that the APT attack recognition result output by the target intelligent analysis model is that the APT attack exists.

Optionally, the apparatus further comprises:

And the statistics and display unit is used for carrying out statistics on the input data and the output data of the target intelligent analysis model and displaying the statistical result through the visualization tool.

Example III

Referring to fig. 6, an embodiment of the present invention provides an electronic device including: a processor 60, a memory 61, a bus 62 and a communication interface 63, the processor 60, the communication interface 63 and the memory 61 being connected by the bus 62; the processor 60 is arranged to execute executable modules, such as computer programs, stored in the memory 61.

The memory 61 may include a high-speed random access memory (RAM, random Access Memory), and may further include a non-volatile memory (non-volatile memory), such as at least one magnetic disk memory. The communication connection between the system network element and at least one other network element is achieved via at least one communication interface 63 (which may be wired or wireless), and may use the internet, a wide area network, a local network, a metropolitan area network, etc.

Bus 62 may be an ISA bus, a PCI bus, an EISA bus, or the like. The buses may be classified as address buses, data buses, control buses, etc. For ease of illustration, only one bi-directional arrow is shown in FIG. 6, but not only one bus or type of bus.

The memory 61 is configured to store a program, and the processor 60 executes the program after receiving an execution instruction, and the method executed by the apparatus for defining a process disclosed in any of the foregoing embodiments of the present invention may be applied to the processor 60 or implemented by the processor 60.

The processor 60 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuitry in hardware or instructions in software in the processor 60. The processor 60 may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but may also be a digital signal processor (Digital Signal Processing, DSP for short), application specific integrated circuit (Application Specific Integrated Circuit, ASIC for short), off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA for short), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory 61 and the processor 60 reads the information in the memory 61 and in combination with its hardware performs the steps of the method described above.

The embodiment of the invention provides an APT attack recognition method and apparatus based on big data and a computer program product of an electronic device, which comprise a computer readable storage medium storing a non-volatile program code executable by a processor, wherein the program code includes instructions for executing the method described in the foregoing method embodiment, and specific implementation can be seen in the method embodiment and will not be repeated here.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer readable storage medium executable by a processor. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.

In the description of the present invention, it should be noted that, directions or positional relationships indicated by terms such as "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc., are directions or positional relationships based on those shown in the drawings, or are directions or positional relationships conventionally put in use of the inventive product, are merely for convenience of describing the present invention and simplifying the description, and are not indicative or implying that the apparatus or element to be referred to must have a specific direction, be constructed and operated in a specific direction, and thus should not be construed as limiting the present invention. Furthermore, the terms "first," "second," "third," and the like are used merely to distinguish between descriptions and should not be construed as indicating or implying relative importance.

Furthermore, the terms "horizontal," "vertical," "overhang," and the like do not denote a requirement that the component be absolutely horizontal or overhang, but rather may be slightly inclined. As "horizontal" merely means that its direction is more horizontal than "vertical", and does not mean that the structure must be perfectly horizontal, but may be slightly inclined.

In the description of the present invention, it should also be noted that, unless explicitly specified and limited otherwise, the terms "disposed," "mounted," "connected," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present invention will be understood in specific cases by those of ordinary skill in the art.

Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims

1. The APT attack identification method based on big data is characterized by comprising the following steps:

Collecting all flow data and all log data in a network system to be protected;

extracting first target features from target flow data to obtain first feature data, and extracting second target features from target log data to obtain second feature data; wherein the target flow data represents any one of the all flow data, and the target log data represents any one of the all log data; the first target feature comprises: source IP address, destination IP address, port number, and transport protocol; the second target feature comprises: logging in data, a process number and an operation instruction;

processing the first characteristic data by using a target intelligent analysis model to obtain an APT attack identification result of the target flow data, and processing the second characteristic data by using a target intelligent analysis model to obtain an APT attack identification result of the target log data; the target intelligent analysis model is a network model which learns known APT attack characteristics and normal behavior characteristics; the APT attack identification result comprises one of the following: there is an APT attack and no APT attack.

2. The big data based APT attack recognition method according to claim 1, wherein collecting all traffic data and all log data in the network system to be protected comprises:

and collecting all flow data and all log data through the distributed data collection nodes in the network system to be protected.

3. The big data based APT attack recognition method of claim 1, wherein extracting a first target feature from the target traffic data and extracting a second target feature from the target log data comprises:

processing the all traffic data and all log data in parallel with a plurality of compute nodes in a distributed compute engine to extract a first target feature from the target traffic data and a second target feature from the target log data.

4. The big data based APT attack identification method of claim 1, further comprising:

acquiring a training sample set; wherein the training sample set comprises: a plurality of sample traffic data and a plurality of sample log data, and each of the sample traffic data and each of the sample log data has a corresponding sample tag, the sample tag comprising one of: APT attack behavior and normal behavior;

Extracting the first target feature from target sample flow data to obtain first sample feature data, and extracting the second target feature from target sample log data to obtain second sample feature data; wherein the target sample flow data represents any one of the plurality of sample flow data, and the target sample log data represents any one of the plurality of sample log data;

and training the initial intelligent analysis model based on sample characteristic data of all sample flow data and all sample log data in the training sample set and corresponding sample labels to obtain the target intelligent analysis model.

5. The big data based APT attack identification method of claim 4, further comprising:

threat information data sent by an external network system is received through a secure data exchange interface;

acquiring threat data characteristics in the threat information data;

storing the threat data characteristics as an expansion sample into the training sample set to obtain an updated training sample set;

and training the target intelligent analysis model by using the updated training sample set to obtain an updated target intelligent analysis model.

6. The big data based APT attack identification method of claim 1, further comprising:

and under the condition that the APT attack identification result output by the target intelligent analysis model is that the APT attack exists, sharing the APT attack identification result and corresponding model input data as threat information to an external network system through a secure data exchange interface.

7. The big data based APT attack identification method of claim 1, further comprising:

and counting the input data and the output data of the target intelligent analysis model, and displaying the counting result through a visualization tool.

8. An APT attack recognition device based on big data, comprising:

the acquisition module is used for acquiring all flow data and all log data in the network system to be protected;

the first extraction module is used for extracting first target features from the target flow data to obtain first feature data, and extracting second target features from the target log data to obtain second feature data; wherein the target flow data represents any one of the all flow data, and the target log data represents any one of the all log data; the first target feature comprises: source IP address, destination IP address, port number, and transport protocol; the second target feature comprises: logging in data, a process number and an operation instruction;

The processing module is used for processing the first characteristic data by utilizing a target intelligent analysis model to obtain an APT attack identification result of the target flow data, and processing the second characteristic data by utilizing the target intelligent analysis model to obtain an APT attack identification result of the target log data; the target intelligent analysis model is a network model which learns known APT attack characteristics and normal behavior characteristics; the APT attack identification result comprises one of the following: there is an APT attack and no APT attack.

9. An electronic device comprising a memory, a processor, the memory having stored thereon a computer program executable on the processor, characterized in that the processor, when executing the computer program, implements the steps of the big data based APT attack identification method of any of claims 1 to 7.

10. A computer readable storage medium storing computer instructions which, when executed by a processor, implement the big data based APT attack identification method of any of claims 1 to 7.