CN117040921A - Big data-based APT attack identification method and device and electronic equipment - Google Patents

Big data-based APT attack identification method and device and electronic equipment Download PDF

Info

Publication number
CN117040921A
CN117040921A CN202311236743.4A CN202311236743A CN117040921A CN 117040921 A CN117040921 A CN 117040921A CN 202311236743 A CN202311236743 A CN 202311236743A CN 117040921 A CN117040921 A CN 117040921A
Authority
CN
China
Prior art keywords
data
target
apt attack
sample
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311236743.4A
Other languages
Chinese (zh)
Inventor
黄晨静子
陆洲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
DBAPPSecurity Co Ltd
Original Assignee
DBAPPSecurity Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by DBAPPSecurity Co Ltd filed Critical DBAPPSecurity Co Ltd
Priority to CN202311236743.4A priority Critical patent/CN117040921A/en
Publication of CN117040921A publication Critical patent/CN117040921A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/16Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks using machine learning or artificial intelligence
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/30Network architectures or network communication protocols for network security for supporting lawful interception, monitoring or retaining of communications or communication related information
    • H04L63/302Network architectures or network communication protocols for network security for supporting lawful interception, monitoring or retaining of communications or communication related information gathering intelligence information for situation awareness or reconnaissance
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/40Network security protocols
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/50Reducing energy consumption in communication networks in wire-line communication networks, e.g. low power modes or reduced link rate

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Computer Hardware Design (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Technology Law (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention provides an APT attack identification method and device based on big data and electronic equipment, and relates to the technical field of network security, comprising the following steps: collecting all flow data and all log data in a network system to be protected; extracting a first target feature from the target flow data to obtain first feature data; extracting a second target feature from the target log data to obtain second feature data; and respectively processing the first characteristic data/the second characteristic data by utilizing the target intelligent analysis model to obtain an APT attack identification result of the target flow data/the target log data. According to the method, the first characteristic data and the second characteristic data are analyzed by using the target intelligent analysis model which learns the known APT attack characteristics and the normal behavior characteristics, so that the accuracy of an APT attack identification result can be ensured, and the technical problem of APT attack missing identification existing in the existing APT attack identification method is effectively solved.

Description

Big data-based APT attack identification method and device and electronic equipment
Technical Field
The invention relates to the technical field of network security, in particular to an APT attack identification method and device based on big data and electronic equipment.
Background
In the current network security environment, APT (Advanced Persistent Threat ) attacks are a concealed and persistent threat. An attacker can acquire the authority to enter the system by utilizing means of loopholes, social engineering and the like, and perform malicious activities silently during latency.
There are some solutions based on network monitoring and intrusion detection systems for detecting and defending against network attacks. However, the existing technical schemes can only defend against specific types of attacks, and as network attacks continue to evolve, challenges facing the schemes are also increasing, and complex features of APT attacks cannot be covered comprehensively. That is, the existing APT attack recognition method has the technical problem of APT attack missing recognition.
Disclosure of Invention
The invention aims to provide an APT attack identification method and device based on big data and electronic equipment, so as to solve the technical problem of APT attack missing identification existing in the existing APT attack identification method.
In a first aspect, the present invention provides an APT attack recognition method based on big data, including: collecting all flow data and all log data in a network system to be protected; extracting first target features from target flow data to obtain first feature data, and extracting second target features from target log data to obtain second feature data; wherein the target flow data represents any one of the all flow data, and the target log data represents any one of the all log data; the first target feature comprises: source IP address, destination IP address, port number, and transport protocol; the second target feature comprises: logging in data, a process number and an operation instruction; processing the first characteristic data by using a target intelligent analysis model to obtain an APT attack identification result of the target flow data, and processing the second characteristic data by using a target intelligent analysis model to obtain an APT attack identification result of the target log data; the target intelligent analysis model is a network model which learns known APT attack characteristics and normal behavior characteristics; the APT attack identification result comprises one of the following: there is an APT attack and no APT attack.
In an alternative embodiment, collecting all traffic data and all log data in the network system to be protected includes: and collecting all flow data and all log data through the distributed data collection nodes in the network system to be protected.
In an alternative embodiment, extracting the first target feature from the target traffic data and extracting the second target feature from the target log data includes: processing the all traffic data and all log data in parallel with a plurality of compute nodes in a distributed compute engine to extract a first target feature from the target traffic data and a second target feature from the target log data.
In an alternative embodiment, the method further comprises: acquiring a training sample set; wherein the training sample set comprises: a plurality of sample traffic data and a plurality of sample log data, and each of the sample traffic data and each of the sample log data has a corresponding sample tag, the sample tag comprising one of: APT attack behavior and normal behavior; extracting the first target feature from target sample flow data to obtain first sample feature data, and extracting the second target feature from target sample log data to obtain second sample feature data; wherein the target sample flow data represents any one of the plurality of sample flow data, and the target sample log data represents any one of the plurality of sample log data; and training the initial intelligent analysis model based on sample characteristic data of all sample flow data and all sample log data in the training sample set and corresponding sample labels to obtain the target intelligent analysis model.
In an alternative embodiment, the method further comprises: threat information data sent by an external network system is received through a secure data exchange interface; acquiring threat data characteristics in the threat information data; storing the threat data characteristics as an expansion sample into the training sample set to obtain an updated training sample set; and training the target intelligent analysis model by using the updated training sample set to obtain an updated target intelligent analysis model.
In an alternative embodiment, the method further comprises: and under the condition that the APT attack identification result output by the target intelligent analysis model is that the APT attack exists, sharing the APT attack identification result and corresponding model input data as threat information to an external network system through a secure data exchange interface.
In an alternative embodiment, the method further comprises: and counting the input data and the output data of the target intelligent analysis model, and displaying the counting result through a visualization tool.
In a second aspect, the present invention provides an APT attack recognition device based on big data, including: the acquisition module is used for acquiring all flow data and all log data in the network system to be protected; the first extraction module is used for extracting first target features from the target flow data to obtain first feature data, and extracting second target features from the target log data to obtain second feature data; wherein the target flow data represents any one of the all flow data, and the target log data represents any one of the all log data; the first target feature comprises: source IP address, destination IP address, port number, and transport protocol; the second target feature comprises: logging in data, a process number and an operation instruction; the processing module is used for processing the first characteristic data by utilizing a target intelligent analysis model to obtain an APT attack identification result of the target flow data, and processing the second characteristic data by utilizing the target intelligent analysis model to obtain an APT attack identification result of the target log data; the target intelligent analysis model is a network model which learns known APT attack characteristics and normal behavior characteristics; the APT attack identification result comprises one of the following: there is an APT attack and no APT attack.
In a third aspect, the present invention provides an electronic device, including a memory, and a processor, where the memory stores a computer program that can be executed on the processor, and the processor implements the steps of the APT attack recognition method based on big data in any of the foregoing embodiments when the processor executes the computer program.
In a fourth aspect, the present invention provides a computer readable storage medium storing computer instructions that when executed by a processor implement the big data based APT attack recognition method of any of the preceding embodiments.
The invention provides an APT attack identification method based on big data, which uses a target intelligent analysis model which learns known APT attack characteristics and normal behavior characteristics to analyze first characteristic data of target flow data and second characteristic data of target log data so as to determine an APT attack identification result of the target flow data/the target log data. In view of the fact that the target intelligent analysis model learns a large number of APT attack characteristics, the target intelligent analysis model has higher accuracy in carrying out APT attack recognition, and therefore the technical problem of APT attack missing recognition existing in an existing APT attack recognition method is effectively solved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of an APT attack recognition method based on big data provided in an embodiment of the present invention;
FIG. 2 is a histogram of threat quantity statistics of different types;
FIG. 3 is a schematic diagram of an analysis of an attacked asset;
FIG. 4 is a threat situation report presentation interface diagram;
fig. 5 is a functional block diagram of an APT attack recognition device based on big data according to an embodiment of the present invention;
fig. 6 is a schematic diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Some embodiments of the present invention are described in detail below with reference to the accompanying drawings. The following embodiments and features of the embodiments may be combined with each other without conflict.
Example 1
Fig. 1 is a flowchart of an APT attack recognition method based on big data provided in an embodiment of the present invention, as shown in fig. 1, the method specifically includes the following steps:
step S102, collecting all flow data and all log data in the network system to be protected.
In order to accurately identify the APT attack occurring in the network system to be protected, the embodiment of the present invention needs to collect all traffic data and all log data in the network system to be protected in real time, so as to analyze whether the APT attack exists in the system from two aspects of traffic and log. Compared with the method for carrying out APT attack identification purely according to the flow data or the log data, the method for carrying out APT attack identification according to the embodiment of the invention has the advantage that the data acquisition is more comprehensive, so that the APT attack omission ratio can be effectively reduced.
Step S104, extracting a first target feature from the target flow data to obtain first feature data, and extracting a second target feature from the target log data to obtain second feature data.
Wherein the target flow data represents any one of all flow data, and the target log data represents any one of all log data; the first target feature comprises: source IP address, destination IP address, port number, and transport protocol; the second target feature comprises: login data, process number and operation instruction.
APT attacks have specific behavioral characteristics, so embodiments of the present invention identify these attacks by extracting features in the traffic data and log data. Optionally, the first target feature that the flow data needs to extract may include, in addition to the features listed above: the size of the network traffic.
And S106, processing the first characteristic data by using the target intelligent analysis model to obtain an APT attack identification result of the target flow data, and processing the second characteristic data by using the target intelligent analysis model to obtain an APT attack identification result of the target log data.
The target intelligent analysis model is a network model which learns the known APT attack characteristics and normal behavior characteristics; the APT attack recognition result includes one of the following: there is an APT attack and no APT attack.
The target intelligent analysis model in the embodiment of the invention is an analysis model constructed based on machine learning and deep learning algorithms, and can process the first characteristic data of target flow data and the second characteristic data of target log data. The model can analyze the newly input characteristic data by learning the characteristics of various known attack modes and normal behaviors, and determine the APT attack recognition results of the target flow data and the target log data by analyzing the characteristic data. After determining that the APT attack exists, the network system to be protected immediately responds to take corresponding defending measures.
The embodiment of the invention provides an APT attack identification method based on big data, which uses a target intelligent analysis model which learns known APT attack characteristics and normal behavior characteristics to analyze first characteristic data of target flow data and second characteristic data of target log data so as to determine an APT attack identification result of the target flow data/the target log data. In view of the fact that the target intelligent analysis model learns a large number of APT attack characteristics, the target intelligent analysis model has higher accuracy in carrying out APT attack recognition, and therefore the technical problem of APT attack missing recognition existing in an existing APT attack recognition method is effectively solved.
In an optional embodiment, the step S102 collects all traffic data and all log data in the network system to be protected, which specifically includes the following contents: all flow data and all log data are collected through distributed data collection nodes in the network system to be protected.
In the step S104, the first target feature is extracted from the target flow data, and the second target feature is extracted from the target log data, which specifically includes the following contents: all traffic data and all log data are processed in parallel with a plurality of compute nodes in the distributed compute engine to extract a first target feature from the target traffic data and a second target feature from the target log data.
In the prior art, a single server processes data basically, and the bottleneck exists in the process of processing large-scale network data, so that real-time analysis and mining cannot be performed quickly and efficiently. In order to improve the security detection capability and the response efficiency, the embodiment of the invention uses big data processing technology (refer to technology and method for processing huge and complex data sets) to protect a central server of a network system, and comprises the following steps: distributed storage and distributed computing, and efficient processing and storage of the collected data.
Specifically, the embodiment of the invention adopts the distributed data acquisition nodes to realize the real-time collection of data such as network traffic, system logs and the like. The distributed data acquisition nodes are devices or nodes deployed in a network system to be protected, and mainly comprise a network traffic monitor, a log collector and the like. The nodes are distributed at different positions, and information of a plurality of data sources is obtained by monitoring network traffic and collecting system logs.
Where network traffic data (i.e., traffic data above) includes information of various data packets transmitted in the network, system log data (i.e., log data above) includes log information generated by the operating system, applications, and network devices, such as login activity, administrator operations, system responses, and the like.
After the distributed data acquisition nodes acquire data, the data can be stored on a plurality of physical nodes in a scattered manner through an open-source distributed file system (HDFS), and the reliability and the expandability of the data are ensured through data redundancy and the distributed file system.
Before the data stored in the distributed file system is transmitted to the subsequent module for performing feature extraction, an encryption algorithm (for example, AES encryption algorithm) is used to protect the data, and even if the data is obtained by an unauthorized person, the specific content thereof cannot be understood and used, so that the security of the data is ensured.
According to the embodiment of the invention, a plurality of computing nodes of an open-source distributed computing engine (for example, apache Spark) are adopted to process all flow data and all log data in parallel, so that first characteristic data are extracted from target flow data, and second characteristic data are extracted from target log data.
In an alternative embodiment, the method of the present invention further comprises the steps of:
step S201, a training sample set is acquired.
Wherein the training sample set comprises: a plurality of sample stream data and a plurality of sample log data, each sample stream data and each sample log data having a corresponding sample tag, the sample tag comprising one of: APT attack behavior, normal behavior.
Specifically, the training samples in the training sample set have both sample traffic data with sample labels and sample log data with sample labels, i.e., the APT attack recognition result of each sample traffic data/sample log data is known. If the APT attack identification result of the sample flow data/sample log data is that the APT attack does not exist, the sample flow data/sample log data is considered to belong to data generated by normal behaviors.
Step S202, extracting a first target feature from target sample flow data to obtain first sample feature data, and extracting a second target feature from target sample log data to obtain second sample feature data.
Wherein the target sample flow rate data represents any one of the plurality of sample flow rate data, and the target sample log data represents any one of the plurality of sample log data.
Referring to the feature extraction item in step S104 described above, that is, the first target feature includes: source IP address, destination IP address, port number, and transport protocol; the second target feature comprises: login data, process number and operation instruction. Corresponding features are extracted from each sample stream data and sample log data, thereby obtaining sample feature data of each training sample.
Step S203, training the initial intelligent analysis model based on sample characteristic data of all sample flow data and all sample log data in the training sample set and corresponding sample labels to obtain a target intelligent analysis model.
According to the embodiment of the invention, the data set with the sample characteristics extracted is used, an initial intelligent analysis model is trained through machine learning and deep learning algorithms, and a target intelligent analysis model capable of predicting new data classification is established by learning the relevance between the characteristics and the labels in the sample data.
Alternatively, the initial intelligent analysis model may select a support vector machine (Support Vector Machines, SVM), or a Decision Tree (Decision Tree). The SVM is a supervised learning algorithm and is widely applied to APT attack detection. It constructs an optimal hyperplane by mapping data into high-dimensional space to distinguish between different categories, such as normal behavior and malicious behavior. SVM typically classifies samples according to the distance of the data point from the hyperplane. Decision trees are a popular supervised learning algorithm that can be used for feature selection and classification in APT attack detection. According to the method, a tree structure consisting of nodes and branches is constructed by recursively dividing according to attribute values of the features, and classification and prediction of new data are finally realized.
After model training is completed, the model can be evaluated and verified by using the test data set, and the aim is to ensure that the model can accurately distinguish APT attack behaviors from normal behaviors. And evaluating the performance of the model by calculating indexes such as the accuracy, recall rate, precision, F1 score and the like of the model.
In an alternative embodiment, the method of the present invention further comprises the steps of:
step S301, threat information data sent by an external network system is received through a secure data exchange interface.
Conventional security monitoring systems often operate in isolation, and lack a mechanism for collaboration and information sharing, which makes it difficult to form a unified countermeasure strategy. A secure data exchange interface is a communication interface or specification for secure data sharing and exchange. It defines a good data format and structure, ensuring the accuracy and consistency of the transmission and interpretation of data between different systems or organizations. Common secure data exchange interfaces include STIX (Structured Threat Information eXpression), TAXII (Trusted Automated eXchange of Indicator Information), MISP (Malware Information Sharing Platform), and the like. These interfaces and specifications provide standardized data formats and mechanisms that allow different organizations to conveniently share and exchange secure data, such as malicious IP addresses, malicious URLs, malicious code, etc. Through the secure data exchange interface, the organization can acquire real-time threat information and rapidly perform risk assessment and response.
The network system to be protected in the embodiment of the invention applies the cooperative mechanism and threat information sharing strategy, and can receive threat information data, security event information and the like shared by an external network system by using a standardized data format in real time through the security data exchange interface.
Step S302, threat data characteristics in threat intelligence data are acquired.
Step S303, the threat data features are stored as extended samples in a training sample set, and an updated training sample set is obtained.
And step S304, training the target intelligent analysis model by using the updated training sample set to obtain an updated target intelligent analysis model.
After threat information data shared by an external network system is received, in order to improve the coping capability of the network system to be protected to the APT attack, threat data features (i.e., feature data of an APT attack behavior) need to be obtained from the threat information data, and stored as an expansion sample into a training sample set of a model, so as to enrich sample data, and of course, sample labels of the threat data features are as follows: APT attack behavior. And then training the model by using a training sample set with richer samples, and updating model parameters so as to improve the accuracy of the model.
In an alternative embodiment, the method of the present invention further comprises the following: and under the condition that the APT attack recognition result output by the target intelligent analysis model is that the APT attack exists, sharing the APT attack recognition result and corresponding model input data as threat information to an external network system through a secure data exchange interface.
For example, the APT detection tool of the network system to be protected detects the attack of APT attack, analyzes the characteristics of the corresponding tool, such as information of IP, and immediately synchronizes the information to the firewall of the external network system, so that the firewall of other manufacturers can quickly defend against the designated IP. Sharing, exchanging and analyzing security data between different organizations may improve the ability to handle APT attacks.
The existing safety monitoring tool usually displays data in a form or a simple graph, threat information and attack trend are difficult to intuitively present, and the visualization effect is poor. To solve this problem, in an alternative embodiment, the method of the present invention further comprises the following: and counting the input data and the output data of the target intelligent analysis model, and displaying the counting result through a visualization tool.
Visualization tools are tools that use a graphical interface to present data in a visual form for helping a user understand and analyze the data. Visualization tools can provide charts, network topologies, threat intelligence, attack trends, statistical analysis, reporting, alarms, and event analysis.
Specifically, the chart includes: time series, pie charts, bar charts, and thermodynamic diagrams. The time sequence diagram is used for showing the threat activity trend in time, and helping an analyst find the time distribution situation of abnormal behaviors or attack events, such as a time sequence diagram of the number of alarms and a time sequence diagram of the number of attack events. The pie charts and the bar charts are used for showing the proportional relation of information such as different types of threat sources, attack modes, affected areas and the like, and fig. 2 is a statistical bar chart of the number of different types of threats. Thermodynamic diagrams are used to show activity intensity over a specific period or region by darkening of color, helping to discover abnormal behavior or high risk regions. The network topology is used to demonstrate hosts, servers, devices and connections between them in the network, helping analysts identify potential attack paths and targets.
The threat information display can be in the form of a threat information report, and comprises detailed information about new threat loopholes, malicious codes, attack organizations or attack modes and the like, so that an analyst is helped to know the current threat environment, and a threat information sharing platform is used for providing real-time threat information update, helping the analyst to acquire the latest threat information in time and coping with the corresponding threat.
The attack trend display can be an attack activity diagram and a threat portrait, wherein the attack activity diagram is used for displaying attack activity trends of different time periods or geographic positions so as to help analysts to know main attack types and attack targets; threat portrayal is to analyze and sort specific attacker, attack organization or attack tool, and show its attack technique, target and attack mode.
The embodiment of the invention can count the attack sources and analyze the attacked assets, wherein the attack source statistics refers to the statistics of attack activities in different source areas, so as to help an analyst to know the geographic distribution and attack trend of the attacker. The analysis of the attacked assets refers to counting the type, quantity and frequency of important assets under attack, helping analysts to determine the most risky asset and take corresponding safeguards. FIG. 3 is a schematic diagram of an analysis of an attacked asset.
Report generation refers to generating charts, statistics and trend analysis reports according to safety data so as to comprehensively know safety conditions and threat conditions. FIG. 4 is a diagram of a threat situation report presentation interface.
Alarm and event analysis refers to monitoring system logs and security events in real time, and by setting rules and strategies, alarms are sent out timely and event analysis is carried out, so that quick response and potential threat processing are facilitated. For example, if there are more than X alarms or Y events or a large amount of traffic suddenly occurs in a certain IP within a preset period of time, an alarm method such as a real-time message notification or mail notification is performed.
The comprehensive application of the functions and the tools can improve the recognition, early warning and response capability of security analysts to APT attacks, and help to protect the security and the integrity of an information system. Through the support of the visualization and analysis tools, the security team can better understand threat information, analyze attack trends, generate reports and alarms, and perform effective statistical analysis and event analysis.
In summary, the embodiment of the invention provides an APT attack identification method based on big data, which utilizes big data technology to acquire, store and analyze information such as network security logs, traffic data, threat information and the like, thereby realizing comprehensive perception and analysis of network threats. By establishing a cooperative mechanism, information sharing with an external network system is realized, so that safety operators can acquire latest threat information in time, and the capability of coping with complex threats is enhanced. In order to better present big data analysis results and threat information, a set of visual and easy-to-use visualization tools is developed, and global situation awareness and attack analysis support is provided for safety operators. The tool plays an increasingly important role in the field of network security, and provides stronger security guarantee for enterprises.
Example two
The embodiment of the invention also provides an APT attack recognition device based on big data, which is mainly used for executing the APT attack recognition method based on big data provided by the first embodiment, and the device provided by the embodiment of the invention is specifically introduced below.
Fig. 5 is a functional block diagram of an APT attack recognition device based on big data according to an embodiment of the present invention, and as shown in fig. 5, the device mainly includes: the device comprises an acquisition module 10, a first extraction module 20 and a processing module 30, wherein:
the collection module 10 is configured to collect all traffic data and all log data in the network system to be protected.
A first extraction module 20, configured to extract a first target feature from the target flow data to obtain first feature data, and extract a second target feature from the target log data to obtain second feature data; wherein the target flow data represents any one of all flow data, and the target log data represents any one of all log data; the first target feature comprises: source IP address, destination IP address, port number, and transport protocol; the second target feature comprises: login data, process number and operation instruction.
The processing module 30 is configured to process the first feature data by using the target intelligent analysis model to obtain an APT attack recognition result of the target flow data, and process the second feature data by using the target intelligent analysis model to obtain an APT attack recognition result of the target log data; the target intelligent analysis model is a network model which learns the known APT attack characteristics and normal behavior characteristics; the APT attack recognition result includes one of the following: there is an APT attack and no APT attack.
The embodiment of the invention provides an APT attack identification device based on big data, which uses a target intelligent analysis model which learns known APT attack characteristics and normal behavior characteristics to analyze first characteristic data of target flow data and second characteristic data of target log data so as to determine an APT attack identification result of the target flow data/the target log data. In view of the fact that the target intelligent analysis model learns a large number of APT attack characteristics, the target intelligent analysis model has higher accuracy in carrying out APT attack recognition, and therefore the technical problem of APT attack missing recognition existing in an existing APT attack recognition method is effectively solved.
The collection module 10 is specifically configured to:
all flow data and all log data are collected through distributed data collection nodes in the network system to be protected.
Optionally, the first extraction module 20 is specifically configured to:
all traffic data and all log data are processed in parallel with a plurality of compute nodes in the distributed compute engine to extract a first target feature from the target traffic data and a second target feature from the target log data.
Optionally, the apparatus further comprises:
the first acquisition module is used for acquiring a training sample set; wherein the training sample set comprises: a plurality of sample stream data and a plurality of sample log data, each sample stream data and each sample log data having a corresponding sample tag, the sample tag comprising one of: APT attack behavior, normal behavior.
The second extraction module is used for extracting first target features from target sample flow data to obtain first sample feature data, and extracting second target features from target sample log data to obtain second sample feature data; wherein the target sample flow rate data represents any one of the plurality of sample flow rate data, and the target sample log data represents any one of the plurality of sample log data.
The first training unit is used for training the initial intelligent analysis model based on sample characteristic data of all sample flow data and all sample log data in the training sample set and corresponding sample labels to obtain a target intelligent analysis model.
Optionally, the apparatus further comprises:
and the receiving module is used for receiving threat information data sent by the external network system through the secure data exchange interface.
And the second acquisition module is used for acquiring threat data characteristics in the threat intelligence data.
And the storage module is used for storing the threat data characteristics as an expansion sample into a training sample set to obtain an updated training sample set.
And the second training module is used for training the target intelligent analysis model by using the updated training sample set to obtain an updated target intelligent analysis model.
Optionally, the apparatus further comprises:
and the sharing module is used for sharing the APT attack recognition result and corresponding model input data to an external network system through a secure data exchange interface as threat information under the condition that the APT attack recognition result output by the target intelligent analysis model is that the APT attack exists.
Optionally, the apparatus further comprises:
And the statistics and display unit is used for carrying out statistics on the input data and the output data of the target intelligent analysis model and displaying the statistical result through the visualization tool.
Example III
Referring to fig. 6, an embodiment of the present invention provides an electronic device including: a processor 60, a memory 61, a bus 62 and a communication interface 63, the processor 60, the communication interface 63 and the memory 61 being connected by the bus 62; the processor 60 is arranged to execute executable modules, such as computer programs, stored in the memory 61.
The memory 61 may include a high-speed random access memory (RAM, random Access Memory), and may further include a non-volatile memory (non-volatile memory), such as at least one magnetic disk memory. The communication connection between the system network element and at least one other network element is achieved via at least one communication interface 63 (which may be wired or wireless), and may use the internet, a wide area network, a local network, a metropolitan area network, etc.
Bus 62 may be an ISA bus, a PCI bus, an EISA bus, or the like. The buses may be classified as address buses, data buses, control buses, etc. For ease of illustration, only one bi-directional arrow is shown in FIG. 6, but not only one bus or type of bus.
The memory 61 is configured to store a program, and the processor 60 executes the program after receiving an execution instruction, and the method executed by the apparatus for defining a process disclosed in any of the foregoing embodiments of the present invention may be applied to the processor 60 or implemented by the processor 60.
The processor 60 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuitry in hardware or instructions in software in the processor 60. The processor 60 may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but may also be a digital signal processor (Digital Signal Processing, DSP for short), application specific integrated circuit (Application Specific Integrated Circuit, ASIC for short), off-the-shelf programmable gate array (Field-Programmable Gate Array, FPGA for short), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be embodied directly in the execution of a hardware decoding processor, or in the execution of a combination of hardware and software modules in a decoding processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory 61 and the processor 60 reads the information in the memory 61 and in combination with its hardware performs the steps of the method described above.
The embodiment of the invention provides an APT attack recognition method and apparatus based on big data and a computer program product of an electronic device, which comprise a computer readable storage medium storing a non-volatile program code executable by a processor, wherein the program code includes instructions for executing the method described in the foregoing method embodiment, and specific implementation can be seen in the method embodiment and will not be repeated here.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer readable storage medium executable by a processor. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.
In the description of the present invention, it should be noted that, directions or positional relationships indicated by terms such as "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc., are directions or positional relationships based on those shown in the drawings, or are directions or positional relationships conventionally put in use of the inventive product, are merely for convenience of describing the present invention and simplifying the description, and are not indicative or implying that the apparatus or element to be referred to must have a specific direction, be constructed and operated in a specific direction, and thus should not be construed as limiting the present invention. Furthermore, the terms "first," "second," "third," and the like are used merely to distinguish between descriptions and should not be construed as indicating or implying relative importance.
Furthermore, the terms "horizontal," "vertical," "overhang," and the like do not denote a requirement that the component be absolutely horizontal or overhang, but rather may be slightly inclined. As "horizontal" merely means that its direction is more horizontal than "vertical", and does not mean that the structure must be perfectly horizontal, but may be slightly inclined.
In the description of the present invention, it should also be noted that, unless explicitly specified and limited otherwise, the terms "disposed," "mounted," "connected," and "connected" are to be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present invention will be understood in specific cases by those of ordinary skill in the art.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims (10)

1. The APT attack identification method based on big data is characterized by comprising the following steps:
Collecting all flow data and all log data in a network system to be protected;
extracting first target features from target flow data to obtain first feature data, and extracting second target features from target log data to obtain second feature data; wherein the target flow data represents any one of the all flow data, and the target log data represents any one of the all log data; the first target feature comprises: source IP address, destination IP address, port number, and transport protocol; the second target feature comprises: logging in data, a process number and an operation instruction;
processing the first characteristic data by using a target intelligent analysis model to obtain an APT attack identification result of the target flow data, and processing the second characteristic data by using a target intelligent analysis model to obtain an APT attack identification result of the target log data; the target intelligent analysis model is a network model which learns known APT attack characteristics and normal behavior characteristics; the APT attack identification result comprises one of the following: there is an APT attack and no APT attack.
2. The big data based APT attack recognition method according to claim 1, wherein collecting all traffic data and all log data in the network system to be protected comprises:
and collecting all flow data and all log data through the distributed data collection nodes in the network system to be protected.
3. The big data based APT attack recognition method of claim 1, wherein extracting a first target feature from the target traffic data and extracting a second target feature from the target log data comprises:
processing the all traffic data and all log data in parallel with a plurality of compute nodes in a distributed compute engine to extract a first target feature from the target traffic data and a second target feature from the target log data.
4. The big data based APT attack identification method of claim 1, further comprising:
acquiring a training sample set; wherein the training sample set comprises: a plurality of sample traffic data and a plurality of sample log data, and each of the sample traffic data and each of the sample log data has a corresponding sample tag, the sample tag comprising one of: APT attack behavior and normal behavior;
Extracting the first target feature from target sample flow data to obtain first sample feature data, and extracting the second target feature from target sample log data to obtain second sample feature data; wherein the target sample flow data represents any one of the plurality of sample flow data, and the target sample log data represents any one of the plurality of sample log data;
and training the initial intelligent analysis model based on sample characteristic data of all sample flow data and all sample log data in the training sample set and corresponding sample labels to obtain the target intelligent analysis model.
5. The big data based APT attack identification method of claim 4, further comprising:
threat information data sent by an external network system is received through a secure data exchange interface;
acquiring threat data characteristics in the threat information data;
storing the threat data characteristics as an expansion sample into the training sample set to obtain an updated training sample set;
and training the target intelligent analysis model by using the updated training sample set to obtain an updated target intelligent analysis model.
6. The big data based APT attack identification method of claim 1, further comprising:
and under the condition that the APT attack identification result output by the target intelligent analysis model is that the APT attack exists, sharing the APT attack identification result and corresponding model input data as threat information to an external network system through a secure data exchange interface.
7. The big data based APT attack identification method of claim 1, further comprising:
and counting the input data and the output data of the target intelligent analysis model, and displaying the counting result through a visualization tool.
8. An APT attack recognition device based on big data, comprising:
the acquisition module is used for acquiring all flow data and all log data in the network system to be protected;
the first extraction module is used for extracting first target features from the target flow data to obtain first feature data, and extracting second target features from the target log data to obtain second feature data; wherein the target flow data represents any one of the all flow data, and the target log data represents any one of the all log data; the first target feature comprises: source IP address, destination IP address, port number, and transport protocol; the second target feature comprises: logging in data, a process number and an operation instruction;
The processing module is used for processing the first characteristic data by utilizing a target intelligent analysis model to obtain an APT attack identification result of the target flow data, and processing the second characteristic data by utilizing the target intelligent analysis model to obtain an APT attack identification result of the target log data; the target intelligent analysis model is a network model which learns known APT attack characteristics and normal behavior characteristics; the APT attack identification result comprises one of the following: there is an APT attack and no APT attack.
9. An electronic device comprising a memory, a processor, the memory having stored thereon a computer program executable on the processor, characterized in that the processor, when executing the computer program, implements the steps of the big data based APT attack identification method of any of claims 1 to 7.
10. A computer readable storage medium storing computer instructions which, when executed by a processor, implement the big data based APT attack identification method of any of claims 1 to 7.
CN202311236743.4A 2023-09-22 2023-09-22 Big data-based APT attack identification method and device and electronic equipment Pending CN117040921A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311236743.4A CN117040921A (en) 2023-09-22 2023-09-22 Big data-based APT attack identification method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311236743.4A CN117040921A (en) 2023-09-22 2023-09-22 Big data-based APT attack identification method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN117040921A true CN117040921A (en) 2023-11-10

Family

ID=88632027

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311236743.4A Pending CN117040921A (en) 2023-09-22 2023-09-22 Big data-based APT attack identification method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN117040921A (en)

Similar Documents

Publication Publication Date Title
CN112651006B (en) Power grid security situation sensing system
US12047396B2 (en) System and method for monitoring security attack chains
US11336669B2 (en) Artificial intelligence cyber security analyst
US20220014560A1 (en) Correlating network event anomalies using active and passive external reconnaissance to identify attack information
US11218510B2 (en) Advanced cybersecurity threat mitigation using software supply chain analysis
EP3287927B1 (en) Non-transitory computer-readable recording medium storing cyber attack analysis support program, cyber attack analysis support method, and cyber attack analysis support device
JP6201614B2 (en) Log analysis apparatus, method and program
EP2953298B1 (en) Log analysis device, information processing method and program
CN115996146B (en) Numerical control system security situation sensing and analyzing system, method, equipment and terminal
US20180219919A1 (en) Rating organization cybersecurity using active and passive external reconnaissance
CN114679338A (en) Network risk assessment method based on network security situation awareness
CN111786950A (en) Situation awareness-based network security monitoring method, device, equipment and medium
US20220210202A1 (en) Advanced cybersecurity threat mitigation using software supply chain analysis
US20150172302A1 (en) Interface for analysis of malicious activity on a network
US10862914B1 (en) Assigning and representing security risks on a computer network
CN112287336A (en) Host security monitoring method, device, medium and electronic equipment based on block chain
US20230283641A1 (en) Dynamic cybersecurity scoring using traffic fingerprinting and risk score improvement
US20230087309A1 (en) Cyberattack identification in a network environment
Ehis Optimization of security information and event management (SIEM) infrastructures, and events correlation/regression analysis for optimal cyber security posture
CN117792733A (en) Network threat detection method and related device
Gonzalez-Granadillo et al. Enhancing information sharing and visualization capabilities in security data analytic platforms
CN112596984A (en) Data security situation sensing system under weak isolation environment of service
CN117040921A (en) Big data-based APT attack identification method and device and electronic equipment
CN115473675A (en) Network security situation sensing method and device, electronic equipment and medium
Gavrilovic et al. Snort IDS system visualization interface for alert analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination