CN115378670B

CN115378670B - APT attack identification method and device, electronic equipment and medium

Info

Publication number: CN115378670B
Application number: CN202210948265.9A
Authority: CN
Inventors: 蔡晶晶; 陈俊; 韩顺闯; 韩伟召
Original assignee: Yongxin Zhicheng Technology Group Co ltd
Current assignee: Yongxin Zhicheng Technology Group Co ltd
Priority date: 2022-08-08
Filing date: 2022-08-08
Publication date: 2024-03-12
Anticipated expiration: 2042-08-08
Also published as: CN115378670A

Abstract

The invention relates to an APT attack identification method, an APT attack identification device, electronic equipment and a medium, wherein the method comprises the following steps: acquiring network attack related information aiming at an object to be detected; extracting APT attack characteristics from the network attack related information; according to the APT attack characteristics, performing multidimensional threat point detection through characteristic detection, behavior detection and machine learning respectively to obtain a first APT attack identification result, a second APT attack identification result and a third APT attack identification result of an object to be detected; and determining a target APT attack recognition result of the object to be detected according to the first APT attack recognition result, the second APT attack recognition result and the third APT attack recognition result. By the method, the characteristic detection, the behavior detection and the machine learning are applied to different detection methods according to the characteristics of each detection point, so that the detection efficiency can be effectively improved, and false alarm is reduced.

Description

APT attack identification method and device, electronic equipment and medium

Technical Field

The invention relates to the technical field of network security, in particular to an APT attack identification method, an APT attack identification device, electronic equipment and a medium.

Background

The trend of realizing network war, globalization of network battlefield, anti-normalization of network and white-heat of network attack is obvious, and the maintenance of network space safety has become a major thing of national security and social stability. The network space mainly comprising the Internet has become a strategic place of national security, economic development and social stability. Because the increasing diversity and complexity of network attacks cause the influence of virtualized network warfare to be enough to bring destructive hit to any organization, each country disputes to build own network army, and the hacking attack is upgraded into the network space countermeasure behavior among countries.

The risk of potential safety hazards of the network is prominent, the problem of frequent safety events of various key units is prominent, and the threat and the risk of network safety are increasingly prominent. At present, the network security problem in China frequently occurs, the technical means are lack for mastering the network asset base number, the informationized network asset base number management means are not established, meanwhile, the effective technical monitoring and discovery means for various network security events and network illegal criminal behaviors in the supervision area are lack, the security precaution cannot be effectively carried out in time, and only 'fire fighting' is carried out for emergency treatment, so that the work is passive. Meanwhile, as the technical professional of the network security is extremely strong, the traditional protection facilities are difficult to be qualified for the technical analysis of professional network security threat analysis and emergency treatment, and the like, and great difficulty exists in the aspects of preventing network security risks, protecting the security of key information infrastructure, developing the tracing of network security case event investigation and the like.

Advanced persistent attacks, also called APT attacks, are persistent and complex network attacks with a specific target, and the protection of APT attacks is always an industrial problem, in the prior art, detection and protection are usually performed based on a single security technology, and the single security technology cannot comprehensively detect the characteristics of APT attacks, so that APT attacks cannot be accurately identified.

Disclosure of Invention

The invention aims to solve at least one technical problem by providing an APT attack identification method, an APT attack identification device, electronic equipment and a medium.

In a first aspect, the present invention solves the above technical problems by providing the following technical solutions: an APT attack recognition method, the method comprising:

acquiring network attack related information aiming at an object to be detected;

extracting APT attack characteristics from the network attack related information, wherein the APT attack characteristics comprise alarm information, network behavior information, operating system information, account information, network monitoring characteristics and protocol analysis information;

according to the APT attack characteristics, performing multidimensional threat point detection through characteristic detection, behavior detection and machine learning respectively to obtain a first APT attack identification result, a second APT attack identification result and a third APT attack identification result of an object to be detected;

And determining a target APT attack recognition result of the object to be detected according to the first APT attack recognition result, the second APT attack recognition result and the third APT attack recognition result.

The beneficial effects of the invention are as follows: in the scheme of the invention, the APT attack is identified through three security technologies of feature detection, behavior detection and machine learning, and compared with the prior art that the APT attack is identified through a single security technology, the method is more accurate.

On the basis of the technical scheme, the invention can be improved as follows.

Further, the network attack related information comprises firewall logs, IDS logs, WAF logs, network audit logs, stiff wood Ru logs, server logs, 4A audit logs, flow logs and EDR information;

extracting APT attack characteristics from the network attack-related information includes:

extracting alarm information from firewall logs, IDS logs, WAF logs and Beauveria bassiana logs;

extracting network behavior information from the network audit log;

Extracting operating system information from the server log;

extracting account information from the 4A audit log;

extracting network monitoring characteristics and protocol analysis information from the flow log;

and extracting the website related information from the EDR information, wherein the APT attack characteristics comprise the alarm information, network behavior information, protocol analysis information, second network monitoring characteristics, account information and website related information.

The further scheme has the advantages that different network attack related information can correspond to different types of data, and different APT attack characteristics can be extracted from the different types of data more accurately.

Further, according to the APT attack characteristics, the multi-dimensional threat point detection is performed through feature detection, behavior detection and machine learning, to obtain a first APT attack recognition result, a second APT attack recognition result and a third APT attack recognition result of the object to be detected, including:

carrying out multi-dimensional threat point detection according to the APT attack characteristics to obtain multi-dimensional threat characteristics, wherein the multi-dimensional threat characteristics comprise access threat characteristics, intrusion threat characteristics, communication threat characteristics, transverse penetration threat characteristics, data security threat characteristics, trace cleaning threat characteristics and user behavior threat characteristics;

According to the multi-dimensional threat characteristics, a first APT attack identification result, a second APT attack identification result and a third APT attack identification result of the object to be detected are respectively determined through characteristic detection, behavior detection and machine learning.

The adoption of the further scheme has the beneficial effects that the multi-dimensional threat features can be extracted from the APT attack features through three methods of feature detection, behavior detection and machine learning so as to carry out APT attack identification from a plurality of different threat points.

Further, the network monitoring feature includes a first domain name corresponding to the object to be detected; the communication threat features include abnormal domain name detection results, which are determined by the following ways:

according to the first domain name, word segmentation processing is carried out on the first domain name through a pre-established natural language processing model, so that a plurality of characters corresponding to the first domain name are obtained, wherein the natural language processing model is determined based on an algorithm of a multi-layer perceptron;

and determining a corresponding abnormal domain name detection result to be detected according to each character and a preset abnormal domain name, wherein the abnormal domain name detection result comprises that the first domain name is an abnormal domain name or the first domain name is a normal domain name.

The method has the advantages that in the process of determining the abnormal domain name detection result, a natural language processing model trained based on a machine learning method can be used for determining a plurality of characters corresponding to the first domain name, and the abnormal domain name detection result is determined through each character in the first domain name, so that the determined abnormal domain name detection result is more accurate.

Further, the target APT attack recognition result is that there is an APT attack or no APT attack, and the method further includes:

if the target APT attack recognition result is that the APT attack exists, determining an alarm result of the object to be detected according to the target APT attack recognition result, wherein the alarm result comprises at least one of a source IP, a destination IP, an attack name, an attack sample name, an alarm time, a first risk level, a behavior parameter, a response mode and an alarm type.

The further scheme has the advantages that the alarm result of the object to be detected can be determined according to the APT attack identification result, and different alarm information can be provided for the user so as to meet different analysis requirements.

Further, the method comprises the following steps:

carrying out association analysis on the threat characteristics of the multiple dimensions according to the source IP or the target IP;

When the source IP or the target IP hits the multi-dimensional threat feature, outputting a second risk level, wherein the second risk level is higher than the first risk level.

The further scheme has the beneficial effects that the risk level can be further confirmed based on the analysis of the association between the source IP or the target IP and the multi-dimensional threat features, so that the risk level of the technician can be processed correspondingly.

Further, before extracting the APT attack feature from the network attack related information, the method further includes:

preprocessing the network attack related information to obtain preprocessed network attack related information, wherein the preprocessing comprises at least one of data cleaning, data format unified processing and data supplementing processing.

The further scheme has the beneficial effects that before the APT attack characteristics are extracted from the network attack related information, the network attack related information is preprocessed, so that the subsequent APT attack identification result determined based on the preprocessed network attack related information is more accurate, and the influence of data, data format and missing data which are irrelevant to the APT attack in the network attack related information is avoided.

In a second aspect, the present invention further provides an APT attack recognition device for solving the above technical problem, where the device includes:

The data acquisition module is used for acquiring network attack related information aiming at the object to be detected;

the APT attack characteristic extraction module is used for extracting APT attack characteristics from the network attack related information, wherein the APT attack characteristics comprise alarm information, network behavior information, operating system information, account information, network monitoring characteristics and protocol analysis information;

the recognition module is used for respectively carrying out multidimensional threat point detection through feature detection, behavior detection and machine learning according to the APT attack features to obtain a first APT attack recognition result, a second APT attack recognition result and a third APT attack recognition result of the object to be detected;

the identification result determining module is used for determining a target APT attack identification result of the object to be detected according to the first APT attack identification result, the second APT attack identification result and the third APT attack identification result.

In a third aspect, the present invention further provides an electronic device for solving the above technical problem, where the electronic device includes a memory, a processor, and a computer program stored on the memory and capable of running on the processor, and when the processor executes the computer program, the processor implements the APT attack recognition method of the present application.

In a fourth aspect, the present invention further provides a computer readable storage medium, where a computer program is stored, where the computer program is executed by a processor to implement the APT attack recognition method of the present application.

Additional aspects and advantages of the application will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the application.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings that are required to be used in the description of the embodiments of the present invention will be briefly described below.

Fig. 1 is a schematic flow chart of an APT attack recognition method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a storage system according to an embodiment of the present invention;

FIG. 3 is a schematic modeling diagram of a natural language model according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a model training process of a natural language model according to an embodiment of the present invention;

FIG. 5 is a flowchart of another APT attack recognition method according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of an APT attack recognition device according to an embodiment of the present invention;

Fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The principles and features of the present invention are described below with examples given for the purpose of illustration only and are not intended to limit the scope of the invention.

The following describes the technical scheme of the present invention and how the technical scheme of the present invention solves the above technical problems in detail with specific embodiments. The following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments. Embodiments of the present invention will be described below with reference to the accompanying drawings.

The scheme provided by the embodiment of the invention can be applied to any application scene needing APT attack identification. The scheme provided by the embodiment of the invention can be executed by any electronic equipment, for example, the terminal equipment can be a user terminal equipment, the terminal equipment can be any terminal equipment which can be provided with an application and can carry out APT attack identification through the application, and the scheme comprises at least one of the following steps: smart phone, tablet computer, notebook computer, desktop computer, intelligent audio amplifier, intelligent wrist-watch, smart television, intelligent vehicle equipment.

The embodiment of the invention provides a possible implementation manner, as shown in fig. 1, a flowchart of an APT attack recognition method is provided, and the method can be executed by any electronic device, for example, a terminal device, or the terminal device and a server together. For convenience of description, a method provided by an embodiment of the present invention will be described below by taking a server as an execution body, and the method may include the following steps as shown in a flowchart in fig. 1:

step S110, obtaining network attack related information aiming at an object to be detected;

step S120, extracting APT attack characteristics from the relevant information of the network attack, wherein the APT attack characteristics comprise alarm information, network behavior information, operating system information, account information, network monitoring characteristics and protocol analysis information;

step S130, respectively carrying out multidimensional threat point detection through feature detection, behavior detection and machine learning according to the APT attack features to obtain a first APT attack recognition result, a second APT attack recognition result and a third APT attack recognition result of an object to be detected;

step S140, determining a target APT attack recognition result of the object to be detected according to the first APT attack recognition result, the second APT attack recognition result and the third APT attack recognition result.

In the method, the APT attack is identified through three security technologies of feature detection, behavior detection and machine learning, and compared with the prior art that the APT attack is identified through a single security technology, the method is more accurate, in addition, in the scheme of the application, the extracted APT attack features are used for multi-dimensional threat point detection, the APT attack identification can be more comprehensively carried out from multiple dimensions, and the APT attack identification result is more accurate.

The scheme of the present invention is further described below with reference to the following specific embodiment, in which the APT attack recognition method may include the following steps:

the object to be detected refers to an object needing network security analysis, and for example, the object can be an application program or a website. The network security data to be processed refers to network data related to the object to be detected, including network data of the object to be detected itself and network data between other objects and the object to be detected.

After acquiring the network attack related information, the method further comprises:

The data cleaning refers to cleaning or filtering data irrelevant to the APT attack in the network attack related information, and the data format unified processing refers to carrying out format unification on each data in the network attack related information. The data filling processing refers to filling the missing data, and because some data in the network attack related information may be incomplete and missing, the network attack related information is subjected to the data filling processing, so that the network attack related information can be enriched.

The specific implementation process of the data cleaning is as follows:

the data cleaning and filtering aims at the problems of inconsistent data formats, data input errors, incomplete data and the like, and supports the conversion and processing of the data. The common data conversion components comprise field mapping, data filtering, data cleaning, data replacement, data calculation, data verification, data merging, data splitting and the like, and corresponding components can be flexibly selected according to actual requirements in the actual processing process;

Security event data (network attack related information) cleaning and filtering functions include, but are not limited to:

1. filtering the duplicate data;

2. filtering the noise data;

3. filtering data with incomplete or unreasonable data; for example: time field out of range, critical attribute value missing, critical attribute value abnormal, etc.

By the data cleaning and filtering method, repeated data, noise data, incomplete or unreasonable data and other data which are irrelevant to APT attack in the relevant information of the network attack can be filtered.

The specific implementation process of the unified processing of the data format is as follows:

and carrying out unified formatting treatment on heterogeneous original data (including network attack related information with different data formats) so as to meet the requirements of data format definition of a storage layer. The original log should be kept for the data to be normalized (format unified).

The principles of data normalization described above include, but are not limited to:

1. on the basis of ensuring basic expansion capability, according to standard library rules of each type of data, the standardization of relevant fields is realized;

2. for commonly used fields, consistency of field contents is guaranteed, inconsistency of descriptions of similar problems by different events is eliminated, and portability of rules depending on the fields is met.

3. The data that is not normalized should hold the original log. Can be used to redefine the normalization rule for that particular data afterwards.

The requirements for the standardization of the data include, but are not limited to:

1. supporting the formatting treatment of the original content by means of regular expression, character string splitting and the like;

2. and the special field mapping processing is supported, and the inconsistency of the description of different events on similar problems is eliminated. Such as type conversion, time field unification format, etc.;

3. and the method supports the retention processing of unknown data formats for subsequent custom development.

The data supplementing and filling process can also be called as data enrichment process, and the specific implementation process is as follows:

the collected data in the relevant information of the network attack can have relevance, complete data can be formed after the relevance is filled, and the data can be enriched, so that the later statistical analysis is facilitated.

The objects of the data enrichment include, but are not limited to:

1. the fields of the user information and the supplementary information include, but are not limited to, user name, organization structure to which the user belongs, user role, contact information and the like.

2. The field of the supplement includes, but is not limited to, information such as asset name, asset IP, business system to which the asset belongs, asset standard system, responsible person to which the asset belongs, asset status, etc.

3. Threat intelligence, the field of the complement includes, but is not limited to, threat intelligence name, threat intelligence number, threat level of threat intelligence, threat intelligence solution, etc.

In the scheme of the application, labels can be added to each piece of data in the relevant information of the network attack, and each piece of data carrying the labels is stored in a database.

In the scheme of the application, according to the application requirement of data analysis, the storage modes of the network attack related information are divided into four types: relational data storage, distributed file storage (for tracking and tracing), distributed full text retrieval (backup effect) and distributed message bus, and the storage modes are defined as follows:

1. and (3) storing relational data: the method comprises the steps of storing structured data with small data quantity and small change period, such as basic class data (such as asset data, user data and the like), scene analysis result data, business data (such as a missing scan result and a compliance result of a safety evaluation and detection platform) and the like;

2. distributed search storage: storing data which needs to provide full text retrieval to the outside;

3. and (3) storing distributed files: the collected raw data and the normalized data after ETL are stored. The distributed file system, the NoSQL distributed database and the distributed relation database can be supported in an expanding manner;

4. Distributed message bus: a distributed message processing mechanism is provided, which has high throughput and high concurrency of message publishing and message subscribing, and is used for real-time data processing.

Depending on the type of data structure, the network secured data store supports three types of data stores:

1. unstructured data: including text files, pictures, audio and video, etc. in all formats;

2. structured data: the method can be represented by a two-dimensional relation table structure, and has the mode and the content of the structured data;

3. semi-structured data: between unstructured data and structured data, such as: HTML documents, etc.

Based on the above different storage modes, each data in the network attack related information may be stored in different modes, specifically, referring to the storage system schematic shown in fig. 2, each data in the network attack related information (such as traffic data, log data, behavior data shown in fig. 2) and other network security data (such as intelligence data, asset data and other data shown in fig. 2) may be stored in different storage modes (including unstructured data, semi-structured data and structured data shown in fig. 2), and the different storage modes may also correspond to different databases, including, but not limited to, hive, HBase, HDFS, ES, noSQL and Mysql databases.

HDFS using unstructured storage, elastic search of index storage, hive of data warehouse. HDFS enables the underlying distributed file system construction, directly providing Hive with a usable file system. Hive's saved data is actually saved in HDFS. Hive realizes structured data storage and can run SQL to realize basic operations such as data query, analysis and the like. All structured data can be stored in a data store like Hive. The elastic search realizes the search query of text data, mainly aims at log data and system data, and can directly store the data which needs manual search query.

Selecting proper storage according to the inflow rate and the retention time of the flow log, based on all network flows, and considering the subsequent platform monitoring range expansion, and storing the formatted data restored by all network flows into Hive; meanwhile, in order to quickly search the alarm log, the result data of the streaming calculation and the offline calculation can be sent to an elastic search component for storage.

Step S120, extracting APT attack characteristics from the relevant information of the network attack, wherein the APT attack characteristics comprise alarm information, network behavior information, operating system information, account information, network monitoring characteristics and protocol analysis information.

The network attack related information comprises firewall logs, IDS logs, WAF logs, network audit logs, stiff wood Ru logs, server logs, 4A audit logs, flow logs and EDR information; the extracting the APT attack feature from the network attack related information includes:

extracting alarm information from the firewall log, the IDS log, the WAF log and the stiff wood Ru log, namely embodying network attack information in the firewall log, the IDS log, the WAF log and the stiff wood Ru log;

extracting network behavior information from the network audit log;

extracting operating system information, such as windows event log and linux log information, from a server log;

extracting account information from the 4A audit log, wherein the account information comprises but is not limited to primary account change information, secondary account change information, authorization information and operation log information;

extracting network monitoring characteristics and protocol analysis information from the traffic log, wherein the network monitoring characteristics comprise, but are not limited to, source IP (source address), source port, target IP (target address), target port and interconnection time; protocol resolution information includes, but is not limited to HTTP, DNS, mail, RDP, SMB, FTP, SSH, NTLM, FILE;

and extracting the website related information from the EDR information, wherein the APT attack characteristics comprise the alarm information, network behavior information, protocol analysis information, second network monitoring characteristics, account information and website related information, and the website related information comprises characteristic information such as website protection information, login protection information, abnormal file information, performance monitoring information, system protection information and the like.

Among these, APT attack behavior is a persistent, complex network attack that refers to an explicit goal. The network monitoring features refer to features obtained through a monitoring network, and can be extracted from logs obtained through the monitoring network, wherein the logs obtained through the monitoring network include but are not limited to flow logs and stiff wood Ru logs.

The threat points in different dimensions can be detected based on different algorithms in feature detection, behavior detection and machine learning, namely, three algorithms in feature detection, behavior detection and machine learning can be used for respectively obtaining the threat points in multiple dimensions. For example, a total of 7 dimensions of threat points need to be detected, the 7 dimensions of threat points can be detected based on feature detection, the 7 dimensions of threat points can be detected by behavior detection, and the 7 dimensions of threat points can be detected by machine learning.

In the above-mentioned various APT attack features, threat points of different dimensions may be obtained by the same APT attack feature, that is, in the various APT attack features, threat points of one dimension may be represented by different APT attack features.

The first APT attack recognition result corresponds to feature detection, the second feature detection APT attack recognition result corresponds to behavior detection, and the APT attack recognition result corresponds to machine learning.

Optionally, determining the target APT attack recognition result of the object to be detected according to the first APT attack recognition result, the second APT attack recognition result, and the third APT attack recognition result includes: and carrying out weighted average and other processing on the first APT attack recognition result, the second APT attack recognition result and the third APT attack recognition result according to a first weight corresponding to the preset first APT attack recognition result, a second weight corresponding to the second APT attack recognition result and a third weight corresponding to the third APT attack recognition result to obtain a target APT attack recognition result. Wherein the first weight, the second weight, and the third result may be determined based on the importance of each algorithm.

Optionally, the detecting the threat point in multiple dimensions according to the APT attack feature by feature detection, behavior detection and machine learning to obtain a first APT attack recognition result, a second APT attack recognition result and a third APT attack recognition result of the object to be detected includes:

Threat features of each dimension correspond to threat features of different stages of an attack chain, access threat features correspond to threat features of a detection stage of the attack chain, and the threat features can be determined through APT attack features corresponding to weak password blasting, IP scanning and port scanning detection points of network equipment; the intrusion threat features correspond to threat features of an intrusion stage of an attack chain and can be determined through mailbox brute force cracking, mailbox fishing, mailbox abnormal login and APT attack features corresponding to a host remote brute force cracking detection point, wherein relevant information of a mailbox can be obtained through account information; the communication threat features correspond to threat features of a command control stage of an attack chain and can be determined through hidden channel communication and APT attack features corresponding to DGA domain name detection points; the threat features of the horizontal penetration threat correspond to threat features of the horizontal penetration stage of the attack chain, and can be determined through vulnerability exploitation, system command exploitation, footprint password hash extraction, host plaintext password extraction, domain control security, malicious remote operation, malicious powershell exploitation, host system self-starting security, host file authority modification and APT attack features corresponding to rebound agent communication security detection points; the data security threat features correspond to threat features of the data leakage stage of the attack chain, and can be determined through APT attack features corresponding to abnormal service interconnection, abnormal communication protocols and plaintext sensitive transmission detection points; the threat features of the trace cleaning stage of the attack chain correspond to the threat features of trace cleaning, and can be determined through the APT attack features corresponding to trace cleaning detection points; the user behavior threat features refer to threat features of a user entity behavior analysis stage of an attack chain, and can be determined through APT attack features corresponding to operation and maintenance ports, bypass security, host account security and host operation security detection points.

The detection points in the foregoing description refer to different events involved in an attack chain, such as IP scanning, port scanning, system command right raising, mailbox fishing, and the like.

As an example, see the multidimensional threat points shown in table 1 and the detection points corresponding to the threat points, where the detection points refer to attack events or attack behaviors that capture attack traffic in the attack process.

TABLE 1

Wherein, the weak password blasting of the network equipment refers to the attack on equipment or service with weak password; IP scanning refers to the discovery of open ports in a network by exhaustive scanning of specific IP ranges and port ranges, thereby providing basic information for further probing; port scanning refers to the discovery of open ports in a network by exhaustive scanning of specific IP ranges and port ranges, thereby providing basic information for further probing; the mailbox brute force decoding refers to a decoding method aiming at mailbox passwords, namely, the passwords are calculated one by one until the real passwords are found out; mailbox phishing refers to the fact that a disguised email is utilized to deceive a receiver to reply information such as an account number, a password and the like to a designated receiver, or the receiver is led to be connected to a special webpage which is usually disguised as a webpage like a real website, such as a bank or financial management webpage, so that a registrant believes to be genuine and is stolen by inputting a credit card or a bank card number, an account name, a password and the like.

The abnormal login of the mailbox refers to the fact that the mailbox is stolen to cause login in a very common network, the remote violent cracking of the host refers to the fact that the host uses own user name and password dictionary to try to login the remote host one by one, the main dictionary is strong enough, the successful guessing is ensured, and after the successful blasting attempt, the next penetration is prepared for an attacker. Covert channel communication refers to a communication channel intended to be a non-transmission information channel, a communication channel that allows processes to communicate information in a form that violates system security policies; DGA domain names refer to domain names generated using a random algorithm; the vulnerability exploitation refers to exploitation of the vulnerability hoisting system usage rights; the system command right-raising refers to utilizing the vulnerability to raise the system right of use; the footprint password hash extraction refers to obtaining a history hash password stored on a host; the extraction of the plaintext password of the host refers to obtaining the plaintext password on the host;

domain control security means that in a domain environment, a database of a domain controller contains information such as account numbers and passwords of users in the whole domain, an attacker can control the domain if the attacker acquires the advanced authority of domain control, and because the domain has a certain trust relationship with the domain, the attacker can perform cross-domain attack at the moment, realize sinking of the whole intranet, thereby causing serious consequences; malicious remote operation refers to that under the condition that a user is not aware of or authorized, the user can receive a command of a remote control end and perform related operation; malicious powershell utilization refers to an attacker using powershell tools on a target host that itself already exists to create malware that attacks the target host; the self-starting security of the host system refers to the damage to the self-starting security service of the target host system; the modification of the authority of the host file refers to the modification of the authority of the host file; the rebound proxy communication safety refers to further scanning and penetration of the intranet, and a proxy server is used for data forwarding of the extranet and the intranet; abnormal business interconnection refers to the occurrence of abnormality of the associated business;

The abnormal communication protocol refers to an abnormal network communication protocol, the plaintext sensitive transmission refers to the transmission of sensitive or important data in a plaintext form when a program is communicated, and trace clearing refers to the hiding of an attacker by clearing trace generated in an invasion process in order to avoid the discovery of invasion behavior; the operation and maintenance port opening to the outside refers to opening a connection port to the outside aiming at the operation and maintenance end; bypass security refers to the act of bypassing the security guard; host account security refers to a series of security measures that are performed against an account of a host; the host operation security refers to a service for improving the overall security of the host, and through host management, risk prevention, intrusion detection, advanced defense, security operation and webpage tamper-proof functions, information assets in the host can be comprehensively identified and managed, risks in the host are monitored in real time, illegal intrusion behaviors are prevented, and enterprises are helped to build a server security system.

Optionally, the APT attack identification result is that there is an APT attack or no APT attack, and the method further includes:

The attack sample name refers to the Trojan worm name used by an attacker, the alarm time refers to the corresponding response time when the attack occurs, the first danger level refers to the severity of an attack event and can be divided into high-risk, medium-risk and low-risk, the behavior parameters refer to attack actions carried when the attack acts, the response mode refers to the corresponding response acts when the attack occurs, and the alarm types refer to different threat types in the APT attack process.

Optionally, the method further comprises:

Wherein if the source IP or the target IP hits the multi-dimensional threat feature, which indicates a higher risk level than the first risk level, the user may be alerted to the higher risk level by the second risk level for faster and better processing. The threat features of the source IP or the target IP hit in multiple dimensions refer to threat features corresponding to the source IP or the target IP including weak passwords and threat features corresponding to host file permission modification.

Optionally, the multi-dimensional threat feature is determined by a feature detection method, taking a traffic log as an example, and content reflecting malicious content and content with field size exceeding a set size can be detected from various HTTP, DNS, mail, RDP, SMB, FTP, SSH, NTLM, FILE protocol fields in the traffic log in the related information of the network attack, and the detected content is used as the threat feature, wherein the content reflecting the malicious content and the content with field size exceeding the set size belongs to the concrete expression forms of the 7 threat detection points.

Optionally, determining the multi-dimensional threat features through a behavior detection method, for example, counting the frequency and the total connection times of direct interconnection of the source and the target IP by using a statistical method, and analyzing the frequency and the total connection times of the direct interconnection to obtain information reflecting the threat features; by using an interconnection method, characteristics of interconnection relations are analyzed by taking a source IP and a target IP as dimensions, and attack behaviors such as rebound connection, unconventional abnormal ports, hidden malicious communication and the like are found and used as threat features. The frequency of direct interconnection, the total connection times, the discovery of rebound connection, unconventional abnormal ports, the concealment of malicious communication and other attack behaviors also belong to the concrete expression forms of the 7 threat detection points.

Optionally, the network monitoring feature includes a first domain name corresponding to the object to be detected; the communication threat features include abnormal domain name detection results, which are determined by the following ways:

In the scheme of the invention, an abnormal domain name detection result is taken as an example for describing in detail, and an abnormal domain name (DGA), usually an attacker, often uses a domain name to connect a malicious program to a C & C server, so that the purpose of controlling a victim machine is achieved. These domain names are often encoded in malicious programs, which also gives attackers great flexibility in that they can easily alter these domain names as well as the IP. For another hard-coded domain name, it is often not used by an attacker because it is very vulnerable to blacklist detection.

The DGA domain name can be nonrandomly evolved, so that an attacker can use the DGA domain name to generate a pseudo-random character string used as the domain name, and the detection of a blacklist can be effectively avoided. Pseudo-random means that the string sequence appears to be random, but can be repeatedly generated and duplicated since its structure can be predetermined. The algorithm is often applied to malware and remote control software.

In the scheme of the application, the training of the natural language processing model comprises the steps of training set extraction, feature engineering and model training, and the method comprises the following steps of:

1) Training set extraction

Using the website domain name of 100 ten thousand of the global ranks of Alexa websites as a white sample;

the first 10 ten thousand domain names of open data using OSINT (open source public information) are black samples;

white samples and black samples were used as training samples (sample data shown in fig. 4).

2) Feature extraction

For each domain name in the training sample, the domain name is treated as a character string, and N-Gram (natural language processing) modeling is used for carrying out N-character word segmentation on the domain name. Taking 2-Gram as an example, www.google.com was modeled and the flow is shown in FIG. 3. The domain name is: com, after 2-Gram processing, can obtain a plurality of characters: ' go ' oo ' og ' gl ' le ' ec ' co ' om '; vectorization is carried out on each character to obtain a vocabulary list: go "oo" og "gl" le "ec" co "om'.

3) Model training

Using a multi-layer perceptron algorithm, the feature extraction uses an N-Gram model, and the complete process flow is shown in fig. 4, and the specific process is as follows:

1) The data set (each of the training samples) was extracted 2-Gram.

2) The training samples are randomly divided into training and testing sets.

3) Training on the training set by using a multi-layer perceptron algorithm to obtain model data, namely a natural language processing model.

4) Predictions are made on the test set using a natural language processing model.

5) And verifying the predictive effect of the multi-layer perceptron algorithm.

The output result of the model can be manually researched and judged, and when the alarm quantity is more or the false alarm rate is higher, the detection parameters of the model need to be corrected.

It should be noted that, for other threat features, the corresponding models may be trained in the same manner, and the threat features corresponding to different APT attack features may be determined by the models obtained by training, where the training process is completely the same, and only the training samples are changed.

Optionally, the method further comprises:

and visually displaying the target APT attack identification result and the alarm result.

For a better description and understanding of the principles of the method provided by the present invention, the following description of the present invention is provided in connection with an alternative embodiment. It should be noted that, the specific implementation manner of each step in this specific embodiment should not be construed as limiting the solution of the present invention, and other implementation manners that can be considered by those skilled in the art based on the principle of the solution provided by the present invention should also be considered as being within the protection scope of the present invention.

Referring to fig. 5, a flow chart of an APT attack recognition method includes the following steps:

step 1, data collection, namely acquiring network attack related information aiming at an object to be detected according to the description;

step 2, extracting features, namely extracting APT attack features from the related information of the network attack, wherein the APT attack features comprise alarm information, network behavior information, operating system information, account information, network monitoring features and protocol analysis information;

step 3, data analysis, namely respectively carrying out multidimensional threat point detection through feature detection, behavior detection and machine learning according to the APT attack features to obtain a first APT attack recognition result, a second APT attack recognition result and a third APT attack recognition result of the object to be detected; determining a target APT attack recognition result according to the first APT attack recognition result, the second APT attack recognition result and the third APT attack recognition result, and determining an alarm result of the object to be detected according to the target APT attack recognition result if the target APT attack recognition result is an APT attack, wherein the alarm result comprises at least one of a source IP, a destination IP, an attack name, an attack sample name, an alarm time, a first risk level, a behavior parameter, a response mode and an alarm type.

Step 4, data association, which corresponds to the above-described association analysis of the threat features of the multi-dimension according to the source IP or the target IP; when the source IP or the target IP hits the multi-dimensional threat feature, outputting a second risk level, wherein the second risk level is higher than the first risk level.

And 5, outputting a result, namely outputting an alarm result.

And 6, performing research and judgment and model updating, namely performing manual research and judgment according to the alarm result and updating the natural language processing model.

According to the scheme of the invention, the characteristic detection, the behavior detection and the machine learning are applied to different detection methods according to the characteristics of each detection point, so that the detection efficiency can be effectively improved, and the false alarm is reduced.

Based on the same principle as the method shown in fig. 1, the embodiment of the present invention further provides an APT attack recognition device 20, as shown in fig. 6, the APT attack recognition device 20 may include a data acquisition module 210, an APT attack feature extraction module 220, a recognition module 230, and a recognition result determination module 240, where:

a data acquisition module 210, configured to acquire network attack related information for an object to be detected;

the APT attack feature extraction module 220 is configured to extract APT attack features from the network attack related information, where the APT attack features include alarm information, network behavior information, operating system information, account information, network monitoring features, and protocol analysis information;

The identifying module 230 is configured to perform multidimensional threat point detection according to APT attack features through feature detection, behavior detection and machine learning, to obtain a first APT attack identification result, a second APT attack identification result and a third APT attack identification result of an object to be detected;

the recognition result determining module 240 is configured to determine a target APT attack recognition result of the object to be detected according to the first APT attack recognition result, the second APT attack recognition result, and the third APT attack recognition result.

Optionally, the network attack related information includes firewall log, IDS log, WAF log, network audit log, stiff wood Ru log, server log, 4A audit log, traffic log, and EDR information;

the APT attack feature extraction module 220 is specifically configured to, when extracting APT attack features from the network attack related information:

extracting network behavior information from the network audit log;

extracting operating system information from the server log;

extracting account information from the 4A audit log;

Optionally, when the identifying module 230 performs multi-dimensional threat point detection according to APT attack features through feature detection, behavior detection and machine learning, to obtain a first APT attack identification result, a second APT attack identification result and a third APT attack identification result of an object to be detected, the identifying module is specifically configured to:

Optionally, the network monitoring feature includes a first domain name corresponding to the object to be detected; the communication threat characteristic comprises an abnormal domain name detection result which is determined by the following steps:

Optionally, the target APT attack identification result is that there is an APT attack or no APT attack, and the apparatus further includes:

the alarm module is used for determining an alarm result of the object to be detected according to the target APT attack identification result when the target APT attack identification result is that the APT attack exists, wherein the alarm result comprises at least one of a source IP, a destination IP, an attack name, an attack sample name, an alarm time, a first risk level, a behavior parameter, a response mode and an alarm type.

Optionally, the apparatus further comprises:

the association module is used for carrying out association analysis on the threat characteristics of the multiple dimensions according to the source IP or the target IP; when the source IP or the target IP hits the multi-dimensional threat feature, outputting a second risk level, wherein the second risk level is higher than the first risk level.

Optionally, before extracting the APT attack feature from the network attack related information, the apparatus further includes:

the preprocessing module is used for preprocessing the network attack related information to obtain preprocessed network attack related information, and the preprocessing comprises at least one of data cleaning, data format unified processing and data supplementing processing.

The APT attack recognition device according to the embodiment of the present invention may execute the APT attack recognition method according to the embodiment of the present invention, and its implementation principle is similar, and actions executed by each module and unit in the APT attack recognition device according to each embodiment of the present invention correspond to steps in the APT attack recognition method according to each embodiment of the present invention, and detailed functional descriptions of each module of the APT attack recognition device may be referred to the descriptions in the corresponding APT attack recognition method shown in the foregoing, which are not repeated herein.

Wherein, the APT attack recognition device may be a computer program (including program code) running in a computer device, for example, the APT attack recognition device is an application software; the device can be used for executing corresponding steps in the method provided by the embodiment of the invention.

In some embodiments, the APT attack recognition device provided by the embodiments of the present invention may be implemented by combining software and hardware, and as an example, the APT attack recognition device provided by the embodiments of the present invention may be a processor in the form of a hardware decoding processor programmed to perform the APT attack recognition method provided by the embodiments of the present invention, for example, the processor in the form of a hardware decoding processor may employ one or more application specific integrated circuits (ASIC, application Specific Integrated Circuit), DSP, programmable logic device (PLD, programmable Logic Device), complex programmable logic device (CPLD, complex Programmable Logic Device), field programmable gate array (FPGA, field-Programmable Gate Array) or other electronic components.

In other embodiments, the APT attack recognition device provided by the embodiments of the present invention may be implemented in a software manner, and fig. 6 shows an APT attack recognition device stored in a memory, which may be software in the form of a program, a plug-in, etc., and includes a series of modules including a data acquisition module 210, an APT attack feature extraction module 220, a recognition module 230, and a recognition result determination module 240, for implementing the APT attack recognition method provided by the embodiments of the present invention.

The modules involved in the embodiments of the present invention may be implemented in software or in hardware. The name of a module does not in some cases define the module itself.

Based on the same principles as the methods shown in the embodiments of the present invention, there is also provided in the embodiments of the present invention an electronic device, which may include, but is not limited to: a processor and a memory; a memory for storing a computer program; a processor for executing the method according to any of the embodiments of the invention by invoking a computer program.

In an alternative embodiment, an electronic device is provided, as shown in fig. 7, the electronic device 4000 shown in fig. 7 includes: a processor 4001 and a memory 4003. Wherein the processor 4001 is coupled to the memory 4003, such as via a bus 4002. Optionally, the electronic device 4000 may further comprise a transceiver 4004, the transceiver 4004 may be used for data interaction between the electronic device and other electronic devices, such as transmission of data and/or reception of data, etc. It should be noted that, in practical applications, the transceiver 4004 is not limited to one, and the structure of the electronic device 4000 is not limited to the embodiment of the present invention.

The processor 4001 may be a CPU (Central Processing Unit ), general purpose processor, DSP (Digital Signal Processor, data signal processor), ASIC (Application Specific Integrated Circuit ), FPGA (Field Programmable Gate Array, field programmable gate array) or other programmable logic device, transistor logic device, hardware components, or any combination thereof. Which may implement or perform the various exemplary logic blocks, modules and circuits described in connection with this disclosure. The processor 4001 may also be a combination that implements computing functionality, e.g., comprising one or more microprocessor combinations, a combination of a DSP and a microprocessor, etc.

Bus 4002 may include a path to transfer information between the aforementioned components. Bus 4002 may be a PCI (Peripheral Component Interconnect, peripheral component interconnect standard) bus or an EISA (Extended Industry Standard Architecture ) bus, or the like. The bus 4002 can be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in fig. 7, but not only one bus or one type of bus.

Memory 4003 may be, but is not limited to, ROM (Read Only Memory) or other type of static storage device that can store static information and instructions, RAM (Random Access Memory ) or other type of dynamic storage device that can store information and instructions, EEPROM (Electrically Erasable Programmable Read Only Memory ), CD-ROM (Compact Disc Read Only Memory, compact disc Read Only Memory) or other optical disk storage, optical disk storage (including compact discs, laser discs, optical discs, digital versatile discs, blu-ray discs, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.

The memory 4003 is used for storing application program codes (computer programs) for executing the present invention and is controlled to be executed by the processor 4001. The processor 4001 is configured to execute application program codes stored in the memory 4003 to realize what is shown in the foregoing method embodiment.

The electronic device shown in fig. 7 is only an example, and should not impose any limitation on the functions and application scope of the embodiment of the present invention.

Embodiments of the present invention provide a computer-readable storage medium having a computer program stored thereon, which when run on a computer, causes the computer to perform the corresponding method embodiments described above.

According to another aspect of the present invention, there is also provided a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the methods provided in the implementation of the various embodiments described above.

Computer program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).

It should be appreciated that the flow charts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The computer readable storage medium according to embodiments of the present invention may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The computer-readable storage medium carries one or more programs which, when executed by the electronic device, cause the electronic device to perform the methods shown in the above-described embodiments.

The above description is only illustrative of the preferred embodiments of the present invention and of the principles of the technology employed. It will be appreciated by persons skilled in the art that the scope of the disclosure referred to in the present invention is not limited to the specific combinations of technical features described above, but also covers other technical features formed by any combination of the technical features described above or their equivalents without departing from the spirit of the disclosure. Such as the above-mentioned features and the technical features disclosed in the present invention (but not limited to) having similar functions are replaced with each other.

Claims

1. An APT attack recognition method, comprising:

extracting APT attack characteristics from the network attack related information, wherein the APT attack characteristics comprise alarm information, network behavior information, operating system information, account information, network monitoring characteristics, protocol analysis information and website related information;

According to the APT attack characteristics, respectively carrying out multi-dimensional threat point detection through characteristic detection, behavior detection and machine learning to obtain multi-dimensional threat characteristics, wherein the multi-dimensional threat characteristics comprise access threat characteristics, intrusion threat characteristics, communication threat characteristics, transverse penetration threat characteristics, data security threat characteristics, trace cleaning threat characteristics and user behavior threat characteristics;

according to the multi-dimensional threat characteristics, a first APT attack identification result, a second APT attack identification result and a third APT attack identification result of the object to be detected are respectively determined through characteristic detection, behavior detection and machine learning;

determining a target APT attack recognition result of the object to be detected according to the first APT attack recognition result, the second APT attack recognition result and the third APT attack recognition result;

the method further comprises the steps of:

and storing each data in the network attack related information in different modes, wherein the different modes comprise unstructured data storage, semi-structured data storage and structured data storage.

2. The method of claim 1, wherein the network attack-related information includes firewall logs, IDS logs, WAF logs, network audit logs, stiff wood Ru logs, server logs, 4A audit logs, traffic logs, EDR information;

The extracting the APT attack feature from the network attack related information includes:

extracting the alarm information from the firewall log, IDS log, WAF log and the stiff wood Ru log;

extracting the network behavior information from the network audit log;

extracting the operating system information from the server log;

extracting the account information from the 4A audit log;

extracting the network monitoring characteristics and the protocol analysis information from the flow log;

and extracting the website related information from the EDR information.

3. The method according to claim 1, wherein the network monitoring feature includes a first domain name corresponding to the object to be detected; the communication threat characteristic comprises an abnormal domain name detection result, wherein the abnormal domain name detection result is determined by the following steps:

according to the first domain name, word segmentation processing is carried out on the first domain name through a pre-established natural language processing model to obtain a plurality of characters corresponding to the first domain name, wherein the natural language processing model is determined based on an algorithm of a multi-layer perceptron;

and determining an abnormal domain name detection result of the object to be detected according to each character and a preset abnormal domain name, wherein the abnormal domain name detection result comprises that the first domain name is an abnormal domain name or the first domain name is a normal domain name.

4. A method according to any one of claims 1 to 3, wherein the target APT attack recognition result is the presence or absence of an APT attack, the method further comprising:

if the target APT attack identification result is that APT attack exists, determining an alarm result of the object to be detected according to the target APT attack identification result, wherein the alarm result comprises at least one of a source IP, a destination IP, an attack name, an attack sample name, alarm time, a first risk level, a behavior parameter, a response mode and an alarm type.

5. The method according to claim 4, wherein the method further comprises:

carrying out association analysis on the multi-dimensional threat features according to the source IP or the target IP;

and when the threat features corresponding to the source IP or the target IP comprise weak passwords and threat features corresponding to host file authority modification, outputting a second risk level, wherein the second risk level is higher than the first risk level.

6. A method according to any one of claims 1 to 3, wherein before extracting APT attack signatures from the network attack related information, the method further comprises:

7. An APT attack recognition device, comprising:

the APT attack characteristic extraction module is used for extracting APT attack characteristics from the network attack related information, wherein the APT attack characteristics comprise alarm information, network behavior information, operating system information, account information, network monitoring characteristics, protocol analysis information and website related information;

the identification module is used for respectively carrying out multi-dimensional threat point detection through feature detection, behavior detection and machine learning according to the APT attack features to obtain multi-dimensional threat features, and respectively determining a first APT attack identification result, a second APT attack identification result and a third APT attack identification result of the object to be detected through feature detection, behavior detection and machine learning according to the multi-dimensional threat features, wherein the multi-dimensional threat features comprise access threat features, intrusion threat features, communication threat features, transverse penetration threat features, data security threat features, trace cleaning threat features and user behavior threat features;

According to the APT attack characteristics, performing multidimensional threat point detection through characteristic detection, behavior detection and machine learning respectively to obtain a first APT attack identification result, a second APT attack identification result and a third APT attack identification result of the object to be detected;

the identification result determining module is used for determining a target APT attack identification result of the object to be detected according to the first APT attack identification result, the second APT attack identification result and the third APT attack identification result;

the apparatus further comprises:

and the storage module is used for storing each data in the network attack related information in different modes, wherein the different modes comprise unstructured data storage, semi-structured data storage and structured data storage.

8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of any one of claims 1-6 when the computer program is executed.

9. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a computer program which, when executed by a processor, implements the method of any of claims 1-6.