CN107623655B - System for real-time detection attack based on artificial intelligence and MapReduce - Google Patents

System for real-time detection attack based on artificial intelligence and MapReduce Download PDF

Info

Publication number
CN107623655B
CN107623655B CN201610546632.7A CN201610546632A CN107623655B CN 107623655 B CN107623655 B CN 107623655B CN 201610546632 A CN201610546632 A CN 201610546632A CN 107623655 B CN107623655 B CN 107623655B
Authority
CN
China
Prior art keywords
real
attacks
time
log
attfreq
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610546632.7A
Other languages
Chinese (zh)
Other versions
CN107623655A (en
Inventor
李木金
凌飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Liancheng Technology Development Co ltd
Original Assignee
Nanjing Liancheng Technology Development Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Liancheng Technology Development Co ltd filed Critical Nanjing Liancheng Technology Development Co ltd
Priority to CN201610546632.7A priority Critical patent/CN107623655B/en
Publication of CN107623655A publication Critical patent/CN107623655A/en
Application granted granted Critical
Publication of CN107623655B publication Critical patent/CN107623655B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a system for detecting security attacks in real time based on artificial intelligence and MapReduce, which comprises a preprocessing stage, a MAP stage, a Reduce stage and a software module contained in each stage. By the method and the system, the construction level of the enterprise safety operation and maintenance service platform can be improved, and the construction cost can be reduced.

Description

System for real-time detection attack based on artificial intelligence and MapReduce
Technical Field
The invention relates to the technical field of artificial intelligence, big data and information security application, in particular to a system for detecting security attack in real time.
Background
The English abbreviation contained in the invention is as follows:
RF: random Forest
CLF: common Log Format of Common Log Format
JSON: JavaScript Object Notification JAVA script Object Notation
SOC: security Operation Center Security management Center
IDS: intrusion Detection system of Intrusion Detection system
SNMP: simple Network Management Protocol
HDFS (Hadoop distributed File System): hadoop distribution File System Hadoop distributed File System.
Safety production always guarantees the orderly development of various works and is also a negative index for checking the leaders and the cadres at all levels. The network and information security operation and maintenance system is an important component of the security production work of various enterprises. The network and the information system are guaranteed to operate efficiently and stably, and the method is the basis for all market operation activities and normal operation of enterprises.
At present, various different business systems and safety equipment are deployed in an enterprise IT system, so that the labor productivity is effectively improved, the operation cost is reduced, and the enterprise IT system becomes an indispensable part in important support and production links of enterprise high-efficiency operation. On one hand, once a security event or fault occurs in each service system, the security event or fault cannot be timely discovered, timely processed and timely recovered, the operation of all services borne on the system must be directly caused, the normal operation order of an enterprise is influenced, the system related to a user directly causes the complaint of the user, the satisfaction degree is reduced, the enterprise image is damaged, and the system is particularly important for the security guarantee of an enterprise network; on the other hand, various network attack technologies are also becoming more advanced and more popular, and the network system of the enterprise is exposed to the risk of being attacked at any time, often suffers from invasion and damage of different degrees, and seriously interferes with the normal operation of the enterprise network. The increasing security threat forces enterprises to strengthen the security protection of network systems, continuously pursue multi-level and three-dimensional security defense systems, build security operation and maintenance service centers, track system events in real time, detect various security attacks in real time, take corresponding control actions in time, eliminate or reduce the loss caused by the attacks, and protect the normal operation of enterprise business systems.
However, as the size of the enterprise IT system is continuously enlarged, especially the variety and number of the devices, databases, middleware, operating systems, Web servers, and the like, used for performing the security operation and maintenance service task are undergoing a huge scale increase, so that log storage, log analysis, and problem tracking become more and more difficult. The massive increase of the log scale of the enterprise IT system forces a security operation and maintenance service provider to adopt a Hadoop/Spark large data architecture to perform log storage, log processing and log analysis, perform real-time tracking on system events and perform real-time detection on security attacks.
The existing security management analysis tools are not enough to be used for the security operation and maintenance service of the current enterprises. Therefore, a completely new concept for real-time analysis and management of mass log information is urgently needed. A log file is typically a flattened file that contains at least a timestamp field, an event identifier field, and an event description field. The rise in log size is also one of the three characteristic attributes of big data.
Therefore, how to improve the operation benefit of enterprises by using an informatization means and optimize an enterprise information system enables the enterprise information system to provide professional and high-cost-performance information security operation and maintenance service for various enterprises is an important subject which needs to be solved in the design of information security operation and maintenance management.
Disclosure of Invention
After analyzing the defects and shortcomings of various enterprise information security operation and maintenance management platforms, the invention provides a system for detecting security attacks in real time based on artificial intelligence and MapReduce.
The core idea of the invention is as follows: a system for real-time detection of security attacks is constructed. The system can realize real-time tracking and real-time detection of the security attack based on the artificial intelligence technology through logs, and is built based on Hadoop/Spark big data.
Further, the system comprises a preprocessing stage, a MAP stage and a Reduce stage.
The preprocessing stage comprises a log real-time acquisition module and a log real-time analysis module.
The MAP stage comprises a real-time event tracking module and a real-time attack detection module.
And the Reduce stage comprises a real-time statistical attack module.
Preferably, the log real-time acquisition and log real-time analysis module converts the original log into a JSON format through Python language and preprocessing.
Preferably, the real-time tracking event module and the real-time attack detection module implement an artificial intelligence algorithm to realize real-time tracking of system events, their dependencies and scenes, can learn normal behaviors of the system in real time, and can detect security attacks in real time.
Preferably, the real-time statistics attack module is used for carrying out real-time statistics on each attack and the occurrence frequency or frequency of the attack.
By the aid of the system, the construction level of the enterprise safety operation and maintenance service platform can be improved, and construction cost can be reduced.
Drawings
FIG. 1 is a schematic diagram of the conversion of original log format to JSON according to the present invention;
FIG. 2 illustrates the main stages of the artificial intelligence based analysis technique of the present invention;
FIG. 3 illustrates the main stages of the big data based architecture according to the present invention;
FIG. 4 is a schematic diagram of a security operation and maintenance management platform system according to the present invention;
Detailed Description
The invention is described in further detail below with reference to the figures and examples:
the system provided by this patent begins with the specification of unstructured log files. By retrieving unstructured log data, log storage and log processing can be further performed. Extracting data from logs has been a rather laborious technical task, since it has to process log data in various heterogeneous formats. To achieve a proper extraction of log data, the Python programming language is chosen for this patent because of its flexibility, its efficiency, and the relative ease with which the analysis tasks are handled. In the Python program, a useful class library is used to enable the construction of the parser directly in the Python code.
In the work of this patent, the result of the log preprocessing phase is a JSON (JavaScript object Notification) file that contains variables corresponding to the log fields, as shown in FIG. 1. JSON is a lightweight data exchange language that facilitates computer analysis and use. Compared with other structured data exchange languages (such as XML), JSON performance is obviously improved, and the parsing speed is one hundred times faster. Based on the RF method, this is an artificial intelligence technique for discovering and detecting events of related attacks in a log.
Fig. 2 shows three main stages of using artificial intelligence techniques. To make the discussion clearer, the binary-based data structure and algorithm will use the MapReduce big data architecture.
In order to analyze the occurrence frequency of the security attacks detected in the log, the patent provides an artificial intelligence technology based on big data. The method processes the JSON data and creates two data structures, one for storing the name of each security attack (i.e., attName) and the other for storing the number/frequency of attacks generated by each detected attack and the combination thereof (i.e., attFreq).
Fig. 3 shows three phases of the big data architecture: preprocessing stage, MAP stage and Reduce stage:
1. pretreatment stage (first stage): at this stage, two data structures attName and attFreq will be created. The size of the attFreq array depends on the number of attacks n that have been detected. For example, n =5, then the size of the attFreq array is:
Figure 192288DEST_PATH_IMAGE001
corresponding to the combination of 5 possible attacks.
Assuming that attacks A, B and C are stored in the array attName at locations 1, 2, and 3, respectively, if both A and C attacks are found in the log, the index of this combination in the array attFreq is 5, which is determined by the binary translation. In this case, a and C are 101 in binary, which is a binary value of decimal 5. Then, the index of attFreq is determined by:
Figure 944343DEST_PATH_IMAGE002
2. MAP phase (second phase): in this stage, the artificial intelligence algorithm begins to be executed by scanning the input JSON variable. Various security attacks are detected in real time by comparing JSON variables to a series of special regular expressions (e.g., the rules of logcorrlator. conf of this patent), which are a series of features used to identify different attack patterns.
For each attack detected in the log, the corresponding ID can be found in attName, which is used to decide the corresponding attFreq index in the following formula, named 'Loc', where i is the attack index in attName.
Figure 766806DEST_PATH_IMAGE003
The following algorithm describes the overall process of the MAP phase, where i is the index of the current attack stored in the array attName. The output of the MAP phase is a key-value pair (key-value): attFreq index and frequency (this 'key-value pair' would be the input to Reduce stage):
Begin
loc←0
For each i in attName
If i is detected in log record
loc← loc + 2 i
End if
End for
Output [loc, 1]
End
3. reduce stage (third stage): at this stage, the Hadoop/Spark working node will redistribute the data based on the output of the MAP phase. The Reduce method will then perform an addition operation on the data output by each MAP in parallel. The array attFreq will be the result of the store Reduce method after execution, which may order the frequencies and may order the indices in the array from high to low.
Fig. 4 is a framework of the secure operation and maintenance management platform according to the present invention:
1. a pre-treatment stage
This part of the program is written by Python. These massive logs are collected in real-time from different security devices, network devices, databases, operating systems, middleware, etc. To be able to pre-process these heterogeneous logs, a rule (or regular expression) based approach is used. The rule-based approach can eliminate redundant log information (or useless log data). This rule-based approach, also in a particular format, contains several fields: the type field indicates the type of a rule whose pattern field is to identify an input event, and the ptype field indicates the type of the pattern field. The field desc is a description of the rule. The field action indicates the manner of alarm (e.g., short message, alarm box, Email) when the event occurs once.
After preprocessing, the log is changed to JSON format.
2. MAP phase
Tracking events in real time enables the discovery of relationships between different events, and it is common practice to obtain a higher level of knowledge from the log information. The number of events occurring on the network is large, so from these thousands of events, the decision is to consider which event to skip, in order to avoid unnecessary processing.
3. Reduce phase
And counting the detected attacks in real time.
The system provided by the patent is mainly realized by three programs, namely main, logcorrlater, conf and logWatcher. The following will briefly be introduced:
1、main.py
py starts from main. First, main () reads the configuration file logcorrlator. conf and loads the rules into memory. After the configuration file is read, the event matching the rule is searched. When a rule matching an event is found to exist, an action (e.g., a manner of alarm) for the event is looked up.
2、logcorrelator.conf
Function of def initFromConf (): is used in main () to achieve the initialization of the system by reading the configuration file.
The def initFromConf () procedure is as follows:
def initFromConf ():
global failed limit
config=configparser.ConfigParser()
config.read(“logcorrelator.conf”)
sections= config. sections()
for section in sections
options= config. options(section)
for option in options:
if(option==“match”):
matchers[section]= config. get(section, option)
if(option==“windows”):
if(section ==“Rule4”):
failed limit=int(config. get(section, option))
print(“failed:+str(failed _limit)”)
if(option==“action”):
actions[section]= config. get(section, option)
here, the working process of the rule is described roughly:
the Type (Type) of the rule shown in the following box is single, which describes the rule for checking the character string of the accepted password. The Continue field specifies the point to Continue after a matching pattern. After an event has matched a rule, the configuration file is immediately searched for the next rule (the rule mentioned in the Continue field for the next rule). When the event matches the rule, corresponding action is immediately executed (the rule has no action), and the password for successful login on the SSH connection is searched.
Figure 971522DEST_PATH_IMAGE004
3、logWatcher.conf
This is an auxiliary file for real-time analysis. Once main () has read the current log file, it gives control to this file. It then polls the new log files in real time and applies the same rules and actions to these new logs. In this way, main, logcorrlator. conf and logwatch. conf, these three files are interrelated and run simultaneously to accomplish the task of detecting security attacks.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention; all equivalent changes and modifications made according to the present invention are considered to be covered by the scope of the present invention.

Claims (1)

1. A system for detecting security attacks in real time based on artificial intelligence and MapReduce is characterized in that a binary-based data structure and an algorithm use a big data architecture of MapReduce, and three files, namely main, py, logcorrlater, conf and logWatcher, conf, are correlated with each other and run simultaneously to realize a task of detecting security attacks;
the system also comprises a preprocessing stage, a MAP stage and a Reduce stage;
in the preprocessing stage, two data structures attName and attFreq are created, wherein attName is used for storing names of various security attacks, attFreq is used for storing times/frequencies of detected attacks and attack times/frequencies generated by combination of the attacks, and the size of an attFreq array is as follows
Figure DEST_PATH_IMAGE001
The size of the attFreq array depends on the number of attacks n that have been detected, corresponding to
Figure 744757DEST_PATH_IMAGE001
The combination of possible attacks, assuming that attacks A, B and C are stored in the array attName at positions 1, 2 and 3, respectively, if both A and C attacks are found in the log, the index of such combination in the array attFreq is 5, which is determined by the binary translation, in this case A and C are 101 in binary, which is the binary value of decimal 5, then the index of attFreq is determined by:
Figure 74107DEST_PATH_IMAGE002
+
Figure DEST_PATH_IMAGE003
=5, comprising a log real-time acquisition module and a log real-time analysis module;
the MAP phase begins execution of an artificial intelligence algorithm by scanning an incoming JSON variable, which contains variables corresponding to log fields, which are a series of features used to identify different attack patterns, and for each attack detected in the log, the corresponding ID, which is used to identify the attack in Loc = attName
Figure 882794DEST_PATH_IMAGE004
Determining corresponding attFreq indexes in a formula, namely named as 'Loc', wherein i is an attack index in attName and comprises a real-time tracking event module and a real-time detection attack module;
in the Reduce stage, addition operation is performed on data output by each MAP in parallel, an array attFreq is used as a result after the Reduce method is executed, the frequency can be sequenced, indexes in the array can be sequenced from high to low, and the Reduce stage comprises a real-time statistical attack module;
the log real-time acquisition and log real-time analysis module is used for converting the original log into a JSON format through Python language and preprocessing;
the real-time tracking event module and the real-time detection attack module track system events, dependence thereof and scenes in real time by implementing an artificial intelligence algorithm, learn normal behaviors of the system in real time and detect security attacks in real time;
the real-time statistics attack module is used for carrying out real-time statistics on various attacks and the occurrence times of the attacks;
the attacks and the times of occurrence thereof are stored in binary data structures attName and attFreq, respectively.
CN201610546632.7A 2016-07-13 2016-07-13 System for real-time detection attack based on artificial intelligence and MapReduce Active CN107623655B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610546632.7A CN107623655B (en) 2016-07-13 2016-07-13 System for real-time detection attack based on artificial intelligence and MapReduce

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610546632.7A CN107623655B (en) 2016-07-13 2016-07-13 System for real-time detection attack based on artificial intelligence and MapReduce

Publications (2)

Publication Number Publication Date
CN107623655A CN107623655A (en) 2018-01-23
CN107623655B true CN107623655B (en) 2020-10-27

Family

ID=61086562

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610546632.7A Active CN107623655B (en) 2016-07-13 2016-07-13 System for real-time detection attack based on artificial intelligence and MapReduce

Country Status (1)

Country Link
CN (1) CN107623655B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104935601A (en) * 2015-06-19 2015-09-23 北京奇虎科技有限公司 Cloud-based method, device and system for analyzing website log safety
CN105677615A (en) * 2016-01-04 2016-06-15 北京邮电大学 Distributed machine learning method based on weka interface

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9112895B1 (en) * 2012-06-25 2015-08-18 Emc Corporation Anomaly detection system for enterprise network security
CN104378371A (en) * 2014-11-14 2015-02-25 浙江工业大学 Network intrusion detection method for parallel AP cluster based on MapReduce
CN104794399A (en) * 2015-04-23 2015-07-22 北京北信源软件股份有限公司 Terminal protection system and method based on massive program behavior data
CN107579944B (en) * 2016-07-05 2020-08-11 南京联成科技发展股份有限公司 Artificial intelligence and MapReduce-based security attack prediction method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104935601A (en) * 2015-06-19 2015-09-23 北京奇虎科技有限公司 Cloud-based method, device and system for analyzing website log safety
CN105677615A (en) * 2016-01-04 2016-06-15 北京邮电大学 Distributed machine learning method based on weka interface

Also Published As

Publication number Publication date
CN107623655A (en) 2018-01-23

Similar Documents

Publication Publication Date Title
US11928144B2 (en) Clustering of log messages
US10958672B2 (en) Cognitive offense analysis using contextual data and knowledge graphs
US11089040B2 (en) Cognitive analysis of security data with signal flow-based graph exploration
Tang et al. Nodemerge: Template based efficient data reduction for big-data causality analysis
EP3205072B1 (en) Differential dependency tracking for attack forensics
US10686830B2 (en) Corroborating threat assertions by consolidating security and threat intelligence with kinetics data
US11080307B1 (en) Detection of outliers in text records
US20120158768A1 (en) Decomposing and merging regular expressions
US10673733B2 (en) System for debugging a network environment
CN107579944B (en) Artificial intelligence and MapReduce-based security attack prediction method
US11681606B2 (en) Automatic configuration of logging infrastructure for software deployments using source code
Chen et al. Log analytics for dependable enterprise telephony
CN111598711A (en) Target user account identification method, computer equipment and storage medium
CN105630797A (en) Data processing method and system
Gardner et al. Pattern discovery and specification techniques for alarm correlation
WO2016093839A1 (en) Structuring of semi-structured log messages
CN113783876A (en) Network security situation perception method based on graph neural network and related equipment
CN113282606A (en) Data processing method, data processing device, storage medium and computing equipment
CN107623655B (en) System for real-time detection attack based on artificial intelligence and MapReduce
JP6594977B2 (en) Method, system, computer program, and computer-readable storage medium for monitoring requests for code sets
Meng et al. A generic framework for application configuration discovery with pluggable knowledge
El Hadj et al. Validation and correction of large security policies: A clustering and access log based approach
Liu et al. Anomaly Detection of Command Shell Sessions based on DistilBERT: Unsupervised and Supervised Approaches
CN107819601A (en) A kind of safe O&M service architecture quickly and efficiently based on Spark
Naukudkar et al. Enhancing performance of security log analysis using correlation-prediction technique

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant