CN107623655B - System for real-time detection attack based on artificial intelligence and MapReduce - Google Patents
System for real-time detection attack based on artificial intelligence and MapReduce Download PDFInfo
- Publication number
- CN107623655B CN107623655B CN201610546632.7A CN201610546632A CN107623655B CN 107623655 B CN107623655 B CN 107623655B CN 201610546632 A CN201610546632 A CN 201610546632A CN 107623655 B CN107623655 B CN 107623655B
- Authority
- CN
- China
- Prior art keywords
- real
- attacks
- time
- log
- attfreq
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Abstract
The invention discloses a system for detecting security attacks in real time based on artificial intelligence and MapReduce, which comprises a preprocessing stage, a MAP stage, a Reduce stage and a software module contained in each stage. By the method and the system, the construction level of the enterprise safety operation and maintenance service platform can be improved, and the construction cost can be reduced.
Description
Technical Field
The invention relates to the technical field of artificial intelligence, big data and information security application, in particular to a system for detecting security attack in real time.
Background
The English abbreviation contained in the invention is as follows:
RF: random Forest
CLF: common Log Format of Common Log Format
JSON: JavaScript Object Notification JAVA script Object Notation
SOC: security Operation Center Security management Center
IDS: intrusion Detection system of Intrusion Detection system
SNMP: simple Network Management Protocol
HDFS (Hadoop distributed File System): hadoop distribution File System Hadoop distributed File System.
Safety production always guarantees the orderly development of various works and is also a negative index for checking the leaders and the cadres at all levels. The network and information security operation and maintenance system is an important component of the security production work of various enterprises. The network and the information system are guaranteed to operate efficiently and stably, and the method is the basis for all market operation activities and normal operation of enterprises.
At present, various different business systems and safety equipment are deployed in an enterprise IT system, so that the labor productivity is effectively improved, the operation cost is reduced, and the enterprise IT system becomes an indispensable part in important support and production links of enterprise high-efficiency operation. On one hand, once a security event or fault occurs in each service system, the security event or fault cannot be timely discovered, timely processed and timely recovered, the operation of all services borne on the system must be directly caused, the normal operation order of an enterprise is influenced, the system related to a user directly causes the complaint of the user, the satisfaction degree is reduced, the enterprise image is damaged, and the system is particularly important for the security guarantee of an enterprise network; on the other hand, various network attack technologies are also becoming more advanced and more popular, and the network system of the enterprise is exposed to the risk of being attacked at any time, often suffers from invasion and damage of different degrees, and seriously interferes with the normal operation of the enterprise network. The increasing security threat forces enterprises to strengthen the security protection of network systems, continuously pursue multi-level and three-dimensional security defense systems, build security operation and maintenance service centers, track system events in real time, detect various security attacks in real time, take corresponding control actions in time, eliminate or reduce the loss caused by the attacks, and protect the normal operation of enterprise business systems.
However, as the size of the enterprise IT system is continuously enlarged, especially the variety and number of the devices, databases, middleware, operating systems, Web servers, and the like, used for performing the security operation and maintenance service task are undergoing a huge scale increase, so that log storage, log analysis, and problem tracking become more and more difficult. The massive increase of the log scale of the enterprise IT system forces a security operation and maintenance service provider to adopt a Hadoop/Spark large data architecture to perform log storage, log processing and log analysis, perform real-time tracking on system events and perform real-time detection on security attacks.
The existing security management analysis tools are not enough to be used for the security operation and maintenance service of the current enterprises. Therefore, a completely new concept for real-time analysis and management of mass log information is urgently needed. A log file is typically a flattened file that contains at least a timestamp field, an event identifier field, and an event description field. The rise in log size is also one of the three characteristic attributes of big data.
Therefore, how to improve the operation benefit of enterprises by using an informatization means and optimize an enterprise information system enables the enterprise information system to provide professional and high-cost-performance information security operation and maintenance service for various enterprises is an important subject which needs to be solved in the design of information security operation and maintenance management.
Disclosure of Invention
After analyzing the defects and shortcomings of various enterprise information security operation and maintenance management platforms, the invention provides a system for detecting security attacks in real time based on artificial intelligence and MapReduce.
The core idea of the invention is as follows: a system for real-time detection of security attacks is constructed. The system can realize real-time tracking and real-time detection of the security attack based on the artificial intelligence technology through logs, and is built based on Hadoop/Spark big data.
Further, the system comprises a preprocessing stage, a MAP stage and a Reduce stage.
The preprocessing stage comprises a log real-time acquisition module and a log real-time analysis module.
The MAP stage comprises a real-time event tracking module and a real-time attack detection module.
And the Reduce stage comprises a real-time statistical attack module.
Preferably, the log real-time acquisition and log real-time analysis module converts the original log into a JSON format through Python language and preprocessing.
Preferably, the real-time tracking event module and the real-time attack detection module implement an artificial intelligence algorithm to realize real-time tracking of system events, their dependencies and scenes, can learn normal behaviors of the system in real time, and can detect security attacks in real time.
Preferably, the real-time statistics attack module is used for carrying out real-time statistics on each attack and the occurrence frequency or frequency of the attack.
By the aid of the system, the construction level of the enterprise safety operation and maintenance service platform can be improved, and construction cost can be reduced.
Drawings
FIG. 1 is a schematic diagram of the conversion of original log format to JSON according to the present invention;
FIG. 2 illustrates the main stages of the artificial intelligence based analysis technique of the present invention;
FIG. 3 illustrates the main stages of the big data based architecture according to the present invention;
FIG. 4 is a schematic diagram of a security operation and maintenance management platform system according to the present invention;
Detailed Description
The invention is described in further detail below with reference to the figures and examples:
the system provided by this patent begins with the specification of unstructured log files. By retrieving unstructured log data, log storage and log processing can be further performed. Extracting data from logs has been a rather laborious technical task, since it has to process log data in various heterogeneous formats. To achieve a proper extraction of log data, the Python programming language is chosen for this patent because of its flexibility, its efficiency, and the relative ease with which the analysis tasks are handled. In the Python program, a useful class library is used to enable the construction of the parser directly in the Python code.
In the work of this patent, the result of the log preprocessing phase is a JSON (JavaScript object Notification) file that contains variables corresponding to the log fields, as shown in FIG. 1. JSON is a lightweight data exchange language that facilitates computer analysis and use. Compared with other structured data exchange languages (such as XML), JSON performance is obviously improved, and the parsing speed is one hundred times faster. Based on the RF method, this is an artificial intelligence technique for discovering and detecting events of related attacks in a log.
Fig. 2 shows three main stages of using artificial intelligence techniques. To make the discussion clearer, the binary-based data structure and algorithm will use the MapReduce big data architecture.
In order to analyze the occurrence frequency of the security attacks detected in the log, the patent provides an artificial intelligence technology based on big data. The method processes the JSON data and creates two data structures, one for storing the name of each security attack (i.e., attName) and the other for storing the number/frequency of attacks generated by each detected attack and the combination thereof (i.e., attFreq).
Fig. 3 shows three phases of the big data architecture: preprocessing stage, MAP stage and Reduce stage:
1. pretreatment stage (first stage): at this stage, two data structures attName and attFreq will be created. The size of the attFreq array depends on the number of attacks n that have been detected. For example, n =5, then the size of the attFreq array is:corresponding to the combination of 5 possible attacks.
Assuming that attacks A, B and C are stored in the array attName at locations 1, 2, and 3, respectively, if both A and C attacks are found in the log, the index of this combination in the array attFreq is 5, which is determined by the binary translation. In this case, a and C are 101 in binary, which is a binary value of decimal 5. Then, the index of attFreq is determined by:。
2. MAP phase (second phase): in this stage, the artificial intelligence algorithm begins to be executed by scanning the input JSON variable. Various security attacks are detected in real time by comparing JSON variables to a series of special regular expressions (e.g., the rules of logcorrlator. conf of this patent), which are a series of features used to identify different attack patterns.
For each attack detected in the log, the corresponding ID can be found in attName, which is used to decide the corresponding attFreq index in the following formula, named 'Loc', where i is the attack index in attName.
The following algorithm describes the overall process of the MAP phase, where i is the index of the current attack stored in the array attName. The output of the MAP phase is a key-value pair (key-value): attFreq index and frequency (this 'key-value pair' would be the input to Reduce stage):
Begin
loc←0
For each i in attName
If i is detected in log record
loc← loc + 2 i
End if
End for
Output [loc, 1]
End
3. reduce stage (third stage): at this stage, the Hadoop/Spark working node will redistribute the data based on the output of the MAP phase. The Reduce method will then perform an addition operation on the data output by each MAP in parallel. The array attFreq will be the result of the store Reduce method after execution, which may order the frequencies and may order the indices in the array from high to low.
Fig. 4 is a framework of the secure operation and maintenance management platform according to the present invention:
1. a pre-treatment stage
This part of the program is written by Python. These massive logs are collected in real-time from different security devices, network devices, databases, operating systems, middleware, etc. To be able to pre-process these heterogeneous logs, a rule (or regular expression) based approach is used. The rule-based approach can eliminate redundant log information (or useless log data). This rule-based approach, also in a particular format, contains several fields: the type field indicates the type of a rule whose pattern field is to identify an input event, and the ptype field indicates the type of the pattern field. The field desc is a description of the rule. The field action indicates the manner of alarm (e.g., short message, alarm box, Email) when the event occurs once.
After preprocessing, the log is changed to JSON format.
2. MAP phase
Tracking events in real time enables the discovery of relationships between different events, and it is common practice to obtain a higher level of knowledge from the log information. The number of events occurring on the network is large, so from these thousands of events, the decision is to consider which event to skip, in order to avoid unnecessary processing.
3. Reduce phase
And counting the detected attacks in real time.
The system provided by the patent is mainly realized by three programs, namely main, logcorrlater, conf and logWatcher. The following will briefly be introduced:
1、main.py
py starts from main. First, main () reads the configuration file logcorrlator. conf and loads the rules into memory. After the configuration file is read, the event matching the rule is searched. When a rule matching an event is found to exist, an action (e.g., a manner of alarm) for the event is looked up.
2、logcorrelator.conf
Function of def initFromConf (): is used in main () to achieve the initialization of the system by reading the configuration file.
The def initFromConf () procedure is as follows:
def initFromConf ():
global failed limit
config=configparser.ConfigParser()
config.read(“logcorrelator.conf”)
sections= config. sections()
for section in sections
options= config. options(section)
for option in options:
if(option==“match”):
matchers[section]= config. get(section, option)
if(option==“windows”):
if(section ==“Rule4”):
failed limit=int(config. get(section, option))
print(“failed:+str(failed _limit)”)
if(option==“action”):
actions[section]= config. get(section, option)
here, the working process of the rule is described roughly:
the Type (Type) of the rule shown in the following box is single, which describes the rule for checking the character string of the accepted password. The Continue field specifies the point to Continue after a matching pattern. After an event has matched a rule, the configuration file is immediately searched for the next rule (the rule mentioned in the Continue field for the next rule). When the event matches the rule, corresponding action is immediately executed (the rule has no action), and the password for successful login on the SSH connection is searched.
3、logWatcher.conf
This is an auxiliary file for real-time analysis. Once main () has read the current log file, it gives control to this file. It then polls the new log files in real time and applies the same rules and actions to these new logs. In this way, main, logcorrlator. conf and logwatch. conf, these three files are interrelated and run simultaneously to accomplish the task of detecting security attacks.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention; all equivalent changes and modifications made according to the present invention are considered to be covered by the scope of the present invention.
Claims (1)
1. A system for detecting security attacks in real time based on artificial intelligence and MapReduce is characterized in that a binary-based data structure and an algorithm use a big data architecture of MapReduce, and three files, namely main, py, logcorrlater, conf and logWatcher, conf, are correlated with each other and run simultaneously to realize a task of detecting security attacks;
the system also comprises a preprocessing stage, a MAP stage and a Reduce stage;
in the preprocessing stage, two data structures attName and attFreq are created, wherein attName is used for storing names of various security attacks, attFreq is used for storing times/frequencies of detected attacks and attack times/frequencies generated by combination of the attacks, and the size of an attFreq array is as followsThe size of the attFreq array depends on the number of attacks n that have been detected, corresponding toThe combination of possible attacks, assuming that attacks A, B and C are stored in the array attName at positions 1, 2 and 3, respectively, if both A and C attacks are found in the log, the index of such combination in the array attFreq is 5, which is determined by the binary translation, in this case A and C are 101 in binary, which is the binary value of decimal 5, then the index of attFreq is determined by:+=5, comprising a log real-time acquisition module and a log real-time analysis module;
the MAP phase begins execution of an artificial intelligence algorithm by scanning an incoming JSON variable, which contains variables corresponding to log fields, which are a series of features used to identify different attack patterns, and for each attack detected in the log, the corresponding ID, which is used to identify the attack in Loc = attNameDetermining corresponding attFreq indexes in a formula, namely named as 'Loc', wherein i is an attack index in attName and comprises a real-time tracking event module and a real-time detection attack module;
in the Reduce stage, addition operation is performed on data output by each MAP in parallel, an array attFreq is used as a result after the Reduce method is executed, the frequency can be sequenced, indexes in the array can be sequenced from high to low, and the Reduce stage comprises a real-time statistical attack module;
the log real-time acquisition and log real-time analysis module is used for converting the original log into a JSON format through Python language and preprocessing;
the real-time tracking event module and the real-time detection attack module track system events, dependence thereof and scenes in real time by implementing an artificial intelligence algorithm, learn normal behaviors of the system in real time and detect security attacks in real time;
the real-time statistics attack module is used for carrying out real-time statistics on various attacks and the occurrence times of the attacks;
the attacks and the times of occurrence thereof are stored in binary data structures attName and attFreq, respectively.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610546632.7A CN107623655B (en) | 2016-07-13 | 2016-07-13 | System for real-time detection attack based on artificial intelligence and MapReduce |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610546632.7A CN107623655B (en) | 2016-07-13 | 2016-07-13 | System for real-time detection attack based on artificial intelligence and MapReduce |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107623655A CN107623655A (en) | 2018-01-23 |
CN107623655B true CN107623655B (en) | 2020-10-27 |
Family
ID=61086562
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610546632.7A Active CN107623655B (en) | 2016-07-13 | 2016-07-13 | System for real-time detection attack based on artificial intelligence and MapReduce |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107623655B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104935601A (en) * | 2015-06-19 | 2015-09-23 | 北京奇虎科技有限公司 | Cloud-based method, device and system for analyzing website log safety |
CN105677615A (en) * | 2016-01-04 | 2016-06-15 | 北京邮电大学 | Distributed machine learning method based on weka interface |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9112895B1 (en) * | 2012-06-25 | 2015-08-18 | Emc Corporation | Anomaly detection system for enterprise network security |
CN104378371A (en) * | 2014-11-14 | 2015-02-25 | 浙江工业大学 | Network intrusion detection method for parallel AP cluster based on MapReduce |
CN104794399A (en) * | 2015-04-23 | 2015-07-22 | 北京北信源软件股份有限公司 | Terminal protection system and method based on massive program behavior data |
CN107579944B (en) * | 2016-07-05 | 2020-08-11 | 南京联成科技发展股份有限公司 | Artificial intelligence and MapReduce-based security attack prediction method |
-
2016
- 2016-07-13 CN CN201610546632.7A patent/CN107623655B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104935601A (en) * | 2015-06-19 | 2015-09-23 | 北京奇虎科技有限公司 | Cloud-based method, device and system for analyzing website log safety |
CN105677615A (en) * | 2016-01-04 | 2016-06-15 | 北京邮电大学 | Distributed machine learning method based on weka interface |
Also Published As
Publication number | Publication date |
---|---|
CN107623655A (en) | 2018-01-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11928144B2 (en) | Clustering of log messages | |
US10958672B2 (en) | Cognitive offense analysis using contextual data and knowledge graphs | |
US11089040B2 (en) | Cognitive analysis of security data with signal flow-based graph exploration | |
Tang et al. | Nodemerge: Template based efficient data reduction for big-data causality analysis | |
EP3205072B1 (en) | Differential dependency tracking for attack forensics | |
US10686830B2 (en) | Corroborating threat assertions by consolidating security and threat intelligence with kinetics data | |
US11080307B1 (en) | Detection of outliers in text records | |
US20120158768A1 (en) | Decomposing and merging regular expressions | |
US10673733B2 (en) | System for debugging a network environment | |
CN107579944B (en) | Artificial intelligence and MapReduce-based security attack prediction method | |
US11681606B2 (en) | Automatic configuration of logging infrastructure for software deployments using source code | |
Chen et al. | Log analytics for dependable enterprise telephony | |
CN111598711A (en) | Target user account identification method, computer equipment and storage medium | |
CN105630797A (en) | Data processing method and system | |
Gardner et al. | Pattern discovery and specification techniques for alarm correlation | |
WO2016093839A1 (en) | Structuring of semi-structured log messages | |
CN113783876A (en) | Network security situation perception method based on graph neural network and related equipment | |
CN113282606A (en) | Data processing method, data processing device, storage medium and computing equipment | |
CN107623655B (en) | System for real-time detection attack based on artificial intelligence and MapReduce | |
JP6594977B2 (en) | Method, system, computer program, and computer-readable storage medium for monitoring requests for code sets | |
Meng et al. | A generic framework for application configuration discovery with pluggable knowledge | |
El Hadj et al. | Validation and correction of large security policies: A clustering and access log based approach | |
Liu et al. | Anomaly Detection of Command Shell Sessions based on DistilBERT: Unsupervised and Supervised Approaches | |
CN107819601A (en) | A kind of safe O&M service architecture quickly and efficiently based on Spark | |
Naukudkar et al. | Enhancing performance of security log analysis using correlation-prediction technique |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |