CN104363222A - Hadoop-based network security event analysis method - Google Patents

Hadoop-based network security event analysis method Download PDF

Info

Publication number
CN104363222A
CN104363222A CN201410630224.0A CN201410630224A CN104363222A CN 104363222 A CN104363222 A CN 104363222A CN 201410630224 A CN201410630224 A CN 201410630224A CN 104363222 A CN104363222 A CN 104363222A
Authority
CN
China
Prior art keywords
hadoop
data
hdfs
mapreduce
file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410630224.0A
Other languages
Chinese (zh)
Inventor
黄敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Electronic Information Industry Co Ltd
Original Assignee
Inspur Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Electronic Information Industry Co Ltd filed Critical Inspur Electronic Information Industry Co Ltd
Priority to CN201410630224.0A priority Critical patent/CN104363222A/en
Publication of CN104363222A publication Critical patent/CN104363222A/en
Pending legal-status Critical Current

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1416Event detection, e.g. attack signature detection
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1433Vulnerability analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a network security event analysis method based on Hadoop, which utilizes the characteristics of high efficiency, high fault tolerance, high expansion, high reliability and open source of the Hadoop in mass data processing, adopts the advantages of high fault tolerance and high flexibility of HDFS, and allows a user to deploy the Hadoop on common and low-cost hardware to form a distributed system; MapReduce provides a development parallel application program, and realizes distributed computation and parallel task processing on a cluster; the HDFS provides support of file operation, storage and the like in the MapReduce task processing process, a data acquisition system acquires network security event information from each network security device and generates data files, an API (application program interface) or a command is used for storing the network security event information into the HDFS, the data is stored in a distributed mode on a plurality of nodes of common hardware resources by the HDFS, then the event information is analyzed by the MapReduce, an analysis result is output to show that the MapReduce realizes work of distributing, tracking, executing and the like on the basis of the HDFS, the results are collected, and the two work interact with each other to complete main tasks of a Hadoop distributed cluster.

Description

A kind of network safety event analytical method based on Hadoop
Technical field
The present invention relates to network safety filed, be specifically related to a kind of network safety event analytical method based on Hadoop.
Background technology
The application of network security management platform, can realize visualizing monitor and the configuration of overall network safe condition, simplifies the complexity of manual analysis and management, saves network security human resources, and disposes for emergency response fast and provide technical basis.But along with information-based scale day by day increases, the increasing extent of network safety prevention is wide, and the complexity of safety management is also more and more higher.Application system uses more, and the data volume of safety is larger, and more and more higher to the requirement of network security management platform data processing, the mass data of process likely reaches TB, even PB level.If data-handling efficiency cannot tackle growing data scale requirement, the applicability to network security management platform, availability, reliability etc. are had an impact, the cost of manual maintenance also may increase greatly.Can say, the arrival of large data age, the analyzing and processing of magnanimity event will be one of most stern challenge of facing of network security management platform.
Hadoop is a Distributed Computing Platform of increasing income of Apache.Efficient, high fault-tolerant, the high expansion had in mass data processing due to Hadoop and high reliability and the feature of increasing income, it is widely adopted in numerous industry and scientific research field: Yahoo supports the research of ad system and Web search by Hadoop, its supported data of Facebook is analyzed and machine learning, Baidu uses Hadoop to carry out the search analysis of daily record and the excacation of web data, and the Hadoop system of Taobao is for storing and processing the related data etc. of e-commerce transaction.
HDFS, Hadoop Distributed File System, being called for short HDFS, is a distributed file system.HDFS has the feature of high fault tolerance (fault-tolerant), and design is used for being deployed on cheap (low-cost) hardware.And it provides high-throughput (high throughput) to visit the data of application program, be applicable to the application program that those have super large data set (large data set).The requirement (requirements) of HDFS relaxes (relax) POSIX can realize the data in form access (streaming access) file system flowed like this.HDFS start be the apache project nutch for increasing income foundation structure and create, HDFS is a part for hadoop project, and hadoop is a part of lucene.
MapReduce is the software architecture that Google proposes, for the concurrent operation of large-scale dataset (being greater than 1TB).Concept " Map(mapping) " and " Reduce(abbreviation) ", and their main thought, all borrow from Functional Programming, the characteristic of borrowing from vector programming language in addition.[1] current software simulating is that appointment Map(maps) function, one group of key-value pair is used for be mapped to one group of new key-value pair, specify concurrent Reduce(abbreviation) function, each being used for ensureing in the key-value pair of all mappings shares identical key group.
Summary of the invention
The technical problem to be solved in the present invention is: the technology of the present invention's application Hadoop process magnanimity event, propose a kind of network safety event analytical method based on Hadoop, the treatment effeciency be intended to for promoting network security management platform big data quantity in the future provides resolving ideas.
The technical solution adopted in the present invention is:
A kind of network safety event analytical method based on Hadoop, efficient, high fault-tolerant, the high expansion utilizing hadoop to have in mass data processing and high reliability and the feature of increasing income, adopt the advantage such as high fault tolerance, high scalability of HDFS, allow user to be deployed in by Hadoop on common cheap hardware, form distributed system, MapReduce provides exploitation concurrent application, and cluster realizes Distributed Calculation and parallel task process, HDFS provides the support such as file operation and storage in MapReduce task processes, data acquisition system is from each Network Security Device collection network security event information and generate data file, use API or order by them stored in HDFS, data are by HDFS distributed storage on the node of multiple common hardware resource, then use MapReduce to analyze event information and export analysis result and carry out showing that MapReduce achieves the distribution of task on the basis of HDFS, follow the tracks of, the work such as execution, and collect result, two-way interaction, complete the main task of Hadoop distributed type assemblies.
Described method adopts distributed storage HDFS cluster, is made up of, as shown in Figure 1 a NameNode and several DataNode.Wherein NameNode is as master server, and the NameSpace of managing file system and client are to the accessing operation of file; The data of the DataNode managed storage in cluster; HDFS allows user to store data in the form of a file; From inside, file is divided into several data blocks (Block) and leaves on one group of DataNode; The NameSpace operation of NameNode execute file system, as opened, closing, Rename file or catalogue etc., it is also responsible for the mapping of data block to concrete DataNode; DataNode is responsible for processing the file read-write request of file system client, and carry out under the United Dispatching of NameNode data block establishment, delete and copy.
Described MapReduce flow chart of data processing is by utilizing a key/value inputted to produce a key value exported to set to set, two function Map and Reduce in corresponding MapReduce storehouse, a MapReduce operation is some independently data blocks the data set cutting of input, processed in a parallel fashion by Map task, first carry out the sequence of Map output, then result is inputed to Reduce task; The input and output of operation all can be stored in file system, and each Map task and each Reduce task all can run on an independent computing node simultaneously.
Described Hadoop cluster adopts principal and subordinate (Master/Slave) pattern, and in the framework of Hadoop, namenode and jobtracker belongs to master, datanode and tasktracker belongs to slave, and master only has one, and slave has multiple.
The process of described method is as follows:
1) data acquisition system generates data file from each Network Security Device collection network security event information, and use API or order by them stored in HDFS, data are by HDFS distributed storage on the node of multiple common hardware resource;
2) use MapReduce to analyze event information and export analysis result to show; The input of MapReduce is from the network safety event information be stored in HDFS (can support text, binary system, database multiple format), when using MapReduce to carry out analytic process to event information, user needs self-defined Mapper, Reducer function;
3) input file is divided into (key1 one by one according to the InputDataFormat of setting by Hadoop, value1) right, then by these (key1, value1) set passes to map function and makes input processing, map function is according to input (key1, value1), form intermediate data (key2, value2) and exchange between node;
4) after map process completes, intermediate data (the key2 that these generate by Hadoop, value2) carry out divide into groups (sort) according to Key2, form <Key2, list (Value2) >, pass to reduce function afterwards, in this function, finally obtain the Output rusults <Key3 of program, Value3>;
5) output of oneself is written in destination file by reduce, uses output data format to configure the file format of output.
Beneficial effect of the present invention: the technology of the present invention's application Hadoop process magnanimity event, Hadoop aggregated structure based on HDFS and MapReduce also meets the application model of network security management platform, propose a kind of network safety event analytical method based on Hadoop, greatly promote the treatment effeciency of network security management platform big data quantity.Answer the challenge that the arrival of right large data age brings to the network security management platform of the analyzing and processing of magnanimity event, realize visualizing monitor and the configuration of overall network safe condition, simplify the complexity of manual analysis and management, save network security human resources, and for emergency response fast dispose technical basis is provided, increase the applicability to network security management platform, for the treatment effeciency promoting network security management platform big data quantity in the future provides very high technological value.
It is efficient that Hadoop has in mass data processing, high fault-tolerant, the application of high expansion and high reliability and the feature of increasing income and network security management platform, growing data scale can be tackled, the requirement that the complexity of safety management is also more and more higher, process mass data reaches TB, even PB level the network security of data, realize visualizing monitor and the configuration of overall network safe condition, simplify the complexity of manual analysis and management, save network security human resources, and for emergency response fast dispose technical basis is provided, increase the applicability to network security management platform, availability reliability etc. has an impact, HDFS adopts " write once to file, repeatedly read " Access Model.File does not just need have modified after establishment, write, closedown.This simplify Data Consistency, make the data access of high-throughput become possibility, the design of Mapreduce is well suited for using such model.The Data Source of network security management platform is mainly the log information of Network Security Device and system, for ensureing that data validity is once generation, do not allow amendment, be mainly used in therefore retrieval and inquisition and statistical analysis etc. also meet network security management platform application model based on the Hadoop aggregated structure of HDFS and MapReduce.
Accompanying drawing explanation
Fig. 1 is HDFS structural representation of the present invention.
Embodiment
Below according to Figure of description, in conjunction with specific embodiments, the present invention is further described:
Use three machines, all Ubuntu 11.04 is installed.One is the Name Node(192.168.1.1 of distributed file system HDFS) and the JobTracker node of MapReduce, other two machines (192.168.1.2,192.168.1.3) are as the Task Tracker node of Data Node and MapReduce of HDFS.In experimental situation, Name Node starts and stops all kinds of processes on Data Node by SSH.
The distributed storage of HDFS is in conjunction with the feature of the parallel distributed Computation schema of MapReduce, and build a safety case investigation Prototyping Platform based on Hadoop, the process of described method is as follows:
1) data acquisition system generates data file from each Network Security Device collection network security event information, and use API or order by them stored in HDFS, data are by HDFS distributed storage on the node of multiple common hardware resource;
2) use MapReduce to analyze event information and export analysis result to show; The input of MapReduce is from the network safety event information be stored in HDFS (can support text, binary system, database multiple format), when using MapReduce to carry out analytic process to event information, user needs self-defined Mapper, Reducer function;
3) input file is divided into (key1 one by one according to the InputDataFormat of setting by Hadoop, value1) right, then by these (key1, value1) set passes to map function and makes input processing, map function is according to input (key1, value1), form intermediate data (key2, value2) and exchange between node;
4) after map process completes, intermediate data (the key2 that these generate by Hadoop, value2) carry out divide into groups (sort) according to Key2, form <Key2, list (Value2) >, pass to reduce function afterwards, in this function, finally obtain the Output rusults <Key3 of program, Value3>;
5) output of oneself is written in destination file by reduce, uses output data format to configure the file format of output.

Claims (5)

1. the network safety event analytical method based on Hadoop, it is characterized in that: efficient, high fault-tolerant, the high expansion utilizing hadoop to have in mass data processing and high reliability and the feature of increasing income, adopt high fault tolerance, the high scalability advantage of HDFS, allow user to be deployed in by Hadoop on common cheap hardware, form distributed system, MapReduce provides exploitation concurrent application, and cluster realizes Distributed Calculation and parallel task process, HDFS provides the support such as file operation and storage in MapReduce task processes, data acquisition system is from each Network Security Device collection network security event information and generate data file, use API or order by them stored in HDFS, data are by HDFS distributed storage on the node of multiple common hardware resource, then use MapReduce to analyze event information and export analysis result and carry out showing that MapReduce achieves the distribution of task on the basis of HDFS, follow the tracks of, the work such as execution, and collect result, two-way interaction, complete the main task of Hadoop distributed type assemblies.
2. a kind of network safety event analytical method based on Hadoop according to claim 1, it is characterized in that: described method adopts distributed storage HDFS cluster, be made up of a NameNode and several DataNode, wherein NameNode is as master server, and the NameSpace of managing file system and client are to the accessing operation of file; The data of the DataNode managed storage in cluster; HDFS allows user to store data in the form of a file; From inside, file is divided into several data blocks and leaves on one group of DataNode; The NameSpace operation of NameNode execute file system, is also responsible for the mapping of data block to concrete DataNode; DataNode is responsible for processing the file read-write request of file system client, and carry out under the United Dispatching of NameNode data block establishment, delete and copy.
3. a kind of network safety event analytical method based on Hadoop according to claim 1 and 2, it is characterized in that: described MapReduce flow chart of data processing utilizes a key/value inputted to produce a key value exported to set to set, two function Map and Reduce in corresponding MapReduce storehouse, a MapReduce operation is some independently data blocks the data set cutting of input, processed in a parallel fashion by Map task, first carry out the sequence of Map output, then result is inputed to Reduce task; The input and output of operation all can be stored in file system, and each Map task and each Reduce task all can run on an independent computing node simultaneously.
4. a kind of network safety event analytical method based on Hadoop according to claim 3, it is characterized in that: described Hadoop cluster adopts master slave mode, in the framework of Hadoop, namenode and jobtracker belongs to master, datanode and tasktracker belongs to slave, master only has one, and slave has multiple.
5. a kind of network safety event analytical method based on Hadoop according to claim 4, it is characterized in that, the process of described method is as follows:
1) data acquisition system generates data file from each Network Security Device collection network security event information, and use API or order by them stored in HDFS, data are by HDFS distributed storage on the node of multiple common hardware resource;
2) use MapReduce to analyze event information and export analysis result to show; The input of MapReduce is the network safety event information from being stored in HDFS, and when using MapReduce to carry out analytic process to event information, user needs self-defined Mapper, Reducer function;
3) input file is divided into (key1 one by one according to the InputDataFormat of setting by Hadoop, value1) right, then by these (key1, value1) set passes to map function and makes input processing, map function is according to input (key1, value1), form intermediate data (key2, value2) and exchange between node;
4) after map process completes, intermediate data (the key2 that these generate by Hadoop, value2) divide into groups according to Key2, form <Key2, list (Value2) >, pass to reduce function afterwards, in this function, finally obtain the Output rusults <Key3 of program, Value3>;
5) output of oneself is written in destination file by reduce, uses output data format to configure the file format of output.
CN201410630224.0A 2014-11-11 2014-11-11 Hadoop-based network security event analysis method Pending CN104363222A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410630224.0A CN104363222A (en) 2014-11-11 2014-11-11 Hadoop-based network security event analysis method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410630224.0A CN104363222A (en) 2014-11-11 2014-11-11 Hadoop-based network security event analysis method

Publications (1)

Publication Number Publication Date
CN104363222A true CN104363222A (en) 2015-02-18

Family

ID=52530448

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410630224.0A Pending CN104363222A (en) 2014-11-11 2014-11-11 Hadoop-based network security event analysis method

Country Status (1)

Country Link
CN (1) CN104363222A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105391742A (en) * 2015-12-18 2016-03-09 桂林电子科技大学 Hadoop-based distributed intrusion detection system
CN105787009A (en) * 2016-02-23 2016-07-20 浪潮软件集团有限公司 Hadoop-based mass data mining method
CN106330982A (en) * 2016-11-22 2017-01-11 湖南优图信息技术有限公司 Network security monitoring platform and method
CN106354876A (en) * 2016-09-22 2017-01-25 珠海格力电器股份有限公司 Data processing system and method
CN106502795A (en) * 2016-11-03 2017-03-15 郑州云海信息技术有限公司 The method and system of scientific algorithm application deployment are realized on distributed type assemblies
CN107181612A (en) * 2017-05-08 2017-09-19 深圳市众泰兄弟科技发展有限公司 A kind of visual network method for safety monitoring based on big data
CN107562926A (en) * 2017-09-14 2018-01-09 丙申南京网络技术有限公司 For more hadoop distributed file systems of big data analysis
CN107579944A (en) * 2016-07-05 2018-01-12 南京联成科技发展股份有限公司 Based on artificial intelligence and MapReduce security attack Forecasting Methodologies
CN108052679A (en) * 2018-01-04 2018-05-18 焦点科技股份有限公司 A kind of Log Analysis System based on HADOOP
CN109669987A (en) * 2018-12-13 2019-04-23 国网河北省电力有限公司石家庄供电分公司 A kind of big data storage optimization method
CN112148804A (en) * 2019-06-28 2020-12-29 京东数字科技控股有限公司 Data preprocessing method, device and storage medium thereof
CN112491624A (en) * 2020-11-30 2021-03-12 江苏极鼎网络科技有限公司 Distributed data distribution method

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120131275A1 (en) * 2010-11-18 2012-05-24 Promise Technology, Inc Network-attached storage system
US20140196115A1 (en) * 2013-01-07 2014-07-10 Zettaset, Inc. Monitoring of Authorization-Exceeding Activity in Distributed Networks

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120131275A1 (en) * 2010-11-18 2012-05-24 Promise Technology, Inc Network-attached storage system
US20140196115A1 (en) * 2013-01-07 2014-07-10 Zettaset, Inc. Monitoring of Authorization-Exceeding Activity in Distributed Networks

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王红艳: "一种基于Hadoop架构的网络安全事件分析方法", 《信息网络安全》 *

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105391742A (en) * 2015-12-18 2016-03-09 桂林电子科技大学 Hadoop-based distributed intrusion detection system
CN105391742B (en) * 2015-12-18 2019-05-21 桂林电子科技大学 A kind of Distributed Intrusion Detection System based on Hadoop
CN105787009A (en) * 2016-02-23 2016-07-20 浪潮软件集团有限公司 Hadoop-based mass data mining method
CN107579944B (en) * 2016-07-05 2020-08-11 南京联成科技发展股份有限公司 Artificial intelligence and MapReduce-based security attack prediction method
CN107579944A (en) * 2016-07-05 2018-01-12 南京联成科技发展股份有限公司 Based on artificial intelligence and MapReduce security attack Forecasting Methodologies
CN106354876A (en) * 2016-09-22 2017-01-25 珠海格力电器股份有限公司 Data processing system and method
CN106502795A (en) * 2016-11-03 2017-03-15 郑州云海信息技术有限公司 The method and system of scientific algorithm application deployment are realized on distributed type assemblies
CN106330982A (en) * 2016-11-22 2017-01-11 湖南优图信息技术有限公司 Network security monitoring platform and method
CN107181612A (en) * 2017-05-08 2017-09-19 深圳市众泰兄弟科技发展有限公司 A kind of visual network method for safety monitoring based on big data
CN107562926A (en) * 2017-09-14 2018-01-09 丙申南京网络技术有限公司 For more hadoop distributed file systems of big data analysis
CN107562926B (en) * 2017-09-14 2023-09-26 丙申南京网络技术有限公司 Multi-hadoop distributed file system for big data analysis
CN108052679A (en) * 2018-01-04 2018-05-18 焦点科技股份有限公司 A kind of Log Analysis System based on HADOOP
CN109669987A (en) * 2018-12-13 2019-04-23 国网河北省电力有限公司石家庄供电分公司 A kind of big data storage optimization method
CN112148804A (en) * 2019-06-28 2020-12-29 京东数字科技控股有限公司 Data preprocessing method, device and storage medium thereof
CN112491624A (en) * 2020-11-30 2021-03-12 江苏极鼎网络科技有限公司 Distributed data distribution method
CN112491624B (en) * 2020-11-30 2023-05-23 江苏极鼎网络科技有限公司 Distributed data distribution method

Similar Documents

Publication Publication Date Title
CN104363222A (en) Hadoop-based network security event analysis method
Kune et al. The anatomy of big data computing
Barika et al. Orchestrating big data analysis workflows in the cloud: research challenges, survey, and future directions
Bakshi Considerations for big data: Architecture and approach
Polato et al. A comprehensive view of Hadoop research—A systematic literature review
Nandimath et al. Big data analysis using Apache Hadoop
Ahmad et al. Multilevel data processing using parallel algorithms for analyzing big data in high-performance computing
Costa et al. The SusCity big data warehousing approach for smart cities
Agrahari et al. A review paper on Big Data: technologies, tools and trends
Tu et al. IoT streaming data integration from multiple sources
Sethy et al. Big data analysis using Hadoop: a survey
Wakde et al. Comparative analysis of hadoop tools and spark technology
Barbierato et al. Performance evaluation of a data lake architecture via modeling techniques
Bakshi Big data analytics approach for network core and edge applications
Mishra et al. Challenges in big data application: a review
Yang et al. On construction of the air pollution monitoring service with a hybrid database converter
Khan Hadoop performance modeling and job optimization for big data analytics
Nagdive et al. A review of Hadoop ecosystem for bigdata
Suguna et al. Improvement of Hadoop ecosystem and their pros and cons in Big data
Kumar et al. Big data issues and challenges in 21st century
Jadhav et al. A Practical approach for integrating Big data Analytics into E-governance using hadoop
Gupta Big Data: New trend to handle big data
Zhang A hadoop processing method for massive sensor network data based on internet of things
Gupta et al. Analysing Distributed Big Data through Hadoop Map Reduce
Bo Study on massive e-government data cloud storage scheme based on Hadoop

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20150218