The security attack alarm navigation system of the large data platform of a kind of Spark based on Tachyou
Technical field
The present invention relates to the technical field of information security, the large data platform of Spark, Flume log collection, kafka data interchange platform, HDFS and Tachyou distributed memory file system, refer more particularly to security attack alarm navigation system.
Background technology
The English abbreviation comprised in the present invention is as follows:
SOC:SecurityOperationCenter security management center
IDS:IntrusionDetectionSystems intruding detection system
DDOS:DDoS:DistributedDenialofService distributed denial of service attack
MIS:ManagementInformationSystem management information system
DMZ:demilitarizedzone isolated area or demilitarized zone
JMS:JavaMessageServiceJava messenger service
APP:Application application program
SNMP:SimpleNetworkManagementProtocol Simple Network Management Protocol
HDFS:HadoopDistributeFileSystemHadoop distributed file system
ODBC:OpenDatabaseConnectivity Open Database Connection
WMI:WindowsManagementInstrumentationWindows management regulation
Safety in production is always the prerequisite ensureing that work in every is carried out in order, is also the rejection index of examination leading cadres at various levels.Network and information security operation and maintenance system is the important component part of all kinds of enterprise safety operation work.Logistics networks runs efficiently and stably, is the basis of all market management activities of enterprise and normal operation.
Along with the construction of all kinds of enterprise information system and perfect, effectively raise labor productivity, reduce operation cost.Once there is security incident or break down or forming property bottleneck in each operation system of enterprise, can not Timeliness coverage, in time process, recover in time, certainly will directly cause carrying the operation of all business thereon, affect the normal operation order of enterprise, business event can not normally be carried out.Therefore, the safety guarantee implemented for Government and enterprise IT basis just seems especially important.
Along with the Government and enterprise level of informatization improves constantly.Contact more and more closer between each operation system, exchanges data is more and more frequent, each system has complex network or logic to connect, there is mass data to exchange, even a fault can cause and become enterprise's the whole network fault, any or a kind of operation system start a leak and infect virus or under attack, will involve rapidly other operation system and network, even cause enterprise's the whole network paralysis.
Although the information security technology system of some enterprises begins to take shape at present, but information safety operation and maintenance management system needs further sound to improve and perfect, managerial ability also has to be strengthened, the degree of depth lacking potential safety hazard is excavated and based on the safety analysis of large data platform, security attack alarm location and analysis tool few.Owing to lacking macroscopical thinking of security system building, there is no-man's-land in safety management, responsibility does not have effective execution.
At present, there is following problem in all kinds of enterprise information security operation management platform:
1, various safety information product and the network equipment wide in variety, distribution is wide, lacks unified data analysis management;
2, the knowledge base disunity of safety information product and the network equipment, lacks unified solution;
3, security responsibility is unclear, and specific responsibility is not implemented completely;
4, information safety operation and maintenance management evaluation is not careful, lacks the index that part is necessary and crucial;
5, the analysis that between different safety means event, the event of even same safety means lacks high-grade intelligent more associates with convergence, causes warning information huge, is not easy to the analysis of potential safety hazard and pinpoints the problems, preventing trouble before it happens;
6, information security events reports not in time, and not in time, treatment effeciency is low for failure diagnosis, weak effect;
7, the leak of information security events and assets does not carry out necessary association analysis, causes a lot of event not have further treatment and analysis;
8, cannot carry out auditing and checking easily for the safety problem of terminal;
9, occur that emergency does not have good early warning and handling process;
10, security attack alarm location and analysis tool few;
There is the business and network that enterprise built up in the problems referred to above, becomes the obstacle that lifting is stablized in enterprise's service security operation management from now on to some extent.
For this reason, information-based means how are utilized to improve enterprise security operation management benefit, solve the safe operation management hidden danger existing for each system of enterprise, and design a security attack alarm navigation system based on large data platform, optimize enterprise information security and administer and maintain work, make it can provide specialty with high efficiency information safety operation and maintenance management service for all kinds of enterprise, namely become the important topic that especially information safety operation and maintenance management design must solve.
Summary of the invention
The present invention, after the defect analyzing above-mentioned all kinds of enterprise information security operation management and deficiency, proposes the security attack alarm navigation system of the large data platform of a kind of Spark based on Tachyou.
Technical assignment of the present invention realizes in such a way: the security attack alarm navigation system of the large data platform of a kind of Spark based on Tachyou, comprises acquisition module, security attack alarm locating module, views module.
Described acquisition module, for gathering the daily record of various equipment, and carries out preliminary treatment and real-time Transmission to locating module (SparkStreaming), and preliminary treatment comprises daily record and filters, merges and standardized format, unified daily record specification.
Described security attack alarm locating module, for the daily record collected, carries out real-time analysis and obtains warning information.
Described views module, by the information inquiry in MySQL database with represent, provides the inquiry and analysis of warning information and log information.
Said system passes through acquisition module, gather log information in enterprise information system and real time propelling movement to security attack alarm locating module, produce warning information by the real-time analysis of security attack alarm locating module and be sent to the front end page of views module, and providing attack to trace to the source, put to the proof and inquire about.
Preferably, acquisition module is developed and the distributed information system of increasing income by LinkedIn by flume being integrated into kafka(kafka) in realize the real-time Transmission of log collection, log integrity and daily record.It can gather syslog daily record, monitoring file daily record and the daily record of TCP/UDP port etc., and, can dock with SparkStreaming well, realize pretreated log information real-time Transmission to locating module.
Security attack alarm locating module obtains warning information to daily record real-time analysis, and is transferred to views module after receiving log information; Meanwhile, the log information after standardization is stored in HDFS, and warning information is stored in MySQL database.
Described security attack alarm locating module, comprises off-line association submodule, online association submodule, alarm generates submodule and attack type finds submodule.
Described off-line association submodule, utilizes the history log information be stored in HDFS to build alarm correlation analysis model, and online updating knowledge base.
Described online association submodule, utilizes knowledge base to carry out online association analysis.
Described attack type finds submodule, warning information is carried out cluster analysis, finds its characteristic sum challenge model.
Described knowledge base, at least comprises:
1, according to the partial content of wall scroll daily record as warning information; Such as, the login occurred in Windows daily record, startup and shutdown can, as warning information, can use ElasticSearch to carry out keyword search;
2, according to the frequency that special event in the unit interval occurs; Using this special event as warning information; Such as, occur in 1 minute in Windows daily record that the situation of 3 user cipher mistakes can as a Brute Force;
3, the association analysis between many device logs take analysis result as warning information; Such as, many target ip address are identical, and the daily record that source IP address is different, then can as a DDOS attack;
Described security attack alarm locating module, stores warning information in MySQL database, and by the daily record relevant to warning information also in this MySQL database.By the analysis to the log information be stored in MySQL database, the information such as attack source, attack path can be obtained further.
Compared with prior art, the security attack alarm navigation system of the large data platform of a kind of Spark based on Tachyou of the present invention, has following outstanding beneficial effect:
1, the distributed structure/architecture of large data platform, is easy to expansion and reduction, can tackle enterprise network scale change and change its own system size to reach effective utilization of resource, also solve the drawback that prior art is difficult to process massive logs;
2, positioning function improves the accuracy of alarm and eliminates wrong report, and provides detailed warning information analysis, facilitates its work;
3, utilize large data technique to carry out data mining and machine learning, the magnanimity history log information collected can be effectively utilized, by the off-line association analysis be combined with existing knowledge base, can automatic expansion knowledge base;
4, in large data system, usually can encounter a problem, whole large data are made up of subsystems, and data need high-performance, not arrheaing of low delay in subsystems to turn.Traditional enterprise information system is not be applicable to very much large-scale data processing.In order to settle application on site (message) and off-line application (data file, daily record) simultaneously, Kafka has just occurred; Further, can FlumeNG be integrated in Kafka, utilize many source and sink assemblies that FlumeNG is built-in, realize the log collection of various equipment, and be transferred to the large data platform of Spark or internal memory distributed file system Tachyon or HDFS or MySQL database or view front end by Kafka and carry out showing and inquiring about, etc.;
5, store intermediate object program by Tachyon, avoid data and fall on disk, share to realize internal storage data.Meanwhile, walk around HDFS and can reduce the therefore disk caused and network I/O.Moreover to be data cachedly all stored in Tachyon because all, the JVM collapse caused by Spark task abnormity can not cause loss of data.
Accompanying drawing explanation
Fig. 1 is the Organization Chart of the security attack alarm navigation system of the large data platform of a kind of Spark based on Tachyou of the present invention;
Fig. 2 is the flow chart of the security attack alarm navigation system of the large data platform of a kind of Spark based on Tachyou of the present invention;
Embodiment
Here be with reference to the accompanying drawings with example to further description of the present invention:
The security attack alarm navigation system of the large data platform of a kind of Spark based on Tachyou of the present invention, by flume being incorporated into the daily record of various equipment in kafka Distributed data exchange systematic collection corporate environment, and preliminary treatment in real time and be transferred to locating module.Locating module provides knowledge base to analyze in real time, and analysis result is pushed to foreground, and provides warning information to trace to the source and put to the proof function.Its framework is as shown in Figure 1: (1) acquisition module, is made up of kafka Distributed data exchange system.(2) security attack alarm locating module, is made up of SparkStreaming; (3) views module, provides the inquiry of warning information and analysis, log information and analysis.
Described acquisition module, is the prerequisite of security attack alarm location, is the bottom of whole system, can utilizes and flume is incorporated into kafka to realize acquisition module.The daily record of the various operating systems in this module primary responsibility collection enterprise network, router, switch, safety means, and preliminary treatment in real time and be transferred to locating module, namely in SparkStreaming system, it is the system of a High Availabitity, highly reliable, distributed massive logs collection, polymerization and transmission.ApacheKafka is a distributed post-subscribe message system.It is developed by LinkedIn company at first, becomes a part for Apache project afterwards.Log collection, realizes by being incorporated in Kafka by ApacheFlume.Use Kafka that processing procedure is postponed lower, more easily support multi-data source and distributed data processing.There is provided same performance efficiently compared with the system centered by daily record such as Flume, Kafka and ensure because copy the higher durability caused, and lower end-to-end delay.The daily record one by one flume collected by Kafka or the message flow of event, after preliminary treatment, and after providing in real time, the SparkStreaming streaming Computational frame of docking processes, and be transferred to view front end in real time by kafka and show.
Further, can Flume be integrated in Kafka, utilize many source and sink assemblies that Flume is built-in, realize the log collection of various equipment, and be transferred to the large data platform of Spark or internal memory distributed file system Tachyon or HDFS or MySQL database or view front end by Kafka and carry out showing and inquiring about, etc.
Flume is integrated in Kafka, can realize by configuring as follows: as the Flume end configuration of Producer, be such as wherein source data source with syslog, sink is kafka, and configure as the Flume end of consumer, source is Kafka, and sink is logger; Further, spark-streaming-kafka_2.10, spark-streaming-flume_2.10 be added to it to rely on.
Kafka is integrated in Spark, according to the announced version of current Spark (such as, Spark1.3), supports two kinds of methods: a kind of is support the method based on receiver, and another kind is direct method (not having receiver).Kafka is as Distributed Message Queue, existing very outstanding throughput, there are again higher reliability and autgmentability, adopt Kafka to transmit middleware as daily record and receive daily record, capture the daily record that in enterprise information system, various equipment sends, meanwhile, accept the request of SparkStreaming, daily record is sent to SparkStreaming cluster according to the order of sequence; Docked with Kafka cluster by SparkStreaming cluster, SparkStreaming obtains daily record one by one and to go forward side by side row relax from Kafka cluster.SparkStreaming can obtain data in real time and is stored in inner available memory space from Kafka cluster.Show and page request for the ease of front end, process the result obtained and will be written in MySQL database.
Compared to traditional process framework, framework Kafka being incorporated into SparkStreaming has following advantage: the efficient and low delay guaranteed of (1) the Spark framework real-time/quasi real time property of SparkStreaming operation; (2) what utilize Spark framework to provide enriches API and high flexibility, can write out comparatively complicated algorithm with simplifying; (3) height of programming model unanimously makes left-hand seat SparkStreaming quite easy, also can ensure multiplexing in process in real time and batch processing of service logic simultaneously.
Tachyon is a distributed memory file system, compatible with Spark.It is while alleviating Spark memory pressure, also gives the ability of Spark internal memory rapid, high volume reading and writing data.Tachyon separates the function of memory from Spark, makes Spark can of more absorbed calculating itself, in the hope of reaching higher execution efficiency and real-time performance by the thinner division of labor, and supports the multiple storage modes such as HDFS.
Described security attack alarm locating module, receive the log information that acquisition module sends over, on the one hand, the daily record after standardization is stored in HDFS, on the other hand, according to knowledge base, real-time analysis carried out to daily record and obtain warning information and be transferred to front end at once.MySQL database stores warning information and daily record this week.Knowledge base is such as: (1) according to the partial content of wall scroll daily record as warning information; Such as, the login occurred in Windows daily record, startup and shutdown can as warning information.(2) according to the frequency that special event in the unit interval occurs; Using this special event as warning information; Such as, occur in 3 minutes in Windows daily record that the situation of 5 user cipher mistakes can as a Brute Force.(3) association analysis between many device logs take analysis result as warning information; Such as, many target ip address are identical, and the daily record that source IP address is different, then can as a DDOS attack.The warning information produced is pushed to front end, timely alarm in real time.
Preferably, described off-line association submodule, by the large data platform of Spark based on Tachyou to the support of data mining, machine learning and figure computing technique.Off-line analysis is carried out to the history log be stored in HDFS, and is combined the knowledge entry producing and do not comprise in new knowledge base with knowledge base, find unknown attack.
Preferably, described online association submodule, carries out real-time analysis according to knowledge base to daily record.
Preferably, described alarm generates submodule, obtains warning information, and be transferred to front end at once, timely alarm by association submodule.
Preferably, described attack type finds submodule, by association analysis, finds its feature, finds attack type.
As shown in Figure 2, the flow chart of the security attack alarm navigation system of the large data platform of a kind of Spark based on Tachyou of the present invention is as follows:
(1) Real-time Collection daily record; Be integrated into by utilizing flume the log information that kafka gathers various kinds of equipment, and by journal format standardization, the equipment gathered comprises various operating system, router, switch, safety means.The daily record gathered, on the one hand, sends to HDFS to store, and on the other hand, sends to the large data platform of Spark based on Tachyou, carries out attack alarm location;
(2) off-line association; By the large data platform of Spark based on Tachyou to the support of data mining, machine learning and figure computing technique.Off-line analysis is carried out to the history log be stored in HDFS, and is combined the knowledge entry producing and do not comprise in new knowledge base with knowledge base, find unknown attack model, expansion knowledge base;
(3) online association; The large data platform of Spark based on Tachyou, according to knowledge base, carries out real-time analysis to the daily record from kafka, generally can produce many warning information;
(4) alarm is compared; These many alarms, according to knowledge base, in order to make produced warning information be conducive to location, can compare according to certain algorithm by the large data platform of Spark, and by comparative result stored in knowledge base, storehouse of refreshing one's knowledge;
(5) alarm level; The result produced according to " alarm is compared " to determine the rank of the order of severity of alarm, and by result stored in knowledge base, storehouse of refreshing one's knowledge; In general, alarm level is higher, then its destructiveness is larger;
(6) Alert aggregation; According to certain algorithm, alarm is carried out polymerization classification, generate multiple cluster;
(7) attack type finds; By multiple clusters that Alert aggregation generates, namely generate than simple multiple alarm graph structure (or illustraton of model) before, a challenge model figure is exactly the expression of an alarm figure, and storehouse of refreshing one's knowledge;
(8) view shows; The warning information that real-time reception locating module sends over, and alarm, and log query function etc. is provided.
The foregoing is only preferred embodiment of the present invention, be not used for limiting practical range of the present invention; Every equivalence done according to the present invention changes and amendment, is all regarded as the scope of the claims of the present invention and contains.