CN113347170B - Intelligent analysis platform design method based on big data framework - Google Patents

Intelligent analysis platform design method based on big data framework Download PDF

Info

Publication number
CN113347170B
CN113347170B CN202110585911.5A CN202110585911A CN113347170B CN 113347170 B CN113347170 B CN 113347170B CN 202110585911 A CN202110585911 A CN 202110585911A CN 113347170 B CN113347170 B CN 113347170B
Authority
CN
China
Prior art keywords
data
analysis
engine
subsystem
intelligent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110585911.5A
Other languages
Chinese (zh)
Other versions
CN113347170A (en
Inventor
唐延辉
冯政鑫
闫子淇
于丰齐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Computer Technology and Applications
Original Assignee
Beijing Institute of Computer Technology and Applications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Computer Technology and Applications filed Critical Beijing Institute of Computer Technology and Applications
Priority to CN202110585911.5A priority Critical patent/CN113347170B/en
Publication of CN113347170A publication Critical patent/CN113347170A/en
Application granted granted Critical
Publication of CN113347170B publication Critical patent/CN113347170B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/20Network architectures or network communication protocols for network security for managing network security; network security policies in general
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention relates to a design method of an intelligent analysis platform based on a big data frame, and relates to the technical field of network security. According to the design method of the intelligent analysis platform based on the big data framework, provided by the invention, according to a big data processing and analyzing framework and a safety analysis tool endowed with energy by artificial intelligence, the safety big data is collected, the safety event is mined, the safety threat is sensed, early warning and precaution are carried out in advance, the intelligent analysis of the network safety situation is realized, and the intelligent safety protection level is improved. The intelligent analysis platform based on the big data frame provided by the invention adopts the big data frame on the technical system, converges security situation data such as network flow, user behaviors, network boundaries, a service system, host endpoints and the like, adopts artificial intelligence technologies such as machine learning, deep learning, natural language processing, knowledge graph spectrum and the like, realizes fusion analysis and deep mining of the situation data, realizes real-time perception of security threat events and visual presentation of network security situations, and improves the continuous guarantee capability of security in a network space.

Description

Intelligent analysis platform design method based on big data framework
Technical Field
The invention relates to the technical field of network security, in particular to a design method of an intelligent analysis platform based on a big data framework.
Background
With the appearance and development of new technologies such as big data, artificial intelligence, internet of things, cloud computing, mobile internet and the like, networks face a more complicated situation in the field of information security than before, new security problems constantly emerge on the water surface, and new security events emerge endlessly. Here, there are not only various types of intrusion and attack coming from the outside, but also information security risks due to internal operation violation, improper management, and the like.
In the last two years, the network information system faces the network attack trend of more intellectualization, automation and weaponization, the traditional security defense system has serious risks and defects, and the increasingly complex and hidden high-level threats can not be solved: (1) the information is split and scattered and lacks cooperation by a single-point type and segment type safety protection means; (2) the characteristic threat detection cannot deal with advanced attack threats, and the technology is single and is easy to discriminate; (3) a passive defense system cannot provide an early warning mechanism, and the system is absent and neglects design; (4) the safety noise is large, the value information is easy to submerge, the personnel are not enough, and the management difficulty is large.
Aiming at the requirements, an intelligent safety brain is urgently needed to be constructed and used for sensing internal and external safety problems and network threats, just as people can sense the environment through each organ of the whole body and make comprehensive judgment through the brain, the safety brain can detect slight differences in a network space through behavior analysis services based on artificial intelligence, including engines of malicious code analysis, abnormal behavior analysis, abnormal flow analysis, encrypted flow analysis, comprehensive association analysis and the like, and finally give decision suggestions through the summary analysis to assist safety managers in working on duty, operation and maintenance and the like.
Disclosure of Invention
Technical problem to be solved
The technical problem to be solved by the invention is as follows: how to realize the fusion analysis and the deep excavation of situation data, realize the real-time perception of security threat events and the visual presentation of network security situation, and improve the continuous security guarantee capability of the security in the network space.
(II) technical scheme
In order to solve the technical problems, the invention provides a design method of an intelligent analysis platform based on a big data framework, which designs the intelligent analysis platform to comprise a data access subsystem, a big data service subsystem, an intelligent analysis framework subsystem and an intelligent analysis engine subsystem;
the data access subsystem supports the access of structured data, unstructured data and semi-structured data;
the big data service subsystem comprises a big data calculation model and a frame adopted by the big data service subsystem, provides real-time calculation, query and index, and can perform continuous big data flow calculation, including task assignment, task scheduling, task acquisition, task execution and task submission functions;
aiming at different analysis engines in an intelligent analysis engine subsystem, an intelligent analysis frame subsystem provides an engine basic operation environment, can realize the functions of analysis engine installation, engine configuration management and engine state analysis, supports engine integration and expansion, automatically allocates engine resources, monitors the engine resource state and the operation state in real time, and aims at constructing a set of standard engine integration frame, conveniently integrates, expands and manages different intelligent analysis engines according to the standard by relying on the set of frame, and provides basic capabilities of engine management and control, engine configuration modification and engine state management and control while supporting different engine basic operation environments and automatic resource allocation;
the intelligent analysis engine subsystem operates the environment on the basis of the intelligent analysis framework subsystem, accesses the multi-source data processed by the big data service subsystem according to a standard interface provided by the intelligent analysis framework subsystem, and provides behavior analysis service based on artificial intelligence.
Preferably, the data access subsystem realizes data access state monitoring, scheduling and security audit through a data access management technology.
Preferably, the structured data is transmitted by using sqoop, the sqoop is used as a data acquisition tool to transmit data between the source data and the structured data storage in the RDBMS, and for the unstructured data and the semi-structured data, the Flume is adopted to provide support for acquisition of the log data and the event data.
Preferably, the big data service subsystem classifies data by adopting MapReduce, sorts, cleans, converts and reduces data from different acquisition devices, and completes data preprocessing work, after data preprocessing, the data storage is mainly HDFS, an HDFS cluster is composed of a Namenode and Datanode of certain data, and the Namenode is used as a central server to manage the addressing path of a file name space; the data is an actual storage contact, the data is stored in a Block form, a plurality of Nanonodes are used as hot backup through a Zookeeper, new Nanonodes are generated by election after the Nanonodes are hung, yann is used as a scheduling basis, calculated original data and calculation results are stored on an HDFS (Hadoop distributed file system), each data unit is dynamically recorded by using the data nodes, the total data amount is updated in real time, the load index of each data unit is calculated at the same time, then the load indexes of the nodes are collected and sent to a management server, then the nodes redistribute resources, the data are fragmented and a data route is established, and an Elaticearch search engine is established to facilitate data search.
Preferably, five analysis engines, namely a malicious code analysis engine, an abnormal behavior analysis engine, an abnormal flow analysis engine, an encrypted flow analysis engine and a comprehensive association analysis engine, are initially built in the intelligent analysis engine subsystem, each analysis engine analyzes data logs obtained from a front-end probe from different technical dimensions so as to detect and identify various threat behaviors, the system adopts loose coupling and modular design, more analysis engines can be expanded, the intelligent analysis engine subsystem provides intelligent analysis for various network threats based on an AI intelligent technology, and known threats, known threat variants and unknown threats can be discovered.
Preferably, the malicious code analysis engine analyzes static and dynamic malicious codes by using machine learning algorithms such as a decision tree and a random forest, and classifies malicious families; the abnormal behavior analysis engine analyzes log data and threat intelligence data and classifies attack behavior data such as mail threat, webpage horse hanging threat, microblog attack threat, port scanning threat, virus slow blasting threat and communication behavior threat; the abnormal flow analysis engine analyzes the flow metadata, and analyzes a lost host, command control, DGA domain name attack, transverse movement attack, data leakage, base line, protocol abnormity and a hidden tunnel; the encryption flow analysis engine analyzes the malicious code encryption channel, the encryption application and the SSL channel; and the comprehensive correlation analysis engine collects and summarizes results of the malicious code analysis engine, the abnormal flow analysis engine and the encrypted flow analysis engine, then performs comprehensive correlation analysis, and returns the results to the service system.
Preferably, each type of analysis engine performs reactive intelligent analysis on the technical points of the attack method according to the ATT & CK model, and meanwhile, the intelligent analysis engine subsystem integrates intelligent algorithm hardware to realize accelerated operation of the intelligent analysis engine subsystem.
Preferably, the intelligent analysis engine subsystem provides intelligent analysis for various network threats based on machine learning technology.
The invention also provides an intelligent analysis platform based on the big data framework, which is designed by the method.
The invention also provides a working method of the intelligent analysis platform.
(III) advantageous effects
According to the design method of the intelligent analysis platform based on the big data framework, provided by the invention, according to a big data processing and analyzing framework and a safety analysis tool endowed with energy by artificial intelligence, the safety big data is collected, the safety event is mined, the safety threat is sensed, early warning and precaution are carried out in advance, the intelligent analysis of the network safety situation is realized, and the intelligent safety protection level is improved. The intelligent analysis platform based on the big data frame provided by the invention adopts the big data frame on the technical system, converges security situation data such as network flow, user behaviors, network boundaries, a service system, host endpoints and the like, adopts artificial intelligence technologies such as machine learning, deep learning, natural language processing, knowledge graph spectrum and the like, realizes fusion analysis and deep mining of the situation data, realizes real-time perception of security threat events and visual presentation of network security situations, and improves the continuous guarantee capability of security in a network space.
Drawings
FIG. 1 is a logical relationship diagram of an intelligent analysis platform of the present invention;
FIG. 2 is a diagram of an intelligent analysis platform service architecture of the present invention;
FIG. 3 is an architecture diagram of an intelligent analysis framework subsystem in accordance with the present invention.
Detailed Description
In order to make the objects, contents, and advantages of the present invention more apparent, the following detailed description of the present invention will be made in conjunction with the accompanying drawings and examples.
According to the design method of the intelligent analysis platform based on the big data framework, provided by the invention, according to a big data processing and analyzing framework and a safety analysis tool endowed with energy by artificial intelligence, the safety big data is collected, the safety event is mined, the safety threat is sensed, early warning and precaution are carried out in advance, the intelligent analysis of the network safety situation is realized, and the intelligent safety protection level is improved. The intelligent analysis platform based on the big data frame provided by the invention adopts the big data frame on the technical system, converges security situation data such as network flow, user behaviors, network boundaries, a service system, host endpoints and the like, adopts artificial intelligence technologies such as machine learning, deep learning, natural language processing, knowledge graph spectrum and the like, realizes fusion analysis and deep mining of the situation data, realizes real-time perception of security threat events and visual presentation of network security situations, and improves the continuous guarantee capability of security in a network space.
The invention integrates a large data processing frame and an artificial intelligence frame into a whole, constructs a processing flow which can contain multidimensional data, multistage ladder data aggregation and hierarchical calculation, completes intelligent threat detection and suspicious event analysis and screening based on machine learning and other modeling, and then reaches a comprehensive analysis processing platform based on platform computing power, such as correlation analysis, cluster analysis, statistical analysis and the like.
The invention designs an intelligent analysis platform based on a big data frame, which comprises the following four parts: the method adopts a message bus mode to interact with the outside, and data access provides structured and unstructured data convergence of log data, flow metadata, asset data, threat intelligence, safety situation data, operation and maintenance situation data and the like. Big data service: the method provides services such as load balancing, data routing, big data acquisition, big data preprocessing, big data storage and the like for an intelligent analysis engine, data access, an intelligent analysis framework and the like, and provides comprehensive situation data services for a situation presentation system. Intelligent analysis framework: the intelligent analysis framework provides analysis engine installation, engine configuration management, engine state analysis and engine basic operation environment, supports engine integration and expansion, automatically allocates engine resources, and monitors the state and the operation state of the engine resources in real time. An intelligent analysis engine: the intelligent analysis engine provides behavior analysis services based on artificial intelligence, and the services comprise malicious code analysis, abnormal behavior analysis, abnormal traffic analysis, encrypted traffic analysis and comprehensive correlation analysis engine.
The invention designs a big data framework-based logic relationship diagram in an intelligent analysis platform, which is shown in figure 1 and comprises the following parts:
(1) Data access subsystem
The data access subsystem supports the access of structured data and unstructured data, including weblogs, security logs, terminal logs and service logs. The network log comprises a flow session, an application behavior, file transmission, a login account and the like; the safety log comprises log information generated by entities such as network equipment, a host, a database, middleware, virtual equipment, an application system, gateway equipment and the like; the terminal log comprises file behavior, process behavior, mail behavior, registry access behavior and the like; the service log comprises service login, service query, transaction record, application information and the like. Interact with the outside using a distributed message bus. The data access subsystem realizes data access state monitoring, scheduling and safety audit through a data access management technology.
(2) Big data service subsystem
The first step of the big data service subsystem is to collect data from the data access subsystem and perform preprocessing operation, provide a uniform high-quality data set for subsequent processes, perform preliminary analysis and organization on the data, including characteristic extraction, relationship analysis, compliance analysis, model analysis and the like, provide functions of load balancing, data routing and the like for application of big data, and realize big data storage through a distributed message bus and a data service bus. The big data service subsystem simultaneously comprises the storage of original data and the storage of structured and unstructured data, and provides a persistence service through a distributed message bus and a data service bus so as to support a subsequent deeper data analysis process. To increase data throughput and reduce storage costs, a distributed architecture is typically employed to store large data. The adopted big data calculation model and framework have the characteristics of distribution and high fault tolerance, and provide real-time calculation, query and index. Continuous big data stream calculation can be carried out, and the functions comprise task assignment, task scheduling, task acquisition, task execution, task submission and the like. The computing frame is the core of the whole system, and functions of data access, data organization, data storage, data analysis, data service and the like are coordinated through a distributed message bus and a data service bus.
(3) Intelligent analysis framework subsystem
Aiming at different analysis engines in the intelligent analysis engine subsystem, the intelligent analysis framework subsystem provides an engine basic operation environment and can realize the functions of analysis engine installation, engine configuration management and engine state analysis. The functions of supporting engine integration and expansion, automatically distributing engine resources, monitoring the state and the running state of the engine resources in real time and the like are realized. The intelligent analysis framework subsystem aims at constructing a set of standard engine integration framework, conveniently integrating, expanding and managing different intelligent analysis engines according to the standard by means of the set of framework, supporting basic operation environments and automatic resource allocation of different engines, and providing basic capabilities of engine management and control, engine configuration modification, engine state management and control and the like.
(4) Intelligent analysis engine subsystem
The intelligent analysis engine subsystem operates the environment on the basis of the intelligent analysis framework subsystem, accesses the multi-source data processed by the big data service subsystem according to a standard interface provided by the intelligent analysis framework subsystem, and provides behavior analysis service based on artificial intelligence. The intelligent analysis engine subsystem is internally provided with five analysis engines, namely a malicious code analysis engine, an abnormal behavior analysis engine, an abnormal flow analysis engine, an encrypted flow analysis engine and a comprehensive correlation analysis engine, at the initial stage, and each engine analyzes data logs obtained from a front-end probe from different technical dimensions, so that various threat behaviors are detected and identified. The system adopts loose coupling and modular design, and can expand more analysis engines. The intelligent analysis engine subsystem provides intelligent analysis for various network threats based on AI intelligent technologies such as machine learning and the like, and can effectively discover known threats, known threat variants and unknown threats. In each type of analysis engine, the technical points of the attack method can be subjected to antagonistic intelligent analysis according to an ATT & CK model, and meanwhile, an intelligent analysis engine subsystem can integrate intelligent algorithm hardware to realize accelerated operation of the intelligent analysis engine subsystem.
The intelligent analysis platform based on big data designed by the invention is a set of intelligent data analysis software and hardware integrated equipment capable of running on domestic hardware, can use various data to realize intelligent mining and analysis through intelligent analysis engines based on machine learning, deep learning and the like, forms high-accuracy security event alarm and situation data with complete coverage, and has a specific logical architecture as shown in figure 2. The scheme design strictly follows the overall design target and the design principle, and is designed hierarchically according to the technical architecture, the functional architecture and the interface design. The whole set of platform consists of a plurality of software-hardware integrated server devices and is divided into a hardware platform, a data access part, a big data service part, an intelligent analysis framework and an intelligent analysis engine.
An intelligent analysis platform based on a big data frame is based on hadoop as technical model selection; the analysis engine with different functions is designed by collecting structured and unstructured data for preprocessing, storing and calculating, utilizing a big data management framework to manage the analysis engine, and utilizing an artificial intelligence algorithm according to business, and an analysis result is presented. The specific design method of each subsystem is as follows:
the data access subsystem is used as the bottommost layer of the system, is directly accessed and collected with a data source and provides data for the intelligent analysis engine subsystem, wherein the accessed data is divided into structured data and unstructured data. Structured data is transmitted using sqoop, which serves as a data collection tool to transmit data between the source data and the structured data store in the RDBMS. For unstructured data and semi-structured data, the adoption of Flume provides better support for collection of log data and event data, and meanwhile, kafka is used as a message subscription system and matched with Flume as a data source for implementing data processing. The data access subsystem mainly takes data scheduling, data access state monitoring and data security auditing as main functions, the DAG workflow scheduling system mainly serves a plurality of business operations, the flow between the operations depends on a complex scene, and in the subsystem, data acquisition task scheduling often depends on proper workflow scheduling to complete auxiliary data acquisition work. And the data access state monitoring adopts a Zabbix open source monitoring system to monitor the data access state, so that the user can know the data state, and the access strategy is adjusted and perfected. For the safety audit of the access data, the subsystem adopts the DAS technology to carry out first-layer screening and filtering on the access data, and simultaneously carries out preliminary filtering and screening on the access of malicious data and dirty data.
High-quality data are necessary elements for obtaining effective results through an artificial intelligence algorithm, and most of the acquired data are incomplete, inconsistent in structure and dirty data containing noise and cannot be directly used for data analysis and mining. The big data service subsystem serves as a bridge between data acquisition and data calculation in the whole framework. After the data are collected, the generated data are classified by adopting MapReduce, and the data from different collection devices are sorted, cleaned, converted and subjected to data specification, so that the data are preprocessed. After data preprocessing, the data storage is mainly HDFS, an HDFS cluster is composed of a Namenode and data of certain data, and the Namenode is used as a central server and is responsible for managing an addressing path of a file name space; the dataode is an actual storage contact, data is stored on the Datanode in a Block form, a plurality of Nanonodes are used as hot backup through a Zookeeper, and high availability is realized by selecting and generating new Nanonodes after the Nanonodes are hung; hbase is a data storage technology with a Master/Slave architecture, and a client acquires required data through Zookeeper each time and then directly queries other communication. The platform takes Yarn as a scheduling basis, and calculated original data and a calculation result are stored on the HDFS; the same load balancing based on Hbase storage can effectively utilize computer resources to carry out data interaction and reasonable scheduling distribution, each data unit is dynamically recorded by utilizing a data joint, the total data amount is updated in real time, the load index of each data unit is calculated at the same time, then the load indexes of the joints are gathered and sent to a management server, and then the resources are redistributed by the nodes, so that the load balancing of the computing resources is achieved. In order to more efficiently read and calculate the stored data, the data is fragmented and a data route is established, and then an Elaticearch search engine is established to facilitate data searching.
Aiming at different analysis engines in the intelligent analysis engine subsystem, the intelligent analysis framework subsystem provides management of the basic operation environment of the engines and can realize the functions of analysis engine installation, engine configuration management and engine state analysis. The functions of supporting engine integration and expansion, automatically distributing engine resources, monitoring the state and the running state of the engine resources in real time and the like are realized. For basic operation environments, such as TensorFlow, deep Mind Lab and the like, integration and unified management are carried out, and development languages mainly comprise Python and C + +. The customized and modularized packaging of the algorithm is realized through an open source algorithm framework according to the service requirement of the algorithm, and each algorithm engine adopts a uniform data interface, so that higher flexibility is provided. Meanwhile, the intelligent analysis framework monitors the running state and the resource occupation condition of the current engine.
The big data intelligent analysis engine subsystem is deployed on the big data intelligent analysis framework subsystem, and intelligent analysis is performed by using a basic development environment provided by the intelligent analysis framework subsystem. The intelligent analysis engine subsystem is divided into five engines, namely a malicious code analysis engine, an abnormal behavior analysis engine, an abnormal flow analysis engine, an encryption flow analysis engine and a comprehensive correlation analysis engine. The malicious code analysis engine analyzes static and dynamic malicious codes by using machine learning algorithms such as a decision tree and a random forest, and classifies malicious families. The abnormal behavior analysis engine analyzes log data and threat information data, and classifies attack behavior data such as mail threats, webpage horse hanging threats, microblog attack threats, port scanning threats, virus slow blasting threats and communication behavior threats, and the classification algorithm is mainly machine learning algorithms such as SVM, quasi-annealing algorithm and the like. The abnormal flow analysis engine analyzes the flow metadata, and analyzes a lost host, C & C (command control), DGA domain name attack, transverse movement attack, data leakage, base line, protocol abnormity and a hidden tunnel, wherein the analysis algorithm mainly takes a CNN (convolutional neural network) algorithm, an SVM (support vector machine) and the like as the main components. The encryption flow analysis engine analyzes a malicious code encryption channel, encryption application and an SSL channel, and the main algorithm of analysis is decision tree and random forest. And the comprehensive correlation analysis engine collects and summarizes results of the malicious code analysis engine, the abnormal flow analysis engine and the encrypted flow analysis engine, then performs comprehensive correlation analysis, and returns the results to the service system.
The specific working process of the intelligent analysis platform comprises the following steps: the method comprises the steps of utilizing an external data acquisition module to acquire different data sources, utilizing a data access subsystem to manage acquired structured data and unstructured data, and enabling the data access subsystem to have safety audit, state monitoring and data access scheduling functions. The big data service subsystem is responsible for preprocessing the acquired data and then storing the data in the distributed database to facilitate query and call, wherein the big data service subsystem provides data routing and load balancing services and helps to share and optimize the data. The intelligent analysis framework subsystem is a module which is mainly proposed by the invention and aims to establish a bridge between data storage and data calculation and decouple functions of data management, resource management and the like in the intelligent analysis engine subsystem. The intelligent analysis framework subsystem provides basic operation environments including various machine learning frameworks, compiling environments and the like, and meanwhile, the intelligent analysis framework subsystem comprises an integrated extension module, a configuration management module, an engine state analysis module and a computing resource management module, and provides visual services for extension, configuration management, operation state analysis and resource allocation of the intelligent analysis engine subsystem. The intelligent analysis engine subsystem is used as an intelligent brain of the system, calculates the preprocessed data, provides a malicious code analysis engine, an abnormal behavior analysis engine, an abnormal flow analysis engine and an encrypted flow analysis engine, collects the analysis results of the analysis engines, performs comprehensive correlation analysis, and finally provides the results to service business.
The invention realizes an integral design method facing mass data processing, based on a big data architecture, realizes structured and unstructured multi-source data aggregation, and realizes a complete processing whole process of multi-source data collection, suspected threat screening, threat data summarization, detection data analysis and analysis result display aiming at data aggregation, calculation, analysis and display faced by the intelligent network threat discovery service.
The invention provides a sustainable expansion-oriented analysis capability design method, which provides a set of complete threat detection analysis application platform framework based on analysis technologies such as machine learning, big data modeling and the like, the platform is based on various component libraries and components, follows the design of systematization, layering and iteration processes, integrates the specific service application characteristics of customers, realizes the centralized and unified processing of multi-source heterogeneous data, provides an open application interface according to the requirements of openness, transportability, compatibility, expandability and the like, can conveniently interconnect with the application systems of the same type of other manufacturers through software and hardware platforms, and facilitates the future expansion of the system.
The invention provides an intelligent analysis capability design method facing a threat model. With the popularization of knowledge of the related technologies of defense and attack confrontation, threat detection and analysis begin to be closer to various technologies used by attackers, and the technologies are also more comprehensive and systematized. Therefore, when the detection capability point of the intelligent analysis engine is designed, the KillChain model and the ATT & CK model are mainly considered, so that the detection capability point can cover various threat attack methods, the systematized continuous accumulation detection capability can be realized, and the full-chain and global coverage of the threat detection points can be realized.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims (6)

1. A design method of an intelligent analysis platform based on a big data frame is characterized in that the method designs the intelligent analysis platform to comprise a data access subsystem, a big data service subsystem, an intelligent analysis frame subsystem and an intelligent analysis engine subsystem;
the data access subsystem supports the access of structured data, unstructured data and semi-structured data;
the big data service subsystem comprises a big data calculation model and a frame adopted by the big data service subsystem, provides real-time calculation, query and index, and can perform continuous big data flow calculation, including task assignment, task scheduling, task acquisition, task execution and task submission functions;
aiming at different analysis engines in an intelligent analysis engine subsystem, an intelligent analysis frame subsystem provides an engine basic operation environment, can realize the functions of analysis engine installation, engine configuration management and engine state analysis, supports engine integration and expansion, automatically allocates engine resources, monitors the engine resource state and the operation state in real time, and aims at constructing a set of standard engine integration frame, conveniently integrates, expands and manages different intelligent analysis engines according to the standard by relying on the set of frame, and provides basic capabilities of engine management and control, engine configuration modification and engine state management and control while supporting different engine basic operation environments and automatic resource allocation;
the intelligent analysis engine subsystem operates the environment on the basis of the intelligent analysis framework subsystem, accesses the multi-source data processed by the big data service subsystem according to a standard interface provided by the intelligent analysis framework subsystem, and provides behavior analysis service based on artificial intelligence;
the intelligent analysis engine subsystem is internally provided with five analysis engines, namely a malicious code analysis engine, an abnormal behavior analysis engine, an abnormal flow analysis engine, an encrypted flow analysis engine and a comprehensive correlation analysis engine, in the initial stage, each analysis engine analyzes data logs obtained from a front-end probe from different technical dimensions so as to detect and identify various threat behaviors, the system adopts loose coupling and modular design, more analysis engines can be expanded, the intelligent analysis engine subsystem provides intelligent analysis for various network threats based on an AI intelligent technology, and known threats, known threat varieties and unknown threats can be found.
2. The method of claim 1, wherein the data access subsystem implements data access status monitoring, scheduling, and security auditing through data access management techniques.
3. The method of claim 1, wherein the structured data is transmitted using sqoop, which is used as a data collection tool to transmit data between source data and structured data stores in the RDBMS, and wherein for unstructured and semi-structured data, flume is used to support collection of log and event data.
4. The method of claim 1, wherein the malicious code analysis engine analyzes static and dynamic malicious code using machine learning algorithms such as decision trees and random forests, and classifies malicious families; the abnormal behavior analysis engine analyzes log data and threat intelligence data and classifies attack behavior data such as mail threat, webpage horse hanging threat, microblog attack threat, port scanning threat, virus slow blasting threat and communication behavior threat; the abnormal flow analysis engine analyzes the flow metadata, and analyzes a lost host, command control, DGA domain name attack, transverse movement attack, data leakage, base line, protocol abnormity and a hidden tunnel; the encryption flow analysis engine analyzes the malicious code encryption channel, the encryption application and the SSL channel; and the comprehensive correlation analysis engine collects and summarizes the results of the malicious code analysis engine, the abnormal flow analysis engine and the encrypted flow analysis engine, then performs comprehensive correlation analysis and returns the results to the service system.
5. The method of claim 1, wherein each type of analysis engine performs reactive intelligent analysis on the technical points of the attack method according to the ATT & CK model, and the intelligent analysis engine subsystem integrates intelligent algorithm hardware to realize accelerated operation of the intelligent analysis engine subsystem.
6. The method of claim 1, wherein the intelligent analysis engine subsystem provides intelligent analysis of various types of cyber threats based on machine learning techniques.
CN202110585911.5A 2021-05-27 2021-05-27 Intelligent analysis platform design method based on big data framework Active CN113347170B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110585911.5A CN113347170B (en) 2021-05-27 2021-05-27 Intelligent analysis platform design method based on big data framework

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110585911.5A CN113347170B (en) 2021-05-27 2021-05-27 Intelligent analysis platform design method based on big data framework

Publications (2)

Publication Number Publication Date
CN113347170A CN113347170A (en) 2021-09-03
CN113347170B true CN113347170B (en) 2023-04-18

Family

ID=77471875

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110585911.5A Active CN113347170B (en) 2021-05-27 2021-05-27 Intelligent analysis platform design method based on big data framework

Country Status (1)

Country Link
CN (1) CN113347170B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114791893B (en) * 2021-12-15 2023-05-09 许磊 Serialization system for random data access
CN114416891B (en) * 2022-03-28 2022-07-15 支付宝(杭州)信息技术有限公司 Method, system, apparatus and medium for data processing in a knowledge graph
CN114896305A (en) * 2022-05-24 2022-08-12 内蒙古自治区公安厅 Smart internet security platform based on big data technology
CN115174154A (en) * 2022-06-13 2022-10-11 盈适慧众(上海)信息咨询合伙企业(有限合伙) Advanced threat event processing method and device, terminal equipment and storage medium
CN115118525B (en) * 2022-08-23 2022-12-13 天津天元海科技开发有限公司 Internet of things safety protection system and protection method thereof

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108769048A (en) * 2018-06-08 2018-11-06 武汉思普崚技术有限公司 A kind of secure visualization and Situation Awareness plateform system

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10728264B2 (en) * 2017-02-15 2020-07-28 Micro Focus Llc Characterizing behavior anomaly analysis performance based on threat intelligence
CN107733986B (en) * 2017-09-15 2021-01-26 中国南方电网有限责任公司 Protection operation big data supporting platform supporting integrated deployment and monitoring
US20190236485A1 (en) * 2018-01-26 2019-08-01 Cisco Technology, Inc. Orchestration system for distributed machine learning engines
CN108595473A (en) * 2018-03-09 2018-09-28 广州市优普计算机有限公司 A kind of big data application platform based on cloud computing
US11580418B2 (en) * 2019-03-17 2023-02-14 Phizzle, Inc. Dynamically updateable rules engine
CN110474973B (en) * 2019-08-08 2022-02-08 三星电子(中国)研发中心 Method, system and equipment for sharing intelligent engine by multiple equipment

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108769048A (en) * 2018-06-08 2018-11-06 武汉思普崚技术有限公司 A kind of secure visualization and Situation Awareness plateform system

Also Published As

Publication number Publication date
CN113347170A (en) 2021-09-03

Similar Documents

Publication Publication Date Title
CN113347170B (en) Intelligent analysis platform design method based on big data framework
CN107196910B (en) Threat early warning monitoring system, method and deployment framework based on big data analysis
CN108270785B (en) Knowledge graph-based distributed security event correlation analysis method
Khare et al. Big data in IoT
CN108494810B (en) Attack-oriented network security situation prediction method, device and system
CN106778253A (en) Threat context aware information security Initiative Defense model based on big data
CN107517216B (en) Network security event correlation method
CN109902072A (en) A kind of log processing system
CN108039959A (en) Situation Awareness method, system and the relevant apparatus of a kind of data
CN107992746A (en) Malicious act method for digging and device
CN102984140B (en) Malicious software feature fusion analytical method and system based on shared behavior segments
CN116662989B (en) Security data analysis method and system
CN112738040A (en) Network security threat detection method, system and device based on DNS log
CN102790706A (en) Safety analyzing method and device of mass events
CN113642023A (en) Data security detection model training method, data security detection device and equipment
CN110392039A (en) Network system events source tracing method and system based on log and flow collection
CN103258027A (en) Context awareness service platform based on intelligent terminal
CN114430331A (en) Network security situation sensing method and system based on knowledge graph
CN110427298A (en) A kind of Automatic Feature Extraction method of distributed information log
CN112651872A (en) Community comprehensive treatment system and method based on data middlebox
CN102903009B (en) Malfunction diagnosis method based on generalized rule reasoning and used for safety production cloud service platform facing industrial and mining enterprises
CN115664703A (en) Attack tracing method based on multi-dimensional information
CN111934954A (en) Broadband detection method and device, electronic equipment and storage medium
CN113542074B (en) Method and system for visually managing east-west network flow of kubernets cluster
CN110311927A (en) Data processing method and its device, electronic equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant