CN113347170A - Intelligent analysis platform design method based on big data framework - Google Patents

Intelligent analysis platform design method based on big data framework Download PDF

Info

Publication number
CN113347170A
CN113347170A CN202110585911.5A CN202110585911A CN113347170A CN 113347170 A CN113347170 A CN 113347170A CN 202110585911 A CN202110585911 A CN 202110585911A CN 113347170 A CN113347170 A CN 113347170A
Authority
CN
China
Prior art keywords
data
analysis
engine
subsystem
intelligent
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110585911.5A
Other languages
Chinese (zh)
Other versions
CN113347170B (en
Inventor
唐延辉
冯政鑫
闫子淇
于丰齐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Computer Technology and Applications
Original Assignee
Beijing Institute of Computer Technology and Applications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Computer Technology and Applications filed Critical Beijing Institute of Computer Technology and Applications
Priority to CN202110585911.5A priority Critical patent/CN113347170B/en
Publication of CN113347170A publication Critical patent/CN113347170A/en
Application granted granted Critical
Publication of CN113347170B publication Critical patent/CN113347170B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1408Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
    • H04L63/1425Traffic logging, e.g. anomaly detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/20Network architectures or network communication protocols for network security for managing network security; network security policies in general
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Artificial Intelligence (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Computer And Data Communications (AREA)

Abstract

The invention relates to a design method of an intelligent analysis platform based on a big data frame, and relates to the technical field of network security. According to the design method of the intelligent analysis platform based on the big data framework, provided by the invention, according to a big data processing and analyzing framework and by combining an artificial intelligence energized safety analysis tool, the safety big data is collected, the safety event is mined, the safety threat is sensed, early warning and precaution are carried out in advance, the intelligent analysis of the network safety situation is realized, and the intelligent safety protection level is improved. The intelligent analysis platform based on the big data frame provided by the invention adopts the big data frame on the technical system, converges security situation data such as network flow, user behaviors, network boundaries, a service system, host endpoints and the like, adopts artificial intelligence technologies such as machine learning, deep learning, natural language processing, knowledge graph spectrum and the like, realizes fusion analysis and deep mining of the situation data, realizes real-time perception of security threat events and visual presentation of network security situations, and improves the continuous guarantee capability of security in a network space.

Description

Intelligent analysis platform design method based on big data framework
Technical Field
The invention relates to the technical field of network security, in particular to a design method of an intelligent analysis platform based on a big data framework.
Background
With the appearance and development of new technologies such as big data, artificial intelligence, internet of things, cloud computing, mobile internet and the like, networks face a more complicated situation in the field of information security than before, new security problems constantly emerge on the water surface, and new security events emerge endlessly. Here, there are not only the infinite invasion and attack from the outside, but also the information security risk caused by the operation violation and the improper management from the inside.
In the last two years, the network information system faces the network attack trend of more intellectualization, automation and weaponization, the traditional security defense system has serious risks and defects, and the increasingly complex and hidden high-level threats can not be solved: firstly, a single-point type and fragment type safety protection means is adopted, information is split and scattered, and the cooperation is lacked; secondly, characteristic threat detection cannot cope with advanced attack threats, and the technology is single and is better than discrimination; passive defense system, unable to provide early warning mechanism, system missing, neglect design; fourthly, the safety noise is large, the value information is easy to submerge, the personnel is not enough, and the management difficulty is large.
Aiming at the requirements, an intelligent safety brain is urgently needed to be constructed and used for sensing internal and external safety problems and network threats, just as people can sense the environment through each organ of the whole body and make comprehensive judgment through the brain, the safety brain can detect slight differences in a network space through behavior analysis services based on artificial intelligence, including engines of malicious code analysis, abnormal behavior analysis, abnormal flow analysis, encrypted flow analysis, comprehensive association analysis and the like, and finally give decision suggestions through the summary analysis to assist safety managers in working on duty, operation and maintenance and the like.
Disclosure of Invention
Technical problem to be solved
The technical problem to be solved by the invention is as follows: how to realize the fusion analysis and the deep excavation of situation data, realize the real-time perception of security threat events and the visual presentation of network security situation, and improve the continuous security guarantee capability of the security in the network space.
(II) technical scheme
In order to solve the technical problems, the invention provides a design method of an intelligent analysis platform based on a big data framework, which designs the intelligent analysis platform to comprise a data access subsystem, a big data service subsystem, an intelligent analysis framework subsystem and an intelligent analysis engine subsystem;
the data access subsystem supports the access of structured data, unstructured data and semi-structured data;
the big data service subsystem collects data from the data access subsystem and carries out preprocessing operation, provides a uniform data set for subsequent processes, carries out preliminary analysis and organization on the data, comprises characteristic extraction, relation analysis, compliance analysis and model analysis, provides load balancing and data routing functions for application of the big data, realizes big data storage through a distributed message bus and a data service bus, simultaneously comprises the functions of storing original data, storing structured and unstructured data, provides persistence service through the distributed message bus and the data service bus to support subsequent deeper data analysis processes, adopts a big data calculation model and a frame to provide real-time calculation, query and index, and can carry out continuous big data flow calculation, the method comprises the following functions of task assignment, task scheduling, task acquisition, task execution and task submission;
aiming at different analysis engines in an intelligent analysis engine subsystem, an intelligent analysis frame subsystem provides an engine basic operation environment, can realize the functions of analysis engine installation, engine configuration management and engine state analysis, supports engine integration and expansion, automatically allocates engine resources, monitors the engine resource state and the operation state in real time, and aims at constructing a set of standard engine integration frame, conveniently integrates, expands and manages different intelligent analysis engines according to the standard by relying on the set of frame, and provides basic capabilities of engine management and control, engine configuration modification and engine state management and control while supporting different engine basic operation environments and automatic resource allocation;
the intelligent analysis engine subsystem operates the environment on the basis of the intelligent analysis framework subsystem, accesses the multi-source data processed by the big data service subsystem according to a standard interface provided by the intelligent analysis framework subsystem, and provides behavior analysis service based on artificial intelligence.
Preferably, the data access subsystem realizes data access state monitoring, scheduling and security audit through a data access management technology.
Preferably, the structured data is transmitted by using sqoop, the sqoop is used as a data acquisition tool to transmit data between the source data and the structured data storage in the RDBMS, and for the unstructured data and the semi-structured data, the Flume is adopted to provide support for acquisition of the log data and the event data.
Preferably, the big data service subsystem classifies data by adopting MapReduce, arranges, cleans, converts and defines data from different acquisition devices, and completes data preprocessing work, after data preprocessing, the data storage is mainly HDFS, an HDFS cluster is composed of a Namenode and a Datanode of certain data, and the Namenode is used as a central server to manage the addressing path of a file name space; the data is an actual storage contact, the data is stored in a Block form, a plurality of Nanonodes are used as hot backup through a Zookeeper, new Nanonodes are generated by election after the Nanonodes are hung, Yann is used as a scheduling basis, calculated original data and calculation results are stored on an HDFS (Hadoop distributed file system), each data unit is dynamically recorded by using the data nodes, the total data amount is updated in real time, the load index of each data unit is calculated at the same time, then the load indexes of the nodes are collected and sent to a management server, then the nodes redistribute resources, the data are fragmented and a data route is established, and an Elaticearch search engine is established to facilitate data search.
Preferably, five analysis engines, namely a malicious code analysis engine, an abnormal behavior analysis engine, an abnormal flow analysis engine, an encrypted flow analysis engine and a comprehensive association analysis engine, are initially built in the intelligent analysis engine subsystem, each analysis engine analyzes data logs obtained from a front-end probe from different technical dimensions so as to detect and identify various threat behaviors, the system adopts loose coupling and modular design, more analysis engines can be expanded, the intelligent analysis engine subsystem provides intelligent analysis for various network threats based on an AI intelligent technology, and known threats, known threat variants and unknown threats can be discovered.
Preferably, the malicious code analysis engine analyzes static and dynamic malicious codes by using machine learning algorithms such as a decision tree and a random forest, and classifies malicious families; the abnormal behavior analysis engine analyzes log data and threat intelligence data and classifies attack behavior data such as mail threat, webpage horse hanging threat, microblog attack threat, port scanning threat, virus slow blasting threat and communication behavior threat; the abnormal flow analysis engine analyzes the flow metadata, and analyzes a lost host, command control, DGA domain name attack, transverse movement attack, data leakage, base line, protocol abnormity and a hidden tunnel; the encryption flow analysis engine analyzes the malicious code encryption channel, the encryption application and the SSL channel; and the comprehensive correlation analysis engine collects and summarizes results of the malicious code analysis engine, the abnormal flow analysis engine and the encrypted flow analysis engine, then performs comprehensive correlation analysis, and returns the results to the service system.
Preferably, each type of analysis engine performs reactive intelligent analysis on the technical points of the attack method according to the ATT & CK model, and meanwhile, the intelligent analysis engine subsystem integrates intelligent algorithm hardware to realize accelerated operation of the intelligent analysis engine subsystem.
Preferably, the intelligent analysis engine subsystem provides intelligent analysis for various network threats based on machine learning technology.
The invention also provides an intelligent analysis platform based on the big data framework, which is designed by the method.
The invention also provides a working method of the intelligent analysis platform.
(III) advantageous effects
According to the design method of the intelligent analysis platform based on the big data framework, provided by the invention, according to a big data processing and analyzing framework and by combining an artificial intelligence energized safety analysis tool, the safety big data is collected, the safety event is mined, the safety threat is sensed, early warning and precaution are carried out in advance, the intelligent analysis of the network safety situation is realized, and the intelligent safety protection level is improved. The intelligent analysis platform based on the big data frame provided by the invention adopts the big data frame on the technical system, converges security situation data such as network flow, user behaviors, network boundaries, a service system, host endpoints and the like, adopts artificial intelligence technologies such as machine learning, deep learning, natural language processing, knowledge graph spectrum and the like, realizes fusion analysis and deep mining of the situation data, realizes real-time perception of security threat events and visual presentation of network security situations, and improves the continuous guarantee capability of security in a network space.
Drawings
FIG. 1 is a logical relationship diagram of an intelligent analysis platform of the present invention;
FIG. 2 is a diagram of an intelligent analysis platform service architecture of the present invention;
FIG. 3 is a diagram of the intelligent analysis framework subsystem architecture of the present invention.
Detailed Description
In order to make the objects, contents, and advantages of the present invention clearer, the following detailed description of the embodiments of the present invention will be made in conjunction with the accompanying drawings and examples.
According to the design method of the intelligent analysis platform based on the big data framework, provided by the invention, according to a big data processing and analyzing framework and by combining an artificial intelligence energized safety analysis tool, the safety big data is collected, the safety event is mined, the safety threat is sensed, early warning and precaution are carried out in advance, the intelligent analysis of the network safety situation is realized, and the intelligent safety protection level is improved. The intelligent analysis platform based on the big data frame provided by the invention adopts the big data frame on the technical system, converges security situation data such as network flow, user behaviors, network boundaries, a service system, host endpoints and the like, adopts artificial intelligence technologies such as machine learning, deep learning, natural language processing, knowledge graph spectrum and the like, realizes fusion analysis and deep mining of the situation data, realizes real-time perception of security threat events and visual presentation of network security situations, and improves the continuous guarantee capability of security in a network space.
The invention integrates a large data processing frame and an artificial intelligence frame into a whole, constructs a processing flow which can contain multidimensional data, multistage ladder data aggregation and hierarchical calculation, completes intelligent threat detection and suspicious event analysis and screening based on machine learning and other modeling, and then reaches a comprehensive analysis processing platform based on platform computing power, such as correlation analysis, cluster analysis, statistical analysis and the like.
The invention designs an intelligent analysis platform based on a big data frame, which comprises the following four parts: the method adopts a message bus mode to interact with the outside, and data access provides structured and unstructured data convergence of log data, flow metadata, asset data, threat intelligence, safety situation data, operation and maintenance situation data and the like. Big data service: the method provides services such as load balancing, data routing, big data acquisition, big data preprocessing, big data storage and the like for an intelligent analysis engine, data access, an intelligent analysis framework and the like, and provides comprehensive situation data services for a situation presentation system. Intelligent analysis framework: the intelligent analysis framework provides analysis engine installation, engine configuration management, engine state analysis and engine basic operation environment, supports engine integration and expansion, automatically allocates engine resources, and monitors the state and the operation state of the engine resources in real time. An intelligent analysis engine: the intelligent analysis engine provides behavior analysis services based on artificial intelligence, and the services comprise malicious code analysis, abnormal behavior analysis, abnormal traffic analysis, encrypted traffic analysis and comprehensive correlation analysis engine.
The invention designs a big data framework-based logic relationship diagram in an intelligent analysis platform, which is shown in figure 1 and comprises the following parts:
(1) data access subsystem
The data access subsystem supports the access of structured data and unstructured data, including weblogs, security logs, terminal logs and service logs. The network log comprises a flow session, an application behavior, file transmission, a login account and the like; the safety log comprises log information generated by entities such as network equipment, a host, a database, middleware, virtual equipment, an application system, gateway equipment and the like; the terminal log comprises file behavior, process behavior, mail behavior, registry access behavior and the like; the service log comprises service login, service query, transaction record, application information and the like. Interact with the outside using a distributed message bus. The data access subsystem realizes data access state monitoring, scheduling and safety audit through a data access management technology.
(2) Big data service subsystem
The first step of the big data service subsystem is to collect data from the data access subsystem and perform preprocessing operation, provide a uniform high-quality data set for subsequent processes, perform preliminary analysis and organization on the data, including characteristic extraction, relationship analysis, compliance analysis, model analysis and the like, provide functions of load balancing, data routing and the like for application of big data, and realize big data storage through a distributed message bus and a data service bus. The big data service subsystem simultaneously comprises the storage of original data and the storage of structured and unstructured data, and provides a persistence service through a distributed message bus and a data service bus so as to support a subsequent deeper data analysis process. To increase data throughput and reduce storage costs, a distributed architecture is typically employed to store large data. The adopted big data calculation model and framework have the characteristics of distribution and high fault tolerance, and provide real-time calculation, query and index. The method can perform continuous large data flow calculation, and comprises the functions of task assignment, task scheduling, task acquisition, task execution, task submission and the like. The computing framework is the core of the whole system, and coordinates functions of data access, data organization, data storage, data analysis, data service and the like through a distributed message bus and a data service bus.
(3) Intelligent analysis framework subsystem
Aiming at different analysis engines in the intelligent analysis engine subsystem, the intelligent analysis framework subsystem provides an engine basic operation environment and can realize the functions of analysis engine installation, engine configuration management and engine state analysis. The functions of supporting engine integration and expansion, automatically distributing engine resources, monitoring the state and the running state of the engine resources in real time and the like are realized. The intelligent analysis framework subsystem aims at constructing a set of standard engine integration framework, conveniently integrating, expanding and managing different intelligent analysis engines according to the standard by means of the set of framework, supporting basic operation environments and automatic resource allocation of different engines, and providing basic capabilities of engine management and control, engine configuration modification, engine state management and control and the like.
(4) Intelligent analysis engine subsystem
The intelligent analysis engine subsystem operates the environment on the basis of the intelligent analysis framework subsystem, accesses the multi-source data processed by the big data service subsystem according to a standard interface provided by the intelligent analysis framework subsystem, and provides behavior analysis service based on artificial intelligence. The intelligent analysis engine subsystem is internally provided with five analysis engines, namely a malicious code analysis engine, an abnormal behavior analysis engine, an abnormal flow analysis engine, an encrypted flow analysis engine and a comprehensive correlation analysis engine, at the beginning, and each engine analyzes a data log obtained from a front-end probe from different technical dimensions, so that various threat behaviors are detected and identified. The system adopts loose coupling and modular design, and can expand more analysis engines. The intelligent analysis engine subsystem provides intelligent analysis for various network threats based on AI intelligent technologies such as machine learning and the like, and can effectively discover known threats, known threat variants and unknown threats. In each type of analysis engine, the technical points of the attack method can be subjected to antagonistic intelligent analysis according to an ATT & CK model, and meanwhile, an intelligent analysis engine subsystem can integrate intelligent algorithm hardware to realize accelerated operation of the intelligent analysis engine subsystem.
The intelligent analysis platform based on big data designed by the invention is a set of intelligent data analysis software and hardware integrated equipment capable of running on domestic hardware, can use various data to realize intelligent mining and analysis through intelligent analysis engines based on machine learning, deep learning and the like, forms high-accuracy security event alarm and situation data with complete coverage, and has a specific logical architecture as shown in figure 2. The scheme design strictly follows the overall design target and the design principle, and is designed hierarchically according to the technical architecture, the functional architecture and the interface design. The whole set of platform consists of a plurality of software-hardware integrated server devices and is divided into a hardware platform, a data access part, a big data service part, an intelligent analysis framework and an intelligent analysis engine.
An intelligent analysis platform based on a big data frame is based on hadoop as technical model selection; the analysis engine with different functions is designed by collecting structured and unstructured data for preprocessing, storing, calculating, utilizing a big data management framework, and utilizing an artificial intelligence algorithm according to business, so that the analysis result is presented. The specific design method of each subsystem is as follows:
the data access subsystem is used as the bottommost layer of the system, is directly accessed and collected with a data source and provides data for the intelligent analysis engine subsystem, wherein the accessed data is divided into structured data and unstructured data. Structured data is transmitted using sqoop, which serves as a data collection tool to transmit data between the source data and the structured data store in the RDBMS. For unstructured data and semi-structured data, the adoption of flash provides better support for collection of log data and event data, and meanwhile, kafka is used as a message subscription system and matched with flash as a data source for implementing data processing. The data access subsystem mainly takes data scheduling, data access state monitoring and data security auditing as main functions, the DAG workflow scheduling system mainly serves a plurality of business operations, the flow between the operations depends on a complex scene, and in the subsystem, data acquisition task scheduling often depends on proper workflow scheduling to complete auxiliary data acquisition work. The data access state monitoring adopts a Zabbix open source monitoring system to monitor the data access state, completes the user to know the state of the data, and further adjusts and perfects the access strategy. For the security audit of the access data, the subsystem adopts the DAS technology to perform first-layer screening and filtering on the access data, and simultaneously performs preliminary filtering and screening on the access of malicious data and dirty data.
High-quality data are essential elements for obtaining effective results through an artificial intelligence algorithm, and most of the acquired data are incomplete, inconsistent in structure and noisy dirty data and cannot be directly used for data analysis and mining. The big data service subsystem serves as a bridge between data acquisition and data calculation in the whole framework. After the data are collected, the generated data are classified by adopting MapReduce, and the data from different collection devices are sorted, cleaned, converted and subjected to data specification, so that the data are preprocessed. After data preprocessing, the data storage is mainly HDFS, an HDFS cluster is composed of a Namenode and data of certain data, and the Namenode is used as a central server and is responsible for managing an addressing path of a file name space; the Datanone is an actual storage contact, data is stored on the Datanone in a Block form, a plurality of Namenodes are used as hot backup through a Zookeeper, and new Namenodes are generated through election after the Namenodes are hung up to realize high availability; hbase is a data storage technology with a Master/Slave architecture, and a client acquires required data through Zookeeper each time and then directly queries other communication. The platform takes Yarn as a scheduling basis, and calculated original data and a calculation result are stored on the HDFS; the same load balancing based on Hbase storage can effectively utilize computer resources to carry out data interaction and reasonable scheduling distribution, each data unit is dynamically recorded by utilizing a data joint, the total data amount is updated in real time, the load index of each data unit is calculated at the same time, then the load indexes of the joints are gathered and sent to a management server, and then the resources are redistributed by the nodes, so that the load balancing of the computing resources is achieved. In order to read and calculate the stored data more efficiently, the data is fragmented and a data route is established, and then an Elaticearch search engine is established to facilitate data search.
Aiming at different analysis engines in the intelligent analysis engine subsystem, the intelligent analysis framework subsystem provides management of the basic operation environment of the engines and can realize the functions of analysis engine installation, engine configuration management and engine state analysis. The functions of supporting engine integration and expansion, automatically distributing engine resources, monitoring the state and the running state of the engine resources in real time and the like are realized. For basic operation environments, such as TensorFlow, deep Mind Lab and the like, integration and unified management are carried out, and development languages mainly comprise Python and C + +. The customized and modularized packaging of the algorithm is realized through an open source algorithm framework according to the service requirement of the algorithm, and each algorithm engine adopts a uniform data interface, so that higher flexibility is provided. Meanwhile, the intelligent analysis framework monitors the running state and the resource occupation condition of the current engine.
The big data intelligent analysis engine subsystem is deployed on the big data intelligent analysis framework subsystem, and intelligent analysis is performed by using a basic development environment provided by the intelligent analysis framework subsystem. The intelligent analysis engine subsystem is divided into five engines, namely a malicious code analysis engine, an abnormal behavior analysis engine, an abnormal flow analysis engine, an encryption flow analysis engine and a comprehensive correlation analysis engine. The malicious code analysis engine analyzes static and dynamic malicious codes by using machine learning algorithms such as a decision tree and a random forest, and classifies malicious families. The abnormal behavior analysis engine analyzes log data and threat information data, and classifies attack behavior data such as mail threats, webpage horse hanging threats, microblog attack threats, port scanning threats, virus slow blasting threats, communication behavior threats and the like, and the classification algorithm is mainly a machine learning algorithm such as an SVM (support vector machine), a simulated annealing algorithm and the like. The abnormal flow analysis engine analyzes the flow metadata, and analyzes a lost host, C & C (command control), DGA domain name attack, transverse movement attack, data leakage, base line, protocol abnormity and a hidden tunnel, wherein the analysis algorithm mainly takes a CNN (convolutional neural network) algorithm, an SVM (support vector machine) and the like as the main components. The encryption flow analysis engine analyzes a malicious code encryption channel, encryption application and an SSL channel, and the main algorithm of analysis is decision tree and random forest. And the comprehensive correlation analysis engine collects and summarizes results of the malicious code analysis engine, the abnormal flow analysis engine and the encrypted flow analysis engine, then performs comprehensive correlation analysis, and returns the results to the service system.
The specific working process of the intelligent analysis platform comprises the following steps: the method comprises the steps of utilizing an external data acquisition module to acquire different data sources, utilizing a data access subsystem to manage acquired structured data and unstructured data, and enabling the data access subsystem to have safety audit, state monitoring and data access scheduling functions. The big data service subsystem is responsible for preprocessing the acquired data and then storing the data in the distributed database to facilitate query and call, wherein the big data service subsystem provides data routing and load balancing services and helps to share and optimize the data. The intelligent analysis framework subsystem is a module which is mainly proposed by the invention and aims to establish a bridge between data storage and data calculation and decouple functions of data management, resource management and the like in the intelligent analysis engine subsystem. The intelligent analysis framework subsystem provides basic operation environments including various machine learning frameworks, compiling environments and the like, and meanwhile, the intelligent analysis framework subsystem comprises an integrated extension module, a configuration management module, an engine state analysis module and a computing resource management module, and provides visual services for extension, configuration management, operation state analysis and resource allocation of the intelligent analysis engine subsystem. The intelligent analysis engine subsystem is used as an intelligent brain of the system, calculates the preprocessed data, provides a malicious code analysis engine, an abnormal behavior analysis engine, an abnormal flow analysis engine and an encrypted flow analysis engine, collects the analysis results of the analysis engines, performs comprehensive correlation analysis, and finally provides the results to service business.
The invention realizes an overall design method facing mass data processing, based on a big data architecture, realizes structured and unstructured multi-source data aggregation, and realizes a complete processing whole process of multi-source data collection, suspected threat screening, threat data summarization, detection data analysis and analysis result display aiming at data aggregation, calculation, analysis and display faced by the intelligent network threat discovery service.
The invention provides a sustainable expansion-oriented analysis capability design method, which provides a set of complete threat detection analysis application platform framework based on analysis technologies such as machine learning, big data modeling and the like, the platform is based on various component libraries and components, follows the design of systematization, layering and iteration processes, integrates the specific service application characteristics of customers, realizes the centralized and unified processing of multi-source heterogeneous data, provides an open application interface according to the requirements of openness, transportability, compatibility, expandability and the like, can conveniently interconnect with the application systems of the same type of other manufacturers through software and hardware platforms, and facilitates the future expansion of the system.
The invention provides an intelligent analysis capability design method facing a threat model. With the knowledge popularization of the relevant technologies of attack and defense confrontation, threat detection and analysis are getting closer to various technologies used by attackers, and are also getting more comprehensive and systematic. Therefore, when the detection capability point of the intelligent analysis engine is designed, the KillChain model and the ATT & CK model are mainly considered, so that the detection capability point can cover various threat attack methods, the systematic continuous accumulation of the detection capability can be realized, and the full-chain and global coverage of the threat detection point is realized.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims (10)

1. A design method of an intelligent analysis platform based on a big data frame is characterized in that the method designs the intelligent analysis platform to comprise a data access subsystem, a big data service subsystem, an intelligent analysis frame subsystem and an intelligent analysis engine subsystem;
the data access subsystem supports the access of structured data, unstructured data and semi-structured data;
the big data service subsystem collects data from the data access subsystem and carries out preprocessing operation, provides a uniform data set for subsequent processes, carries out preliminary analysis and organization on the data, comprises characteristic extraction, relation analysis, compliance analysis and model analysis, provides load balancing and data routing functions for application of the big data, realizes big data storage through a distributed message bus and a data service bus, simultaneously comprises the functions of storing original data, storing structured and unstructured data, provides persistence service through the distributed message bus and the data service bus to support subsequent deeper data analysis processes, adopts a big data calculation model and a frame to provide real-time calculation, query and index, and can carry out continuous big data flow calculation, the method comprises the following functions of task assignment, task scheduling, task acquisition, task execution and task submission;
aiming at different analysis engines in an intelligent analysis engine subsystem, an intelligent analysis frame subsystem provides an engine basic operation environment, can realize the functions of analysis engine installation, engine configuration management and engine state analysis, supports engine integration and expansion, automatically allocates engine resources, monitors the engine resource state and the operation state in real time, and aims at constructing a set of standard engine integration frame, conveniently integrates, expands and manages different intelligent analysis engines according to the standard by relying on the set of frame, and provides basic capabilities of engine management and control, engine configuration modification and engine state management and control while supporting different engine basic operation environments and automatic resource allocation;
the intelligent analysis engine subsystem operates the environment on the basis of the intelligent analysis framework subsystem, accesses the multi-source data processed by the big data service subsystem according to a standard interface provided by the intelligent analysis framework subsystem, and provides behavior analysis service based on artificial intelligence.
2. The method of claim 1, wherein the data access subsystem implements data access status monitoring, scheduling, and security auditing through data access management techniques.
3. The method of claim 1, wherein the structured data is transmitted using sqoop, which is used as a data collection tool to transmit data between source data and structured data stores in the RDBMS, and wherein for unstructured and semi-structured data, Flume is used to support collection of log and event data.
4. The method of claim 1, wherein the big data service subsystem classifies data by MapReduce, performs sorting, cleaning, conversion and data specification on data from different acquisition devices, and completes preprocessing of data, after data preprocessing, data storage is mainly HDFS, an HDFS cluster is composed of a Namenode and dataode of certain data, and the Namenode is used as a central server to manage an addressing path of a file namespace; the data is an actual storage contact, the data is stored in a Block form, a plurality of Nanonodes are used as hot backup through a Zookeeper, new Nanonodes are generated by election after the Nanonodes are hung, Yann is used as a scheduling basis, calculated original data and calculation results are stored on an HDFS (Hadoop distributed file system), each data unit is dynamically recorded by using the data nodes, the total data amount is updated in real time, the load index of each data unit is calculated at the same time, then the load indexes of the nodes are collected and sent to a management server, then the nodes redistribute resources, the data are fragmented and a data route is established, and an Elaticearch search engine is established to facilitate data search.
5. The method of claim 1, wherein five analysis engines, namely a malicious code analysis engine, an abnormal behavior analysis engine, an abnormal flow analysis engine, an encrypted flow analysis engine and a comprehensive association analysis engine, are initially built in the intelligent analysis engine subsystem, each analysis engine analyzes data logs acquired from a front-end probe from different technical dimensions so as to detect and identify various threat behaviors, the system adopts a loose coupling and modular design and can expand more analysis engines, and the intelligent analysis engine subsystem provides intelligent analysis for various network threats on the basis of an AI intelligent technology and can find known threats, known threat variants and unknown threats.
6. The method of claim 5, wherein the malicious code analysis engine analyzes static and dynamic malicious code using machine learning algorithms such as decision trees and random forests, and classifies malicious families; the abnormal behavior analysis engine analyzes log data and threat intelligence data and classifies attack behavior data such as mail threat, webpage horse hanging threat, microblog attack threat, port scanning threat, virus slow blasting threat and communication behavior threat; the abnormal flow analysis engine analyzes the flow metadata, and analyzes a lost host, command control, DGA domain name attack, transverse movement attack, data leakage, base line, protocol abnormity and a hidden tunnel; the encryption flow analysis engine analyzes the malicious code encryption channel, the encryption application and the SSL channel; and the comprehensive correlation analysis engine collects and summarizes results of the malicious code analysis engine, the abnormal flow analysis engine and the encrypted flow analysis engine, then performs comprehensive correlation analysis, and returns the results to the service system.
7. The method of claim 5, wherein each type of analysis engine performs reactive intelligent analysis on the technical points of the attack method according to the ATT & CK model, and the intelligent analysis engine subsystem integrates intelligent algorithm hardware to realize accelerated operation of the intelligent analysis engine subsystem.
8. The method of claim 1, wherein the intelligent analysis engine subsystem provides intelligent analysis of various types of cyber threats based on machine learning techniques.
9. An intelligent analysis platform based on big data framework, which is designed by the method of any one of claims 1 to 8.
10. A method of operating an intelligent analysis platform as claimed in claim 9.
CN202110585911.5A 2021-05-27 2021-05-27 Intelligent analysis platform design method based on big data framework Active CN113347170B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110585911.5A CN113347170B (en) 2021-05-27 2021-05-27 Intelligent analysis platform design method based on big data framework

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110585911.5A CN113347170B (en) 2021-05-27 2021-05-27 Intelligent analysis platform design method based on big data framework

Publications (2)

Publication Number Publication Date
CN113347170A true CN113347170A (en) 2021-09-03
CN113347170B CN113347170B (en) 2023-04-18

Family

ID=77471875

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110585911.5A Active CN113347170B (en) 2021-05-27 2021-05-27 Intelligent analysis platform design method based on big data framework

Country Status (1)

Country Link
CN (1) CN113347170B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114416891A (en) * 2022-03-28 2022-04-29 支付宝(杭州)信息技术有限公司 Method, system, apparatus and medium for data processing in a knowledge graph
CN114791893A (en) * 2021-12-15 2022-07-26 许磊 Serialization system for random data access
CN114896305A (en) * 2022-05-24 2022-08-12 内蒙古自治区公安厅 Smart internet security platform based on big data technology
CN115118525A (en) * 2022-08-23 2022-09-27 天津天元海科技开发有限公司 Internet of things safety protection system and protection method thereof
CN115174154A (en) * 2022-06-13 2022-10-11 盈适慧众(上海)信息咨询合伙企业(有限合伙) Advanced threat event processing method and device, terminal equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107733986A (en) * 2017-09-15 2018-02-23 中国南方电网有限责任公司 Support the protection of integrated deployment and monitoring operation big data support platform
US20180234445A1 (en) * 2017-02-15 2018-08-16 Hewlett Packard Enterprise Development Lp Characterizing Behavior Anomaly Analysis Performance Based On Threat Intelligence
CN108595473A (en) * 2018-03-09 2018-09-28 广州市优普计算机有限公司 A kind of big data application platform based on cloud computing
CN108769048A (en) * 2018-06-08 2018-11-06 武汉思普崚技术有限公司 A kind of secure visualization and Situation Awareness plateform system
US20190236485A1 (en) * 2018-01-26 2019-08-01 Cisco Technology, Inc. Orchestration system for distributed machine learning engines
US20200293915A1 (en) * 2019-03-17 2020-09-17 Phizzle, Inc. Dynamically updateable rules engine
US20210044955A1 (en) * 2019-08-08 2021-02-11 Samsung Electronics Co., Ltd. Method, system and device for sharing intelligence engine by multiple devices

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180234445A1 (en) * 2017-02-15 2018-08-16 Hewlett Packard Enterprise Development Lp Characterizing Behavior Anomaly Analysis Performance Based On Threat Intelligence
CN107733986A (en) * 2017-09-15 2018-02-23 中国南方电网有限责任公司 Support the protection of integrated deployment and monitoring operation big data support platform
US20190236485A1 (en) * 2018-01-26 2019-08-01 Cisco Technology, Inc. Orchestration system for distributed machine learning engines
CN108595473A (en) * 2018-03-09 2018-09-28 广州市优普计算机有限公司 A kind of big data application platform based on cloud computing
CN108769048A (en) * 2018-06-08 2018-11-06 武汉思普崚技术有限公司 A kind of secure visualization and Situation Awareness plateform system
US20200293915A1 (en) * 2019-03-17 2020-09-17 Phizzle, Inc. Dynamically updateable rules engine
US20210044955A1 (en) * 2019-08-08 2021-02-11 Samsung Electronics Co., Ltd. Method, system and device for sharing intelligence engine by multiple devices

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
李艳斐等: "基于海量安全数据的专家分析系统的功能设计与实现", 《网络安全技术与应用》, no. 10, 15 October 2017 (2017-10-15) *
黄河清: "基于工作流及大数据的学习流引擎的构建与实现", 《安阳师范学院学报》, no. 02, 15 April 2018 (2018-04-15) *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114791893A (en) * 2021-12-15 2022-07-26 许磊 Serialization system for random data access
CN114416891A (en) * 2022-03-28 2022-04-29 支付宝(杭州)信息技术有限公司 Method, system, apparatus and medium for data processing in a knowledge graph
CN114896305A (en) * 2022-05-24 2022-08-12 内蒙古自治区公安厅 Smart internet security platform based on big data technology
CN115174154A (en) * 2022-06-13 2022-10-11 盈适慧众(上海)信息咨询合伙企业(有限合伙) Advanced threat event processing method and device, terminal equipment and storage medium
CN115118525A (en) * 2022-08-23 2022-09-27 天津天元海科技开发有限公司 Internet of things safety protection system and protection method thereof

Also Published As

Publication number Publication date
CN113347170B (en) 2023-04-18

Similar Documents

Publication Publication Date Title
CN113347170B (en) Intelligent analysis platform design method based on big data framework
CN107196910B (en) Threat early warning monitoring system, method and deployment framework based on big data analysis
CN108270785B (en) Knowledge graph-based distributed security event correlation analysis method
CN108494810B (en) Attack-oriented network security situation prediction method, device and system
Khare et al. Big data in IoT
CN108306756B (en) Holographic evaluation system based on power data network and fault positioning method thereof
CN106778253A (en) Threat context aware information security Initiative Defense model based on big data
CN108039959A (en) Situation Awareness method, system and the relevant apparatus of a kind of data
CN102474431B (en) Identification of underutilized network devices
CN102984140B (en) Malicious software feature fusion analytical method and system based on shared behavior segments
CN116662989B (en) Security data analysis method and system
CN107332685A (en) A kind of method based on big data O&M daily record applied in state's net cloud
CN113642023A (en) Data security detection model training method, data security detection device and equipment
CN110392039A (en) Network system events source tracing method and system based on log and flow collection
CN112738040A (en) Network security threat detection method, system and device based on DNS log
CN113542074B (en) Method and system for visually managing east-west network flow of kubernets cluster
CN110334119A (en) A kind of data correlation processing method, device, equipment and medium
CN114430331A (en) Network security situation sensing method and system based on knowledge graph
CN114726654A (en) Data analysis method and server for coping with cloud computing network attack
CN111934954A (en) Broadband detection method and device, electronic equipment and storage medium
Sujatha Improved user navigation pattern prediction technique from web log data
CN112651872A (en) Community comprehensive treatment system and method based on data middlebox
CN102903009B (en) Malfunction diagnosis method based on generalized rule reasoning and used for safety production cloud service platform facing industrial and mining enterprises
CN115664703A (en) Attack tracing method based on multi-dimensional information
CN109308290A (en) A kind of efficient data cleaning conversion method based on CIM

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant