CN111858251A

CN111858251A - Big data computing technology-based data security audit method and system

Info

Publication number: CN111858251A
Application number: CN202010713842.7A
Authority: CN
Inventors: 刘迎风; 冯桂安; 梁满; 冯骏; 何怡; 傅行晓; 周亚美
Original assignee: Shanghai Big Data Center
Current assignee: Shanghai Big Data Center
Priority date: 2020-07-22
Filing date: 2020-07-22
Publication date: 2020-10-30
Anticipated expiration: 2040-07-22
Also published as: CN111858251B

Abstract

The invention discloses a data security auditing method and system based on big data computing technology, belonging to the field of big data security, and comprising the steps of collecting log data of a server and sending the log data to an first-class processing platform; receiving one or more log data, analyzing, and sending the analyzed log data to at least one data target place; classifying the analyzed log data, judging whether the log data is real-time data or non-real-time data, and sending the real-time data to a stream processing platform for storage; sending the non-real-time data to a data center for storage; analyzing and processing the log data respectively to obtain an analysis result; and generating and outputting corresponding alarm information according to the analysis result. The invention has the beneficial effects that: log collection and storage are realized based on the flash, task scheduling and task monitoring are introduced, and flash log collection sources and output sources are enriched; and data security audit, alarm monitoring management and processing and security risk identification are realized based on the flink.

Description

Big data computing technology-based data security audit method and system

Technical Field

The invention relates to the field of big data security, in particular to a data security auditing method and system based on big data computing technology.

Background

In recent years, data security audit systems have become more important, and the data audit systems are mainly used for monitoring and recording various operation behaviors of a data server, analyzing network data, intelligently analyzing various operations of the data server in real time, and recording the operations into an audit database for query, analysis and filtration in the future, so that monitoring and audit of user operations of a target data audit system are realized, and particularly, when public data resources of various industries are integrated and utilized, the data security audit systems are urgently needed to provide guarantees for security application, shared exchange and opening of data.

In the existing data circulation use process, due to the lack of audit safeguard measures, the query mode for safety events in work is conditional screening by means of a large number of manual workers, retrieval is carried out in a massive log library, the audit efficiency is low, the result is greatly interfered by human factors, so that the problems of untimely audit, insufficient audit strength and the like exist, the requirement of data safety audit cannot be met, the safety risk exists in the data circulation use process, the traditional big data calculation method is limited by related constraints of disk read-write performance and network performance, and the data safety audit method and the data safety audit system based on the big data calculation technology are not highly efficient in the aspects of query, calculation, storage and the like of real-time data, so that the data safety audit method and the data safety audit system based on the big data calculation technology are urgently needed to be designed to meet the requirement of actual use.

Disclosure of Invention

In order to solve the technical problems, the invention provides a data security audit method and a data security audit system based on a big data computing technology.

The technical problem solved by the invention can be realized by adopting the following technical scheme:

the invention provides a data security auditing method based on big data computing technology, which comprises the following steps:

step S1, collecting the log data of the server, and sending the collected log data to an first-stream processing platform;

step S2, receiving one or more log data in the stream processing platform, analyzing the log data, outputting the analyzed log data and sending the log data to at least one data target place;

step S3, classifying the analyzed log data, and determining whether the log data is real-time data or non-real-time data:

if the data is real-time data, sending the data to the stream processing platform for storage;

if the data is non-real-time data, sending the data to a data center for storage;

step S4, according to the classification in the step S3, analyzing the log data respectively to obtain an analysis result, and outputting the analysis result;

and step S5, generating and outputting corresponding alarm information according to the analysis result.

Preferably, in the step S1, during the log data collection process, the collection status and the collection amount of the stream processing platform and the log data are continuously managed and monitored.

Preferably, the real-time data is analyzed and processed online, and the non-real-time data is analyzed and processed offline;

the online analysis step comprises:

step A1, classifying the real-time data and storing the real-time data in a cluster of the stream processing platform, wherein the cluster comprises a global event and at least one internal event;

step A2, performing real-time correlation analysis on the global event and at least one internal event;

step a3, determine whether it is an internal event:

if yes, go to step A4;

if not, generating the internal event and storing the internal event in one of the internal events of the cluster;

step A4, when it is judged that debugging and monitoring are needed, outputting a first analysis result;

the offline analyzing step comprises:

step B1, storing offline rules in advance, and issuing the offline rules to the stream processing platform;

step B2, receiving the offline rule, and calling the log data of the data center according to the offline rule;

step B3, performing batch analysis on the non-real-time log data, outputting a second analysis result and issuing the second analysis result to the stream processing platform;

and step B4, receiving the second analysis result and sending the second analysis result to the document database.

Preferably, in step S2, at least one parsing node parses the log data, and the parsing step is as follows:

step 21: initializing the log data;

step 22: extracting effective log information from the log data;

step 23: and processing the log information to obtain the log data of at least one data type, and respectively sending the log data to at least one data target place.

Preferably, in step S1, the log collection system is controlled to collect the log data by performing a functional configuration with the log collection system, where the functional configuration includes collection frequency, collection time period, and on and off of a task.

The invention also provides a data security auditing system based on the big data computing technology, which is applied to the data security auditing method based on the big data computing technology, and comprises the following steps:

the task scheduling module is connected with the log acquisition system and used for acquiring log data of the server and sending the acquired log data to the first-class processing platform;

the analysis module is connected with the stream processing platform and used for receiving one or more log data in the stream processing platform, analyzing the log data, outputting the analyzed log data and sending the analyzed log data to at least one data target place;

the audit analysis module is connected with the analysis module and used for classifying the analyzed log data and judging whether the log data is real-time data or non-real-time data:

the audit analysis module analyzes and processes the log data to obtain an analysis result and outputs the analysis result;

and the alarm module is connected with the audit analysis module and used for generating and outputting corresponding alarm information according to the analysis result.

Preferably, the data security audit system further includes a monitoring module, which is respectively connected to the log collection system and the stream processing platform, and is configured to continuously manage and monitor collection conditions and collection amounts of the stream processing platform and the log collection system during the log data collection process.

Preferably, the audit analysis module comprises:

the online analysis engine is connected with the stream processing platform and is used for performing real-time correlation analysis on the global events and the plurality of internal events of the stream processing platform and outputting a first analysis result;

and the offline analysis engine is connected with the data center and used for calling the log data in the data center according to the issued offline rule, carrying out batch analysis on the non-real-time log data and outputting a second analysis result.

Preferably, the alarm module comprises:

the first alarm unit is connected with the online analysis engine and used for generating and outputting corresponding first alarm information according to the first analysis result;

and the second alarm unit is connected with the offline analysis engine and used for generating and outputting corresponding second alarm information according to the second analysis result.

Preferably, the parsing module includes a plurality of parsing nodes, and each parsing node is correspondingly provided with a parser for initializing the log data, extracting effective log information from the log data, obtaining the log data of at least one data type according to the log information, and sending the log data to at least one data destination respectively.

The invention has the beneficial effects that:

the log collection and storage capacity is realized by the log collection and storage capacity based on the open source frame log collection system (flash), the log collection system is subjected to function iteration, the concepts of task scheduling and task monitoring are introduced, and the log collection source and the output target place of the log collection system are enriched; the development of functional modules such as data security audit capability, alarm monitoring management and processing capability, security risk identification capability access and the like is realized through modeling based on an open source assembly stream processing engine (flink), the construction of a data security audit system is realized, security audit is carried out in the whole life cycle of data acquisition, transmission, storage, processing, exchange and destruction, relatively comprehensive security management service is provided for a large data resource platform, and the normal use of data circulation is ensured; meanwhile, the system can continuously check, find and early warn various abnormal and illegal behaviors in the service supporting system, timely find the confidential operation event, accurately and quickly position the operator of the confidential event, and store the relevant evidence which can be used for pursuing accountability.

Drawings

FIG. 1 is a flow chart of a data security auditing method based on big data computing technology in the present invention;

FIG. 2 is a flow chart of log data parsing in the present invention;

FIG. 3 is a flow chart of an on-line analysis in the present invention;

FIG. 4 is a flow chart of an off-line analysis in the present invention;

FIG. 5 is a block diagram of a task scheduling and monitoring architecture according to the present invention;

FIG. 6 is a schematic diagram of the operation of the stream processing engine (Flink) according to the present invention;

FIG. 7 is a block diagram of the flow of an online policy in the present invention;

FIG. 8 is a block diagram of the flow of an offline policy in the present invention;

FIG. 9 is a block diagram of a data security audit system according to the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict.

The invention is further described with reference to the following drawings and specific examples, which are not intended to be limiting.

The invention provides a data security auditing method based on big data computing technology, which belongs to the field of big data security, and as shown in figures 1 and 5, the data security auditing method comprises the following steps:

if the data is real-time data, sending the data to a stream processing platform for storage;

step S4, according to the classification in step S3, the log data are respectively analyzed and processed to obtain an analysis result, and the analysis result is output;

Specifically, the function configuration is performed between the web end and the log acquisition system 1, the log acquisition system 1 is controlled to gather log data in batches, the log data acquired by the log acquisition system 1 is called, and the log data is sent to the stream processing platform to be issued.

Further, in this embodiment, the log collection System 1 provided by the present invention is a Distributed, highly reliable, and highly available collection System, and the log collection System 1 is based on an open source framework flash, and can collect, aggregate, and move a large amount of log data of different data sources to a data center (Hadoop Distributed File System) for storage.

The open source stream processing platform can adopt Apache Kafka and is written by Scala and Java. Kafka is a distributed publish-subscribe messaging system with high throughput that can handle all action flow data of a consumer in a web site. Kafka unifies online and offline message processing through a parallel loading mechanism of a Hadoop Distributed File System (HDFS for short), and can also provide real-time messages through clustering.

Specifically, a certain one of the stream processing platforms or a plurality of log data subscribed in batch is read in a subscription mode, and the read log data is respectively analyzed, wherein the analysis method comprises the following steps: multilevel JSON flat conversion, irregular text regular analysis and database table field mapping; the method comprises the steps of converting and outputting analyzed log data, and sending the converted and output log data to at least one different data target place, wherein the data target place comprises an open source flow processing platform (Kafka), a distributed file system (HDFS), a Lucene-based Search server (Elastic Search), httpfs, an open source database (hbase), a file, a relational database and the like.

Specifically, in the log auditing process, log data are classified, classified and stored according to whether the log data are real-time data or non-real-time data, the real-time log data are sent to a stream processing platform for storage, and the non-real-time log data are sent to a data center for storage;

and auditing and analyzing the log data by respectively calling real-time data in the stream processing platform or calling non-real-time data in the data center, obtaining an analysis result after analysis and processing, outputting the analysis result, and generating and outputting corresponding alarm information and/or monitoring and debugging information according to the analysis result.

And log auditing and safety identification warning adopt a calculation engine based on an open source component Apache flight. Flink is a streaming media technology computing engine implemented by java. Flink is very powerful, can process both stream data (stream data) and batch data (batch data), and can also have the functions of a general purpose computing engine (Spark) and Spark stream, but unlike the general purpose computing engine (Spark), Flink essentially has only the concept of stream, and batch is considered as special stream.

A further preferred embodiment, as shown in fig. 6, wherein Flink essentially comprises three components: JobClient, JobManager and TaskManager.

The user submits a Flink program to the JobClient, the JobClient sends the program to the JobManager, and the JobManager receives the Job program and then feeds back the program to the JobClient. The JobManager plans to execute the received job program, firstly, resources required by the job program are distributed, and the resources are mainly slots to be executed on the TaskManagers; after resource allocation, the JobManager submits an individual Task to the responding TaskManager. The TaskManager receives a task and generates a task for the thread to perform. When the state changes, such as starting a computation or completing a computation, it will be sent back to the JobManager to report the state of the Task at regular time. Once a job program is executed, JobManager returns task results to JobClient.

The invention realizes log collection and storage capacity based on flash, and realizes data security audit capacity, alarm monitoring management and processing capacity and security risk identification capacity access based on flash engine modeling; in the whole life cycle circulation process of data acquisition, transmission, storage, processing, exchange and destruction, safety audit is carried out, the normal use of the circulation of the data is ensured, the construction of a data safety audit system is realized, and relatively comprehensive safety management service is provided for a big data resource platform. Meanwhile, the system can find out secret-related operation events in time and accurately position event operators; various anomalies and violations in the business support system are inspected, discovered, and pre-warned, providing relevant evidence that can be used for pursuit.

As a preferred embodiment, in the data security auditing method, during the data security auditing process, during the log data collecting process, the log collecting system 1 and the stream processing platform (Kafka) are continuously managed and monitored, the collecting condition and collecting amount of the log data are monitored, and the log collecting and storing are monitored in real time, so that a user can know the collecting condition and collecting amount of the log data in real time.

As a preferred embodiment, the data security auditing method performs online analysis processing on real-time data and performs offline analysis processing on non-real-time data;

as shown in fig. 7, the online analyzing step includes:

step A1, classifying the real-time data and storing the real-time data in a cluster of a stream processing platform, wherein the cluster comprises a global event and at least one internal event;

step a3, determine whether it is an internal event:

if yes, go to step A4;

if not, generating an internal event and storing the internal event in one of the internal events of the cluster;

as shown in fig. 8, the offline analysis step includes:

the method comprises the steps of storing a plurality of offline rules in advance, issuing the offline rules through a stream processing platform, receiving the offline rules, calling log data of a data center according to the offline rules, carrying out batch processing analysis on non-real-time original log data through parameters of a configuration list DB and a base line DB, outputting a second analysis result, issuing the second analysis result to Kafka, subscribing and sending the second analysis result to a document database (ES). The off-line analysis can analyze past logs in batch and generate different alarm information according to different parameter configurations.

As a preferred embodiment, the data security auditing method is as shown in fig. 2, where in step S2, at least one parsing node parses log data respectively, and the parsing steps are as follows:

step 21: carrying out initialization processing on log data;

step 22: extracting effective log information from log data;

step 23: and processing the log information to obtain log data of at least one data type, and respectively sending the log data to at least one data target place.

Specifically, in this embodiment, the original log data is formatted, and effective log information is extracted from the text, so that the difficulty of parsing is reduced. Analyzing the extracted log information in a multi-level JSON flat conversion, irregular text regular analysis or database table field mapping mode, and dynamically completing the obtained log after analysis, wherein the completed content comprises regions and countries completed according to IP addresses.

As a preferred embodiment, the data security auditing method configures acquisition frequency, acquisition time period and the opening and closing of tasks with the log acquisition system 1, controls the condition of the log acquisition system 1 for acquiring log data in a web server by configuring parameters, controls the opening and closing of log acquisition tasks by configuring the time, the time period and the opening frequency of the task for opening and closing, and configures the acquisition frequency, the acquisition time period and the acquisition amount for controlling the condition of acquisition in the acquisition process.

The invention also provides a data security auditing system based on big data computing technology, which is applied to the data security auditing method based on big data computing technology, as shown in fig. 9, and comprises the following steps:

the task scheduling module 2 is connected with the log acquisition system 1 and used for acquiring log data of the server and sending the acquired log data to the first-class processing platform;

the analysis module 3 is connected with the stream processing platform and used for receiving one or more log data in the stream processing platform, analyzing the log data, outputting the analyzed log data and sending the analyzed log data to at least one data target place;

the audit analysis module 5 is connected with the analysis module 3 and used for classifying the analyzed log data and judging whether the log data is real-time data or non-real-time data:

the audit analysis module 5 analyzes and processes the log data to obtain an analysis result and outputs the analysis result;

and the alarm module 4 is connected with the audit analysis module 5 and used for generating and outputting corresponding alarm information according to the analysis result.

Specifically, in this embodiment, the data security audit system includes a task scheduling module 2, an analysis module 3, an audit analysis module 5, and an alarm module 4;

the task scheduling module 2 is used for controlling the log acquisition system 1 to acquire log data of the server by configuring acquisition frequency and acquisition time period with the log acquisition system 1 based on the flash frame and by starting and closing tasks, sending the acquired log data to the stream processing platform, performing function iteration on the flash and introducing a task scheduling concept;

and the analysis module 3 is used for subscribing one or more log data in the stream processing platform, analyzing the log data, performing multi-source output on the analyzed log data, and enriching flash log acquisition sources and output sources.

And the audit analysis module 5 is used for storing the analyzed log data and judging whether the log data is real-time data or non-real-time data:

the audit analysis module 5 calls the stored log data to perform audit analysis and then outputs an analysis result;

and the alarm module is used for generating and outputting corresponding alarm information according to the analysis result.

As a preferred embodiment, the data security audit system further includes a monitoring module 6, which is respectively connected to the log collection system 1 and the stream processing platform, and is configured to continuously manage and monitor collection conditions and collection amounts of the stream processing platform and the log collection system during a log data collection process.

In a preferred embodiment, the data security auditing system includes an audit analysis module 5:

the online analysis engine 51 is connected with the stream processing platform, is based on a Flink framework, and is used for performing real-time correlation analysis on the global events and the plurality of internal events of the stream processing platform to output a first analysis result;

and the offline analysis engine 52 is connected with the data center, is based on a Flink framework, and is used for calling original log data in the data center according to the issued offline rule, performing batch analysis on the non-real-time original log data, and outputting a second analysis result.

Specifically, the online analysis engine 51 and the offline analysis engine 52 are based on a Flink framework, where the Flink framework includes predefined window distributors, such as a rolling window, a sliding window, a conversation window, and a global window, and the real-time online analysis engine can create windows and perform windowing analysis on real-time log stream data, so as to generate an alarm signal and monitoring and debugging information in real time, and send the alarm signal and the monitoring and debugging information to an operator, so that the operator can process the alarm signal and the monitoring and debugging information in time, and loss is reduced.

In a preferred embodiment, the data security audit system, wherein the alarm module 4 comprises:

the first alarm unit 41 is connected to the online analysis engine 51, and configured to generate and output corresponding first alarm information according to the first analysis result;

and the second alarm unit 42 is connected to the offline analysis engine 52, and is configured to generate and output corresponding second alarm information according to the second analysis result.

As a preferred implementation manner, the data security audit system further includes an audit report module, which is respectively connected to the audit analysis module 5 and the alarm module, and is configured to generate a corresponding audit report according to the analysis result and the alarm information.

As a preferred embodiment, in the data security audit system, the parsing module 3 includes a plurality of parsing nodes, each parsing node is correspondingly provided with a Parser host as a Parser, and the Parser host is configured to initialize log data, extract effective log information from the log data, obtain log data of at least one data type according to the log information, and respectively send the log data to at least one data destination.

Specifically, the log information is processed to obtain data of different data types, and the data is sent to a data target, wherein the data target comprises various output sources such as an elastic search, an HBse/HDFS, a Druid and a CVS.

The invention has the beneficial effects that:

While the invention has been described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention.

Claims

1. A data security audit method based on big data computing technology is characterized by comprising the following steps:

2. The big data computing technology-based data security audit method according to claim 1, wherein in the step S1, during the log data collection process, the collection status and the collection amount of the stream processing platform and the log data are continuously managed and monitored.

3. The big data computing technology-based data security audit method according to claim 1, wherein the real-time data is analyzed and processed online, and the non-real-time data is analyzed and processed offline;

the online analysis step comprises:

step a3, determine whether it is an internal event:

if yes, go to step A4;

the offline analyzing step comprises:

4. The big data computing technology-based data security audit method according to claim 1, wherein in step S2, at least one parsing node parses the log data respectively, and the parsing steps are as follows:

step 21: initializing the log data;

step 22: extracting effective log information from the log data;

5. The big data computing technology-based data security audit method according to claim 1, wherein in step S1, the log collection system is controlled to collect the log data by performing a functional configuration with the log collection system, where the functional configuration includes collection frequency, collection time period and task on and off.

6. A big data computing technology-based data security auditing system, which is applied to the big data computing technology-based data security auditing method of any one of claims 1 to 5, and comprises the following steps:

7. The big data computing technology-based data security audit system according to claim 6, further comprising a monitoring module, respectively connected to the log collection system and the stream processing platform, for continuously managing and monitoring the collection status and collection amount of the stream processing platform and the log collection system during the log data collection process.

8. The big data computing technology-based data security audit system according to claim 6, wherein the audit analysis module includes:

9. The big data computing technology-based data security audit system according to claim 8, wherein the alarm module comprises:

10. The big data computing technology-based data security audit system according to claim 6, wherein the parsing module includes a plurality of parsing nodes, and each parsing node is correspondingly provided with a parser for initializing the log data, extracting effective log information from the log data, obtaining the log data of at least one data type according to the log information, and sending the log data to at least one data destination respectively.