CN113094154A - Big data processing method and system based on Aliyun - Google Patents

Big data processing method and system based on Aliyun Download PDF

Info

Publication number
CN113094154A
CN113094154A CN202110355586.3A CN202110355586A CN113094154A CN 113094154 A CN113094154 A CN 113094154A CN 202110355586 A CN202110355586 A CN 202110355586A CN 113094154 A CN113094154 A CN 113094154A
Authority
CN
China
Prior art keywords
data
cloud
alarm
aliskiren
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202110355586.3A
Other languages
Chinese (zh)
Inventor
徐胜国
郭靓
曾锃
金倩倩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nari Technology Co Ltd
Nari Information and Communication Technology Co
Original Assignee
Nari Technology Co Ltd
Nari Information and Communication Technology Co
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nari Technology Co Ltd, Nari Information and Communication Technology Co filed Critical Nari Technology Co Ltd
Priority to CN202110355586.3A priority Critical patent/CN113094154A/en
Publication of CN113094154A publication Critical patent/CN113094154A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/465Distributed object oriented systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/466Transaction processing
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/14Session management
    • H04L67/143Termination or inactivation of sessions, e.g. event-controlled end of session

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a big data processing method based on Aliskiren cloud, which comprises the steps of preprocessing collected data through Aliskiren cloud; multitasking the preprocessed data through the Aliskiren cloud; storing alarms generated in the multitasking of data into a database; the multitask processed data are stored through the Ali cloud, the data processing capacity (MaxCommute EB-level large data storage and analysis capacity) of the Ali cloud is fully utilized by the Ali cloud-based computing and storage platform, the esper is operated on the Ali cloud in a distributed multi-task mode through Blink, the problem of low efficiency of the esper single process is solved, stronger data analysis capacity is provided, and the data processing efficiency is improved through various components in the Ali cloud platform.

Description

Big data processing method and system based on Aliyun
Technical Field
The invention belongs to the technical field of information processing, and particularly relates to a big data processing method based on Aliyun.
Background
With the development of internet technology, the application of big data is wider and wider, and in the data processing process of the big data, a traditional architecture needs many virtual machine resources, and an analysis tool esper is single-process and runs on a single server, so that the data processing speed is slow, and a single point of failure is easy to occur.
Disclosure of Invention
The invention aims to provide a big data processing method based on Aliskiren, which can improve the efficiency of big data processing.
In order to achieve the purpose, the invention provides the following technical scheme:
in a first aspect, a big data processing method based on Aliskiren cloud is provided, which includes:
preprocessing the acquired data through Aliskiren cloud;
multitasking the preprocessed data through the Aliskiren cloud;
storing alarms generated in the multitasking of data into a database;
and storing the multitasked data through the Ali cloud.
With reference to the first aspect, further, the preprocessing the acquired data by the arrhizus specifically includes:
sending the collected data to a kafka message queue of the Aliskiren cloud;
and taking out the data from the kafka message queue through the Aliskin Blink computing engine, identifying, de-duplicating, de-noising, normalizing and enhancing the data, and respectively sending the processed data to a data analysis queue and a big data storage queue.
With reference to the first aspect, further, the multitasking of the preprocessed data by the arrhizus specifically includes:
and converting the data in the data analysis queue into corresponding data objects according to a data model, converting the data objects into esper events, and sending the esper events to an esper rule engine for analysis and processing, wherein the esper rule engine runs on a Blink computing engine of the Alice cloud in a multitask mode.
With reference to the first aspect, further, an alarm event is generated for an esper event that hits an alarm rule loaded in advance, and is sent to the alarm warehousing queue of kafka.
With reference to the first aspect, the entering of the alarm generated in the multitasking of the data specifically includes:
and judging whether the alarm event is in a white list through a whiteteliststream tool in the Aliyun Blink computing engine, if so, not warehousing the alarm event, and otherwise, performing warehousing operation on the alarm event.
With reference to the first aspect, further, in the process of warehousing operation, if there is no alarm with the same source IP, the same destination IP, and the same alarm type in the database, the alarm is inserted into the database, otherwise, the number of times of the existing alarm is incremented by 1, and the number of times is updated into the database.
With reference to the first aspect, further, the storing the multitasked data through the aricloud includes:
reading data from the big data storage queue, and deserializing the data into corresponding data objects according to the data types;
and storing the data object through the MaxCommute service of the Alice cloud.
In combination with the first aspect, further, the historical data stored by MaxCompute before 6 months is automatically deleted.
In a second aspect, there is provided an ali cloud-based big data processing system, including:
a preprocessing module: the data acquisition device is used for preprocessing the acquired data through the Aliskiren cloud;
a multitask processing module: the data preprocessing module is used for performing multitasking processing on the preprocessed data through the Aliskiren cloud;
an alarm storage module: the system comprises a database, a database and a database, wherein the database is used for storing alarms generated in the multitasking of data;
a data storage module: the data processing method is used for storing the multitasked data through the Ali cloud.
The beneficial technical effects are as follows: compared with the prior art, the Arry cloud-based computing and storage platform provided by the invention fully utilizes the data processing capacity (MaxCommute EB-level large data storage and analysis capacity) of the Arry cloud, enables the esper to run on the Arry cloud in a distributed multi-task mode through Blink, solves the problem of low efficiency of the esper single process, provides stronger data analysis capacity, and improves the data processing efficiency through various components in the Arry cloud platform.
Drawings
FIG. 1 is a system architecture diagram of the present invention;
FIG. 2 is a data flow diagram of the present invention;
FIG. 3 is a flow diagram of a big data preprocessing module according to the present invention;
FIG. 4 is a flow chart of big data analysis in the present invention;
FIG. 5 is a flow chart of the alarm warehousing process of the present invention;
FIG. 6 is a flow chart of big data storage according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
As shown in fig. 1 to 6, the present invention provides a big data processing method based on alisma cloud, comprising the following steps:
step one, preprocessing the acquired data through Aliskiren cloud
The method specifically comprises the following steps:
1) data analysis: the collected data are sent to a kafka message queue of Aliskiu, the preprocessed data are consumed from the kafka queue, then the preprocessed data are deserialized into corresponding objects respectively according to the types of the data, and for most of the data, the final objects are an alarm object (eventWarnng), an index object (indicator Single) and a heartbeat object (DataQualityHeartBase).
2) Data identification: identifying the data in the step 1), associating a classification table cached in advance according to a port value of the data, selecting a data identification code required for solving a problem, and determining the data type.
3) Denoising data: and (3) directly deleting the data with the inconsistent format and the inconsistent attribute number and requirement from the data identified in the step 2), thereby improving the data quality.
4) Data deduplication: and 3) carrying out deduplication on the result data in the step 3), and only keeping the last record in the repeated records if the records only have different time and the same other attributes appear for multiple times in the specified time interval.
5) Normalizing data: performing canonicalization processing on the log data subjected to the duplicate removal in the step 4), and converting the logs of various different expression modes into a uniform description form. The analyst does not need to be familiar with different log information of different manufacturers, so that the analysis and audit work efficiency is greatly improved. The system provides a normalized field comprising log receiving time, log generating time, log duration, user name, source address, source MAC address, source port, operation, destination address, destination MAC address, destination port, log event name, abstract, level, original type, network protocol, network application protocol, device address, device name, device type and the like. The technical personnel also manually classify and analyze each log according to the best practice and the related technical standard, add a new log type field, enrich the information content of the log and make the boring log information more understandable.
6) Data forwarding: and (3) sending the data after the normalization in the step 5) to an analysis queue of the ES and the kafka and a large data storage queue of the kafka respectively. The ES extracts the lemma from the original journal text full text and indexes the extracted lemma to realize the indexing of formatted fields and the full text, and through the full text indexing technology, the system can provide a flexible and convenient analysis tool for analysts, thereby greatly improving the flexible convenience of using the system; the analysis queue of kafka is used as the input of a data analysis module and used for the real-time correlation analysis of subsequent big data; the big data storage queue of kafka is used as an input of the big data storage module for subsequent offline analysis tasks.
Step two, multitasking (big data analysis) is carried out on the preprocessed data through Aliyun
Step 1) data analysis: and consuming and analyzing data from the kafka queue, and then respectively deserializing the data into corresponding objects according to the types of the data.
Step 2) converting the object into an esper event and sending the event to the esper: registering metadata for a data source to be monitored, wherein the metadata is used for describing information of the data source to be monitored; then the data processing program defines the alarm rule of the data source to be monitored according to the attribute of the metadata and translates the defined process into an Esper SQL-like statement; then the data processing program monitors the data source to be monitored; and when the data source to be monitored triggers the alarm rule, the data processing program sends alarm information to a kafka alarm storage queue.
Step 3) splitting the object to be put in storage: splitting the data generated in the step 2) into information such as alarm data, index data, heartbeat data and the like.
Step 4), thread pool processing: and putting different types of data into blocking queues of different thread pools, taking the latest data when the monitoring thread finds that the queues are not empty, handing the latest data to the thread for processing, and then putting the latest data into the relational database.
Step three, the alarm generated in the multitasking process of the data is put into a warehouse, and the method comprises the following steps:
1) reading kafka alarm warehousing queue data: and reading the alarm data generated by the esper in the step two, and deserializing the alarm data into an alarm object.
2) Judging whether the white list is in the white list: loading a database white list table into a memory in advance, inquiring the white list table according to an attack source IP, an attack destination IP and an alarm level in an alarm object respectively, and returning the program to the step 1) to continuously read the next piece of alarm data in the list; and if the current time is not in the white list, performing warehousing operation. If the same source IP, the target IP and the same type of alarm do not exist in the database, directly inserting the alarm into the database; otherwise, adding 1 to the existing data on the basis of the alarm frequency, and updating the data into the database.
Step four, storing the data after multitasking through the Ali cloud, comprising the following steps:
1) data analysis: and consuming the large data storage data from the kafka queue, and then respectively deserializing the large data storage data into corresponding objects according to the types of the data.
2) Data storage to MaxCompute of aricloud: MaxCommute (original name ODPS) is a fast, fully hosted EB-level data warehouse solution.
With the continuous enrichment of data collection means and the accumulation of a large amount of industry data, the data scale has grown to the level of massive data (hundred TB, PB and EB) which cannot be borne by the traditional software industry. MaxCommute is dedicated to storage and calculation of batch structured data, and provides a solution and an analysis modeling service of a mass data warehouse. The method comprises the steps of establishing a MaxCommute data table through an odps _ cmd client tool according to a preset table establishing rule, wherein the MaxCommute is partitioned storage data, the size of each block of data is 64 megabytes, in order to prevent a large number of small files, a method for timed batch insertion is designed in the step, when the number of data reaches 3 thousands or the time reaches one day, batch operation is carried out, and the data is stored in a pre-established table according to datamode by using a Tunnel data transmission method in the MaxCommute for an offline analysis program to use.
Step 3) deleting the MaxCommute historical data at regular time: to prevent the data from expanding indefinitely, this step sets a timed task, deleting the historical data before MaxCompute6 months.
Example 2
Provided is an Aliskiun-based big data processing system, which comprises:
a preprocessing module: the data acquisition device is used for preprocessing the acquired data through the Aliskiren cloud;
a multitask processing module: the data preprocessing module is used for performing multitasking processing on the preprocessed data through the Aliskiren cloud;
an alarm storage module: the system comprises a database, a database and a database, wherein the database is used for storing alarms generated in the multitasking of data;
a data storage module: the data processing method is used for storing the multitasked data through the Ali cloud.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims (9)

1. A big data processing method based on Aliskiren cloud is characterized by comprising the following steps:
preprocessing the acquired data through Aliskiren cloud;
multitasking the preprocessed data through the Aliskiren cloud;
storing alarms generated in the multitasking of data into a database;
and storing the multitasked data through the Ali cloud.
2. The big data processing method based on Aliskiren cloud according to claim 1, wherein: the preprocessing of the collected data through the Aliskiren cloud specifically comprises the following steps:
sending the collected data to a kafka message queue of the Aliskiren cloud;
and taking out the data from the kafka message queue through the Aliskin Blink computing engine, identifying, de-duplicating, de-noising, normalizing and enhancing the data, and respectively sending the processed data to a data analysis queue and a big data storage queue.
3. The big data processing method based on Aliskiren cloud as claimed in claim 2, wherein the multitasking of the preprocessed data by Aliskiren cloud is specifically as follows:
and converting the data in the data analysis queue into corresponding data objects according to a data model, converting the data objects into esper events, and sending the esper events to an esper rule engine for analysis and processing, wherein the esper rule engine runs on a Blink computing engine of the Alice cloud in a multitask mode.
4. The big data processing method based on Aliskiu as claimed in claim 3, wherein an alarm event is generated for an esper event hitting the alarm rule loaded in advance and sent to the alarm warehousing queue of kafka.
5. The big data processing method based on the Aliskiren cloud as claimed in claim 4, wherein the step of putting the alarms generated in the multitasking process of the data into a database specifically comprises the following steps:
and judging whether the alarm event is in a white list through a whiteteliststream tool in the Aliyun Blink computing engine, if so, not warehousing the alarm event, and otherwise, performing warehousing operation on the alarm event.
6. The big data processing method based on Aliskiu as claimed in claim 4, wherein in the process of warehousing operation, if there is no alarm with the same source IP, the same destination IP and the same alarm type in the database, the alarm is inserted into the database, otherwise, the number of times of recording the existing alarm is increased by 1, and the number of times is updated into the database.
7. The big data processing method based on the Aliskiren cloud according to claim 1, wherein the storing the multitasked data through the Aliskiren cloud comprises the following steps:
reading data from the big data storage queue, and deserializing the data into corresponding data objects according to the data types;
and storing the data object through the MaxCommute service of the Alice cloud.
8. The arrhizus-based big data processing method as claimed in claim 1, wherein automatic deletion is performed for historical data stored by MaxCompute6 months ago.
9. A big data processing system based on Aliskiun, comprising:
a preprocessing module: the data acquisition device is used for preprocessing the acquired data through the Aliskiren cloud;
a multitask processing module: the data preprocessing module is used for performing multitasking processing on the preprocessed data through the Aliskiren cloud;
an alarm storage module: the system comprises a database, a database and a database, wherein the database is used for storing alarms generated in the multitasking of data;
a data storage module: the data processing method is used for storing the multitasked data through the Ali cloud.
CN202110355586.3A 2021-04-01 2021-04-01 Big data processing method and system based on Aliyun Withdrawn CN113094154A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110355586.3A CN113094154A (en) 2021-04-01 2021-04-01 Big data processing method and system based on Aliyun

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110355586.3A CN113094154A (en) 2021-04-01 2021-04-01 Big data processing method and system based on Aliyun

Publications (1)

Publication Number Publication Date
CN113094154A true CN113094154A (en) 2021-07-09

Family

ID=76672489

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110355586.3A Withdrawn CN113094154A (en) 2021-04-01 2021-04-01 Big data processing method and system based on Aliyun

Country Status (1)

Country Link
CN (1) CN113094154A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113986656A (en) * 2021-10-14 2022-01-28 南京南瑞信息通信科技有限公司 Power grid data safety monitoring system based on data center
CN116821246A (en) * 2023-07-12 2023-09-29 深度(山东)数字科技集团有限公司 Data synchronization method based on big data engine calculation

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113986656A (en) * 2021-10-14 2022-01-28 南京南瑞信息通信科技有限公司 Power grid data safety monitoring system based on data center
CN113986656B (en) * 2021-10-14 2023-12-19 南京南瑞信息通信科技有限公司 Power grid data safety monitoring system based on data center platform
CN116821246A (en) * 2023-07-12 2023-09-29 深度(山东)数字科技集团有限公司 Data synchronization method based on big data engine calculation

Similar Documents

Publication Publication Date Title
CN110019218B (en) Data storage and query method and equipment
CN108694195B (en) Management method and system of distributed data warehouse
CN110569214B (en) Index construction method and device for log file and electronic equipment
CN113094154A (en) Big data processing method and system based on Aliyun
CN114547208B (en) Method and native distributed database for full link trace transactions
CN112084224A (en) Data management method, system, device and medium
JP2020057416A (en) Method and device for processing data blocks in distributed database
CN111767320A (en) Data blood relationship determination method and device
CN111680017A (en) Data synchronization method and device
CN114416703A (en) Method, device, equipment and medium for automatically monitoring data integrity
CN115297041A (en) Data verification method and device for flow playback
CN106557483B (en) Data processing method, data query method, data processing equipment and data query equipment
CN113901037A (en) Data management method, device and storage medium
CN112650739A (en) Data storage processing method and device for coal mine data middling station
CN109508244B (en) Data processing method and computer readable medium
CN111639016A (en) Big data log analysis method and device and computer storage medium
US11023449B2 (en) Method and system to search logs that contain a massive number of entries
CN113986656B (en) Power grid data safety monitoring system based on data center platform
CN115269519A (en) Log detection method and device and electronic equipment
CN115829412A (en) Index data quantization processing method, system and medium based on business process
CN115470279A (en) Data source conversion method, device, equipment and medium based on enterprise data
CN109033196A (en) A kind of distributed data scheduling system and method
CN114385188A (en) Code workload statistical method and device and electronic equipment
CN114253914A (en) Distributed data acquisition system and method
CN114090558A (en) Data quality management method and device for database

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20210709