CN108563665A

CN108563665A - A kind of data processing system and method based on big data technology

Info

Publication number: CN108563665A
Application number: CN201810009615.9A
Authority: CN
Inventors: 何立鹏
Original assignee: Chengdu Xing Zheng E-Government Operations Services Ltd
Current assignee: Chengdu Xing Zheng E-Government Operations Services Ltd
Priority date: 2018-01-05
Filing date: 2018-01-05
Publication date: 2018-09-21

Abstract

Present invention relates particularly to a kind of data processing system and method based on big data technology, are related to big data technical field, the system comprises：Data acquisition unit, the data information original for gathered data；Data processing unit, for being carried out to collected data information, data cleansing, data merge and data conversion；Data synchronisation unit, for carrying out data synchronization to the data information after acquisition process.Have the advantages that visualization height and reliability are high.

Description

A kind of data processing system and method based on big data technology

Technical field

The present invention relates to big data technical fields, and in particular to a kind of data processing system and side based on big data technology Method.

Background technology

With the fast development of informationization technology, collects, stores, handling and the data volume of analysis is increasing.Towards more The processing of the isomery big data in source is more and more fiery, and there are five features for big data.Wherein, magnanimity is primarily referred to as the huge of data scale Big and growth rate continues to increase；Distributivity be mainly reflected in huge data volume cannot store on a machine calculating and Analysis；Isomerism is mainly reflected in the diversification of data type and data source.Utilize the collection of traditional structure-oriented data Chinese style processing mode, it is difficult to solve the problems, such as that big data is brought, for these three characteristics, the integrated and cleaning towards big data becomes It obtains particularly important.Big data includes also simultaneously uncertain data, and uncertain data Producing reason is more diversified at this stage, It is mainly reflected in initial data inaccuracy, using coarseness data acquisition system, data field missing and data integration.

Key in processing big data problem is to use distributed storage and Distributed Calculation to data.Currently, the world is flowed The capable model Key-Value models calculated about big data, most representational is exactly Hadoop, Spark, Hive, he Appearance to solve the problems, such as that big data provides a reliably solution.

Invention content

In consideration of it, the object of the present invention is to provide a kind of data processing system and method based on big data technology, has Visualization height and the high advantage of reliability.

The technical solution adopted by the present invention is as follows：

A kind of data processing system based on big data technology, the system comprises：

Data acquisition unit, the data information original for gathered data；

Data processing unit, for being carried out to collected data information, data cleansing, data merge and data conversion；

Data synchronisation unit, for carrying out data synchronization to the data information after acquisition process.

Further, the data acquisition unit includes：

Initial data library unit, for storing original data information；

Data acquisition unit, for from initial data library unit, acquiring original data information；

Database Unit is stored, for storing the data information after acquiring.

Further, the data processing unit includes：

Data cleansing unit, for carrying out data cleansing to the data information in storage Database Unit；

Data combination unit generates for carrying out data merging to the data information after data cleansing and merges table；

Date Conversion Unit generates conversion table for carrying out data conversion to the data information after data cleansing.

Further, the data synchronisation unit includes：

Synchronization unit, for carrying out data synchronization；

Table synchronization unit, for carrying out table synchronization；

Text file download unit, for carrying out text file download；

File format lead-out unit, the file for exporting specified format.

A kind of data processing method based on big data technology, the method includes：

Step 1：Configuration data source node selects Data source table to be processed and field；

Step 2：Configuration cleaning node, cleans the data of Data source table；

Step 3：Merge node is configured, the data of multiple data source nodes are merged；

Step 4：Aggregation is configured, the data of last node are subjected to data aggregate；

Step 5：Switching node is configured, the data of superior node are subjected to data conversion and processing；

Step 6：Output node is configured, where is the output of setting treated data；

Step 7：The data of access are split by data flow task management module by the amount of capacity of setting, and will be divided Data flow after cutting is handled by setting processing flow；

Step 8：Message managing module handles the message transmission between each node of setting process, ensures that flow smoothly executes.

Step 9：Workflow management, for managing and executing flow chart of data processing.

Compared with prior art, the beneficial effects of the invention are as follows：Provide visualization, rapidly configuring data process flow Platform.It provides rich and varied data format to support, can desensitization DecryptDecryption processing quickly be carried out to data, timing or can immediately hold Row flow chart of data processing, and the execution information of flow nodes can be intuitively viewed, the error information of back end failure is provided Check and Data Quality Analysis, using the extraction of big data treatment technology combination processing data, conversion with it is synchronous, thus can be with The reliability of data conversion is greatly improved, and privacy desensitization DecryptDecryption data can be quickly obtained.

Description of the drawings

Fig. 1 is the structural schematic diagram of the data processing system based on big data technology of the present invention.

Fig. 2 is the processing flow schematic diagram of the data processing system based on big data technology of the present invention.

Specific implementation mode

It is below in conjunction with the accompanying drawings and specific real in order to make those skilled in the art more fully understand technical scheme of the present invention Applying example, the present invention is described in further detail.

Embodiment 1：

As depicted in figs. 1 and 2, a kind of data processing system based on big data technology, the system comprises：

Data acquisition unit, the data information original for gathered data；

Further, the data acquisition unit includes：

Initial data library unit, for storing original data information；

Database Unit is stored, for storing the data information after acquiring.

Further, the data processing unit includes：

Further, the data synchronisation unit includes：

Synchronization unit, for carrying out data synchronization；

Table synchronization unit, for carrying out table synchronization；

Text file download unit, for carrying out text file download；

File format lead-out unit, the file for exporting specified format.

When data acquire, system generates different synchronization statements according to the different data sources of setting, passes through sqoop tools By in data pick-up to HIVE databases, the field information of Data source table is saved in application library.

When data processing, data cleansing HiveSql is generated by the data cleansing information of user configuration, is connected by jdbc To hive databases and execute cleaning HiveSql table data are cleaned；By the pooling information configured, it is associated to generate table Merge HiveSql, connection hive databases, which execute, generates a new merging data table, and is saved in the table name generated is merged In application library；Table after merging is converted into txt-formatted file, is generated and is appointed by ssh command calls spark data conversion packets Business, transformed content is saved in txt file, new conversion table is generated, the transformed content of txt file is passed through HiveSql imports order and imported into conversion table.

When data synchronize, data synchronization processing is carried out according to the different types of data method of synchronization of user configuration, if It being configured to be synchronized to relation data, then the type of database of the synchronization configured user's inflammation and address generate data and export sentence, Export sentence is executed by sqoop tools, the data that data are exported to setting correspond in table, when in synchronous target database The table does not then generate table structure by the field information of record, then data are exported in new table；It is literary if it is text is passed through Part mode exports, and corresponding HiveSql export (swf) commands are generated by configuration information, generates corresponding text file, supports conversion For text file, excel files.

Embodiment 2：A kind of data processing method based on big data technology, the method includes：

Step 2：Configuration cleaning node, cleans the data of Data source table；

It the above is only the preferred embodiment of the present invention, it is noted that above-mentioned preferred embodiment is not construed as pair The limitation of the present invention, protection scope of the present invention should be subject to claim limited range.For the art For those of ordinary skill, without departing from the spirit and scope of the present invention, several improvements and modifications can also be made, these change Protection scope of the present invention is also should be regarded as into retouching.

Claims

1. a kind of data processing system based on big data technology, which is characterized in that the system comprises：

Data acquisition unit, the data information original for gathered data；

2. the data processing system as described in claim 1 based on big data technology, which is characterized in that the data acquisition is single Member includes：

Initial data library unit, for storing original data information；

Database Unit is stored, for storing the data information after acquiring.

3. the data processing system as claimed in claim 2 based on big data technology, which is characterized in that the data processing list Member includes：

4. the data processing system as claimed in claim 3 based on big data technology, which is characterized in that the data synchronize list Member includes：

Synchronization unit, for carrying out data synchronization；

Table synchronization unit, for carrying out table synchronization；

Text file download unit, for carrying out text file download；

File format lead-out unit, the file for exporting specified format.

5. a kind of data processing method based on big data technology, which is characterized in that the method includes：

Step 2：Configuration cleaning node, cleans the data of Data source table；

Step 7：The data of access are split by data flow task management module by the amount of capacity of setting, and will be after segmentation Data flow handled by setting processing flow；