CN108563665A - A kind of data processing system and method based on big data technology - Google Patents

A kind of data processing system and method based on big data technology Download PDF

Info

Publication number
CN108563665A
CN108563665A CN201810009615.9A CN201810009615A CN108563665A CN 108563665 A CN108563665 A CN 108563665A CN 201810009615 A CN201810009615 A CN 201810009615A CN 108563665 A CN108563665 A CN 108563665A
Authority
CN
China
Prior art keywords
data
unit
information
node
carrying
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810009615.9A
Other languages
Chinese (zh)
Inventor
何立鹏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Xing Zheng E-Government Operations Services Ltd
Original Assignee
Chengdu Xing Zheng E-Government Operations Services Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Xing Zheng E-Government Operations Services Ltd filed Critical Chengdu Xing Zheng E-Government Operations Services Ltd
Priority to CN201810009615.9A priority Critical patent/CN108563665A/en
Publication of CN108563665A publication Critical patent/CN108563665A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services

Abstract

Present invention relates particularly to a kind of data processing system and method based on big data technology, are related to big data technical field, the system comprises:Data acquisition unit, the data information original for gathered data;Data processing unit, for being carried out to collected data information, data cleansing, data merge and data conversion;Data synchronisation unit, for carrying out data synchronization to the data information after acquisition process.Have the advantages that visualization height and reliability are high.

Description

A kind of data processing system and method based on big data technology
Technical field
The present invention relates to big data technical fields, and in particular to a kind of data processing system and side based on big data technology Method.
Background technology
With the fast development of informationization technology, collects, stores, handling and the data volume of analysis is increasing.Towards more The processing of the isomery big data in source is more and more fiery, and there are five features for big data.Wherein, magnanimity is primarily referred to as the huge of data scale Big and growth rate continues to increase;Distributivity be mainly reflected in huge data volume cannot store on a machine calculating and Analysis;Isomerism is mainly reflected in the diversification of data type and data source.Utilize the collection of traditional structure-oriented data Chinese style processing mode, it is difficult to solve the problems, such as that big data is brought, for these three characteristics, the integrated and cleaning towards big data becomes It obtains particularly important.Big data includes also simultaneously uncertain data, and uncertain data Producing reason is more diversified at this stage, It is mainly reflected in initial data inaccuracy, using coarseness data acquisition system, data field missing and data integration.
Key in processing big data problem is to use distributed storage and Distributed Calculation to data.Currently, the world is flowed The capable model Key-Value models calculated about big data, most representational is exactly Hadoop, Spark, Hive, he Appearance to solve the problems, such as that big data provides a reliably solution.
Invention content
In consideration of it, the object of the present invention is to provide a kind of data processing system and method based on big data technology, has Visualization height and the high advantage of reliability.
The technical solution adopted by the present invention is as follows:
A kind of data processing system based on big data technology, the system comprises:
Data acquisition unit, the data information original for gathered data;
Data processing unit, for being carried out to collected data information, data cleansing, data merge and data conversion;
Data synchronisation unit, for carrying out data synchronization to the data information after acquisition process.
Further, the data acquisition unit includes:
Initial data library unit, for storing original data information;
Data acquisition unit, for from initial data library unit, acquiring original data information;
Database Unit is stored, for storing the data information after acquiring.
Further, the data processing unit includes:
Data cleansing unit, for carrying out data cleansing to the data information in storage Database Unit;
Data combination unit generates for carrying out data merging to the data information after data cleansing and merges table;
Date Conversion Unit generates conversion table for carrying out data conversion to the data information after data cleansing.
Further, the data synchronisation unit includes:
Synchronization unit, for carrying out data synchronization;
Table synchronization unit, for carrying out table synchronization;
Text file download unit, for carrying out text file download;
File format lead-out unit, the file for exporting specified format.
A kind of data processing method based on big data technology, the method includes:
Step 1:Configuration data source node selects Data source table to be processed and field;
Step 2:Configuration cleaning node, cleans the data of Data source table;
Step 3:Merge node is configured, the data of multiple data source nodes are merged;
Step 4:Aggregation is configured, the data of last node are subjected to data aggregate;
Step 5:Switching node is configured, the data of superior node are subjected to data conversion and processing;
Step 6:Output node is configured, where is the output of setting treated data;
Step 7:The data of access are split by data flow task management module by the amount of capacity of setting, and will be divided Data flow after cutting is handled by setting processing flow;
Step 8:Message managing module handles the message transmission between each node of setting process, ensures that flow smoothly executes.
Step 9:Workflow management, for managing and executing flow chart of data processing.
Compared with prior art, the beneficial effects of the invention are as follows:Provide visualization, rapidly configuring data process flow Platform.It provides rich and varied data format to support, can desensitization DecryptDecryption processing quickly be carried out to data, timing or can immediately hold Row flow chart of data processing, and the execution information of flow nodes can be intuitively viewed, the error information of back end failure is provided Check and Data Quality Analysis, using the extraction of big data treatment technology combination processing data, conversion with it is synchronous, thus can be with The reliability of data conversion is greatly improved, and privacy desensitization DecryptDecryption data can be quickly obtained.
Description of the drawings
Fig. 1 is the structural schematic diagram of the data processing system based on big data technology of the present invention.
Fig. 2 is the processing flow schematic diagram of the data processing system based on big data technology of the present invention.
Specific implementation mode
It is below in conjunction with the accompanying drawings and specific real in order to make those skilled in the art more fully understand technical scheme of the present invention Applying example, the present invention is described in further detail.
Embodiment 1:
As depicted in figs. 1 and 2, a kind of data processing system based on big data technology, the system comprises:
Data acquisition unit, the data information original for gathered data;
Data processing unit, for being carried out to collected data information, data cleansing, data merge and data conversion;
Data synchronisation unit, for carrying out data synchronization to the data information after acquisition process.
Further, the data acquisition unit includes:
Initial data library unit, for storing original data information;
Data acquisition unit, for from initial data library unit, acquiring original data information;
Database Unit is stored, for storing the data information after acquiring.
Further, the data processing unit includes:
Data cleansing unit, for carrying out data cleansing to the data information in storage Database Unit;
Data combination unit generates for carrying out data merging to the data information after data cleansing and merges table;
Date Conversion Unit generates conversion table for carrying out data conversion to the data information after data cleansing.
Further, the data synchronisation unit includes:
Synchronization unit, for carrying out data synchronization;
Table synchronization unit, for carrying out table synchronization;
Text file download unit, for carrying out text file download;
File format lead-out unit, the file for exporting specified format.
When data acquire, system generates different synchronization statements according to the different data sources of setting, passes through sqoop tools By in data pick-up to HIVE databases, the field information of Data source table is saved in application library.
When data processing, data cleansing HiveSql is generated by the data cleansing information of user configuration, is connected by jdbc To hive databases and execute cleaning HiveSql table data are cleaned;By the pooling information configured, it is associated to generate table Merge HiveSql, connection hive databases, which execute, generates a new merging data table, and is saved in the table name generated is merged In application library;Table after merging is converted into txt-formatted file, is generated and is appointed by ssh command calls spark data conversion packets Business, transformed content is saved in txt file, new conversion table is generated, the transformed content of txt file is passed through HiveSql imports order and imported into conversion table.
When data synchronize, data synchronization processing is carried out according to the different types of data method of synchronization of user configuration, if It being configured to be synchronized to relation data, then the type of database of the synchronization configured user's inflammation and address generate data and export sentence, Export sentence is executed by sqoop tools, the data that data are exported to setting correspond in table, when in synchronous target database The table does not then generate table structure by the field information of record, then data are exported in new table;It is literary if it is text is passed through Part mode exports, and corresponding HiveSql export (swf) commands are generated by configuration information, generates corresponding text file, supports conversion For text file, excel files.
Embodiment 2:A kind of data processing method based on big data technology, the method includes:
Step 1:Configuration data source node selects Data source table to be processed and field;
Step 2:Configuration cleaning node, cleans the data of Data source table;
Step 3:Merge node is configured, the data of multiple data source nodes are merged;
Step 4:Aggregation is configured, the data of last node are subjected to data aggregate;
Step 5:Switching node is configured, the data of superior node are subjected to data conversion and processing;
Step 6:Output node is configured, where is the output of setting treated data;
Step 7:The data of access are split by data flow task management module by the amount of capacity of setting, and will be divided Data flow after cutting is handled by setting processing flow;
Step 8:Message managing module handles the message transmission between each node of setting process, ensures that flow smoothly executes.
Step 9:Workflow management, for managing and executing flow chart of data processing.
It the above is only the preferred embodiment of the present invention, it is noted that above-mentioned preferred embodiment is not construed as pair The limitation of the present invention, protection scope of the present invention should be subject to claim limited range.For the art For those of ordinary skill, without departing from the spirit and scope of the present invention, several improvements and modifications can also be made, these change Protection scope of the present invention is also should be regarded as into retouching.

Claims (5)

1. a kind of data processing system based on big data technology, which is characterized in that the system comprises:
Data acquisition unit, the data information original for gathered data;
Data processing unit, for being carried out to collected data information, data cleansing, data merge and data conversion;
Data synchronisation unit, for carrying out data synchronization to the data information after acquisition process.
2. the data processing system as described in claim 1 based on big data technology, which is characterized in that the data acquisition is single Member includes:
Initial data library unit, for storing original data information;
Data acquisition unit, for from initial data library unit, acquiring original data information;
Database Unit is stored, for storing the data information after acquiring.
3. the data processing system as claimed in claim 2 based on big data technology, which is characterized in that the data processing list Member includes:
Data cleansing unit, for carrying out data cleansing to the data information in storage Database Unit;
Data combination unit generates for carrying out data merging to the data information after data cleansing and merges table;
Date Conversion Unit generates conversion table for carrying out data conversion to the data information after data cleansing.
4. the data processing system as claimed in claim 3 based on big data technology, which is characterized in that the data synchronize list Member includes:
Synchronization unit, for carrying out data synchronization;
Table synchronization unit, for carrying out table synchronization;
Text file download unit, for carrying out text file download;
File format lead-out unit, the file for exporting specified format.
5. a kind of data processing method based on big data technology, which is characterized in that the method includes:
Step 1:Configuration data source node selects Data source table to be processed and field;
Step 2:Configuration cleaning node, cleans the data of Data source table;
Step 3:Merge node is configured, the data of multiple data source nodes are merged;
Step 4:Aggregation is configured, the data of last node are subjected to data aggregate;
Step 5:Switching node is configured, the data of superior node are subjected to data conversion and processing;
Step 6:Output node is configured, where is the output of setting treated data;
Step 7:The data of access are split by data flow task management module by the amount of capacity of setting, and will be after segmentation Data flow handled by setting processing flow;
Step 8:Message managing module handles the message transmission between each node of setting process, ensures that flow smoothly executes.
Step 9:Workflow management, for managing and executing flow chart of data processing.
CN201810009615.9A 2018-01-05 2018-01-05 A kind of data processing system and method based on big data technology Pending CN108563665A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810009615.9A CN108563665A (en) 2018-01-05 2018-01-05 A kind of data processing system and method based on big data technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810009615.9A CN108563665A (en) 2018-01-05 2018-01-05 A kind of data processing system and method based on big data technology

Publications (1)

Publication Number Publication Date
CN108563665A true CN108563665A (en) 2018-09-21

Family

ID=63530668

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810009615.9A Pending CN108563665A (en) 2018-01-05 2018-01-05 A kind of data processing system and method based on big data technology

Country Status (1)

Country Link
CN (1) CN108563665A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110618988A (en) * 2019-09-20 2019-12-27 中国银行股份有限公司 Data processing method and device based on big data platform
WO2020215532A1 (en) * 2019-04-26 2020-10-29 厦门市美亚柏科信息股份有限公司 System and method for data synchronization between heterogeneous databases, and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105893375A (en) * 2014-12-04 2016-08-24 北京航天长峰科技工业集团有限公司 Safety production data following management based on big data
CN106227862A (en) * 2016-07-29 2016-12-14 浪潮软件集团有限公司 E-commerce data integration method based on distribution

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105893375A (en) * 2014-12-04 2016-08-24 北京航天长峰科技工业集团有限公司 Safety production data following management based on big data
CN106227862A (en) * 2016-07-29 2016-12-14 浪潮软件集团有限公司 E-commerce data integration method based on distribution

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020215532A1 (en) * 2019-04-26 2020-10-29 厦门市美亚柏科信息股份有限公司 System and method for data synchronization between heterogeneous databases, and storage medium
CN110618988A (en) * 2019-09-20 2019-12-27 中国银行股份有限公司 Data processing method and device based on big data platform
CN110618988B (en) * 2019-09-20 2022-09-23 中国银行股份有限公司 Data processing method and device based on big data platform

Similar Documents

Publication Publication Date Title
CN111400326B (en) Smart city data management system and method thereof
CN104123288B (en) A kind of data query method and device
CN102426609B (en) Index generation method and index generation device based on MapReduce programming architecture
CN104239417B (en) Dynamic adjusting method and device after a kind of distributed data base data fragmentation
CN104111996A (en) Health insurance outpatient clinic big data extraction system and method based on hadoop platform
CN108280023B (en) Task execution method and device and server
WO2015062181A1 (en) Method for achieving automatic synchronization of multisource heterogeneous data resources
CN106682213A (en) Internet-of-things task customizing method and system based on Hadoop platform
CN110134663B (en) Organization structure data processing method and device and electronic equipment
CN108536808A (en) A kind of data capture method and device based on Spark Computational frames
Bala et al. P-ETL: Parallel-ETL based on the MapReduce paradigm
CN104899284A (en) Method and device for driving scheduling system based on metadata
CN103034553B (en) Intelligent verification algorithm, method and device for report designer
CN108563665A (en) A kind of data processing system and method based on big data technology
CN105279138B (en) A kind of information research report automatic creation system
CN115146000A (en) Database data synchronization method and device, electronic equipment and storage medium
CN107704620A (en) A kind of method, apparatus of file administration, equipment and storage medium
CN108073582B (en) Computing framework selection method and device
CN111488325A (en) Meteorological big data aggregation method based on Hadoop architecture
CN106571940A (en) Method and device of fusing network management data and resource data
CN107656995A (en) Towards the data management system of big data
CN103810197A (en) Hadoop-based data processing method and system
CN111930862B (en) SQL interactive analysis method and system based on big data platform
CN105426407A (en) Web data acquisition method based on content analysis
CN116225455B (en) Method for rapidly generating statistical analysis report

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20180921