CN108563665A - A kind of data processing system and method based on big data technology - Google Patents
A kind of data processing system and method based on big data technology Download PDFInfo
- Publication number
- CN108563665A CN108563665A CN201810009615.9A CN201810009615A CN108563665A CN 108563665 A CN108563665 A CN 108563665A CN 201810009615 A CN201810009615 A CN 201810009615A CN 108563665 A CN108563665 A CN 108563665A
- Authority
- CN
- China
- Prior art keywords
- data
- unit
- information
- node
- carrying
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/602—Providing cryptographic facilities or services
Abstract
Present invention relates particularly to a kind of data processing system and method based on big data technology, are related to big data technical field, the system comprises:Data acquisition unit, the data information original for gathered data;Data processing unit, for being carried out to collected data information, data cleansing, data merge and data conversion;Data synchronisation unit, for carrying out data synchronization to the data information after acquisition process.Have the advantages that visualization height and reliability are high.
Description
Technical field
The present invention relates to big data technical fields, and in particular to a kind of data processing system and side based on big data technology
Method.
Background technology
With the fast development of informationization technology, collects, stores, handling and the data volume of analysis is increasing.Towards more
The processing of the isomery big data in source is more and more fiery, and there are five features for big data.Wherein, magnanimity is primarily referred to as the huge of data scale
Big and growth rate continues to increase;Distributivity be mainly reflected in huge data volume cannot store on a machine calculating and
Analysis;Isomerism is mainly reflected in the diversification of data type and data source.Utilize the collection of traditional structure-oriented data
Chinese style processing mode, it is difficult to solve the problems, such as that big data is brought, for these three characteristics, the integrated and cleaning towards big data becomes
It obtains particularly important.Big data includes also simultaneously uncertain data, and uncertain data Producing reason is more diversified at this stage,
It is mainly reflected in initial data inaccuracy, using coarseness data acquisition system, data field missing and data integration.
Key in processing big data problem is to use distributed storage and Distributed Calculation to data.Currently, the world is flowed
The capable model Key-Value models calculated about big data, most representational is exactly Hadoop, Spark, Hive, he
Appearance to solve the problems, such as that big data provides a reliably solution.
Invention content
In consideration of it, the object of the present invention is to provide a kind of data processing system and method based on big data technology, has
Visualization height and the high advantage of reliability.
The technical solution adopted by the present invention is as follows:
A kind of data processing system based on big data technology, the system comprises:
Data acquisition unit, the data information original for gathered data;
Data processing unit, for being carried out to collected data information, data cleansing, data merge and data conversion;
Data synchronisation unit, for carrying out data synchronization to the data information after acquisition process.
Further, the data acquisition unit includes:
Initial data library unit, for storing original data information;
Data acquisition unit, for from initial data library unit, acquiring original data information;
Database Unit is stored, for storing the data information after acquiring.
Further, the data processing unit includes:
Data cleansing unit, for carrying out data cleansing to the data information in storage Database Unit;
Data combination unit generates for carrying out data merging to the data information after data cleansing and merges table;
Date Conversion Unit generates conversion table for carrying out data conversion to the data information after data cleansing.
Further, the data synchronisation unit includes:
Synchronization unit, for carrying out data synchronization;
Table synchronization unit, for carrying out table synchronization;
Text file download unit, for carrying out text file download;
File format lead-out unit, the file for exporting specified format.
A kind of data processing method based on big data technology, the method includes:
Step 1:Configuration data source node selects Data source table to be processed and field;
Step 2:Configuration cleaning node, cleans the data of Data source table;
Step 3:Merge node is configured, the data of multiple data source nodes are merged;
Step 4:Aggregation is configured, the data of last node are subjected to data aggregate;
Step 5:Switching node is configured, the data of superior node are subjected to data conversion and processing;
Step 6:Output node is configured, where is the output of setting treated data;
Step 7:The data of access are split by data flow task management module by the amount of capacity of setting, and will be divided
Data flow after cutting is handled by setting processing flow;
Step 8:Message managing module handles the message transmission between each node of setting process, ensures that flow smoothly executes.
Step 9:Workflow management, for managing and executing flow chart of data processing.
Compared with prior art, the beneficial effects of the invention are as follows:Provide visualization, rapidly configuring data process flow
Platform.It provides rich and varied data format to support, can desensitization DecryptDecryption processing quickly be carried out to data, timing or can immediately hold
Row flow chart of data processing, and the execution information of flow nodes can be intuitively viewed, the error information of back end failure is provided
Check and Data Quality Analysis, using the extraction of big data treatment technology combination processing data, conversion with it is synchronous, thus can be with
The reliability of data conversion is greatly improved, and privacy desensitization DecryptDecryption data can be quickly obtained.
Description of the drawings
Fig. 1 is the structural schematic diagram of the data processing system based on big data technology of the present invention.
Fig. 2 is the processing flow schematic diagram of the data processing system based on big data technology of the present invention.
Specific implementation mode
It is below in conjunction with the accompanying drawings and specific real in order to make those skilled in the art more fully understand technical scheme of the present invention
Applying example, the present invention is described in further detail.
Embodiment 1:
As depicted in figs. 1 and 2, a kind of data processing system based on big data technology, the system comprises:
Data acquisition unit, the data information original for gathered data;
Data processing unit, for being carried out to collected data information, data cleansing, data merge and data conversion;
Data synchronisation unit, for carrying out data synchronization to the data information after acquisition process.
Further, the data acquisition unit includes:
Initial data library unit, for storing original data information;
Data acquisition unit, for from initial data library unit, acquiring original data information;
Database Unit is stored, for storing the data information after acquiring.
Further, the data processing unit includes:
Data cleansing unit, for carrying out data cleansing to the data information in storage Database Unit;
Data combination unit generates for carrying out data merging to the data information after data cleansing and merges table;
Date Conversion Unit generates conversion table for carrying out data conversion to the data information after data cleansing.
Further, the data synchronisation unit includes:
Synchronization unit, for carrying out data synchronization;
Table synchronization unit, for carrying out table synchronization;
Text file download unit, for carrying out text file download;
File format lead-out unit, the file for exporting specified format.
When data acquire, system generates different synchronization statements according to the different data sources of setting, passes through sqoop tools
By in data pick-up to HIVE databases, the field information of Data source table is saved in application library.
When data processing, data cleansing HiveSql is generated by the data cleansing information of user configuration, is connected by jdbc
To hive databases and execute cleaning HiveSql table data are cleaned;By the pooling information configured, it is associated to generate table
Merge HiveSql, connection hive databases, which execute, generates a new merging data table, and is saved in the table name generated is merged
In application library;Table after merging is converted into txt-formatted file, is generated and is appointed by ssh command calls spark data conversion packets
Business, transformed content is saved in txt file, new conversion table is generated, the transformed content of txt file is passed through
HiveSql imports order and imported into conversion table.
When data synchronize, data synchronization processing is carried out according to the different types of data method of synchronization of user configuration, if
It being configured to be synchronized to relation data, then the type of database of the synchronization configured user's inflammation and address generate data and export sentence,
Export sentence is executed by sqoop tools, the data that data are exported to setting correspond in table, when in synchronous target database
The table does not then generate table structure by the field information of record, then data are exported in new table;It is literary if it is text is passed through
Part mode exports, and corresponding HiveSql export (swf) commands are generated by configuration information, generates corresponding text file, supports conversion
For text file, excel files.
Embodiment 2:A kind of data processing method based on big data technology, the method includes:
Step 1:Configuration data source node selects Data source table to be processed and field;
Step 2:Configuration cleaning node, cleans the data of Data source table;
Step 3:Merge node is configured, the data of multiple data source nodes are merged;
Step 4:Aggregation is configured, the data of last node are subjected to data aggregate;
Step 5:Switching node is configured, the data of superior node are subjected to data conversion and processing;
Step 6:Output node is configured, where is the output of setting treated data;
Step 7:The data of access are split by data flow task management module by the amount of capacity of setting, and will be divided
Data flow after cutting is handled by setting processing flow;
Step 8:Message managing module handles the message transmission between each node of setting process, ensures that flow smoothly executes.
Step 9:Workflow management, for managing and executing flow chart of data processing.
It the above is only the preferred embodiment of the present invention, it is noted that above-mentioned preferred embodiment is not construed as pair
The limitation of the present invention, protection scope of the present invention should be subject to claim limited range.For the art
For those of ordinary skill, without departing from the spirit and scope of the present invention, several improvements and modifications can also be made, these change
Protection scope of the present invention is also should be regarded as into retouching.
Claims (5)
1. a kind of data processing system based on big data technology, which is characterized in that the system comprises:
Data acquisition unit, the data information original for gathered data;
Data processing unit, for being carried out to collected data information, data cleansing, data merge and data conversion;
Data synchronisation unit, for carrying out data synchronization to the data information after acquisition process.
2. the data processing system as described in claim 1 based on big data technology, which is characterized in that the data acquisition is single
Member includes:
Initial data library unit, for storing original data information;
Data acquisition unit, for from initial data library unit, acquiring original data information;
Database Unit is stored, for storing the data information after acquiring.
3. the data processing system as claimed in claim 2 based on big data technology, which is characterized in that the data processing list
Member includes:
Data cleansing unit, for carrying out data cleansing to the data information in storage Database Unit;
Data combination unit generates for carrying out data merging to the data information after data cleansing and merges table;
Date Conversion Unit generates conversion table for carrying out data conversion to the data information after data cleansing.
4. the data processing system as claimed in claim 3 based on big data technology, which is characterized in that the data synchronize list
Member includes:
Synchronization unit, for carrying out data synchronization;
Table synchronization unit, for carrying out table synchronization;
Text file download unit, for carrying out text file download;
File format lead-out unit, the file for exporting specified format.
5. a kind of data processing method based on big data technology, which is characterized in that the method includes:
Step 1:Configuration data source node selects Data source table to be processed and field;
Step 2:Configuration cleaning node, cleans the data of Data source table;
Step 3:Merge node is configured, the data of multiple data source nodes are merged;
Step 4:Aggregation is configured, the data of last node are subjected to data aggregate;
Step 5:Switching node is configured, the data of superior node are subjected to data conversion and processing;
Step 6:Output node is configured, where is the output of setting treated data;
Step 7:The data of access are split by data flow task management module by the amount of capacity of setting, and will be after segmentation
Data flow handled by setting processing flow;
Step 8:Message managing module handles the message transmission between each node of setting process, ensures that flow smoothly executes.
Step 9:Workflow management, for managing and executing flow chart of data processing.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810009615.9A CN108563665A (en) | 2018-01-05 | 2018-01-05 | A kind of data processing system and method based on big data technology |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810009615.9A CN108563665A (en) | 2018-01-05 | 2018-01-05 | A kind of data processing system and method based on big data technology |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108563665A true CN108563665A (en) | 2018-09-21 |
Family
ID=63530668
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810009615.9A Pending CN108563665A (en) | 2018-01-05 | 2018-01-05 | A kind of data processing system and method based on big data technology |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108563665A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110618988A (en) * | 2019-09-20 | 2019-12-27 | 中国银行股份有限公司 | Data processing method and device based on big data platform |
WO2020215532A1 (en) * | 2019-04-26 | 2020-10-29 | 厦门市美亚柏科信息股份有限公司 | System and method for data synchronization between heterogeneous databases, and storage medium |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105893375A (en) * | 2014-12-04 | 2016-08-24 | 北京航天长峰科技工业集团有限公司 | Safety production data following management based on big data |
CN106227862A (en) * | 2016-07-29 | 2016-12-14 | 浪潮软件集团有限公司 | E-commerce data integration method based on distribution |
-
2018
- 2018-01-05 CN CN201810009615.9A patent/CN108563665A/en active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105893375A (en) * | 2014-12-04 | 2016-08-24 | 北京航天长峰科技工业集团有限公司 | Safety production data following management based on big data |
CN106227862A (en) * | 2016-07-29 | 2016-12-14 | 浪潮软件集团有限公司 | E-commerce data integration method based on distribution |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020215532A1 (en) * | 2019-04-26 | 2020-10-29 | 厦门市美亚柏科信息股份有限公司 | System and method for data synchronization between heterogeneous databases, and storage medium |
CN110618988A (en) * | 2019-09-20 | 2019-12-27 | 中国银行股份有限公司 | Data processing method and device based on big data platform |
CN110618988B (en) * | 2019-09-20 | 2022-09-23 | 中国银行股份有限公司 | Data processing method and device based on big data platform |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111400326B (en) | Smart city data management system and method thereof | |
CN104123288B (en) | A kind of data query method and device | |
CN102426609B (en) | Index generation method and index generation device based on MapReduce programming architecture | |
CN104239417B (en) | Dynamic adjusting method and device after a kind of distributed data base data fragmentation | |
CN104111996A (en) | Health insurance outpatient clinic big data extraction system and method based on hadoop platform | |
CN108280023B (en) | Task execution method and device and server | |
WO2015062181A1 (en) | Method for achieving automatic synchronization of multisource heterogeneous data resources | |
CN106682213A (en) | Internet-of-things task customizing method and system based on Hadoop platform | |
CN110134663B (en) | Organization structure data processing method and device and electronic equipment | |
CN108536808A (en) | A kind of data capture method and device based on Spark Computational frames | |
Bala et al. | P-ETL: Parallel-ETL based on the MapReduce paradigm | |
CN104899284A (en) | Method and device for driving scheduling system based on metadata | |
CN103034553B (en) | Intelligent verification algorithm, method and device for report designer | |
CN108563665A (en) | A kind of data processing system and method based on big data technology | |
CN105279138B (en) | A kind of information research report automatic creation system | |
CN115146000A (en) | Database data synchronization method and device, electronic equipment and storage medium | |
CN107704620A (en) | A kind of method, apparatus of file administration, equipment and storage medium | |
CN108073582B (en) | Computing framework selection method and device | |
CN111488325A (en) | Meteorological big data aggregation method based on Hadoop architecture | |
CN106571940A (en) | Method and device of fusing network management data and resource data | |
CN107656995A (en) | Towards the data management system of big data | |
CN103810197A (en) | Hadoop-based data processing method and system | |
CN111930862B (en) | SQL interactive analysis method and system based on big data platform | |
CN105426407A (en) | Web data acquisition method based on content analysis | |
CN116225455B (en) | Method for rapidly generating statistical analysis report |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180921 |