CN106202346A - A kind of data load and clean engine, dispatch and storage system - Google Patents

A kind of data load and clean engine, dispatch and storage system Download PDF

Info

Publication number
CN106202346A
CN106202346A CN201610524292.8A CN201610524292A CN106202346A CN 106202346 A CN106202346 A CN 106202346A CN 201610524292 A CN201610524292 A CN 201610524292A CN 106202346 A CN106202346 A CN 106202346A
Authority
CN
China
Prior art keywords
data
module
etl
connects
dispatch
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610524292.8A
Other languages
Chinese (zh)
Other versions
CN106202346B (en
Inventor
孙永剑
郑书礼
裘鑫芳
董磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Information Network Co Ltd
Original Assignee
Zhejiang Sci Tech University ZSTU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Sci Tech University ZSTU filed Critical Zhejiang Sci Tech University ZSTU
Priority to CN201610524292.8A priority Critical patent/CN106202346B/en
Publication of CN106202346A publication Critical patent/CN106202346A/en
Application granted granted Critical
Publication of CN106202346B publication Critical patent/CN106202346B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of data to load cleaning engine, dispatch and storage system, module is represented including data source, data warehouse and user, data warehouse connects has ETL to manage module, ETL management module includes ETL scheduler module, ETL monitoring module, quality of data module and ETL task module, and data warehouse includes interface document district, detail data working area SSA, detail data SOR, Data Mart, Data Summary module, feedback module and metadata storage MDR.The present invention has practical, and data management is convenient, and motility is high, it is easy to promote, and high-effect data process, big handling capacity, it is possible to more data source, the advantage supporting more analysis demand are added in reply.

Description

A kind of data load and clean engine, dispatch and storage system
Technical field
The invention belongs to field of computer technology, particularly relate to a kind of data and load cleaning engine, dispatch and storage system.
Background technology
The fast development of big data technique and informationalized propelling so that the data volume that human society is accumulated alreadys more than The summation of 5000 in the past, the collection of mass data, the quantity storing, process and propagating also grow with each passing day.Enterprise realizes number According to sharing, more people can be made to use data with existing resource more fully, reduce the duplication of labour such as collection of data, data acquisition With corresponding expense.But, in the middle of the process implementing data sharing, the data provided due to different user may be from different Approach, its data content, data form and the quality of data vary, and the most even can run into data form and can not change or number After format transformation, the thorny problem such as loss information, seriously hinder data flowing in all departments and each software system with Share.Therefore, how mass data is carried out effective integrated management and become the inevitable choice strengthening Commercial Banks ' Competitiveness.
In recent years, along with the development of the big data processing technique such as Hadoop, Spark, data have attracted people's attention, Become the strategic resource of equal importance with water, oil.Current mass data is mainly stored in traditional SQL database, and greatly The NoSQL data base that data technique uses is very different, simultaneously because the multiformity feature of data, uses big data platform Before processing data, need data to import the storage system of big data platform oneself, and when importing it is generally required to carry out at ETL The processes such as reason, completes the extraction of Various types of data, cleans, loading.On the unit that traditional E TL system is mainly run, also have distributed ETL process, but mainly towards multitask scene.It is the most perfect that these traditional ETL system functions have developed, but When tackling the scene of big data quantity, being difficult to meet process demand in processing speed, there is a lot of deviation to connecting in function, causes Traditional ETL processing mode embarrassment heavy burden.
Summary of the invention
Present invention aim at solving above-mentioned technical problem present in prior art, it is provided that a kind of data load cleaning and draw Holding up, dispatch and storage system, have practical, data management is convenient, and motility is high, it is easy to promote, and high-effect data process, Big handling capacity, it is possible to more data source, the advantage supporting more analysis demand are added in reply.
In order to solve above-mentioned technical problem, the present invention adopts the following technical scheme that
A kind of data load and clean engine, dispatch and storage system, it is characterised in that: include data source, data warehouse and User represents module, and data warehouse connects has ETL to manage module, and ETL management module includes that ETL scheduler module, ETL monitor mould Block, quality of data module and ETL task module, ETL scheduler module is for controlling the operation of all ETL tasks, ETL monitoring module For the operation of tracing and monitoring ETL task, quality of data module is for following the tracks of the quality of data of data warehouse, ETL task module For completing concrete ETL process work;Data warehouse includes interface document district, detail data working area SSA, detail data SOR, Data Mart, Data Summary module, feedback module and metadata storage MDR, detail data SOR connects Data Summary mould Block, Data Summary module connects feedback module, and file interface district is for storage and Processing Interface file, and file interface district connects to be had Authority setting module, authority setting module is for organizing according to specific bibliographic structure, specific according to it to each catalogue Purposes set access rights to different user, ETL management module interacts centered by metadata and cooperates, from data Extracted data in source, then carries out passing conversion, cleaning and load, according to the data warehouse model defined, loads data into In data warehouse, meet renewing of data integration well, it is achieved the collecting and distribute of the data between each business;
Detail data working area SSA connects authentication module, and authentication module connects lookup module, searches module and connects thin Joint number is according to SOR, and authentication module connects processing module, processing module joint detail data SOR, and detail data SOR connects friendship Changing division module, metadata storage MDR is used for preserving the information about the process in data warehouse and data, and metadata stores MDR connects metadata management module;Data Mart connects has multi-dimension data cube module, multi-dimension data cube module to be used for storing many Dimension data, data warehouse and Data Mart are stored in a TDH data group, and each different data are pressed in TDH data group Different home zones is distinguished, and Data Mart is stored in 3D vision region, is used for analyzing multidimensional data, multi-dimension data cube Module stores is in integrated region;Exchange partition module uses " subregion is ignored " and " dividing and rule " two kinds of zoning schemes, permissible Reduce and import the data manipulation impact on user's real time access data, operator scheme just as using hot swappable hard disk, Easy to use, in performance, owing to system storing mass data, can be effectively improved by " subregion is ignored " and look into Ask performance, manageability and the availability of data can be improved, such as data deletion, data backup etc., take " dividing and rule " to enter Row more improves and manages efficiently, the fault that task produces can be confined in subregion, and can effectively shorten recovery Time;All can generate the metadata of oneself due to each instrument and system, utilize metadata management module these metadata to the greatest extent Possible centralized stores stores in MDR to metadata, and metadata storage MDR is that a shared metadata is for user's central access Place, real metadata safeguard ground or in generating the system of these metadata and instrument;User represents module and connects Having enquiry module, enquiry module is for representing business tine according to user's request.This system has practical, and data management is just Victory, motility is high, it is easy to promote, and high-effect data process, big handling capacity, it is possible to more data source is added in reply, supports more The advantages analyzing demand more.
Further, ETL scheduler module connects has time setting module, each task to be set in when to hold OK so that each task can be run automatically in the moment specified, the execution cycle of task has the biggest diversity, has Defining time interval, have defines the time of determination, establishes a scheduling chained list in systems by time setting module, Each node in chained list contains " schedule information of task " and " next time performs the moment ", and all the time according to " when next time performs Carve " it is ranked up from small to large, improve dispatching efficiency, to tackle the task of big quantity.
Further, ETL monitoring module connects faulty processing module, and fault processing module connects ETL scheduler module, when going out When current task run-time error or fault, fault processing module can redistribute task, it is ensured that system continues to run with.
Further, ETL task module connects graphics module, and the ruuning situation of task is converted by graphics module For visual figure, the clearest.
Further, the data processing tools in interface document district is mainly Kettle, and interface document district presses under Unix system Organize according to specific bibliographic structure, by authority setting module, each catalogue is set difference according to its specific purposes The access rights of user, separate, Clear partition.
Further, detail data SOR is a set of list structure meeting 3NF normal form specification based on BDW exploitation, detail data SOR stores the data of level of detail in data warehouse, is classified according to different subject areas by exchange partition module Tissue, detail data SOR, as enterprise data model, is the core of whole data warehouse data model, has enough flexible Property, it is possible to more data source is added in reply, supports more analysis demand, expands the scope of application of system.
Further, detail data SOR connects has BDW to upgrade more new module, can support BDW by BDW more new module of upgrading Further upgrading and update.
Further, ETL management module uses the DTS assembly of Microsoft, defines ETL mistake by standard interface OLE DB or ODBC The data source of journey connects, the decimation rule that carried by DTS or use T-SQL script data extraction definition, clean and turn Change method, the DTS tool design using Microsoft SQL Server the ETL operation completing in all of data warehouse.
Further, Data Mart is star-like or snowflake type structure, and Data Mart is a subset of data warehouse, can claim Making in " small data warehouse ", the application of Data Mart is to supplement data warehouse applications, and Data Mart is towards the multidimensional analyzed Data, store precalculated data for specific user, thus meet user's special demand, have independence, access fast Fast and convenient, do not affected by the ongoing renewal of system.
Due to the fact that and have employed technique scheme, have the advantages that
The present invention achieves data acquisition automatic, reliable rapidly, transmits, changes and load, and ETL processing speed is fast, The processing of big data quantity can be completed so that ETL tasks carrying gets up to be more prone to realize, and can support that multitask is held OK, separate, it is independent of each other, and reduces the cost that ETL data process, improve the performance that ETL data process, improve The manageability of data and availability, detail data SOR, as enterprise data model, is the core of whole data warehouse data model The heart, has enough motilities, it is possible to more data source is added in reply, supports more analysis demand, the scope of application of system It is greatly enhanced.The present invention has practical, and data management is convenient, and motility is high, it is easy to promote, and high-effect data process, greatly Handling capacity, it is possible to more data source, the advantage supporting more analysis demand are added in reply.
Accompanying drawing explanation
The invention will be further described below in conjunction with the accompanying drawings:
Fig. 1 is that a kind of data of the present invention load the schematic flow sheet cleaning engine, dispatching and store system;
Fig. 2 is the schematic flow sheet of data warehouse in the present invention.
Detailed description of the invention
As shown in Figure 1 to Figure 2, load cleaning engine for one data of the present invention, dispatch and storage system, including data Source, data warehouse and user represent module, and data warehouse connects has ETL to manage module, and ETL management module includes that ETL dispatches mould Block, ETL monitoring module, quality of data module and ETL task module, ETL scheduler module is for controlling the fortune of all ETL tasks OK, ETL scheduler module connects has time setting module, each task to be set in when to perform so that Mei Geren Business can run automatically in the moment specified, and the execution cycle of task has the biggest diversity, and have defined between the time Every (as performed once every 3 minutes), have defines the time of determination (as Friday night 21:00 weekly starts to perform), For determining the time, again can be divided into per year, the moon, week, a lot of modes such as day, established in systems by time setting module One scheduling chained list, each node in chained list contains " schedule information of task " and " next time performs the moment ", and presses all the time It is ranked up from small to large according to " next time performs the moment ", improves dispatching efficiency, to tackle the task of big quantity.ETL monitors mould Block connects faulty processing module for the operation of tracing and monitoring ETL task, ETL monitoring module, and fault processing module connects ETL Scheduler module, when there is task run mistake or fault, fault processing module can redistribute task, it is ensured that system continues Run.Quality of data module is for following the tracks of the quality of data of data warehouse, and ETL task module has been used for concrete ETL process Work, ETL task module connects has graphics module, graphics module to be converted into visual by the ruuning situation of task Figure, the clearest.
ETL management module uses the DTS assembly of Microsoft, by the number of standard interface OLE DB or ODBC definition ETL process Connect according to source, the decimation rule carried by DTS or use T-SQL script data extraction definition, cleaning and conversion method, The DTS tool design using Microsoft SQL Server the ETL operation completing in all of data warehouse, design with DTS assembly After complete DTS bag, bag disposably can be performed, it is also possible to bag is set to Automatic dispatching, make the execution process of bag without Manual intervention.In order to provide convenient to system manager, execution and the scheduling of the DTS bag on backstage are embodied as by ASP technology B/S multi-modal user interface, the ETL of data warehouse need not be managed and safeguard by such system manager on the server, pipe Reason person can complete management and attended operation, convenient management in other any one places, improves work efficiency.ETL manages mould Block interacts centered by metadata and cooperates, extracted data from data source, then carries out passing conversion, cleaning and load, According to the data warehouse model defined, load data in data warehouse, meet renewing of data integration well, it is achieved The collecting and distribute of data between each business.
Data warehouse includes that interface document district, detail data working area SSA, detail data SOR, Data Mart, data are total Knot module, feedback module and metadata storage MDR, detail data SOR connects Data Summary module, and Data Summary module connects anti- Feedback module, file interface district has permission setting module, file interface for storage and Processing Interface file, the connection of file interface district District organizes according to specific bibliographic structure under Unix system, specific according to it to each catalogue by authority setting module Purposes set access rights to different user, the data processing tools in interface document district is mainly Kettle, separate, It is independent of each other, Clear partition, it is ensured that the effectiveness of access.Detail data working area SSA connects authentication module, and authentication module is even Being connected to search module, search module joint detail data SOR, authentication module connects processing module, processing module joint detail Data SOR, detail data working area SSA keeping in for data, the interface document supported is loaded into data base, authentication module According to search module to detail data SOR in existing data compare with the new data loaded, by verifying then by going out Processing module is by the Data Integration of these new loadings to detail data SOR.
Detail data SOR is a set of list structure meeting 3NF normal form specification based on BDW exploitation, and detail data SOR stores The data of level of detail in data warehouse, detail data SOR connects exchange partition module, is pressed by exchange partition module Carrying out taxonomic organization according to different subject areas, exchange partition module uses " subregion is ignored " and " dividing and rule " two kinds of subregion machines System, it is possible to reduce import the data manipulation impact on user's real time access data, operator scheme is just as using hot swappable hard Dish is the same, easy to use, in performance, owing to system storing mass data, and can be effective by " subregion is ignored " Ground improve query performance, manageability and the availability of data can be improved, such as data deletion, data backup etc., take " point and Control it " more improve and manage efficiently, the fault that task produces can be confined in subregion, and can effectively contract Short recovery time, detail data SOR, as enterprise data model, is the core of whole data warehouse data model, has enough Motility, it is possible to reply add more data source, support more analysis demand, expand the scope of application of system.Details Data SOR connect has BDW to upgrade more new module, can be supported the upgrading further of BDW by BDW more new module of upgrading and be updated.
Metadata storage MDR is used for preserving the information about the process in data warehouse and data, and the information of data includes Daily record, data dictionary and configuration information etc., metadata storage MDR connects has metadata management module, due to each instrument be System all can generate the metadata of oneself, utilizes metadata management module these metadata centralized stores as far as possible to metadata In storage MDR, metadata storage MDR is the shared metadata place for user's central access, the dimension of real metadata Protect ground or in the system generating these metadata and instrument.Data Mart connects multi-dimension data cube module, data warehouse Being stored in a TDH data group with Data Mart, each different data are come by different home zones in TDH data group Distinguishing, Data Mart is stored in 3D vision region, is used for analyzing multidimensional data, and multi-dimension data cube module stores is in integrated district In territory, it is used for storing multidimensional data.Data Mart is star-like or snowflake type structure, and Data Mart is a son of data warehouse Collection, can be referred to as in " small data warehouse ", and the application of Data Mart is to supplement data warehouse applications, and Data Mart is towards dividing The multidimensional data of analysis, stores precalculated data for specific user, thus meets user's special demand, have independence Property, access quickly and convenient, do not affected by the ongoing renewal of system.Data Summary module is designed as denormalization, is used for Updating multidimensional data, feedback module is based on data mining results.User represents module connection enquiry module, and enquiry module is used Corresponding business tine, handling the time including business, the deadline of business, business is represented in the demand set according to user Detailed content parameter etc..Specific user can quick search to the detailed content of the business of oneself demand.
The present invention achieves data acquisition automatic, reliable rapidly, transmits, changes and load, and ETL processing speed is fast, The processing of big data quantity can be completed so that ETL tasks carrying gets up to be more prone to realize, and can support that multitask is held OK, separate, it is independent of each other, and reduces the cost that ETL data process, improve the performance that ETL data process, improve The manageability of data and availability, detail data SOR, as enterprise data model, is the core of whole data warehouse data model The heart, has enough motilities, it is possible to more data source is added in reply, supports more analysis demand, the scope of application of system It is greatly enhanced.The present invention has practical, and data management is convenient, and motility is high, it is easy to promote, and high-effect data process, greatly Handling capacity, it is possible to more data source, the advantage supporting more analysis demand are added in reply.
These are only the specific embodiment of the present invention, but the technical characteristic of the present invention is not limited thereto.Any with this Based on bright, for solving essentially identical technical problem, it is achieved essentially identical technique effect, done simple change, etc. With replacement or modification etc., all it is covered by among protection scope of the present invention.

Claims (9)

1. data load cleaning engine, dispatch and storage system, it is characterised in that: include data source, data warehouse and use Family represents module, and described data warehouse connects has ETL to manage module, and described ETL management module includes that ETL scheduler module, ETL supervise Control module, quality of data module and ETL task module, described ETL scheduler module is for controlling the operation of all ETL tasks, institute Stating the operation for tracing and monitoring ETL task of the ETL monitoring module, described quality of data module is for following the tracks of the data of data warehouse Quality, described ETL task module has been used for concrete ETL process work;
Described data warehouse includes that interface document district, detail data working area SSA, detail data SOR, Data Mart, data are total Knot module, feedback module and metadata storage MDR, described detail data SOR connects described Data Summary module, and described data are total Knot module connects described feedback module, and described file interface district is for storage and Processing Interface file, and described file interface district is even Being connected to authority setting module, each catalogue, for organizing according to specific bibliographic structure, is pressed by described authority setting module The access rights to different user are set according to its specific purposes;
Described detail data working area SSA connects authentication module, and described authentication module connects lookup module, described lookup mould Block connects described detail data SOR, and described authentication module connects processing module, and described processing module connects described detail data SOR, described detail data SOR connects exchange partition module, and described metadata storage MDR is used for preserving about in data warehouse Process and the information of data, described metadata storage MDR connect have metadata management module;The connection of described Data Mart has many Dimension cube module, described multi-dimension data cube module is used for storing multidimensional data;
Described user represents module connection enquiry module, and described enquiry module is for representing business tine according to user's request.
A kind of data the most according to claim 1 load and clean engine, dispatch and storage system, it is characterised in that: described ETL scheduler module connects time setting module.
A kind of data the most according to claim 1 load and clean engine, dispatch and storage system, it is characterised in that: described ETL monitoring module connects faulty processing module, and described fault processing module connects described ETL scheduler module.
A kind of data the most according to claim 1 load and clean engine, dispatch and storage system, it is characterised in that: described ETL task module connects graphics module.
A kind of data the most according to claim 1 load and clean engine, dispatch and storage system, it is characterised in that: described The data processing tools in interface document district is mainly Kettle.
A kind of data the most according to claim 1 load and clean engine, dispatch and storage system, it is characterised in that: described Detail data SOR is a set of list structure meeting 3NF normal form specification based on BDW exploitation.
A kind of data the most according to claim 6 load and clean engine, dispatch and storage system, it is characterised in that: described Detail data SOR connects has BDW to upgrade more new module.
A kind of data the most according to claim 1 load and clean engine, dispatch and storage system, it is characterised in that: described ETL management module uses the DTS assembly of Microsoft.
A kind of data the most according to claim 1 load and clean engine, dispatch and storage system, it is characterised in that: described Data Mart is star-like or snowflake type structure.
CN201610524292.8A 2016-06-29 2016-06-29 A kind of data load cleaning engine, scheduling and storage system Active CN106202346B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610524292.8A CN106202346B (en) 2016-06-29 2016-06-29 A kind of data load cleaning engine, scheduling and storage system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610524292.8A CN106202346B (en) 2016-06-29 2016-06-29 A kind of data load cleaning engine, scheduling and storage system

Publications (2)

Publication Number Publication Date
CN106202346A true CN106202346A (en) 2016-12-07
CN106202346B CN106202346B (en) 2019-11-01

Family

ID=57465396

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610524292.8A Active CN106202346B (en) 2016-06-29 2016-06-29 A kind of data load cleaning engine, scheduling and storage system

Country Status (1)

Country Link
CN (1) CN106202346B (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107679160A (en) * 2017-09-28 2018-02-09 深圳市华傲数据技术有限公司 Data processing method and device based on chart database
CN107688592A (en) * 2017-04-06 2018-02-13 平安科技(深圳)有限公司 The method and terminal of data cleansing
CN107832451A (en) * 2017-11-23 2018-03-23 安徽科创智慧知识产权服务有限公司 A kind of big data cleaning way of simplification
CN107895032A (en) * 2017-11-23 2018-04-10 安徽科创智慧知识产权服务有限公司 Carry out the network data acquisition method that data are tentatively cleaned
CN107992552A (en) * 2017-11-28 2018-05-04 南京莱斯信息技术股份有限公司 A kind of data interchange platform and method for interchanging data
CN108196912A (en) * 2018-01-03 2018-06-22 新疆熙菱信息技术股份有限公司 One kind is based on hot-plug component formula data integrating method
CN108280084A (en) * 2017-01-06 2018-07-13 上海前隆信息科技有限公司 A kind of construction method of data warehouse, system and server
CN109033291A (en) * 2018-07-13 2018-12-18 深圳市小牛在线互联网信息咨询有限公司 A kind of job scheduling method, device, computer equipment and storage medium
CN109269557A (en) * 2018-09-19 2019-01-25 中国南方电网有限责任公司超高压输电公司广州局 A kind of change of current station equipment operating parameter and running environment intelligent monitor system and method
CN109669975A (en) * 2018-11-09 2019-04-23 成都数之联科技有限公司 A kind of industry big data processing system and method
CN109918437A (en) * 2019-03-08 2019-06-21 北京中油瑞飞信息技术有限责任公司 Distributed data processing method, apparatus and data assets management system
CN112667615A (en) * 2020-12-25 2021-04-16 广东电网有限责任公司电力科学研究院 Data cleaning system and method
CN112667472A (en) * 2020-12-28 2021-04-16 武汉达梦数据库股份有限公司 Data source connection state monitoring device and method
CN113177039A (en) * 2021-04-27 2021-07-27 中通服咨询设计研究院有限公司 Data center data cleaning system based on data fusion
CN114817393A (en) * 2022-06-24 2022-07-29 深圳市信联征信有限公司 Data extraction and cleaning method and device and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101452485A (en) * 2008-12-31 2009-06-10 中国建设银行股份有限公司 Method and device for generating multidimensional cubic based on relational database
CN201600693U (en) * 2009-11-26 2010-10-06 中国移动通信集团河北有限公司 Data warehouse system
CN103577605A (en) * 2013-11-20 2014-02-12 贵州电网公司电力调度控制中心 Data warehouse based on data fusion and data mining and application method of data warehouse
CN104933160A (en) * 2015-06-26 2015-09-23 河海大学 ETL (Extract Transform and Load) framework design method for safety monitoring business analysis
CN105095327A (en) * 2014-05-23 2015-11-25 深圳市珍爱网信息技术有限公司 Distributed ELT system and scheduling method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101452485A (en) * 2008-12-31 2009-06-10 中国建设银行股份有限公司 Method and device for generating multidimensional cubic based on relational database
CN201600693U (en) * 2009-11-26 2010-10-06 中国移动通信集团河北有限公司 Data warehouse system
CN103577605A (en) * 2013-11-20 2014-02-12 贵州电网公司电力调度控制中心 Data warehouse based on data fusion and data mining and application method of data warehouse
CN105095327A (en) * 2014-05-23 2015-11-25 深圳市珍爱网信息技术有限公司 Distributed ELT system and scheduling method
CN104933160A (en) * 2015-06-26 2015-09-23 河海大学 ETL (Extract Transform and Load) framework design method for safety monitoring business analysis

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
石油论文资料库: "IBM数据仓库解决方案简述", 《豆丁》 *
龙新征,等: "《基于数据仓库的高校数据统计服务平台研究》", 《通信学报》 *

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108280084A (en) * 2017-01-06 2018-07-13 上海前隆信息科技有限公司 A kind of construction method of data warehouse, system and server
CN107688592B (en) * 2017-04-06 2020-03-17 平安科技(深圳)有限公司 Data cleaning method and terminal
CN107688592A (en) * 2017-04-06 2018-02-13 平安科技(深圳)有限公司 The method and terminal of data cleansing
CN107679160A (en) * 2017-09-28 2018-02-09 深圳市华傲数据技术有限公司 Data processing method and device based on chart database
CN107832451A (en) * 2017-11-23 2018-03-23 安徽科创智慧知识产权服务有限公司 A kind of big data cleaning way of simplification
CN107895032A (en) * 2017-11-23 2018-04-10 安徽科创智慧知识产权服务有限公司 Carry out the network data acquisition method that data are tentatively cleaned
CN107992552A (en) * 2017-11-28 2018-05-04 南京莱斯信息技术股份有限公司 A kind of data interchange platform and method for interchanging data
CN108196912B (en) * 2018-01-03 2021-04-23 新疆熙菱信息技术股份有限公司 Data integration method based on hot plug assembly
CN108196912A (en) * 2018-01-03 2018-06-22 新疆熙菱信息技术股份有限公司 One kind is based on hot-plug component formula data integrating method
CN109033291A (en) * 2018-07-13 2018-12-18 深圳市小牛在线互联网信息咨询有限公司 A kind of job scheduling method, device, computer equipment and storage medium
CN109269557A (en) * 2018-09-19 2019-01-25 中国南方电网有限责任公司超高压输电公司广州局 A kind of change of current station equipment operating parameter and running environment intelligent monitor system and method
CN109669975A (en) * 2018-11-09 2019-04-23 成都数之联科技有限公司 A kind of industry big data processing system and method
CN109669975B (en) * 2018-11-09 2020-12-18 成都数之联科技有限公司 Industrial big data processing system and method
CN109918437A (en) * 2019-03-08 2019-06-21 北京中油瑞飞信息技术有限责任公司 Distributed data processing method, apparatus and data assets management system
CN112667615B (en) * 2020-12-25 2022-02-15 广东电网有限责任公司电力科学研究院 Data cleaning system and method
CN112667615A (en) * 2020-12-25 2021-04-16 广东电网有限责任公司电力科学研究院 Data cleaning system and method
CN112667472A (en) * 2020-12-28 2021-04-16 武汉达梦数据库股份有限公司 Data source connection state monitoring device and method
CN112667472B (en) * 2020-12-28 2022-04-08 武汉达梦数据库股份有限公司 Data source connection state monitoring device and method
CN113177039A (en) * 2021-04-27 2021-07-27 中通服咨询设计研究院有限公司 Data center data cleaning system based on data fusion
CN113177039B (en) * 2021-04-27 2024-02-27 中通服咨询设计研究院有限公司 Data center data cleaning system based on data fusion
CN114817393A (en) * 2022-06-24 2022-07-29 深圳市信联征信有限公司 Data extraction and cleaning method and device and storage medium

Also Published As

Publication number Publication date
CN106202346B (en) 2019-11-01

Similar Documents

Publication Publication Date Title
CN106202346A (en) A kind of data load and clean engine, dispatch and storage system
CN105005570B (en) Magnanimity intelligent power data digging method and device based on cloud computing
CN107256443B (en) Line loss real-time computing technique based on business and data integration
CN101208692B (en) Automatically moving multidimensional data between live datacubes of enterprise software systems
CN104468778B (en) A kind of cloud manufacturing execution system and its manufacture execution method based on cloud service
CN108520316A (en) A kind of data-optimized processing method of overload alarm
US20140358977A1 (en) Management of Intermediate Data Spills during the Shuffle Phase of a Map-Reduce Job
CN109271382A (en) A kind of data lake system towards full data shape opening and shares
CN104111996A (en) Health insurance outpatient clinic big data extraction system and method based on hadoop platform
CN101566981A (en) Method for establishing dynamic virtual data base in analyzing and processing system
CN106599197A (en) Data acquisition and exchange engine
CN108694195A (en) A kind of management method and system of Distributed Data Warehouse
CN102722355A (en) Workflow mechanism-based concurrent ETL (Extract, Transform and Load) conversion method
CN103902537A (en) Multi-service log data storage processing and inquiring system and method thereof
CN112883001A (en) Data processing method, device and medium based on marketing and distribution through data visualization platform
CN102763083A (en) Computer system and upgrade method for same
CN109508957A (en) A kind of enterprise management system
CN102495916A (en) Multi-application-system panoramic modeling method based on object matching
CN106780157B (en) Ceph-based power grid multi-temporal model storage and management system and method
CN109359107A (en) Database method for cleaning, system, device and storage medium
CN112825169B (en) Intelligent enterprise management system and management method
CN110007905A (en) A kind of generation method and system of the software development scheme based on big data
CN104009906A (en) Structuralized theme instant communication system and method
Lee et al. A big data management system for energy consumption prediction models
CN103810258A (en) Data aggregation scheduling method based on data warehouse

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20191008

Address after: 510030 floor 2 and 5, building 9, No. 305, Dongfeng Middle Road, Yuexiu District, Guangzhou City, Guangdong Province

Applicant after: Guangdong Information Network Co., Ltd.

Address before: 310018, No. 2, No. 928, Xiasha Higher Education Park, Hangzhou, Zhejiang, Jianggan District

Applicant before: Zhejiang University of Technology

GR01 Patent grant
GR01 Patent grant