CN108846076A - The massive multi-source ETL process method and system of supporting interface adaptation - Google Patents

The massive multi-source ETL process method and system of supporting interface adaptation Download PDF

Info

Publication number
CN108846076A
CN108846076A CN201810588231.7A CN201810588231A CN108846076A CN 108846076 A CN108846076 A CN 108846076A CN 201810588231 A CN201810588231 A CN 201810588231A CN 108846076 A CN108846076 A CN 108846076A
Authority
CN
China
Prior art keywords
data
etl
job
conversion
interface
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810588231.7A
Other languages
Chinese (zh)
Inventor
史玉良
王新军
张晖
管永明
吕梁
刘智勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
DAREWAY SOFTWARE Co Ltd
Original Assignee
DAREWAY SOFTWARE Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by DAREWAY SOFTWARE Co Ltd filed Critical DAREWAY SOFTWARE Co Ltd
Priority to CN201810588231.7A priority Critical patent/CN108846076A/en
Publication of CN108846076A publication Critical patent/CN108846076A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses the massive multi-source ETL process method and system of supporting interface adaptation.Including:The essential information of data source and target database is arranged in data pick-up step, adaptively matches corresponding ETL tool for different data sources, and carry out parameter setting to ETL tool;Data conversion step completes the execution of ETL Operation control and management and running, carries out buffer-stored and management to the data extracted, and cleaning and conversion for completing data etc. is handled;Data object after conversion is carried out quality examination, and exported according to the table structure of data model definitions by data load step, and the data update after checking and finding correct is loaded onto target database;Data monitoring step is monitored management to ETL job execution process, operation resource service condition and running situation.Suitable ETL tool is adaptively matched, and realizes the extraction and conversion of mass data, realizes efficient execution and the orderly management of ETL operation.

Description

The massive multi-source ETL process method and system of supporting interface adaptation
Technical field
The present invention relates to ETL management domain, in particular to a kind of massive multi-source ETL process side of supporting interface adaptation Method and system.
Background technique
Industry has accumulated mass data at present, and capacity, type and the variation of data are all sharply increasing, but big data is not yet Make full use of, wherein the immense value contained have it is to be excavated.Big data often has multi-source heterogeneous characteristic, from it is different, point Scattered operation system, there are the multiple types such as structural data, semi-structured data, unstructured data, it is difficult to extract and turn Change required data into.Under big data environment, data show large capacity, Suresh Kumar, interact the features such as frequent, with acquisition Data are continuously increased, and data process method is gradually complicated, and be faced with massive multi-source data disparate databases it Between efficiency of transmission problem.
Traditional ETL tool is expensive, very high to specific business dependence, and is centralized architecture, that is, designs, transports Row management all concentrates on a server, and the requirement to hardware is very high.Under traditional ETL management mode, generally according to source The attribute of database and target database, it is artificial to determine ETL tool, and ETL flow of task, setting parameter, starting task are set, Such artificial ETL management mode process is complicated, consumes a large amount of manpower and time, and be unable to satisfy massive multi-source data ETL job requirements.Therefore needs exploration can more economical, more efficiently execute ETL under big data environment and (extract, conversion, adds Carry) operation device.
Summary of the invention
The object of the invention is to solve the above problems, proposing a kind of massive multi-source number of supporting interface adaptation Interface adapter and ETL are based on for the massive multi-source data from different, dispersion system according to ETL method and system Suitable ETL tool is adaptive selected in tools engine, and based on big datas processing techniques such as HDFS, MapReduce, Spark Realize that the centrally stored and processing of the management of ETL job scheduling and efficiently execution and magnanimity complex data is converted.
To achieve the goals above, the present invention adopts the following technical scheme that:
As the first aspect of the present invention, the massive multi-source ETL process method of supporting interface adaptation is provided;
The massive multi-source ETL process method of supporting interface adaptation, including:
The essential information of data source and target database is arranged in data pick-up step, be different data sources adaptively Parameter setting is carried out with corresponding ETL tool, and to ETL tool;It is connect by database interface, journal file interface or flow data Mouth extracts different data sources;
Data conversion step completes ETL Operation control based on MapReduce and Spark Computational frame and executes and dispatch pipe Reason carries out buffer-stored and management to the data extracted based on HDFS, Hive or HBase, and completes the cleaning of data and turn It changes;
Data object after conversion is carried out quality examination by data load step, and according to data model definitions Table structure output, the data update after checking and finding correct are loaded onto target database;
Monitoring management step is monitored ETL job execution process, operation resource service condition and running situation Management.
As a further improvement of the present invention, the data pick-up step, including:
Data source and object library sub-step are set, the essential information of data source and target database is set, including:Database Connection type, database IP between type, data source and target database, database-name, port, user name, password;
Adaptive matching ETL tool sub-step, for the adaptive corresponding ETL tool of matching of different data sources.
In the Adaptive matching ETL tool sub-step, if data source or target database are database data, if having One side is non-relational database HDFS, then adaptively matches ETL tool Sqoop;Otherwise adaptive matching ETL tool Kettle;If data source is journal file, ETL tool Flume is adaptively matched;If data source is flow data, Adaptively match ETL tool Kafka.
ETL tool parameters configure sub-step, set after the completion of ETL tool matching, task parameters.
As a further improvement of the present invention, the data source, including:Database data, picture, audio file, video File, journal file or flow data;Wherein, database data includes:Relevant database and non-relational database;Relationship type Database, including:Oracle,MySQL,SQL Server;Non-relational database, including:HDFS,MongoDB,HBase.Day Will file includes:From console (console), RPC (Thrift-RPC), text (file), tail (UNIX tail), The various types and format of syslog (syslog log system supports 2 kinds of modes such as TCP and UDP), exec (order executes) etc. Daily record data.
As a further improvement of the present invention, the target database realizes data sharing, report query, system application.
As a further improvement of the present invention, the ETL tool, including:Sqoop, Kettle, Flume or Kafka, In, Sqoop is a Open-Source Tools, for carrying out data biography between Hadoop and traditional database (Oracle, MySQL etc.) It passs;Kettle is a open source ETL tool, realizes data pick-up by core of workflow;Flume is the sea that Cloudera is provided The system for measuring log collection, polymerization and transmission;Kafka is the open source stream process platform an of high-throughput.
As a further improvement of the present invention, the data conversion step, including:
Operation process design sub-step refers to according to actual service logic design project control flow, including extract mode and ETL flow of task.
Job scheduling manages sub-step, including:Job scheduling strategy, job dependence control, job priority configuration, operation Scheduling controlling, wherein job scheduling strategy includes time trigger, event triggering and immediate processing mode;Job dependence controls Refer to the dependence formulated between operation according to actual service logic;Job priority configuration refers to according to actual service logic and is Resource service condition of uniting formulates the priority of operation;Job scheduling control refers to setting job scheduling resource threshold value of warning, is providing When source uses more than threshold value, the low operation of pause priority.
Job execution sub-step is responsible for the execution of ETL operation.
In the job execution sub-step, Sqoop starting only have map MapReduce operation, according to data cutting value by Row reads data;Kettle establishes conversion Transformation and task Job, after each link task parameters are arranged, starts work Make process and carries out data pick-up;Flume is placed in channel component and is delayed by its source collect components daily record data It deposits, and destination is sent data to by sink component;Kafka collect resolve into after flow data a series of batch processing jobs by Distributed elastic data set in Spark is handled in real time.
Distributed caching sub-step carries out buffer-stored to the data of extraction, and wherein HDFS is responsible for the storage of bottom data, Hive is responsible for the filtering of data, summarizes, inquires, analyzing, and HBase is responsible for the change maintenance of data, calculates in data conversion Journey is written infrequently the data taken and is stored;
Business rule formulates sub-step, according to practical business rule, formulates the business rule of data cleansing, conversion;
Data processing sub-step completes the cleaning and conversion of data, wherein data cleansing is complete according to the business rule of formulation Filling a vacancy, correct and cleaning at data, data conversion complete the inconsistent conversion of data, data granularity conversion and standard handovers.
The inconsistent conversion:For example the same user is A01 in A system coding, is encoded to B01 in B system, it is such Data pick-up is uniformly converted into a coding after coming;
The data granularity conversion:Data information as user M is stored in A system is very detailed, stores in B system Data information it is then relatively simple, granularity is different, it is decimated come after need to polymerize its granularity;
The standard handovers:Such as business datum, in operation system A and system B due to difference of business rule etc., It has different standards in two systems, needs to seek unity of standard after extraction.
As a further improvement of the present invention, data load step, including:
The quality of data checks sub-step, the data object after conversion is carried out quality examination, to due to network interruption Data exception problem caused by reason is verified, and checks whether the quality of data converted meets the mark of target database It is quasi-;
Data update load sub-step, will be loaded into target database by the data to check and find correct, according to fixed in advance The good data model of justice updates target matrix in such a way that timestamp, log sheet, full table compare, full table is deleted or insertion.
As a further improvement of the present invention, monitoring management step, including:
Monitoring operation manages sub-step, and implementation procedure and resource service condition to ETL operation are monitored;
The ETL job execution process monitoring sub-step, to include the job execution time, operation progress situation, whether surpass When, job interruption, the information such as job stacking are monitored.The job execution time is monitored, time-out is set and is reminded, and by artificial judgment Analyze job timeout's problem;Job execution log information is monitored, when there is job interruption, according to the interruption Restoration Mechanism of formulation, Retriggered job execution;Operation progress situation is monitored, when there is job stacking, is lined up according to job priority, it is preferential to execute The high operation of rank.
The ETL operation monitoring resource sub-step, is monitored the service condition of operation resource, if resource load is more than Carry out adjustment of load when threshold value, pause or the low operation of stop section priority wait load to be down to threshold value or less and execute work again Industry;
System monitoring manages sub-step, is monitored to machine hardware information, cluster running state information, and to first number According to, database interface, journal file interface or flow data interface is managed.
As a second aspect of the invention, the massive multi-source ETL process system of supporting interface adaptation is provided;
The massive multi-source ETL process system of supporting interface adaptation, including:
The essential information of data source and target database is arranged in data extraction module, be different data sources adaptively Parameter setting is carried out with corresponding ETL tool, and to ETL tool;It is connect by database interface, journal file interface or flow data Mouth extracts different data sources;
Data conversion module completes ETL job execution and management and running, base based on MapReduce and Spark Computational frame Buffer-stored and management are carried out to the data extracted in HDFS, Hive or HBase, and complete the cleaning and conversion of data;
Data object after conversion is carried out quality examination by data loading module, and according to data model definitions Table structure output, the data update after checking and finding correct are loaded onto target database;
Monitoring management module is monitored ETL job execution process, operation resource service condition and running situation Management.
As a further improvement of the present invention, the data source, including:Database data, picture, audio file, video File, journal file or flow data;Wherein, database data, including:Relevant database and non-relational database;Relationship Type database, including:Oracle, MySQL, SQL Server etc.;Non-relational database, including:HDFS,MongoDB, HBase etc..The journal file, including:From console (console), RPC (Thrift-RPC), text (file), tail (UNIXtail), syslog (syslog log system supports 2 kinds of modes such as TCP and UDP), exec's (order executes) etc. is each The daily record data of seed type and format.
As a further improvement of the present invention, the target database, for realizing data sharing, report query and system Using.
As a further improvement of the present invention, the data extraction module, including:
Interface adapter, data base-oriented data, journal file or flow data different types of data provide data-interface, and Formulate ETL tool adaptation rule;The interface adapter further includes adaptation rule engine;
-4-
The data-interface, including:Database interface, journal file interface and flow data interface, wherein passing through database Interface extracts database data from relevant database or non-relational database;By journal file interface, log is extracted File;By flow data interface, flow data is extracted.
The adaptation rule engine is used to be arranged the essential information of data source and target database, including:Type of database, Data source and target database connection type, database IP, database-name, port, user name, password;Adaptation rule includes The adaptation rule of the decision rule of data source and target database type, difference ETL tool.
ETL tools engine, integrated and manage ETL tool, the ETL tool, including:Sqoop,Flume,Kettle, Kafka, the isomeric data for database data, journal file, flow data from different data sources, Adaptive matching are suitable ETL tool.
Wherein, Sqoop is a Open-Source Tools, for carrying out data transmitting between Hadoop database;
Kettle is a open source ETL tool, realizes data pick-up by core of workflow;
Flume is massive logs acquisition, polymerization and the system transmitted that Cloudera is provided;
Kafka is the open source stream process platform an of high-throughput.
In the ETL tools engine, ETL adaptation engine is according to the data formulated in the adaptation rule engine of interface adapter The adaptation rule of the decision rule of source and target database, difference ETL tool is different data source capability ETL tool.
As a further improvement of the present invention, the data conversion module, including:
Job scheduling management engine, using the host node of distributed type assemblies as management and running engine, including Job Management Sheet Member and task scheduling unit;
The job management unit, including:ETL job design, operation configuration and monitoring operation, wherein ETL job design Refer to according to actual service logic fulfil assignment dependence, whether increment extraction or extract frequency Operation control process and set Meter;Operation configures the configuration for referring to the priority that fulfils assignment, job execution mode parameter;Monitoring operation refers to ETL operation Execution state and resource service condition are monitored management;
The task scheduling unit, including:Job scheduling strategy and job scheduling control, wherein job scheduling strategy packet Include the modes such as time trigger, event triggering and real-time processing;Job scheduling control refers to setting job scheduling resource threshold value of warning, When resource uses more than threshold value, the low operation of pause priority.
Job execution engine is responsible for the execution of ETL operation, is based on using the slave node of distributed type assemblies as enforcement engine MapReduce Computational frame realizes the processed offline of ETL operation, and the real-time place of ETL operation is realized based on Spark Computational frame Reason.
In the job execution engine, Sqoop starting only has the MapReduce operation of map, line by line according to data cutting value Read data;Kettle establishes conversion Transformation and task Job, after each link task parameters are arranged, starts work Process carries out data pick-up;Flume is placed in channel component and is cached by its source collect components daily record data, And destination is sent data to by sink component;Kafka collect resolve into after flow data a series of batch processing jobs by Distributed elastic data set in Spark is handled in real time.
Distributed caching submodule carries out buffer-stored and management to the data of extraction, and wherein HDFS is responsible for bottom data Storage, Hive is responsible for the filtering of data, summarizes, inquires or analyze;HBase is responsible for the change maintenance of data, turns in data It changes calculating process and is written infrequently the data taken and stored;
Business Rule Engine formulates cleaning rule to deficiency of data, wrong data and dirty data according to practical business rule Then, transformation rule is formulated to the isomeric data from different business systems;
Data processing submodule, including:Data cleansing unit and Date Conversion Unit;According to practical business rule, complete The cleaning and conversion of data;
The data cleansing unit completes filling a vacancy, correct and cleaning for data, wherein data are filled a vacancy deficiency of data Missing information and mismatch information carry out completion;Wrong data is modified by data correction according to specific business;Data cleansing Cleaning rule is designed for dirty data and determines its correctness;
The Date Conversion Unit is by the magnanimity M IS from different business systems at needed for target database Data, wherein it is inconsistent conversion by the data of the same type from different business systems carry out unification;Data granularity conversion According to target database granularity, the operation system data extracted are polymerize;Data standard conversion is according to pre-establishing Normal data model will extract data conversion into normal data needed for object library.
As a further improvement of the present invention, data loading module, including:
Data object after conversion is carried out quality examination, to due to network interruption by quality examination submodule Caused by data exception problem verified, and check whether the quality of data that converts meets the standard of target database;
Data update load submodule, will be loaded into target database by the data to check and find correct, according to fixed in advance The good data model of justice updates target matrix in such a way that timestamp, log sheet, full table compare, full table is deleted or insertion.
As a further improvement of the present invention, monitoring management module, including:
Monitoring operation submodule is monitored the implementation procedure of ETL operation, including the job execution time, whether time-out, Job interruption, job stacking, operation progress situation, operation resource service condition;
Load balancing submodule, according to ETL operation execution situation assess resource load situation, load be more than threshold value when into Row adjustment of load, pause or the low operation of stop section priority wait load to be down to threshold value or less and execute operation again;
Metadata management submodule, data of the metadata as description data attribute information save the definition of data source, turn Definition, the implementation procedure definition for changing rule, are managed metadata information involved by ETL operation;
Interface management submodule carries out interface definition, agreement to database interface, journal file interface, flow data interface Adaptation, data encapsulation, support http, ftp, rest, webservice interface protocol.
Run monitoring submodule, monitor and collecting robot hardware information, resource information, load information, cluster component states, Cluster operation information, real-time perception and the operation conditions for analyzing cluster.
The beneficial effects of the invention are as follows:
1. towards the magnanimity structuring from different, dispersion system, the isomeric datas such as semi-structured, unstructured, base In interface adapter and ETL tools engine be the suitable ETL tool of its adaptive matching.
2. high performance cloud ETL managing device carries out ETL based on distributed type assemblies and MapReduce, Spark frame Job scheduling management and execution realize the orderly management of complicated big data ETL operation and efficiently execute.
3. being cached based on distributed system HDFS to mass data, and carried out based on MapReduce and Spark frame Data cleansing and conversion, realize different decentralized system data efficient decimation, it is centrally stored with processing, be beneficial to data sharing With strengthened research.
Detailed description of the invention
The accompanying drawings constituting a part of this application is used to provide further understanding of the present application, and the application's shows Meaning property embodiment and its explanation are not constituted an undue limitation on the present application for explaining the application.
The position Fig. 1 flow chart of the method for the present invention;
Fig. 2 is functional module connection figure of the invention;
Fig. 3 is cloud ETL tool matching flow chart;
Fig. 4 is ETL job execution and management and running flow chart;
Fig. 5 is concrete application embodiment of the invention.
Specific embodiment
It is noted that following detailed description is all illustrative, it is intended to provide further instruction to the application.Unless another It indicates, all technical and scientific terms used herein has usual with the application person of an ordinary skill in the technical field The identical meanings of understanding.
It should be noted that term used herein above is merely to describe specific embodiment, and be not intended to restricted root According to the illustrative embodiments of the application.As used herein, unless the context clearly indicates otherwise, otherwise singular Also it is intended to include plural form, additionally, it should be understood that, when in the present specification using term "comprising" and/or " packet Include " when, indicate existing characteristics, step, operation, device, component and/or their combination.
It is the massive multi-source ETL process method flow diagram of supporting interface adaptation of the present invention with reference to Fig. 1, including following Step:
Process 101:Data source and object library are set, the essential information of data source and target database is set, including:Data Library type, connection type, database IP, database-name, port, user name, password.
Process 102:Adaptive matching ETL tool is the different data sources such as database data, journal file, flow data The ETL tool such as adaptive matching Sqoop, Kettle, Flume, Kafka.
Process 103:The configuration of ETL tool parameters, set after the completion of ETL tool matching, task parameters etc. it is initial Parameter value.
Process 104:Operation process design designs ETL Operation control process, including data pick-up according to actual service logic Mode, and specific ETL work flow corresponding to difference ETL tool.
Process 105:Job scheduling management, to job scheduling strategy, job dependence relationship, job priority, job scheduling Etc. being managed for configuration.
Process 106:Job execution is responsible for the execution of specific ETL operation.
Process 107:ETL monitoring operation management, implementation procedure and resource service condition to ETL operation are monitored.
Process 108:Distributed caching carries out buffer-stored to the data of extraction, and wherein HDFS is responsible for depositing for bottom data Storage, Hive are responsible for the management of data, and HBase is responsible for the change maintenance of data, frequent in data conversion calculating process to those The data of reading are stored.
Process 109:Business rule is formulated, and the business rule of data cleansing, conversion is formulated according to actual service logic.
Process 110:The cleaning and conversion of data, wherein data cleansing are completed in data processing according to the business rule of formulation Filling a vacancy, correct and cleaning for data is completed, data conversion is completed the inconsistent conversion of data, data granularity conversion and standard and turned It changes.
Process 111:Quality of data inspection carries out quality examination to the data after conversion, to due to network interruption etc. Data exception problem caused by reason is verified, and checks whether the quality of data converted meets the mark of target database It is quasi-.
Process 112:Data update load, will be loaded into target database by the data to check and find correct, according to preparatory The data model defined is compared using timestamp, log sheet, full table, full table deletes the modes such as insertion and updates target matrix.
Process 113:System monitoring management is monitored the information such as machine hardware information, cluster operating status, and to member Data, database interface, journal file interface, flow data interface etc. are managed.
It is the massive multi-source ETL process system of supporting interface adaptation of the present invention, including data pick-up mould with reference to Fig. 2 Block, data conversion module, data loading module, monitoring management module.
Data extraction module is made of interface adapter, ETL tools engine.Wherein interface adapter include data-interface and Adaptation rule engine specifies data source and object library, designs decimation rule, completes interface adaptation.ETL tools engine is integrated and is managed The ETL tools such as Sqoop, Flume, Kettle, Kafka are managed, according to the matching pair adaptive to different data of interface adaptation rule The ETL tool answered.
Data conversion module is drawn by job scheduling management engine, job execution engine, distributed caching area, business rule It holds up, data processing submodule composition.
Job scheduling management engine is responsible for the management and task schedule of ETL operation.
Job execution engine is responsible for the execution of specific ETL operation, realizes ETL operation based on MapReduce Computational frame Processed offline realizes the real-time processing of ETL operation based on Spark Computational frame.
Distributed caching area carries out buffer-stored to the data of extraction, and wherein HDFS is responsible for the storage of bottom data, Hive It is responsible for the management of data, HBase is responsible for the change maintenance of data, is stored in data conversion calculating process and is written infrequently the number taken According to;
Business Rule Engine formulates the business rule of data cleansing, conversion according to actual service logic;
Data processing submodule completes the cleaning and conversion of data according to practical business rule;
Data loading module updates submodule by quality examination submodule, load and forms, and is responsible for the data converted It is checked, and loads and be updated to target database.
Monitoring management module, including:Monitoring operation submodule, load balancing submodule, metadata management submodule, interface Manage submodule, operation monitoring submodule.
Monitoring operation submodule is monitored the implementation procedure of ETL operation, handles operation abnormal conditions;
Load balancing submodule is assessed resource load situation according to operation execution situation, is loaded by load migration Adjustment, realizes the lasting maximization of the utilization of resources;
Metadata management submodule carries out pipe to metadata such as data source definitions, transformation rule definition, implementation procedure definition Reason;
Interface management submodule, to the interfaces such as database interface, journal file interface carry out interface definition, protocol adaptation, Data encapsulation;
Run monitoring submodule, monitor and collecting robot hardware information, resource information, load information, cluster component states, Cluster operation information, real-time perception and the operation conditions for analyzing cluster.
With reference to Fig. 3, it is cloud ETL tool matching flow chart of the present invention, includes the following steps:
Process 301:User specifies data source and object library, and database information, including type of database, connection class is arranged Type, database IP, database-name, port, user name, password etc..
Process 302:According to data source and object library type, ETL tool is adaptively matched, if database data, is turned To process 203;If log file data, process 206 is gone to;If flow data, process 207 is gone to.
Process 303:Process 204 is gone to if it is HDFS that data source and object library, which have a side, for database data;Its He then goes to process 205 at type.
Process 304:For the data pick-up between HDFS and other databases, ETL tool Sqoop is adaptively matched.
Process 305:For between the relational datas such as Oracle, MySQL, and with the non-relationals data such as MongoDB The data pick-up in library adaptively matches ETL tool Kettle.
Process 306:For log file data, ETL tool Flume is matched, acquisition comes from console (console), RPC (Thrift-RPC), (syslog log system supports TCP and UDP etc. 2 by text (file), tail (UNIX tail), syslog Kind of mode), the daily record datas of the various types of exec (order executes) etc. and format.
Process 307:For flow data, ETL tool Kafka is matched, the flow data high to requirement of real-time is acquired number According to.
Process 308:ETL tool matching is completed.
With reference to Fig. 4, it is ETL job execution and management and running flow chart of the present invention, includes the following steps:
Process 401:According to the objectives and tasks of data pick-up, actual service logic is combed.
Process 402:ETL Operation control process is created, the execution process of ETL operation, packet are designed according to actual service logic It includes job dependence relationship, whether be increment extraction, extraction frequency etc..Wherein Kettle is to visualize rapid configuration, Sqoop, The ETL tool such as Flume, Kafka needs write order to execute.
Process 403:ETL job scheduling strategy, including the triggering of time trigger, event and processing in real time are set.
Process 404:ETL job priority is set, is configured according to priority of the actual service logic to ETL operation, So as to the high operation of the priority scheduling priority in inadequate resource.
Process 405:ETL job scheduling control strategy is set, job scheduling resource threshold value of warning is set, super using resource When crossing threshold value, the low ETL operation of pause priority.
Process 406:Select job execution mode, including locally execute, remotely execute, cluster execute etc. modes.
Process 407:Operation is executed, ETL job execution is started according to operation configuration parameter.Sqoop starting only has map's MapReduce operation reads data according to data cutting value line by line;Kettle establishes conversion Transformation and task Job, after each link task parameters are arranged, starting workflow carries out data pick-up;Flume passes through its source collect components day Will data are placed in channel component and are cached, and send data to destination by sink component;Kafka collects stream A series of batch processing jobs are resolved into after data to be handled in real time by the distributed elastic data set in Spark.
Process 408:ETL job execution state is monitored, suspends Partial Jobs when memory source is more than threshold value, waits until money Operation is executed when below the near threshold value in source again, realizes load balancing.
Process 409:Whether Inspection execution is completed, if all executing completion, terminates.
It is the embodiment of the present invention with reference to Fig. 5, including data source, data extraction module, data conversion module, data cloud are put down The modules such as platform.
Power information acquisition system, sales service application system, life of the power information big data cloud platform from dispersion everywhere It produces and extracts subscriber profile data, acquisition in the peripheral systems such as management platform, the hot multiple-in-one system of electricity-water-gas, preposition communications platform The multiple types of data such as data, statistical query data, management data, communication data, monitoring data carry out unified store and analyze, and are The strengthened research of power information big data provides data and supports, and the flow datas such as monitoring information of acquisition server in real time, it is ensured that The stable operation of platform.
Data source:The flow data of database data and Platform Server including peripheral system.Wherein power information is adopted The database of the peripheral systems such as collecting system is mostly oracle database and HDFS distributed file system, the basis including structuring Data and document, picture, audio-video etc. are semi-structured, unstructured data.Distributed storage, access effect based on HDFS The high characteristic of rate, power information big data is stored in HDFS more in the peripheral systems such as power information acquisition system, is covered big The structural datas such as partial base profile data, acquisition data, statistical query data, monitoring data and document, picture, Audio-video etc. is semi-structured, unstructured data;Part basis file data is stored in oracle database, and uses frequency The lower statistical data analysis of rate.The difference of access efficiency based on HDFS and oracle database institute storing data type, from Semi-structured, the unstructured data that most structural data and whole are extracted in HDFS, from oracle database Extract the unexistent part basis file data of HDFS and statistical data analysis.It is specific as shown in table 1:
The data source information of 1 power information big data cloud platform of table
Data extraction module is made of database interface, adaptation engine, ETL tools engine.
Wherein oracle database and HDFS of the database interface towards peripheral systems such as power information acquisition systems provide Database interface, the flow data that object platform server extracts provide flow data interface;
Adaptation engine provides ETL tool adaptation rule towards power information big data, the judgement rule including data source types Then, the decision rule of type of database, the matching rule of ETL tool.
ETL tools engine is that the different types of data from different data sources matches corresponding ETL work according to adaptation rule Tool, first determines whether the type of data source, if the real-time acquisition of Platform Server flow data, then matches Kafka;
If the data from peripheral system database, then further type of database is judged, acquire for power information Oracle number in the peripheral systems such as the hot multiple-in-one system of system, sales service application system, production management platform, electricity-water-gas According to storehouse matching ETL tool Kettle;For power information acquisition system, the hot multiple-in-one system of electricity-water-gas, preposition communications platform HDFS in equal peripheral systems then matches ETL tool Sqoop;Set after the completion of ETL tool matching, task parameters, The initial parameter values such as database connection, user name, password, permission.
Data conversion module is realized based on distributed type assemblies, is drawn by Operation control engine, job scheduling engine, job execution It holds up, data processing loading module composition.
Operation control engine in the data conversion module is patrolled according to the practical business of power information big data cloud platform Design ETL Operation control process is collected, the design cycle mode of different ETL tools is different, as shown in table 2:
The setting of 2 ETL Operation control process of table
Job scheduling engine in the data conversion module completes the job priority that power information big data extracts task Grade configuration, job scheduling and monitoring, scheduling strategy are as shown in table 3:
3 ETL job scheduling strategy of table
Job execution engine in the data conversion module is according to established power information big data ETL Operation control Process, which specifically to execute ETL, extracts operation.Wherein Sqoop starting only has the MapReduce operation of map, is joined according to data cutting Data are carried out cutting by numerical value, and the region cut out is assigned in different map, and each map is according in data capsule The metadata information of storage, reads data from HDFS line by line.Data pick-up based on Kettle is mainly established Transformation (conversion) and Job (task) realize, wherein Transformation can be by the visual design by each ring Section is added in main window, realizes the connection between each link, the entitled .ktr of the file extent of formation.Job is based on workflow mould Type executes designed convert task, the entitled .kjb of file extent.Flow data batch processing job is then based on SparkStreaming It is handled, flow data is resolved into a series of short and small batch processing jobs at set intervals first, then will be criticized Operation changing is handled into the elasticity distribution formula data set RDDS in Spark, data processing is carried out by RDDs, including mapping (map), It filters (filter), polymerize (join) and grouping (group by) etc..
Data processing loading module is based on distributed computing framework MapReduce and Spark and is realized, including data are slow Storing module, data cleansing module, data conversion module, data loading module.
Data cache module in the data processing loading module is made of HDFS, Hive, HBaase, to what is extracted Power information big data is cached, and wherein HDFS is responsible for data storage, and Hive is responsible for data management, and HBase is responsible for data change More safeguard.
Data cleansing module in the data processing loading module, filling a vacancy including power information big data, correct and Cleaning, wherein data fill a vacancy the missing information of deficiency of data and mismatch information progress completion, and data correction will be in data Do not have progress logic judgment that wrong data caused by database is just written when reception to be modified according to specific business, data cleansing needle Cleaning rule is designed to dirty data, and determines its correctness.
Data conversion module in the data processing loading module, the inconsistent conversion of completion power information big data, Data granularity and standard handovers, inconsistent conversion complete the whole of data according to power information big data feature and business rule It closes, the data of the same type from different business systems is subjected to unification;Data granularity conversion then will be according to target data storehouse Operation system data polymerize by library granularity;The mark that data standard conversion is then formulated according to power information big data cloud platform Quasi- data model, by the data conversion extracted at normal data needed for platform.Data translation tasks be based on MapReduce and Spark is realized, wherein the off-line calculation processing of magnanimity electricity consumption big data is realized based on MapReduce frame, it is parallel based on Spark The real-time calculation processing of computing architecture realization big data.
The data cleaned, converted are loaded by the data loading module in the data processing loading module uses telecommunications Cease big data cloud platform.
Data cloud platform, that is, data load object library, is uniformly stored using HDFS distributed file system from dispersion The different types of magnanimity power information data that data source is extracted are divided into storage facility located at processing plant and analysis library by data type.
Power information big data cloud platform it is integrated using cloud ETL managing device and effectively manage Sqoop, Kettle, The ETL tool such as Kafka, for the Oracle number of the peripheral systems such as power information acquisition system, the preposition communications platform of dispersion everywhere According to the corresponding ETL tool of the flow data Adaptive matching of library, HDFS and server, and it is based on distributed computing framework MapReduce and parallel processing Computational frame Spark complete magnanimity structuring, semi-structured, unstructured data extraction with Conversion can be realized efficient execution and the orderly management of ETL operation, realize the centrally stored of magnanimity power information big data and place Reason is beneficial to be realized data sharing based on the mass data in power information big data cloud platform and carries out strengthened research.
The foregoing is merely preferred embodiment of the present application, are not intended to limit this application, for the skill of this field For art personnel, various changes and changes are possible in this application.Within the spirit and principles of this application, made any to repair Change, equivalent replacement, improvement etc., should be included within the scope of protection of this application.

Claims (10)

1. the massive multi-source ETL process method of supporting interface adaptation, characterized in that including:
The essential information of data source and target database is arranged in data pick-up step, adaptively matches phase for different data sources The ETL tool answered, and parameter setting is carried out to ETL tool;It is taken out by database interface, journal file interface or flow data interface Take different data sources;
Data conversion step completes the execution of ETL Operation control and management and running, base based on MapReduce and Spark Computational frame Buffer-stored and management are carried out to the data extracted in HDFS, Hive or HBase, and complete the cleaning and conversion of data;
Data object after conversion is carried out quality examination by data load step, and according to the table knot of data model definitions Structure output, the data update after checking and finding correct are loaded onto target database;
Monitoring management step is monitored pipe to ETL job execution process, operation resource service condition and running situation Reason.
2. the massive multi-source ETL process method of supporting interface adaptation as described in claim 1, characterized in that the number According to extraction step, including:
Data source and object library sub-step are set, the essential information of data source and target database is set, including:Class database Connection type, database IP between type, data source and target database, database-name, port, user name, password;
Adaptive matching ETL tool sub-step, for the adaptive corresponding ETL tool of matching of different data sources;
In the Adaptive matching ETL tool sub-step, if data source or target database are database data, if there is a side For non-relational database HDFS, then ETL tool Sqoop is adaptively matched;Otherwise adaptive matching ETL tool Kettle;If data source is journal file, ETL tool Flume is adaptively matched;If data source is flow data, Adaptively match ETL tool Kafka;
ETL tool parameters configure sub-step, set after the completion of ETL tool matching, task parameters.
3. the massive multi-source ETL process method of supporting interface adaptation as described in claim 1, characterized in that the number According to switch process, including:
Operation process design sub-step refers to according to actual service logic design project control flow, including extracts mode and ETL Business process;
Job scheduling manages sub-step, including:Job scheduling strategy, job dependence control, job priority configuration, job scheduling Control, wherein job scheduling strategy includes time trigger, event triggering and immediate processing mode;Job dependence control refers to root The dependence between operation is formulated according to actual service logic;Job priority configuration refers to according to actual service logic and system money The priority of source service condition formulation operation;Job scheduling control refers to setting job scheduling resource threshold value of warning, makes in resource When with being more than threshold value, the low operation of pause priority;
Job execution sub-step is responsible for the execution of ETL operation;
In the job execution sub-step, Sqoop starting only has the MapReduce operation of map, is read line by line according to data cutting value Access evidence;Kettle establishes conversion Transformation and task Job, after each link task parameters are arranged, starts workflow Cheng Jinhang data pick-up;Flume is placed in channel component and is cached by its source collect components daily record data, and Destination is sent data to by sink component;Kafka resolves into a series of batch processing jobs by Spark after collecting flow data In distributed elastic data set handled in real time;
Distributed caching sub-step carries out buffer-stored to the data of extraction, and wherein HDFS is responsible for the storage of bottom data, Hive It is responsible for the filtering of data, summarizes, inquire, analyzing, HBase is responsible for the change maintenance of data, in data conversion calculating process quilt The data frequently read are stored;
Business rule formulates sub-step, according to practical business rule, formulates the business rule of data cleansing, conversion;
Data processing sub-step completes the cleaning and conversion of data according to the business rule of formulation, and wherein number is completed in data cleansing According to fill a vacancy, correct and clean, data conversion complete data inconsistent conversion, data granularity conversion and standard handovers.
4. the massive multi-source ETL process method of supporting interface adaptation as described in claim 1, characterized in that data add Step is carried, including:
The quality of data checks sub-step, the data object after conversion is carried out quality examination, to due to network interruption Caused by data exception problem verified, and check whether the quality of data that converts meets the standard of target database;
Data update load sub-step, will be loaded into target database by the data that check and find correct, according to pre-defining Data model, using timestamp, log sheet, full table compare, full table delete or insertion by the way of update target matrix.
5. the massive multi-source ETL process method of supporting interface adaptation as described in claim 1, characterized in that monitoring pipe Step is managed, including:
Monitoring operation manages sub-step, and implementation procedure and resource service condition to ETL operation are monitored;
The ETL job execution process monitoring sub-step, to including job execution time, operation progress situation, whether overtime, work Industry is interrupted, job stacking information is monitored;The job execution time is monitored, time-out is set and is reminded, and is analyzed and is made by artificial judgment Industry timeout issue;Job execution log information is monitored, when there is job interruption, according to the interruption Restoration Mechanism of formulation, is touched again Send out job execution;Operation progress situation is monitored, when there is job stacking, is lined up according to job priority, preferential executive level is high Operation;
The ETL operation monitoring resource sub-step, is monitored the service condition of operation resource, if resource load is more than threshold value Shi Jinhang adjustment of load, pause or the low operation of stop section priority wait load to be down to threshold value or less and execute operation again;
System monitoring manages sub-step, is monitored to machine hardware information, cluster running state information, and to metadata, number It is managed according to bank interface, journal file interface or flow data interface.
6. the massive multi-source ETL process system of supporting interface adaptation, characterized in that including:
The essential information of data source and target database is arranged in data extraction module, adaptively matches phase for different data sources The ETL tool answered, and parameter setting is carried out to ETL tool;It is taken out by database interface, journal file interface or flow data interface Take different data sources;
Data conversion module is completed ETL job execution and management and running based on MapReduce and Spark Computational frame, is based on HDFS, Hive or HBase carry out buffer-stored and management to the data extracted, and complete the cleaning and conversion of data;
Data object after conversion is carried out quality examination by data loading module, and according to the table knot of data model definitions Structure output, the data update after checking and finding correct are loaded onto target database;
Monitoring management module is monitored pipe to ETL job execution process, operation resource service condition and running situation Reason.
7. the massive multi-source ETL process system of supporting interface adaptation as claimed in claim 6, characterized in that
The data extraction module, including:
Interface adapter, data base-oriented data, journal file or flow data different types of data provide data-interface, and formulate ETL tool adaptation rule;The interface adapter further includes adaptation rule engine;
The data-interface, including:Database interface, journal file interface and flow data interface, wherein by database interface, Database data is extracted from relevant database or non-relational database;By journal file interface, journal file is extracted; By flow data interface, flow data is extracted;
The adaptation rule engine is used to be arranged the essential information of data source and target database, including:Type of database, data Source and target database connection type, database IP, database-name, port, user name, password;Adaptation rule includes data The adaptation rule of the decision rule of source and target type of database, difference ETL tool;
ETL tools engine, integrated and manage ETL tool, the ETL tool, including:Sqoop, Flume, Kettle, Kafka, Isomeric data for database data, journal file, flow data from different data sources, the suitable ETL work of Adaptive matching Tool;
In the ETL tools engine, ETL adaptation engine according to the data source formulated in the adaptation rule engine of interface adapter and The adaptation rule of the decision rule of target database, difference ETL tool is different data source capability ETL tool.
8. the massive multi-source ETL process system of supporting interface adaptation as claimed in claim 6, characterized in that
The data conversion module, including:
Job scheduling management engine, using the host node of distributed type assemblies as management and running engine, including job management unit and Task scheduling unit;
The job management unit, including:ETL job design, operation configuration and monitoring operation, wherein ETL job design refers to According to actual service logic fulfil assignment dependence, whether increment extraction or extract frequency Operation control process design;Make Industry configures the configuration for referring to the priority that fulfils assignment, job execution mode parameter;Monitoring operation refers to the execution shape to ETL operation State and resource service condition are monitored management;
The task scheduling unit, including:Job scheduling strategy and job scheduling control, wherein when job scheduling strategy includes Between triggering, event triggering and immediate processing mode;Job scheduling control refers to setting job scheduling resource threshold value of warning, in resource When using more than threshold value, the low operation of pause priority;
Job execution engine is responsible for the execution of ETL operation, is based on using the slave node of distributed type assemblies as enforcement engine MapReduce Computational frame realizes the processed offline of ETL operation, and the real-time place of ETL operation is realized based on Spark Computational frame Reason;
In the job execution engine, Sqoop starting only has the MapReduce operation of map, is read line by line according to data cutting value Data;Kettle establishes conversion Transformation and task Job, after each link task parameters are arranged, starts workflow Carry out data pick-up;Flume is placed in channel component and is cached by its source collect components daily record data, and by Sink component sends data to destination;Kafka resolves into a series of batch processing jobs by Spark after collecting flow data Distributed elastic data set handled in real time;
Distributed caching submodule carries out buffer-stored and management to the data of extraction, and wherein HDFS is responsible for depositing for bottom data Storage, Hive are responsible for the filtering of data, summarize, inquire or analyze;HBase is responsible for the change maintenance of data, in data conversion meter Calculation process is written infrequently the data taken and is stored;
Business Rule Engine formulates cleaning rule to deficiency of data, wrong data and dirty data according to practical business rule, Transformation rule is formulated to the isomeric data from different business systems;
Data processing submodule, including:Data cleansing unit and Date Conversion Unit;According to practical business rule, data are completed Cleaning and conversion;
The data cleansing unit completes filling a vacancy, correct and cleaning for data, wherein data are filled a vacancy the missing of deficiency of data Information and mismatch information carry out completion;Wrong data is modified by data correction according to specific business;Data cleansing is directed to Dirty data design cleaning rule simultaneously determines its correctness;
The Date Conversion Unit is by the magnanimity M IS from different business systems at number needed for target database According to wherein the data of the same type from different business systems are carried out unification by inconsistent conversion;Data granularity conversion according to The operation system data extracted polymerize by target database granularity;Data standard is converted according to the standard pre-established Data model will extract data conversion into normal data needed for object library.
9. the massive multi-source ETL process system of supporting interface adaptation as claimed in claim 6, characterized in that
Data loading module, including:
Data object after conversion is carried out quality examination, is caused to due to network interruption by quality examination submodule Data exception problem verified, and check whether the quality of data that converts meets the standard of target database;
Data update load submodule, will be loaded into target database by the data that check and find correct, according to pre-defining Data model, using timestamp, log sheet, full table compare, full table delete or insertion by the way of update target matrix.
10. the massive multi-source ETL process system of supporting interface adaptation as claimed in claim 6, characterized in that
Monitoring management module, including:
Monitoring operation submodule is monitored the implementation procedure of ETL operation, including the job execution time, whether time-out, operation Interruption, job stacking, operation progress situation, operation resource service condition;
Load balancing submodule is assessed resource load situation according to ETL operation execution situation, is born when load is more than threshold value Adjustment, pause or the low operation of stop section priority are carried, waits load to be down to threshold value or less and executes operation again;
Metadata management submodule, data of the metadata as description data attribute information save definition, the conversion rule of data source Definition, implementation procedure definition then, are managed metadata information involved by ETL operation;
Interface management submodule, to database interface, journal file interface, flow data interface carry out interface definition, protocol adaptation, Data encapsulation, supports http, ftp, rest, webservice interface protocol;
Monitoring submodule is run, simultaneously collecting robot hardware information, resource information, load information, cluster component states, cluster are monitored Operation information, real-time perception and the operation conditions for analyzing cluster.
CN201810588231.7A 2018-06-08 2018-06-08 The massive multi-source ETL process method and system of supporting interface adaptation Pending CN108846076A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810588231.7A CN108846076A (en) 2018-06-08 2018-06-08 The massive multi-source ETL process method and system of supporting interface adaptation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810588231.7A CN108846076A (en) 2018-06-08 2018-06-08 The massive multi-source ETL process method and system of supporting interface adaptation

Publications (1)

Publication Number Publication Date
CN108846076A true CN108846076A (en) 2018-11-20

Family

ID=64210656

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810588231.7A Pending CN108846076A (en) 2018-06-08 2018-06-08 The massive multi-source ETL process method and system of supporting interface adaptation

Country Status (1)

Country Link
CN (1) CN108846076A (en)

Cited By (107)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109246254A (en) * 2018-11-29 2019-01-18 国网重庆市电力公司 The data acquisition communications platform and communication means for supporting large-scale electric energy table directly to adopt
CN109558400A (en) * 2018-11-28 2019-04-02 北京锐安科技有限公司 Data processing method, device, equipment and storage medium
CN109656963A (en) * 2018-12-18 2019-04-19 深圳前海微众银行股份有限公司 Metadata acquisition methods, device, equipment and computer readable storage medium
CN109669977A (en) * 2018-11-30 2019-04-23 金蝶软件(中国)有限公司 Data cut-in method, device, computer equipment and the storage medium of integration across database
CN109684399A (en) * 2018-12-24 2019-04-26 成都四方伟业软件股份有限公司 Data bank access method, database access device and Data Analysis Platform
CN109697215A (en) * 2018-12-14 2019-04-30 安徽同徽网络技术有限公司 Collecting method, data collection system and nonvolatile computer storage media
CN109739851A (en) * 2019-01-21 2019-05-10 广东创能科技股份有限公司 Floating population's big data multi-source acquisition method and system
CN109753502A (en) * 2018-12-29 2019-05-14 山东浪潮商用系统有限公司 A kind of collecting method based on NiFi
CN109783314A (en) * 2018-12-26 2019-05-21 广州裕鼎信息科技有限公司 Information technoloy equipment method for managing and monitoring and server
CN109800220A (en) * 2019-01-29 2019-05-24 浙江国贸云商企业服务有限公司 A kind of big data cleaning method, system and relevant apparatus
CN109857792A (en) * 2018-12-24 2019-06-07 中译语通科技股份有限公司 A kind of method and system of asynchronous big data cleaning conversion
CN110032570A (en) * 2019-04-01 2019-07-19 江西世恒信息产业有限公司 A kind of spatial data dynamic update system based on B/S framework
CN110119422A (en) * 2019-05-16 2019-08-13 武汉神算云信息科技有限责任公司 Small wechat borrows tenant data depot data processing system and equipment
CN110147356A (en) * 2019-05-14 2019-08-20 厦门欢乐逛科技股份有限公司 Data transmission method and device
CN110196876A (en) * 2019-06-05 2019-09-03 浪潮软件股份有限公司 A method of it is isolated tool based on web administration and scheduling Kettle
CN110262945A (en) * 2019-06-25 2019-09-20 苏宁消费金融有限公司 A kind of method of intelligent monitoring data warehouse scheduling system
CN110347741A (en) * 2019-07-18 2019-10-18 普元信息技术股份有限公司 The system and its control method of the outputting result quality of data are effectively promoted in big data treatment process
CN110413404A (en) * 2019-06-18 2019-11-05 平安科技(深圳)有限公司 Resource allocation methods, device, equipment and storage medium priority-based
CN110471968A (en) * 2019-07-11 2019-11-19 新华三大数据技术有限公司 Dissemination method, device, equipment and the storage medium of ETL task
CN110502491A (en) * 2019-07-25 2019-11-26 北京神州泰岳智能数据技术有限公司 A kind of Log Collect System and its data transmission method, device
CN110569298A (en) * 2019-09-12 2019-12-13 成都中科大旗软件股份有限公司 data docking and visualization method and system
CN110597798A (en) * 2019-09-17 2019-12-20 山东爱城市网信息技术有限公司 Data detection method based on Thrift
CN110636116A (en) * 2019-08-29 2019-12-31 武汉烽火众智数字技术有限责任公司 Multidimensional data acquisition system and method
CN110647570A (en) * 2019-09-20 2020-01-03 百度在线网络技术(北京)有限公司 Data processing method and device and electronic equipment
CN110704502A (en) * 2019-11-20 2020-01-17 中电万维信息技术有限责任公司 Componentized data quality checking method
CN110716774A (en) * 2019-08-22 2020-01-21 华信永道(北京)科技股份有限公司 Data driving method, system and storage medium for brain of financial business data
CN110781248A (en) * 2019-09-27 2020-02-11 浙江省北大信息技术高等研究院 Multi-source heterogeneous data acquisition method and device
CN110880146A (en) * 2019-11-21 2020-03-13 上海中信信息发展股份有限公司 Block chain chaining method, device, electronic equipment and storage medium
CN110990391A (en) * 2019-12-04 2020-04-10 中山市凯能集团有限公司 Integration method and system of multi-source heterogeneous data, computer equipment and storage medium
CN110990368A (en) * 2019-11-29 2020-04-10 广西电网有限责任公司 Full-link data management system and management method thereof
CN110990390A (en) * 2019-12-02 2020-04-10 东莞中国科学院云计算产业技术创新与育成中心 Data cooperative processing method and device, computer equipment and storage medium
CN111061715A (en) * 2019-12-16 2020-04-24 北京邮电大学 Web and Kafka-based distributed data integration system and method
CN111061788A (en) * 2019-11-26 2020-04-24 江苏瑞中数据股份有限公司 Multi-source heterogeneous data conversion integration system based on cloud architecture and implementation method thereof
CN111104214A (en) * 2019-12-26 2020-05-05 北京九章云极科技有限公司 Workflow application method and device
CN111125209A (en) * 2019-11-25 2020-05-08 集奥聚合(北京)人工智能科技有限公司 Access configuration system supporting multi-element heterogeneous type data
CN111125230A (en) * 2019-12-30 2020-05-08 中电工业互联网有限公司 Data processing method and system of Internet of things platform based on rule engine
CN111124679A (en) * 2019-12-19 2020-05-08 南京莱斯信息技术股份有限公司 Time-limited automatic processing method for multi-source heterogeneous mass data
CN111142942A (en) * 2019-12-26 2020-05-12 远景智能国际私人投资有限公司 Window data processing method and device, server and storage medium
CN111159268A (en) * 2019-12-19 2020-05-15 武汉达梦数据库有限公司 Method and device for running ETL (extract-transform-load) process in Spark cluster
CN111324688A (en) * 2020-02-24 2020-06-23 南京莱斯网信技术研究院有限公司 Semi-structured data and unstructured data acquisition system based on events
CN111400288A (en) * 2019-01-02 2020-07-10 中国移动通信有限公司研究院 Data quality inspection method and system
CN111460019A (en) * 2020-04-02 2020-07-28 中电工业互联网有限公司 Data conversion method and middleware of heterogeneous data source
CN111460772A (en) * 2020-02-28 2020-07-28 上海维信荟智金融科技有限公司 Automatic report processing method and system
CN111506638A (en) * 2020-03-03 2020-08-07 浙江大学 Method for automatically collecting supervision data
CN111581254A (en) * 2020-05-03 2020-08-25 上海维信荟智金融科技有限公司 ETL method and system based on internet financial data
CN111639083A (en) * 2020-04-10 2020-09-08 新智云数据服务有限公司 Management system of unified database management method
CN111666324A (en) * 2020-05-18 2020-09-15 新浪网技术(中国)有限公司 ETL scheduling method and device between relational databases
CN111721355A (en) * 2020-05-14 2020-09-29 中铁第一勘察设计院集团有限公司 Railway contact net monitoring data acquisition system
CN111737242A (en) * 2020-06-19 2020-10-02 福建南威软件有限公司 Method for monitoring mass data processing process
CN111881154A (en) * 2020-07-29 2020-11-03 北京浪潮数据技术有限公司 ETL task processing method, device and related equipment
CN111882203A (en) * 2020-07-24 2020-11-03 山东管理学院 Traditional Chinese medicine cloud service experimental system
CN111897865A (en) * 2020-08-13 2020-11-06 工银科技有限公司 Dynamic adjustment method and device for ETL (extract transform load) working load
CN111966394A (en) * 2020-08-28 2020-11-20 珠海格力电器股份有限公司 ETL-based data analysis method, device, equipment and storage medium
CN112015724A (en) * 2019-09-25 2020-12-01 国网湖北省电力有限公司黄石供电公司 Method for analyzing metering abnormality of electric power operation data
CN112035468A (en) * 2020-08-24 2020-12-04 杭州览众数据科技有限公司 Multi-data-source ETL tool based on memory calculation and web visual configuration
CN112052284A (en) * 2020-08-26 2020-12-08 南京越扬科技有限公司 Main data management method and system under big data
CN112100227A (en) * 2020-09-22 2020-12-18 国网辽宁省电力有限公司电力科学研究院 Big data processing method based on multilevel heterogeneous data storage
CN112162754A (en) * 2020-10-19 2021-01-01 科技谷(厦门)信息技术有限公司 Multi-source heterogeneous data processing system
CN112164430A (en) * 2020-10-12 2021-01-01 深圳晶泰科技有限公司 Data processing method and system for drug research and development
CN112181959A (en) * 2020-09-15 2021-01-05 山东特检鲁安工程技术服务有限公司 Special equipment multi-source data processing platform and processing method
CN112307103A (en) * 2020-10-30 2021-02-02 山东浪潮通软信息科技有限公司 Big data rendering method and device and computer readable medium
CN112434016A (en) * 2020-12-11 2021-03-02 上海中通吉网络技术有限公司 Universal billion-level data heterogeneous migration method, device and equipment
CN112433998A (en) * 2020-11-20 2021-03-02 广东电网有限责任公司佛山供电局 Multisource heterogeneous data acquisition and convergence system and method based on power system
CN112434102A (en) * 2020-11-25 2021-03-02 深圳前海微众银行股份有限公司 Data visualization system for multiple data sources
CN112527885A (en) * 2020-12-23 2021-03-19 民生科技有限责任公司 System and method for data processing based on rule configuration in ETL
CN112527783A (en) * 2020-11-27 2021-03-19 中科曙光南京研究院有限公司 Data quality probing system based on Hadoop
CN112565042A (en) * 2020-12-24 2021-03-26 航天科工网络信息发展有限公司 Method for exchanging star-structured data
CN112579676A (en) * 2019-09-30 2021-03-30 北京国双科技有限公司 Data processing method and device between heterogeneous systems, storage medium and equipment
CN112597203A (en) * 2020-12-28 2021-04-02 恩亿科(北京)数据科技有限公司 General data monitoring method and system based on big data platform
CN112632177A (en) * 2020-12-31 2021-04-09 中国农业银行股份有限公司 Data loading operation generation method
CN112667615A (en) * 2020-12-25 2021-04-16 广东电网有限责任公司电力科学研究院 Data cleaning system and method
CN112732828A (en) * 2020-12-22 2021-04-30 航天信息股份有限公司 Cross-platform data sharing method based on data warehouse tool
CN112765121A (en) * 2021-01-08 2021-05-07 北京虹信万达科技有限公司 Administration and application system based on big data service
CN112860776A (en) * 2021-01-20 2021-05-28 山东众阳健康科技集团有限公司 Method and system for extracting and scheduling various data
CN112860675A (en) * 2021-02-06 2021-05-28 高云 Big data processing method under online cloud service environment and cloud computing server
CN112905420A (en) * 2021-03-04 2021-06-04 广东电网有限责任公司 Data monitoring system, method, electronic device and storage medium
CN112925772A (en) * 2019-12-06 2021-06-08 北京沃东天骏信息技术有限公司 Data dynamic splitting method and device
CN112925767A (en) * 2021-03-03 2021-06-08 浪潮云信息技术股份公司 Multi-data-source dynamic data synchronization management method and system based on internet supervision
CN112966031A (en) * 2019-12-12 2021-06-15 北京奇艺世纪科技有限公司 Data processing method and device, electronic equipment and computer readable storage medium
CN113051329A (en) * 2021-04-12 2021-06-29 平安国际智慧城市科技股份有限公司 Interface-based data acquisition method, device, equipment and storage medium
CN113111107A (en) * 2021-04-06 2021-07-13 创意信息技术股份有限公司 Data comprehensive access system and method
CN113111109A (en) * 2021-04-06 2021-07-13 创意信息技术股份有限公司 Interface warehousing analysis access method of data source
CN113111105A (en) * 2021-04-06 2021-07-13 创意信息技术股份有限公司 Data customized access method and system based on big data
CN113111111A (en) * 2021-04-06 2021-07-13 创意信息技术股份有限公司 Multi-data source database access method
CN113177088A (en) * 2021-04-02 2021-07-27 北京科技大学 Multi-scale simulation big data management system for material irradiation damage
CN113407734A (en) * 2021-07-14 2021-09-17 重庆富民银行股份有限公司 Construction method of knowledge map system based on real-time big data
CN113407607A (en) * 2021-06-22 2021-09-17 中国联合网络通信集团有限公司 Multi-cloud heterogeneous data processing method and device and electronic equipment
CN113485894A (en) * 2021-07-14 2021-10-08 深信服科技股份有限公司 Data acquisition method, device and equipment and readable storage medium
CN113485747A (en) * 2021-07-08 2021-10-08 广州钛动科技有限公司 Data processing method, data processor, target source component and system
CN113506098A (en) * 2021-09-10 2021-10-15 国能信控互联技术有限公司 Power plant metadata management system and method based on multi-source data
CN113535835A (en) * 2021-07-12 2021-10-22 上海浦东发展银行股份有限公司 Data acquisition method, device, medium and equipment of kernel data processing software
CN114064777A (en) * 2021-11-19 2022-02-18 杭州雷数科技有限公司 Configurable method for acquiring data at fixed time, scheduling data, encrypting transmission and visualizing
CN114064643A (en) * 2021-11-11 2022-02-18 南京熊猫电子股份有限公司 Task type data conversion system based on Oracle
CN114168672A (en) * 2021-12-13 2022-03-11 明觉科技(北京)有限公司 Log data processing method, device, system and medium
WO2022077166A1 (en) * 2020-10-12 2022-04-21 深圳晶泰科技有限公司 Data processing method and system for drug research and development
CN114379608A (en) * 2021-12-13 2022-04-22 中铁南方投资集团有限公司 Multi-source heterogeneous data integration processing method for urban rail transit engineering
CN114817393A (en) * 2022-06-24 2022-07-29 深圳市信联征信有限公司 Data extraction and cleaning method and device and storage medium
CN114936245A (en) * 2022-04-28 2022-08-23 北京远舢智能科技有限公司 Method and device for integrating and processing multi-source heterogeneous data
CN115086303A (en) * 2022-06-29 2022-09-20 徐工汉云技术股份有限公司 Multi-data-source data repeater and design method thereof
CN115796457A (en) * 2023-02-03 2023-03-14 山东铁路投资控股集团有限公司 Personnel and enterprise rating method and system based on multidimensional data
CN116016032A (en) * 2023-01-06 2023-04-25 广西电子口岸有限公司 Customs service complex message packaging method
CN116775737A (en) * 2023-06-21 2023-09-19 上海腾道信息技术有限公司 Method and system for automatically generating ETL configuration
CN117271648A (en) * 2023-11-23 2023-12-22 北京邮电大学 Adaptation method of bottom data model and storage medium
CN117312103A (en) * 2023-11-30 2023-12-29 山东麦港数据系统有限公司 Hot-pluggable distributed heterogeneous data source data scheduling processing system
CN117539605A (en) * 2024-01-09 2024-02-09 无锡挚达物联科技有限公司 Data processing program assembling method, device, equipment and storage medium
CN117555586A (en) * 2024-01-11 2024-02-13 之江实验室 Algorithm application publishing, managing and scoring method
CN111966394B (en) * 2020-08-28 2024-05-31 珠海格力电器股份有限公司 ETL-based data analysis method, device, equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160292186A1 (en) * 2015-03-30 2016-10-06 International Business Machines Corporation Dynamically maintaining data structures driven by heterogeneous clients in a distributed data collection system
CN106339509A (en) * 2016-10-26 2017-01-18 国网山东省电力公司临沂供电公司 Power grid operation data sharing system based on large data technology
CN106611046A (en) * 2016-12-16 2017-05-03 武汉中地数码科技有限公司 Big data technology-based space data storage processing middleware framework
CN107402976A (en) * 2017-07-03 2017-11-28 国网山东省电力公司经济技术研究院 Power grid multi-source data fusion method and system based on multi-element heterogeneous model
CN107733986A (en) * 2017-09-15 2018-02-23 中国南方电网有限责任公司 Support the protection of integrated deployment and monitoring operation big data support platform

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160292186A1 (en) * 2015-03-30 2016-10-06 International Business Machines Corporation Dynamically maintaining data structures driven by heterogeneous clients in a distributed data collection system
CN106339509A (en) * 2016-10-26 2017-01-18 国网山东省电力公司临沂供电公司 Power grid operation data sharing system based on large data technology
CN106611046A (en) * 2016-12-16 2017-05-03 武汉中地数码科技有限公司 Big data technology-based space data storage processing middleware framework
CN107402976A (en) * 2017-07-03 2017-11-28 国网山东省电力公司经济技术研究院 Power grid multi-source data fusion method and system based on multi-element heterogeneous model
CN107733986A (en) * 2017-09-15 2018-02-23 中国南方电网有限责任公司 Support the protection of integrated deployment and monitoring operation big data support platform

Cited By (140)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109558400A (en) * 2018-11-28 2019-04-02 北京锐安科技有限公司 Data processing method, device, equipment and storage medium
CN109558400B (en) * 2018-11-28 2021-04-27 北京锐安科技有限公司 Data processing method, device, equipment and storage medium
CN109246254A (en) * 2018-11-29 2019-01-18 国网重庆市电力公司 The data acquisition communications platform and communication means for supporting large-scale electric energy table directly to adopt
CN109669977A (en) * 2018-11-30 2019-04-23 金蝶软件(中国)有限公司 Data cut-in method, device, computer equipment and the storage medium of integration across database
CN109697215A (en) * 2018-12-14 2019-04-30 安徽同徽网络技术有限公司 Collecting method, data collection system and nonvolatile computer storage media
CN109656963A (en) * 2018-12-18 2019-04-19 深圳前海微众银行股份有限公司 Metadata acquisition methods, device, equipment and computer readable storage medium
CN109684399A (en) * 2018-12-24 2019-04-26 成都四方伟业软件股份有限公司 Data bank access method, database access device and Data Analysis Platform
CN109857792A (en) * 2018-12-24 2019-06-07 中译语通科技股份有限公司 A kind of method and system of asynchronous big data cleaning conversion
CN109783314A (en) * 2018-12-26 2019-05-21 广州裕鼎信息科技有限公司 Information technoloy equipment method for managing and monitoring and server
CN109753502A (en) * 2018-12-29 2019-05-14 山东浪潮商用系统有限公司 A kind of collecting method based on NiFi
CN109753502B (en) * 2018-12-29 2023-05-12 浪潮软件科技有限公司 Data acquisition method based on NiFi
CN111400288A (en) * 2019-01-02 2020-07-10 中国移动通信有限公司研究院 Data quality inspection method and system
CN109739851A (en) * 2019-01-21 2019-05-10 广东创能科技股份有限公司 Floating population's big data multi-source acquisition method and system
CN109800220A (en) * 2019-01-29 2019-05-24 浙江国贸云商企业服务有限公司 A kind of big data cleaning method, system and relevant apparatus
CN110032570A (en) * 2019-04-01 2019-07-19 江西世恒信息产业有限公司 A kind of spatial data dynamic update system based on B/S framework
CN110147356A (en) * 2019-05-14 2019-08-20 厦门欢乐逛科技股份有限公司 Data transmission method and device
CN110119422A (en) * 2019-05-16 2019-08-13 武汉神算云信息科技有限责任公司 Small wechat borrows tenant data depot data processing system and equipment
CN110196876A (en) * 2019-06-05 2019-09-03 浪潮软件股份有限公司 A method of it is isolated tool based on web administration and scheduling Kettle
CN110413404A (en) * 2019-06-18 2019-11-05 平安科技(深圳)有限公司 Resource allocation methods, device, equipment and storage medium priority-based
CN110262945A (en) * 2019-06-25 2019-09-20 苏宁消费金融有限公司 A kind of method of intelligent monitoring data warehouse scheduling system
CN110471968A (en) * 2019-07-11 2019-11-19 新华三大数据技术有限公司 Dissemination method, device, equipment and the storage medium of ETL task
CN110347741A (en) * 2019-07-18 2019-10-18 普元信息技术股份有限公司 The system and its control method of the outputting result quality of data are effectively promoted in big data treatment process
CN110347741B (en) * 2019-07-18 2023-05-05 普元信息技术股份有限公司 System for effectively improving output result data quality in big data processing process and control method thereof
CN110502491A (en) * 2019-07-25 2019-11-26 北京神州泰岳智能数据技术有限公司 A kind of Log Collect System and its data transmission method, device
CN110716774A (en) * 2019-08-22 2020-01-21 华信永道(北京)科技股份有限公司 Data driving method, system and storage medium for brain of financial business data
CN110636116A (en) * 2019-08-29 2019-12-31 武汉烽火众智数字技术有限责任公司 Multidimensional data acquisition system and method
CN110636116B (en) * 2019-08-29 2022-05-10 武汉烽火众智数字技术有限责任公司 Multidimensional data acquisition system and method
CN110569298A (en) * 2019-09-12 2019-12-13 成都中科大旗软件股份有限公司 data docking and visualization method and system
CN110569298B (en) * 2019-09-12 2023-03-24 成都中科大旗软件股份有限公司 Data docking and visualization method and system
CN110597798A (en) * 2019-09-17 2019-12-20 山东爱城市网信息技术有限公司 Data detection method based on Thrift
CN110597798B (en) * 2019-09-17 2023-08-25 浪潮卓数大数据产业发展有限公司 Data detection method based on thread
CN110647570B (en) * 2019-09-20 2022-04-29 百度在线网络技术(北京)有限公司 Data processing method and device and electronic equipment
CN110647570A (en) * 2019-09-20 2020-01-03 百度在线网络技术(北京)有限公司 Data processing method and device and electronic equipment
CN112015724A (en) * 2019-09-25 2020-12-01 国网湖北省电力有限公司黄石供电公司 Method for analyzing metering abnormality of electric power operation data
CN110781248A (en) * 2019-09-27 2020-02-11 浙江省北大信息技术高等研究院 Multi-source heterogeneous data acquisition method and device
CN112579676A (en) * 2019-09-30 2021-03-30 北京国双科技有限公司 Data processing method and device between heterogeneous systems, storage medium and equipment
CN110704502A (en) * 2019-11-20 2020-01-17 中电万维信息技术有限责任公司 Componentized data quality checking method
CN110880146A (en) * 2019-11-21 2020-03-13 上海中信信息发展股份有限公司 Block chain chaining method, device, electronic equipment and storage medium
CN111125209A (en) * 2019-11-25 2020-05-08 集奥聚合(北京)人工智能科技有限公司 Access configuration system supporting multi-element heterogeneous type data
CN111061788B (en) * 2019-11-26 2023-10-13 江苏瑞中数据股份有限公司 Multi-source heterogeneous data conversion integration system based on cloud architecture and implementation method thereof
CN111061788A (en) * 2019-11-26 2020-04-24 江苏瑞中数据股份有限公司 Multi-source heterogeneous data conversion integration system based on cloud architecture and implementation method thereof
CN110990368A (en) * 2019-11-29 2020-04-10 广西电网有限责任公司 Full-link data management system and management method thereof
CN110990390A (en) * 2019-12-02 2020-04-10 东莞中国科学院云计算产业技术创新与育成中心 Data cooperative processing method and device, computer equipment and storage medium
CN110990390B (en) * 2019-12-02 2024-03-08 东莞中国科学院云计算产业技术创新与育成中心 Data cooperative processing method, device, computer equipment and storage medium
CN110990391A (en) * 2019-12-04 2020-04-10 中山市凯能集团有限公司 Integration method and system of multi-source heterogeneous data, computer equipment and storage medium
CN112925772A (en) * 2019-12-06 2021-06-08 北京沃东天骏信息技术有限公司 Data dynamic splitting method and device
CN112966031A (en) * 2019-12-12 2021-06-15 北京奇艺世纪科技有限公司 Data processing method and device, electronic equipment and computer readable storage medium
CN111061715B (en) * 2019-12-16 2022-07-01 北京邮电大学 Web and Kafka-based distributed data integration system and method
CN111061715A (en) * 2019-12-16 2020-04-24 北京邮电大学 Web and Kafka-based distributed data integration system and method
CN111159268A (en) * 2019-12-19 2020-05-15 武汉达梦数据库有限公司 Method and device for running ETL (extract-transform-load) process in Spark cluster
CN111124679B (en) * 2019-12-19 2023-11-21 南京莱斯信息技术股份有限公司 Multi-source heterogeneous mass data-oriented time-limited automatic processing method
CN111124679A (en) * 2019-12-19 2020-05-08 南京莱斯信息技术股份有限公司 Time-limited automatic processing method for multi-source heterogeneous mass data
CN111159268B (en) * 2019-12-19 2022-01-04 武汉达梦数据库股份有限公司 Method and device for running ETL (extract-transform-load) process in Spark cluster
CN111142942B (en) * 2019-12-26 2023-08-04 远景智能国际私人投资有限公司 Window data processing method and device, server and storage medium
CN111142942A (en) * 2019-12-26 2020-05-12 远景智能国际私人投资有限公司 Window data processing method and device, server and storage medium
CN111104214A (en) * 2019-12-26 2020-05-05 北京九章云极科技有限公司 Workflow application method and device
CN111125230A (en) * 2019-12-30 2020-05-08 中电工业互联网有限公司 Data processing method and system of Internet of things platform based on rule engine
CN111324688A (en) * 2020-02-24 2020-06-23 南京莱斯网信技术研究院有限公司 Semi-structured data and unstructured data acquisition system based on events
CN111460772A (en) * 2020-02-28 2020-07-28 上海维信荟智金融科技有限公司 Automatic report processing method and system
CN111506638A (en) * 2020-03-03 2020-08-07 浙江大学 Method for automatically collecting supervision data
CN111460019A (en) * 2020-04-02 2020-07-28 中电工业互联网有限公司 Data conversion method and middleware of heterogeneous data source
CN111639083A (en) * 2020-04-10 2020-09-08 新智云数据服务有限公司 Management system of unified database management method
CN111581254A (en) * 2020-05-03 2020-08-25 上海维信荟智金融科技有限公司 ETL method and system based on internet financial data
CN111721355A (en) * 2020-05-14 2020-09-29 中铁第一勘察设计院集团有限公司 Railway contact net monitoring data acquisition system
CN111666324A (en) * 2020-05-18 2020-09-15 新浪网技术(中国)有限公司 ETL scheduling method and device between relational databases
CN111666324B (en) * 2020-05-18 2023-06-27 新浪技术(中国)有限公司 ETL scheduling method and device between relational databases
CN111737242A (en) * 2020-06-19 2020-10-02 福建南威软件有限公司 Method for monitoring mass data processing process
CN111882203B (en) * 2020-07-24 2022-12-02 山东管理学院 Traditional Chinese medicine cloud service experimental system
CN111882203A (en) * 2020-07-24 2020-11-03 山东管理学院 Traditional Chinese medicine cloud service experimental system
CN111881154A (en) * 2020-07-29 2020-11-03 北京浪潮数据技术有限公司 ETL task processing method, device and related equipment
CN111897865A (en) * 2020-08-13 2020-11-06 工银科技有限公司 Dynamic adjustment method and device for ETL (extract transform load) working load
CN112035468A (en) * 2020-08-24 2020-12-04 杭州览众数据科技有限公司 Multi-data-source ETL tool based on memory calculation and web visual configuration
CN112052284A (en) * 2020-08-26 2020-12-08 南京越扬科技有限公司 Main data management method and system under big data
CN111966394B (en) * 2020-08-28 2024-05-31 珠海格力电器股份有限公司 ETL-based data analysis method, device, equipment and storage medium
CN111966394A (en) * 2020-08-28 2020-11-20 珠海格力电器股份有限公司 ETL-based data analysis method, device, equipment and storage medium
CN112181959A (en) * 2020-09-15 2021-01-05 山东特检鲁安工程技术服务有限公司 Special equipment multi-source data processing platform and processing method
CN112100227A (en) * 2020-09-22 2020-12-18 国网辽宁省电力有限公司电力科学研究院 Big data processing method based on multilevel heterogeneous data storage
CN112164430A (en) * 2020-10-12 2021-01-01 深圳晶泰科技有限公司 Data processing method and system for drug research and development
WO2022077166A1 (en) * 2020-10-12 2022-04-21 深圳晶泰科技有限公司 Data processing method and system for drug research and development
CN112164430B (en) * 2020-10-12 2024-05-31 深圳晶泰科技有限公司 Data processing method and system for drug development
CN112162754A (en) * 2020-10-19 2021-01-01 科技谷(厦门)信息技术有限公司 Multi-source heterogeneous data processing system
CN112307103A (en) * 2020-10-30 2021-02-02 山东浪潮通软信息科技有限公司 Big data rendering method and device and computer readable medium
CN112433998B (en) * 2020-11-20 2022-01-21 广东电网有限责任公司佛山供电局 Multisource heterogeneous data acquisition and convergence system and method based on power system
CN112433998A (en) * 2020-11-20 2021-03-02 广东电网有限责任公司佛山供电局 Multisource heterogeneous data acquisition and convergence system and method based on power system
CN112434102A (en) * 2020-11-25 2021-03-02 深圳前海微众银行股份有限公司 Data visualization system for multiple data sources
CN112527783A (en) * 2020-11-27 2021-03-19 中科曙光南京研究院有限公司 Data quality probing system based on Hadoop
CN112527783B (en) * 2020-11-27 2024-05-24 中科曙光南京研究院有限公司 Hadoop-based data quality exploration system
CN112434016A (en) * 2020-12-11 2021-03-02 上海中通吉网络技术有限公司 Universal billion-level data heterogeneous migration method, device and equipment
CN112732828A (en) * 2020-12-22 2021-04-30 航天信息股份有限公司 Cross-platform data sharing method based on data warehouse tool
CN112527885A (en) * 2020-12-23 2021-03-19 民生科技有限责任公司 System and method for data processing based on rule configuration in ETL
CN112565042A (en) * 2020-12-24 2021-03-26 航天科工网络信息发展有限公司 Method for exchanging star-structured data
CN112667615A (en) * 2020-12-25 2021-04-16 广东电网有限责任公司电力科学研究院 Data cleaning system and method
CN112667615B (en) * 2020-12-25 2022-02-15 广东电网有限责任公司电力科学研究院 Data cleaning system and method
CN112597203A (en) * 2020-12-28 2021-04-02 恩亿科(北京)数据科技有限公司 General data monitoring method and system based on big data platform
CN112632177A (en) * 2020-12-31 2021-04-09 中国农业银行股份有限公司 Data loading operation generation method
CN112765121A (en) * 2021-01-08 2021-05-07 北京虹信万达科技有限公司 Administration and application system based on big data service
CN112860776A (en) * 2021-01-20 2021-05-28 山东众阳健康科技集团有限公司 Method and system for extracting and scheduling various data
CN112860776B (en) * 2021-01-20 2022-12-06 众阳健康科技集团有限公司 Method and system for extracting and scheduling various data
CN112860675A (en) * 2021-02-06 2021-05-28 高云 Big data processing method under online cloud service environment and cloud computing server
CN112925767A (en) * 2021-03-03 2021-06-08 浪潮云信息技术股份公司 Multi-data-source dynamic data synchronization management method and system based on internet supervision
CN112905420A (en) * 2021-03-04 2021-06-04 广东电网有限责任公司 Data monitoring system, method, electronic device and storage medium
CN113177088B (en) * 2021-04-02 2023-07-04 北京科技大学 Multi-scale simulation big data management system for material irradiation damage
CN113177088A (en) * 2021-04-02 2021-07-27 北京科技大学 Multi-scale simulation big data management system for material irradiation damage
CN113111105A (en) * 2021-04-06 2021-07-13 创意信息技术股份有限公司 Data customized access method and system based on big data
CN113111109A (en) * 2021-04-06 2021-07-13 创意信息技术股份有限公司 Interface warehousing analysis access method of data source
CN113111107B (en) * 2021-04-06 2023-10-13 创意信息技术股份有限公司 Data comprehensive access system and method
CN113111107A (en) * 2021-04-06 2021-07-13 创意信息技术股份有限公司 Data comprehensive access system and method
CN113111111A (en) * 2021-04-06 2021-07-13 创意信息技术股份有限公司 Multi-data source database access method
CN113051329A (en) * 2021-04-12 2021-06-29 平安国际智慧城市科技股份有限公司 Interface-based data acquisition method, device, equipment and storage medium
CN113051329B (en) * 2021-04-12 2024-03-15 平安国际智慧城市科技股份有限公司 Data acquisition method, device, equipment and storage medium based on interface
CN113407607A (en) * 2021-06-22 2021-09-17 中国联合网络通信集团有限公司 Multi-cloud heterogeneous data processing method and device and electronic equipment
CN113407607B (en) * 2021-06-22 2023-06-27 中国联合网络通信集团有限公司 Multi-cloud heterogeneous data processing method and device and electronic equipment
CN113485747A (en) * 2021-07-08 2021-10-08 广州钛动科技有限公司 Data processing method, data processor, target source component and system
CN113535835A (en) * 2021-07-12 2021-10-22 上海浦东发展银行股份有限公司 Data acquisition method, device, medium and equipment of kernel data processing software
CN113407734B (en) * 2021-07-14 2023-05-19 重庆富民银行股份有限公司 Method for constructing knowledge graph system based on real-time big data
CN113485894A (en) * 2021-07-14 2021-10-08 深信服科技股份有限公司 Data acquisition method, device and equipment and readable storage medium
CN113407734A (en) * 2021-07-14 2021-09-17 重庆富民银行股份有限公司 Construction method of knowledge map system based on real-time big data
CN113506098A (en) * 2021-09-10 2021-10-15 国能信控互联技术有限公司 Power plant metadata management system and method based on multi-source data
CN114064643A (en) * 2021-11-11 2022-02-18 南京熊猫电子股份有限公司 Task type data conversion system based on Oracle
CN114064777A (en) * 2021-11-19 2022-02-18 杭州雷数科技有限公司 Configurable method for acquiring data at fixed time, scheduling data, encrypting transmission and visualizing
CN114168672A (en) * 2021-12-13 2022-03-11 明觉科技(北京)有限公司 Log data processing method, device, system and medium
CN114379608A (en) * 2021-12-13 2022-04-22 中铁南方投资集团有限公司 Multi-source heterogeneous data integration processing method for urban rail transit engineering
CN114168672B (en) * 2021-12-13 2022-09-23 明觉科技(北京)有限公司 Log data processing method, device, system and medium
CN114936245A (en) * 2022-04-28 2022-08-23 北京远舢智能科技有限公司 Method and device for integrating and processing multi-source heterogeneous data
CN114817393A (en) * 2022-06-24 2022-07-29 深圳市信联征信有限公司 Data extraction and cleaning method and device and storage medium
CN114817393B (en) * 2022-06-24 2022-09-16 深圳市信联征信有限公司 Data extraction and cleaning method and device and storage medium
CN115086303A (en) * 2022-06-29 2022-09-20 徐工汉云技术股份有限公司 Multi-data-source data repeater and design method thereof
CN115086303B (en) * 2022-06-29 2024-05-17 徐工汉云技术股份有限公司 Multi-data source data repeater and design method thereof
CN116016032A (en) * 2023-01-06 2023-04-25 广西电子口岸有限公司 Customs service complex message packaging method
CN116016032B (en) * 2023-01-06 2023-08-11 广西电子口岸有限公司 Customs service message packaging method
CN115796457A (en) * 2023-02-03 2023-03-14 山东铁路投资控股集团有限公司 Personnel and enterprise rating method and system based on multidimensional data
CN116775737A (en) * 2023-06-21 2023-09-19 上海腾道信息技术有限公司 Method and system for automatically generating ETL configuration
CN116775737B (en) * 2023-06-21 2024-04-30 上海腾道信息技术有限公司 Method and system for automatically generating ETL configuration
CN117271648A (en) * 2023-11-23 2023-12-22 北京邮电大学 Adaptation method of bottom data model and storage medium
CN117312103B (en) * 2023-11-30 2024-03-01 山东麦港数据系统有限公司 Hot-pluggable distributed heterogeneous data source data scheduling processing system
CN117312103A (en) * 2023-11-30 2023-12-29 山东麦港数据系统有限公司 Hot-pluggable distributed heterogeneous data source data scheduling processing system
CN117539605B (en) * 2024-01-09 2024-03-19 无锡挚达物联科技有限公司 Data processing program assembling method, device, equipment and storage medium
CN117539605A (en) * 2024-01-09 2024-02-09 无锡挚达物联科技有限公司 Data processing program assembling method, device, equipment and storage medium
CN117555586B (en) * 2024-01-11 2024-03-22 之江实验室 Algorithm application publishing, managing and scoring method
CN117555586A (en) * 2024-01-11 2024-02-13 之江实验室 Algorithm application publishing, managing and scoring method

Similar Documents

Publication Publication Date Title
CN108846076A (en) The massive multi-source ETL process method and system of supporting interface adaptation
CN103152393B (en) A kind of charging method of cloud computing and charge system
CN104639374B (en) A kind of application deployment management system
CN105608758B (en) A kind of big data analysis platform device and method calculated based on algorithm configuration and distributed stream
CN202058147U (en) Distribution type real-time database management system
CN105005570B (en) Magnanimity intelligent power data digging method and device based on cloud computing
CN105893628A (en) Real-time data collection system and method
CN107733986A (en) Support the protection of integrated deployment and monitoring operation big data support platform
CN103235820B (en) Date storage method and device in a kind of group system
JP2015146183A (en) Managing big data in process control systems
KR20150112357A (en) Sensor data processing system and method thereof
CN103973815A (en) Method for unified monitoring of storage environment across data centers
CN102955977A (en) Energy efficiency service method and energy efficiency service platform adopting same on basis of cloud technology
CN104376365A (en) Method for constructing information system running rule libraries on basis of association rule mining
CN106201754A (en) Mission bit stream analyzes method and device
CN105577411B (en) Cloud service monitoring method and device based on service origin
CN109542057A (en) Novel maintenance model and its construction method based on virtual Machine Architecture
CN107612984B (en) Big data platform based on internet
CN101072130A (en) Network performance measuring method and system
CN103795575A (en) Multi-data-centre-oriented system monitoring method
CN112365366A (en) Micro-grid management method and system based on intelligent 5G slice
CN114448094A (en) Data sharing system based on platform area intelligent service terminal edge calculation
Kalim et al. Henge: Intent-driven multi-tenant stream processing
CN115391444A (en) Heterogeneous data acquisition and interaction method, device, equipment and storage medium
CN103945005A (en) Multiple evaluation indexes based dynamic load balancing framework

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20181120

RJ01 Rejection of invention patent application after publication