CN108846076A - The massive multi-source ETL process method and system of supporting interface adaptation - Google Patents
The massive multi-source ETL process method and system of supporting interface adaptation Download PDFInfo
- Publication number
- CN108846076A CN108846076A CN201810588231.7A CN201810588231A CN108846076A CN 108846076 A CN108846076 A CN 108846076A CN 201810588231 A CN201810588231 A CN 201810588231A CN 108846076 A CN108846076 A CN 108846076A
- Authority
- CN
- China
- Prior art keywords
- data
- etl
- job
- conversion
- interface
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses the massive multi-source ETL process method and system of supporting interface adaptation.Including:The essential information of data source and target database is arranged in data pick-up step, adaptively matches corresponding ETL tool for different data sources, and carry out parameter setting to ETL tool;Data conversion step completes the execution of ETL Operation control and management and running, carries out buffer-stored and management to the data extracted, and cleaning and conversion for completing data etc. is handled;Data object after conversion is carried out quality examination, and exported according to the table structure of data model definitions by data load step, and the data update after checking and finding correct is loaded onto target database;Data monitoring step is monitored management to ETL job execution process, operation resource service condition and running situation.Suitable ETL tool is adaptively matched, and realizes the extraction and conversion of mass data, realizes efficient execution and the orderly management of ETL operation.
Description
Technical field
The present invention relates to ETL management domain, in particular to a kind of massive multi-source ETL process side of supporting interface adaptation
Method and system.
Background technique
Industry has accumulated mass data at present, and capacity, type and the variation of data are all sharply increasing, but big data is not yet
Make full use of, wherein the immense value contained have it is to be excavated.Big data often has multi-source heterogeneous characteristic, from it is different, point
Scattered operation system, there are the multiple types such as structural data, semi-structured data, unstructured data, it is difficult to extract and turn
Change required data into.Under big data environment, data show large capacity, Suresh Kumar, interact the features such as frequent, with acquisition
Data are continuously increased, and data process method is gradually complicated, and be faced with massive multi-source data disparate databases it
Between efficiency of transmission problem.
Traditional ETL tool is expensive, very high to specific business dependence, and is centralized architecture, that is, designs, transports
Row management all concentrates on a server, and the requirement to hardware is very high.Under traditional ETL management mode, generally according to source
The attribute of database and target database, it is artificial to determine ETL tool, and ETL flow of task, setting parameter, starting task are set,
Such artificial ETL management mode process is complicated, consumes a large amount of manpower and time, and be unable to satisfy massive multi-source data
ETL job requirements.Therefore needs exploration can more economical, more efficiently execute ETL under big data environment and (extract, conversion, adds
Carry) operation device.
Summary of the invention
The object of the invention is to solve the above problems, proposing a kind of massive multi-source number of supporting interface adaptation
Interface adapter and ETL are based on for the massive multi-source data from different, dispersion system according to ETL method and system
Suitable ETL tool is adaptive selected in tools engine, and based on big datas processing techniques such as HDFS, MapReduce, Spark
Realize that the centrally stored and processing of the management of ETL job scheduling and efficiently execution and magnanimity complex data is converted.
To achieve the goals above, the present invention adopts the following technical scheme that:
As the first aspect of the present invention, the massive multi-source ETL process method of supporting interface adaptation is provided;
The massive multi-source ETL process method of supporting interface adaptation, including:
The essential information of data source and target database is arranged in data pick-up step, be different data sources adaptively
Parameter setting is carried out with corresponding ETL tool, and to ETL tool;It is connect by database interface, journal file interface or flow data
Mouth extracts different data sources;
Data conversion step completes ETL Operation control based on MapReduce and Spark Computational frame and executes and dispatch pipe
Reason carries out buffer-stored and management to the data extracted based on HDFS, Hive or HBase, and completes the cleaning of data and turn
It changes;
Data object after conversion is carried out quality examination by data load step, and according to data model definitions
Table structure output, the data update after checking and finding correct are loaded onto target database;
Monitoring management step is monitored ETL job execution process, operation resource service condition and running situation
Management.
As a further improvement of the present invention, the data pick-up step, including:
Data source and object library sub-step are set, the essential information of data source and target database is set, including:Database
Connection type, database IP between type, data source and target database, database-name, port, user name, password;
Adaptive matching ETL tool sub-step, for the adaptive corresponding ETL tool of matching of different data sources.
In the Adaptive matching ETL tool sub-step, if data source or target database are database data, if having
One side is non-relational database HDFS, then adaptively matches ETL tool Sqoop;Otherwise adaptive matching ETL tool
Kettle;If data source is journal file, ETL tool Flume is adaptively matched;If data source is flow data,
Adaptively match ETL tool Kafka.
ETL tool parameters configure sub-step, set after the completion of ETL tool matching, task parameters.
As a further improvement of the present invention, the data source, including:Database data, picture, audio file, video
File, journal file or flow data;Wherein, database data includes:Relevant database and non-relational database;Relationship type
Database, including:Oracle,MySQL,SQL Server;Non-relational database, including:HDFS,MongoDB,HBase.Day
Will file includes:From console (console), RPC (Thrift-RPC), text (file), tail (UNIX tail),
The various types and format of syslog (syslog log system supports 2 kinds of modes such as TCP and UDP), exec (order executes) etc.
Daily record data.
As a further improvement of the present invention, the target database realizes data sharing, report query, system application.
As a further improvement of the present invention, the ETL tool, including:Sqoop, Kettle, Flume or Kafka,
In, Sqoop is a Open-Source Tools, for carrying out data biography between Hadoop and traditional database (Oracle, MySQL etc.)
It passs;Kettle is a open source ETL tool, realizes data pick-up by core of workflow;Flume is the sea that Cloudera is provided
The system for measuring log collection, polymerization and transmission;Kafka is the open source stream process platform an of high-throughput.
As a further improvement of the present invention, the data conversion step, including:
Operation process design sub-step refers to according to actual service logic design project control flow, including extract mode and
ETL flow of task.
Job scheduling manages sub-step, including:Job scheduling strategy, job dependence control, job priority configuration, operation
Scheduling controlling, wherein job scheduling strategy includes time trigger, event triggering and immediate processing mode;Job dependence controls
Refer to the dependence formulated between operation according to actual service logic;Job priority configuration refers to according to actual service logic and is
Resource service condition of uniting formulates the priority of operation;Job scheduling control refers to setting job scheduling resource threshold value of warning, is providing
When source uses more than threshold value, the low operation of pause priority.
Job execution sub-step is responsible for the execution of ETL operation.
In the job execution sub-step, Sqoop starting only have map MapReduce operation, according to data cutting value by
Row reads data;Kettle establishes conversion Transformation and task Job, after each link task parameters are arranged, starts work
Make process and carries out data pick-up;Flume is placed in channel component and is delayed by its source collect components daily record data
It deposits, and destination is sent data to by sink component;Kafka collect resolve into after flow data a series of batch processing jobs by
Distributed elastic data set in Spark is handled in real time.
Distributed caching sub-step carries out buffer-stored to the data of extraction, and wherein HDFS is responsible for the storage of bottom data,
Hive is responsible for the filtering of data, summarizes, inquires, analyzing, and HBase is responsible for the change maintenance of data, calculates in data conversion
Journey is written infrequently the data taken and is stored;
Business rule formulates sub-step, according to practical business rule, formulates the business rule of data cleansing, conversion;
Data processing sub-step completes the cleaning and conversion of data, wherein data cleansing is complete according to the business rule of formulation
Filling a vacancy, correct and cleaning at data, data conversion complete the inconsistent conversion of data, data granularity conversion and standard handovers.
The inconsistent conversion:For example the same user is A01 in A system coding, is encoded to B01 in B system, it is such
Data pick-up is uniformly converted into a coding after coming;
The data granularity conversion:Data information as user M is stored in A system is very detailed, stores in B system
Data information it is then relatively simple, granularity is different, it is decimated come after need to polymerize its granularity;
The standard handovers:Such as business datum, in operation system A and system B due to difference of business rule etc.,
It has different standards in two systems, needs to seek unity of standard after extraction.
As a further improvement of the present invention, data load step, including:
The quality of data checks sub-step, the data object after conversion is carried out quality examination, to due to network interruption
Data exception problem caused by reason is verified, and checks whether the quality of data converted meets the mark of target database
It is quasi-;
Data update load sub-step, will be loaded into target database by the data to check and find correct, according to fixed in advance
The good data model of justice updates target matrix in such a way that timestamp, log sheet, full table compare, full table is deleted or insertion.
As a further improvement of the present invention, monitoring management step, including:
Monitoring operation manages sub-step, and implementation procedure and resource service condition to ETL operation are monitored;
The ETL job execution process monitoring sub-step, to include the job execution time, operation progress situation, whether surpass
When, job interruption, the information such as job stacking are monitored.The job execution time is monitored, time-out is set and is reminded, and by artificial judgment
Analyze job timeout's problem;Job execution log information is monitored, when there is job interruption, according to the interruption Restoration Mechanism of formulation,
Retriggered job execution;Operation progress situation is monitored, when there is job stacking, is lined up according to job priority, it is preferential to execute
The high operation of rank.
The ETL operation monitoring resource sub-step, is monitored the service condition of operation resource, if resource load is more than
Carry out adjustment of load when threshold value, pause or the low operation of stop section priority wait load to be down to threshold value or less and execute work again
Industry;
System monitoring manages sub-step, is monitored to machine hardware information, cluster running state information, and to first number
According to, database interface, journal file interface or flow data interface is managed.
As a second aspect of the invention, the massive multi-source ETL process system of supporting interface adaptation is provided;
The massive multi-source ETL process system of supporting interface adaptation, including:
The essential information of data source and target database is arranged in data extraction module, be different data sources adaptively
Parameter setting is carried out with corresponding ETL tool, and to ETL tool;It is connect by database interface, journal file interface or flow data
Mouth extracts different data sources;
Data conversion module completes ETL job execution and management and running, base based on MapReduce and Spark Computational frame
Buffer-stored and management are carried out to the data extracted in HDFS, Hive or HBase, and complete the cleaning and conversion of data;
Data object after conversion is carried out quality examination by data loading module, and according to data model definitions
Table structure output, the data update after checking and finding correct are loaded onto target database;
Monitoring management module is monitored ETL job execution process, operation resource service condition and running situation
Management.
As a further improvement of the present invention, the data source, including:Database data, picture, audio file, video
File, journal file or flow data;Wherein, database data, including:Relevant database and non-relational database;Relationship
Type database, including:Oracle, MySQL, SQL Server etc.;Non-relational database, including:HDFS,MongoDB,
HBase etc..The journal file, including:From console (console), RPC (Thrift-RPC), text (file), tail
(UNIXtail), syslog (syslog log system supports 2 kinds of modes such as TCP and UDP), exec's (order executes) etc. is each
The daily record data of seed type and format.
As a further improvement of the present invention, the target database, for realizing data sharing, report query and system
Using.
As a further improvement of the present invention, the data extraction module, including:
Interface adapter, data base-oriented data, journal file or flow data different types of data provide data-interface, and
Formulate ETL tool adaptation rule;The interface adapter further includes adaptation rule engine;
-4-
The data-interface, including:Database interface, journal file interface and flow data interface, wherein passing through database
Interface extracts database data from relevant database or non-relational database;By journal file interface, log is extracted
File;By flow data interface, flow data is extracted.
The adaptation rule engine is used to be arranged the essential information of data source and target database, including:Type of database,
Data source and target database connection type, database IP, database-name, port, user name, password;Adaptation rule includes
The adaptation rule of the decision rule of data source and target database type, difference ETL tool.
ETL tools engine, integrated and manage ETL tool, the ETL tool, including:Sqoop,Flume,Kettle,
Kafka, the isomeric data for database data, journal file, flow data from different data sources, Adaptive matching are suitable
ETL tool.
Wherein, Sqoop is a Open-Source Tools, for carrying out data transmitting between Hadoop database;
Kettle is a open source ETL tool, realizes data pick-up by core of workflow;
Flume is massive logs acquisition, polymerization and the system transmitted that Cloudera is provided;
Kafka is the open source stream process platform an of high-throughput.
In the ETL tools engine, ETL adaptation engine is according to the data formulated in the adaptation rule engine of interface adapter
The adaptation rule of the decision rule of source and target database, difference ETL tool is different data source capability ETL tool.
As a further improvement of the present invention, the data conversion module, including:
Job scheduling management engine, using the host node of distributed type assemblies as management and running engine, including Job Management Sheet
Member and task scheduling unit;
The job management unit, including:ETL job design, operation configuration and monitoring operation, wherein ETL job design
Refer to according to actual service logic fulfil assignment dependence, whether increment extraction or extract frequency Operation control process and set
Meter;Operation configures the configuration for referring to the priority that fulfils assignment, job execution mode parameter;Monitoring operation refers to ETL operation
Execution state and resource service condition are monitored management;
The task scheduling unit, including:Job scheduling strategy and job scheduling control, wherein job scheduling strategy packet
Include the modes such as time trigger, event triggering and real-time processing;Job scheduling control refers to setting job scheduling resource threshold value of warning,
When resource uses more than threshold value, the low operation of pause priority.
Job execution engine is responsible for the execution of ETL operation, is based on using the slave node of distributed type assemblies as enforcement engine
MapReduce Computational frame realizes the processed offline of ETL operation, and the real-time place of ETL operation is realized based on Spark Computational frame
Reason.
In the job execution engine, Sqoop starting only has the MapReduce operation of map, line by line according to data cutting value
Read data;Kettle establishes conversion Transformation and task Job, after each link task parameters are arranged, starts work
Process carries out data pick-up;Flume is placed in channel component and is cached by its source collect components daily record data,
And destination is sent data to by sink component;Kafka collect resolve into after flow data a series of batch processing jobs by
Distributed elastic data set in Spark is handled in real time.
Distributed caching submodule carries out buffer-stored and management to the data of extraction, and wherein HDFS is responsible for bottom data
Storage, Hive is responsible for the filtering of data, summarizes, inquires or analyze;HBase is responsible for the change maintenance of data, turns in data
It changes calculating process and is written infrequently the data taken and stored;
Business Rule Engine formulates cleaning rule to deficiency of data, wrong data and dirty data according to practical business rule
Then, transformation rule is formulated to the isomeric data from different business systems;
Data processing submodule, including:Data cleansing unit and Date Conversion Unit;According to practical business rule, complete
The cleaning and conversion of data;
The data cleansing unit completes filling a vacancy, correct and cleaning for data, wherein data are filled a vacancy deficiency of data
Missing information and mismatch information carry out completion;Wrong data is modified by data correction according to specific business;Data cleansing
Cleaning rule is designed for dirty data and determines its correctness;
The Date Conversion Unit is by the magnanimity M IS from different business systems at needed for target database
Data, wherein it is inconsistent conversion by the data of the same type from different business systems carry out unification;Data granularity conversion
According to target database granularity, the operation system data extracted are polymerize;Data standard conversion is according to pre-establishing
Normal data model will extract data conversion into normal data needed for object library.
As a further improvement of the present invention, data loading module, including:
Data object after conversion is carried out quality examination, to due to network interruption by quality examination submodule
Caused by data exception problem verified, and check whether the quality of data that converts meets the standard of target database;
Data update load submodule, will be loaded into target database by the data to check and find correct, according to fixed in advance
The good data model of justice updates target matrix in such a way that timestamp, log sheet, full table compare, full table is deleted or insertion.
As a further improvement of the present invention, monitoring management module, including:
Monitoring operation submodule is monitored the implementation procedure of ETL operation, including the job execution time, whether time-out,
Job interruption, job stacking, operation progress situation, operation resource service condition;
Load balancing submodule, according to ETL operation execution situation assess resource load situation, load be more than threshold value when into
Row adjustment of load, pause or the low operation of stop section priority wait load to be down to threshold value or less and execute operation again;
Metadata management submodule, data of the metadata as description data attribute information save the definition of data source, turn
Definition, the implementation procedure definition for changing rule, are managed metadata information involved by ETL operation;
Interface management submodule carries out interface definition, agreement to database interface, journal file interface, flow data interface
Adaptation, data encapsulation, support http, ftp, rest, webservice interface protocol.
Run monitoring submodule, monitor and collecting robot hardware information, resource information, load information, cluster component states,
Cluster operation information, real-time perception and the operation conditions for analyzing cluster.
The beneficial effects of the invention are as follows:
1. towards the magnanimity structuring from different, dispersion system, the isomeric datas such as semi-structured, unstructured, base
In interface adapter and ETL tools engine be the suitable ETL tool of its adaptive matching.
2. high performance cloud ETL managing device carries out ETL based on distributed type assemblies and MapReduce, Spark frame
Job scheduling management and execution realize the orderly management of complicated big data ETL operation and efficiently execute.
3. being cached based on distributed system HDFS to mass data, and carried out based on MapReduce and Spark frame
Data cleansing and conversion, realize different decentralized system data efficient decimation, it is centrally stored with processing, be beneficial to data sharing
With strengthened research.
Detailed description of the invention
The accompanying drawings constituting a part of this application is used to provide further understanding of the present application, and the application's shows
Meaning property embodiment and its explanation are not constituted an undue limitation on the present application for explaining the application.
The position Fig. 1 flow chart of the method for the present invention;
Fig. 2 is functional module connection figure of the invention;
Fig. 3 is cloud ETL tool matching flow chart;
Fig. 4 is ETL job execution and management and running flow chart;
Fig. 5 is concrete application embodiment of the invention.
Specific embodiment
It is noted that following detailed description is all illustrative, it is intended to provide further instruction to the application.Unless another
It indicates, all technical and scientific terms used herein has usual with the application person of an ordinary skill in the technical field
The identical meanings of understanding.
It should be noted that term used herein above is merely to describe specific embodiment, and be not intended to restricted root
According to the illustrative embodiments of the application.As used herein, unless the context clearly indicates otherwise, otherwise singular
Also it is intended to include plural form, additionally, it should be understood that, when in the present specification using term "comprising" and/or " packet
Include " when, indicate existing characteristics, step, operation, device, component and/or their combination.
It is the massive multi-source ETL process method flow diagram of supporting interface adaptation of the present invention with reference to Fig. 1, including following
Step:
Process 101:Data source and object library are set, the essential information of data source and target database is set, including:Data
Library type, connection type, database IP, database-name, port, user name, password.
Process 102:Adaptive matching ETL tool is the different data sources such as database data, journal file, flow data
The ETL tool such as adaptive matching Sqoop, Kettle, Flume, Kafka.
Process 103:The configuration of ETL tool parameters, set after the completion of ETL tool matching, task parameters etc. it is initial
Parameter value.
Process 104:Operation process design designs ETL Operation control process, including data pick-up according to actual service logic
Mode, and specific ETL work flow corresponding to difference ETL tool.
Process 105:Job scheduling management, to job scheduling strategy, job dependence relationship, job priority, job scheduling
Etc. being managed for configuration.
Process 106:Job execution is responsible for the execution of specific ETL operation.
Process 107:ETL monitoring operation management, implementation procedure and resource service condition to ETL operation are monitored.
Process 108:Distributed caching carries out buffer-stored to the data of extraction, and wherein HDFS is responsible for depositing for bottom data
Storage, Hive are responsible for the management of data, and HBase is responsible for the change maintenance of data, frequent in data conversion calculating process to those
The data of reading are stored.
Process 109:Business rule is formulated, and the business rule of data cleansing, conversion is formulated according to actual service logic.
Process 110:The cleaning and conversion of data, wherein data cleansing are completed in data processing according to the business rule of formulation
Filling a vacancy, correct and cleaning for data is completed, data conversion is completed the inconsistent conversion of data, data granularity conversion and standard and turned
It changes.
Process 111:Quality of data inspection carries out quality examination to the data after conversion, to due to network interruption etc.
Data exception problem caused by reason is verified, and checks whether the quality of data converted meets the mark of target database
It is quasi-.
Process 112:Data update load, will be loaded into target database by the data to check and find correct, according to preparatory
The data model defined is compared using timestamp, log sheet, full table, full table deletes the modes such as insertion and updates target matrix.
Process 113:System monitoring management is monitored the information such as machine hardware information, cluster operating status, and to member
Data, database interface, journal file interface, flow data interface etc. are managed.
It is the massive multi-source ETL process system of supporting interface adaptation of the present invention, including data pick-up mould with reference to Fig. 2
Block, data conversion module, data loading module, monitoring management module.
Data extraction module is made of interface adapter, ETL tools engine.Wherein interface adapter include data-interface and
Adaptation rule engine specifies data source and object library, designs decimation rule, completes interface adaptation.ETL tools engine is integrated and is managed
The ETL tools such as Sqoop, Flume, Kettle, Kafka are managed, according to the matching pair adaptive to different data of interface adaptation rule
The ETL tool answered.
Data conversion module is drawn by job scheduling management engine, job execution engine, distributed caching area, business rule
It holds up, data processing submodule composition.
Job scheduling management engine is responsible for the management and task schedule of ETL operation.
Job execution engine is responsible for the execution of specific ETL operation, realizes ETL operation based on MapReduce Computational frame
Processed offline realizes the real-time processing of ETL operation based on Spark Computational frame.
Distributed caching area carries out buffer-stored to the data of extraction, and wherein HDFS is responsible for the storage of bottom data, Hive
It is responsible for the management of data, HBase is responsible for the change maintenance of data, is stored in data conversion calculating process and is written infrequently the number taken
According to;
Business Rule Engine formulates the business rule of data cleansing, conversion according to actual service logic;
Data processing submodule completes the cleaning and conversion of data according to practical business rule;
Data loading module updates submodule by quality examination submodule, load and forms, and is responsible for the data converted
It is checked, and loads and be updated to target database.
Monitoring management module, including:Monitoring operation submodule, load balancing submodule, metadata management submodule, interface
Manage submodule, operation monitoring submodule.
Monitoring operation submodule is monitored the implementation procedure of ETL operation, handles operation abnormal conditions;
Load balancing submodule is assessed resource load situation according to operation execution situation, is loaded by load migration
Adjustment, realizes the lasting maximization of the utilization of resources;
Metadata management submodule carries out pipe to metadata such as data source definitions, transformation rule definition, implementation procedure definition
Reason;
Interface management submodule, to the interfaces such as database interface, journal file interface carry out interface definition, protocol adaptation,
Data encapsulation;
Run monitoring submodule, monitor and collecting robot hardware information, resource information, load information, cluster component states,
Cluster operation information, real-time perception and the operation conditions for analyzing cluster.
With reference to Fig. 3, it is cloud ETL tool matching flow chart of the present invention, includes the following steps:
Process 301:User specifies data source and object library, and database information, including type of database, connection class is arranged
Type, database IP, database-name, port, user name, password etc..
Process 302:According to data source and object library type, ETL tool is adaptively matched, if database data, is turned
To process 203;If log file data, process 206 is gone to;If flow data, process 207 is gone to.
Process 303:Process 204 is gone to if it is HDFS that data source and object library, which have a side, for database data;Its
He then goes to process 205 at type.
Process 304:For the data pick-up between HDFS and other databases, ETL tool Sqoop is adaptively matched.
Process 305:For between the relational datas such as Oracle, MySQL, and with the non-relationals data such as MongoDB
The data pick-up in library adaptively matches ETL tool Kettle.
Process 306:For log file data, ETL tool Flume is matched, acquisition comes from console (console), RPC
(Thrift-RPC), (syslog log system supports TCP and UDP etc. 2 by text (file), tail (UNIX tail), syslog
Kind of mode), the daily record datas of the various types of exec (order executes) etc. and format.
Process 307:For flow data, ETL tool Kafka is matched, the flow data high to requirement of real-time is acquired number
According to.
Process 308:ETL tool matching is completed.
With reference to Fig. 4, it is ETL job execution and management and running flow chart of the present invention, includes the following steps:
Process 401:According to the objectives and tasks of data pick-up, actual service logic is combed.
Process 402:ETL Operation control process is created, the execution process of ETL operation, packet are designed according to actual service logic
It includes job dependence relationship, whether be increment extraction, extraction frequency etc..Wherein Kettle is to visualize rapid configuration, Sqoop,
The ETL tool such as Flume, Kafka needs write order to execute.
Process 403:ETL job scheduling strategy, including the triggering of time trigger, event and processing in real time are set.
Process 404:ETL job priority is set, is configured according to priority of the actual service logic to ETL operation,
So as to the high operation of the priority scheduling priority in inadequate resource.
Process 405:ETL job scheduling control strategy is set, job scheduling resource threshold value of warning is set, super using resource
When crossing threshold value, the low ETL operation of pause priority.
Process 406:Select job execution mode, including locally execute, remotely execute, cluster execute etc. modes.
Process 407:Operation is executed, ETL job execution is started according to operation configuration parameter.Sqoop starting only has map's
MapReduce operation reads data according to data cutting value line by line;Kettle establishes conversion Transformation and task
Job, after each link task parameters are arranged, starting workflow carries out data pick-up;Flume passes through its source collect components day
Will data are placed in channel component and are cached, and send data to destination by sink component;Kafka collects stream
A series of batch processing jobs are resolved into after data to be handled in real time by the distributed elastic data set in Spark.
Process 408:ETL job execution state is monitored, suspends Partial Jobs when memory source is more than threshold value, waits until money
Operation is executed when below the near threshold value in source again, realizes load balancing.
Process 409:Whether Inspection execution is completed, if all executing completion, terminates.
It is the embodiment of the present invention with reference to Fig. 5, including data source, data extraction module, data conversion module, data cloud are put down
The modules such as platform.
Power information acquisition system, sales service application system, life of the power information big data cloud platform from dispersion everywhere
It produces and extracts subscriber profile data, acquisition in the peripheral systems such as management platform, the hot multiple-in-one system of electricity-water-gas, preposition communications platform
The multiple types of data such as data, statistical query data, management data, communication data, monitoring data carry out unified store and analyze, and are
The strengthened research of power information big data provides data and supports, and the flow datas such as monitoring information of acquisition server in real time, it is ensured that
The stable operation of platform.
Data source:The flow data of database data and Platform Server including peripheral system.Wherein power information is adopted
The database of the peripheral systems such as collecting system is mostly oracle database and HDFS distributed file system, the basis including structuring
Data and document, picture, audio-video etc. are semi-structured, unstructured data.Distributed storage, access effect based on HDFS
The high characteristic of rate, power information big data is stored in HDFS more in the peripheral systems such as power information acquisition system, is covered big
The structural datas such as partial base profile data, acquisition data, statistical query data, monitoring data and document, picture,
Audio-video etc. is semi-structured, unstructured data;Part basis file data is stored in oracle database, and uses frequency
The lower statistical data analysis of rate.The difference of access efficiency based on HDFS and oracle database institute storing data type, from
Semi-structured, the unstructured data that most structural data and whole are extracted in HDFS, from oracle database
Extract the unexistent part basis file data of HDFS and statistical data analysis.It is specific as shown in table 1:
The data source information of 1 power information big data cloud platform of table
Data extraction module is made of database interface, adaptation engine, ETL tools engine.
Wherein oracle database and HDFS of the database interface towards peripheral systems such as power information acquisition systems provide
Database interface, the flow data that object platform server extracts provide flow data interface;
Adaptation engine provides ETL tool adaptation rule towards power information big data, the judgement rule including data source types
Then, the decision rule of type of database, the matching rule of ETL tool.
ETL tools engine is that the different types of data from different data sources matches corresponding ETL work according to adaptation rule
Tool, first determines whether the type of data source, if the real-time acquisition of Platform Server flow data, then matches Kafka;
If the data from peripheral system database, then further type of database is judged, acquire for power information
Oracle number in the peripheral systems such as the hot multiple-in-one system of system, sales service application system, production management platform, electricity-water-gas
According to storehouse matching ETL tool Kettle;For power information acquisition system, the hot multiple-in-one system of electricity-water-gas, preposition communications platform
HDFS in equal peripheral systems then matches ETL tool Sqoop;Set after the completion of ETL tool matching, task parameters,
The initial parameter values such as database connection, user name, password, permission.
Data conversion module is realized based on distributed type assemblies, is drawn by Operation control engine, job scheduling engine, job execution
It holds up, data processing loading module composition.
Operation control engine in the data conversion module is patrolled according to the practical business of power information big data cloud platform
Design ETL Operation control process is collected, the design cycle mode of different ETL tools is different, as shown in table 2:
The setting of 2 ETL Operation control process of table
Job scheduling engine in the data conversion module completes the job priority that power information big data extracts task
Grade configuration, job scheduling and monitoring, scheduling strategy are as shown in table 3:
3 ETL job scheduling strategy of table
Job execution engine in the data conversion module is according to established power information big data ETL Operation control
Process, which specifically to execute ETL, extracts operation.Wherein Sqoop starting only has the MapReduce operation of map, is joined according to data cutting
Data are carried out cutting by numerical value, and the region cut out is assigned in different map, and each map is according in data capsule
The metadata information of storage, reads data from HDFS line by line.Data pick-up based on Kettle is mainly established
Transformation (conversion) and Job (task) realize, wherein Transformation can be by the visual design by each ring
Section is added in main window, realizes the connection between each link, the entitled .ktr of the file extent of formation.Job is based on workflow mould
Type executes designed convert task, the entitled .kjb of file extent.Flow data batch processing job is then based on SparkStreaming
It is handled, flow data is resolved into a series of short and small batch processing jobs at set intervals first, then will be criticized
Operation changing is handled into the elasticity distribution formula data set RDDS in Spark, data processing is carried out by RDDs, including mapping (map),
It filters (filter), polymerize (join) and grouping (group by) etc..
Data processing loading module is based on distributed computing framework MapReduce and Spark and is realized, including data are slow
Storing module, data cleansing module, data conversion module, data loading module.
Data cache module in the data processing loading module is made of HDFS, Hive, HBaase, to what is extracted
Power information big data is cached, and wherein HDFS is responsible for data storage, and Hive is responsible for data management, and HBase is responsible for data change
More safeguard.
Data cleansing module in the data processing loading module, filling a vacancy including power information big data, correct and
Cleaning, wherein data fill a vacancy the missing information of deficiency of data and mismatch information progress completion, and data correction will be in data
Do not have progress logic judgment that wrong data caused by database is just written when reception to be modified according to specific business, data cleansing needle
Cleaning rule is designed to dirty data, and determines its correctness.
Data conversion module in the data processing loading module, the inconsistent conversion of completion power information big data,
Data granularity and standard handovers, inconsistent conversion complete the whole of data according to power information big data feature and business rule
It closes, the data of the same type from different business systems is subjected to unification;Data granularity conversion then will be according to target data storehouse
Operation system data polymerize by library granularity;The mark that data standard conversion is then formulated according to power information big data cloud platform
Quasi- data model, by the data conversion extracted at normal data needed for platform.Data translation tasks be based on MapReduce and
Spark is realized, wherein the off-line calculation processing of magnanimity electricity consumption big data is realized based on MapReduce frame, it is parallel based on Spark
The real-time calculation processing of computing architecture realization big data.
The data cleaned, converted are loaded by the data loading module in the data processing loading module uses telecommunications
Cease big data cloud platform.
Data cloud platform, that is, data load object library, is uniformly stored using HDFS distributed file system from dispersion
The different types of magnanimity power information data that data source is extracted are divided into storage facility located at processing plant and analysis library by data type.
Power information big data cloud platform it is integrated using cloud ETL managing device and effectively manage Sqoop, Kettle,
The ETL tool such as Kafka, for the Oracle number of the peripheral systems such as power information acquisition system, the preposition communications platform of dispersion everywhere
According to the corresponding ETL tool of the flow data Adaptive matching of library, HDFS and server, and it is based on distributed computing framework
MapReduce and parallel processing Computational frame Spark complete magnanimity structuring, semi-structured, unstructured data extraction with
Conversion can be realized efficient execution and the orderly management of ETL operation, realize the centrally stored of magnanimity power information big data and place
Reason is beneficial to be realized data sharing based on the mass data in power information big data cloud platform and carries out strengthened research.
The foregoing is merely preferred embodiment of the present application, are not intended to limit this application, for the skill of this field
For art personnel, various changes and changes are possible in this application.Within the spirit and principles of this application, made any to repair
Change, equivalent replacement, improvement etc., should be included within the scope of protection of this application.
Claims (10)
1. the massive multi-source ETL process method of supporting interface adaptation, characterized in that including:
The essential information of data source and target database is arranged in data pick-up step, adaptively matches phase for different data sources
The ETL tool answered, and parameter setting is carried out to ETL tool;It is taken out by database interface, journal file interface or flow data interface
Take different data sources;
Data conversion step completes the execution of ETL Operation control and management and running, base based on MapReduce and Spark Computational frame
Buffer-stored and management are carried out to the data extracted in HDFS, Hive or HBase, and complete the cleaning and conversion of data;
Data object after conversion is carried out quality examination by data load step, and according to the table knot of data model definitions
Structure output, the data update after checking and finding correct are loaded onto target database;
Monitoring management step is monitored pipe to ETL job execution process, operation resource service condition and running situation
Reason.
2. the massive multi-source ETL process method of supporting interface adaptation as described in claim 1, characterized in that the number
According to extraction step, including:
Data source and object library sub-step are set, the essential information of data source and target database is set, including:Class database
Connection type, database IP between type, data source and target database, database-name, port, user name, password;
Adaptive matching ETL tool sub-step, for the adaptive corresponding ETL tool of matching of different data sources;
In the Adaptive matching ETL tool sub-step, if data source or target database are database data, if there is a side
For non-relational database HDFS, then ETL tool Sqoop is adaptively matched;Otherwise adaptive matching ETL tool
Kettle;If data source is journal file, ETL tool Flume is adaptively matched;If data source is flow data,
Adaptively match ETL tool Kafka;
ETL tool parameters configure sub-step, set after the completion of ETL tool matching, task parameters.
3. the massive multi-source ETL process method of supporting interface adaptation as described in claim 1, characterized in that the number
According to switch process, including:
Operation process design sub-step refers to according to actual service logic design project control flow, including extracts mode and ETL
Business process;
Job scheduling manages sub-step, including:Job scheduling strategy, job dependence control, job priority configuration, job scheduling
Control, wherein job scheduling strategy includes time trigger, event triggering and immediate processing mode;Job dependence control refers to root
The dependence between operation is formulated according to actual service logic;Job priority configuration refers to according to actual service logic and system money
The priority of source service condition formulation operation;Job scheduling control refers to setting job scheduling resource threshold value of warning, makes in resource
When with being more than threshold value, the low operation of pause priority;
Job execution sub-step is responsible for the execution of ETL operation;
In the job execution sub-step, Sqoop starting only has the MapReduce operation of map, is read line by line according to data cutting value
Access evidence;Kettle establishes conversion Transformation and task Job, after each link task parameters are arranged, starts workflow
Cheng Jinhang data pick-up;Flume is placed in channel component and is cached by its source collect components daily record data, and
Destination is sent data to by sink component;Kafka resolves into a series of batch processing jobs by Spark after collecting flow data
In distributed elastic data set handled in real time;
Distributed caching sub-step carries out buffer-stored to the data of extraction, and wherein HDFS is responsible for the storage of bottom data, Hive
It is responsible for the filtering of data, summarizes, inquire, analyzing, HBase is responsible for the change maintenance of data, in data conversion calculating process quilt
The data frequently read are stored;
Business rule formulates sub-step, according to practical business rule, formulates the business rule of data cleansing, conversion;
Data processing sub-step completes the cleaning and conversion of data according to the business rule of formulation, and wherein number is completed in data cleansing
According to fill a vacancy, correct and clean, data conversion complete data inconsistent conversion, data granularity conversion and standard handovers.
4. the massive multi-source ETL process method of supporting interface adaptation as described in claim 1, characterized in that data add
Step is carried, including:
The quality of data checks sub-step, the data object after conversion is carried out quality examination, to due to network interruption
Caused by data exception problem verified, and check whether the quality of data that converts meets the standard of target database;
Data update load sub-step, will be loaded into target database by the data that check and find correct, according to pre-defining
Data model, using timestamp, log sheet, full table compare, full table delete or insertion by the way of update target matrix.
5. the massive multi-source ETL process method of supporting interface adaptation as described in claim 1, characterized in that monitoring pipe
Step is managed, including:
Monitoring operation manages sub-step, and implementation procedure and resource service condition to ETL operation are monitored;
The ETL job execution process monitoring sub-step, to including job execution time, operation progress situation, whether overtime, work
Industry is interrupted, job stacking information is monitored;The job execution time is monitored, time-out is set and is reminded, and is analyzed and is made by artificial judgment
Industry timeout issue;Job execution log information is monitored, when there is job interruption, according to the interruption Restoration Mechanism of formulation, is touched again
Send out job execution;Operation progress situation is monitored, when there is job stacking, is lined up according to job priority, preferential executive level is high
Operation;
The ETL operation monitoring resource sub-step, is monitored the service condition of operation resource, if resource load is more than threshold value
Shi Jinhang adjustment of load, pause or the low operation of stop section priority wait load to be down to threshold value or less and execute operation again;
System monitoring manages sub-step, is monitored to machine hardware information, cluster running state information, and to metadata, number
It is managed according to bank interface, journal file interface or flow data interface.
6. the massive multi-source ETL process system of supporting interface adaptation, characterized in that including:
The essential information of data source and target database is arranged in data extraction module, adaptively matches phase for different data sources
The ETL tool answered, and parameter setting is carried out to ETL tool;It is taken out by database interface, journal file interface or flow data interface
Take different data sources;
Data conversion module is completed ETL job execution and management and running based on MapReduce and Spark Computational frame, is based on
HDFS, Hive or HBase carry out buffer-stored and management to the data extracted, and complete the cleaning and conversion of data;
Data object after conversion is carried out quality examination by data loading module, and according to the table knot of data model definitions
Structure output, the data update after checking and finding correct are loaded onto target database;
Monitoring management module is monitored pipe to ETL job execution process, operation resource service condition and running situation
Reason.
7. the massive multi-source ETL process system of supporting interface adaptation as claimed in claim 6, characterized in that
The data extraction module, including:
Interface adapter, data base-oriented data, journal file or flow data different types of data provide data-interface, and formulate
ETL tool adaptation rule;The interface adapter further includes adaptation rule engine;
The data-interface, including:Database interface, journal file interface and flow data interface, wherein by database interface,
Database data is extracted from relevant database or non-relational database;By journal file interface, journal file is extracted;
By flow data interface, flow data is extracted;
The adaptation rule engine is used to be arranged the essential information of data source and target database, including:Type of database, data
Source and target database connection type, database IP, database-name, port, user name, password;Adaptation rule includes data
The adaptation rule of the decision rule of source and target type of database, difference ETL tool;
ETL tools engine, integrated and manage ETL tool, the ETL tool, including:Sqoop, Flume, Kettle, Kafka,
Isomeric data for database data, journal file, flow data from different data sources, the suitable ETL work of Adaptive matching
Tool;
In the ETL tools engine, ETL adaptation engine according to the data source formulated in the adaptation rule engine of interface adapter and
The adaptation rule of the decision rule of target database, difference ETL tool is different data source capability ETL tool.
8. the massive multi-source ETL process system of supporting interface adaptation as claimed in claim 6, characterized in that
The data conversion module, including:
Job scheduling management engine, using the host node of distributed type assemblies as management and running engine, including job management unit and
Task scheduling unit;
The job management unit, including:ETL job design, operation configuration and monitoring operation, wherein ETL job design refers to
According to actual service logic fulfil assignment dependence, whether increment extraction or extract frequency Operation control process design;Make
Industry configures the configuration for referring to the priority that fulfils assignment, job execution mode parameter;Monitoring operation refers to the execution shape to ETL operation
State and resource service condition are monitored management;
The task scheduling unit, including:Job scheduling strategy and job scheduling control, wherein when job scheduling strategy includes
Between triggering, event triggering and immediate processing mode;Job scheduling control refers to setting job scheduling resource threshold value of warning, in resource
When using more than threshold value, the low operation of pause priority;
Job execution engine is responsible for the execution of ETL operation, is based on using the slave node of distributed type assemblies as enforcement engine
MapReduce Computational frame realizes the processed offline of ETL operation, and the real-time place of ETL operation is realized based on Spark Computational frame
Reason;
In the job execution engine, Sqoop starting only has the MapReduce operation of map, is read line by line according to data cutting value
Data;Kettle establishes conversion Transformation and task Job, after each link task parameters are arranged, starts workflow
Carry out data pick-up;Flume is placed in channel component and is cached by its source collect components daily record data, and by
Sink component sends data to destination;Kafka resolves into a series of batch processing jobs by Spark after collecting flow data
Distributed elastic data set handled in real time;
Distributed caching submodule carries out buffer-stored and management to the data of extraction, and wherein HDFS is responsible for depositing for bottom data
Storage, Hive are responsible for the filtering of data, summarize, inquire or analyze;HBase is responsible for the change maintenance of data, in data conversion meter
Calculation process is written infrequently the data taken and is stored;
Business Rule Engine formulates cleaning rule to deficiency of data, wrong data and dirty data according to practical business rule,
Transformation rule is formulated to the isomeric data from different business systems;
Data processing submodule, including:Data cleansing unit and Date Conversion Unit;According to practical business rule, data are completed
Cleaning and conversion;
The data cleansing unit completes filling a vacancy, correct and cleaning for data, wherein data are filled a vacancy the missing of deficiency of data
Information and mismatch information carry out completion;Wrong data is modified by data correction according to specific business;Data cleansing is directed to
Dirty data design cleaning rule simultaneously determines its correctness;
The Date Conversion Unit is by the magnanimity M IS from different business systems at number needed for target database
According to wherein the data of the same type from different business systems are carried out unification by inconsistent conversion;Data granularity conversion according to
The operation system data extracted polymerize by target database granularity;Data standard is converted according to the standard pre-established
Data model will extract data conversion into normal data needed for object library.
9. the massive multi-source ETL process system of supporting interface adaptation as claimed in claim 6, characterized in that
Data loading module, including:
Data object after conversion is carried out quality examination, is caused to due to network interruption by quality examination submodule
Data exception problem verified, and check whether the quality of data that converts meets the standard of target database;
Data update load submodule, will be loaded into target database by the data that check and find correct, according to pre-defining
Data model, using timestamp, log sheet, full table compare, full table delete or insertion by the way of update target matrix.
10. the massive multi-source ETL process system of supporting interface adaptation as claimed in claim 6, characterized in that
Monitoring management module, including:
Monitoring operation submodule is monitored the implementation procedure of ETL operation, including the job execution time, whether time-out, operation
Interruption, job stacking, operation progress situation, operation resource service condition;
Load balancing submodule is assessed resource load situation according to ETL operation execution situation, is born when load is more than threshold value
Adjustment, pause or the low operation of stop section priority are carried, waits load to be down to threshold value or less and executes operation again;
Metadata management submodule, data of the metadata as description data attribute information save definition, the conversion rule of data source
Definition, implementation procedure definition then, are managed metadata information involved by ETL operation;
Interface management submodule, to database interface, journal file interface, flow data interface carry out interface definition, protocol adaptation,
Data encapsulation, supports http, ftp, rest, webservice interface protocol;
Monitoring submodule is run, simultaneously collecting robot hardware information, resource information, load information, cluster component states, cluster are monitored
Operation information, real-time perception and the operation conditions for analyzing cluster.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810588231.7A CN108846076A (en) | 2018-06-08 | 2018-06-08 | The massive multi-source ETL process method and system of supporting interface adaptation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810588231.7A CN108846076A (en) | 2018-06-08 | 2018-06-08 | The massive multi-source ETL process method and system of supporting interface adaptation |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108846076A true CN108846076A (en) | 2018-11-20 |
Family
ID=64210656
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810588231.7A Pending CN108846076A (en) | 2018-06-08 | 2018-06-08 | The massive multi-source ETL process method and system of supporting interface adaptation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108846076A (en) |
Cited By (107)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109246254A (en) * | 2018-11-29 | 2019-01-18 | 国网重庆市电力公司 | The data acquisition communications platform and communication means for supporting large-scale electric energy table directly to adopt |
CN109558400A (en) * | 2018-11-28 | 2019-04-02 | 北京锐安科技有限公司 | Data processing method, device, equipment and storage medium |
CN109656963A (en) * | 2018-12-18 | 2019-04-19 | 深圳前海微众银行股份有限公司 | Metadata acquisition methods, device, equipment and computer readable storage medium |
CN109669977A (en) * | 2018-11-30 | 2019-04-23 | 金蝶软件(中国)有限公司 | Data cut-in method, device, computer equipment and the storage medium of integration across database |
CN109684399A (en) * | 2018-12-24 | 2019-04-26 | 成都四方伟业软件股份有限公司 | Data bank access method, database access device and Data Analysis Platform |
CN109697215A (en) * | 2018-12-14 | 2019-04-30 | 安徽同徽网络技术有限公司 | Collecting method, data collection system and nonvolatile computer storage media |
CN109739851A (en) * | 2019-01-21 | 2019-05-10 | 广东创能科技股份有限公司 | Floating population's big data multi-source acquisition method and system |
CN109753502A (en) * | 2018-12-29 | 2019-05-14 | 山东浪潮商用系统有限公司 | A kind of collecting method based on NiFi |
CN109783314A (en) * | 2018-12-26 | 2019-05-21 | 广州裕鼎信息科技有限公司 | Information technoloy equipment method for managing and monitoring and server |
CN109800220A (en) * | 2019-01-29 | 2019-05-24 | 浙江国贸云商企业服务有限公司 | A kind of big data cleaning method, system and relevant apparatus |
CN109857792A (en) * | 2018-12-24 | 2019-06-07 | 中译语通科技股份有限公司 | A kind of method and system of asynchronous big data cleaning conversion |
CN110032570A (en) * | 2019-04-01 | 2019-07-19 | 江西世恒信息产业有限公司 | A kind of spatial data dynamic update system based on B/S framework |
CN110119422A (en) * | 2019-05-16 | 2019-08-13 | 武汉神算云信息科技有限责任公司 | Small wechat borrows tenant data depot data processing system and equipment |
CN110147356A (en) * | 2019-05-14 | 2019-08-20 | 厦门欢乐逛科技股份有限公司 | Data transmission method and device |
CN110196876A (en) * | 2019-06-05 | 2019-09-03 | 浪潮软件股份有限公司 | A method of it is isolated tool based on web administration and scheduling Kettle |
CN110262945A (en) * | 2019-06-25 | 2019-09-20 | 苏宁消费金融有限公司 | A kind of method of intelligent monitoring data warehouse scheduling system |
CN110347741A (en) * | 2019-07-18 | 2019-10-18 | 普元信息技术股份有限公司 | The system and its control method of the outputting result quality of data are effectively promoted in big data treatment process |
CN110413404A (en) * | 2019-06-18 | 2019-11-05 | 平安科技(深圳)有限公司 | Resource allocation methods, device, equipment and storage medium priority-based |
CN110471968A (en) * | 2019-07-11 | 2019-11-19 | 新华三大数据技术有限公司 | Dissemination method, device, equipment and the storage medium of ETL task |
CN110502491A (en) * | 2019-07-25 | 2019-11-26 | 北京神州泰岳智能数据技术有限公司 | A kind of Log Collect System and its data transmission method, device |
CN110569298A (en) * | 2019-09-12 | 2019-12-13 | 成都中科大旗软件股份有限公司 | data docking and visualization method and system |
CN110597798A (en) * | 2019-09-17 | 2019-12-20 | 山东爱城市网信息技术有限公司 | Data detection method based on Thrift |
CN110636116A (en) * | 2019-08-29 | 2019-12-31 | 武汉烽火众智数字技术有限责任公司 | Multidimensional data acquisition system and method |
CN110647570A (en) * | 2019-09-20 | 2020-01-03 | 百度在线网络技术(北京)有限公司 | Data processing method and device and electronic equipment |
CN110704502A (en) * | 2019-11-20 | 2020-01-17 | 中电万维信息技术有限责任公司 | Componentized data quality checking method |
CN110716774A (en) * | 2019-08-22 | 2020-01-21 | 华信永道(北京)科技股份有限公司 | Data driving method, system and storage medium for brain of financial business data |
CN110781248A (en) * | 2019-09-27 | 2020-02-11 | 浙江省北大信息技术高等研究院 | Multi-source heterogeneous data acquisition method and device |
CN110880146A (en) * | 2019-11-21 | 2020-03-13 | 上海中信信息发展股份有限公司 | Block chain chaining method, device, electronic equipment and storage medium |
CN110990391A (en) * | 2019-12-04 | 2020-04-10 | 中山市凯能集团有限公司 | Integration method and system of multi-source heterogeneous data, computer equipment and storage medium |
CN110990368A (en) * | 2019-11-29 | 2020-04-10 | 广西电网有限责任公司 | Full-link data management system and management method thereof |
CN110990390A (en) * | 2019-12-02 | 2020-04-10 | 东莞中国科学院云计算产业技术创新与育成中心 | Data cooperative processing method and device, computer equipment and storage medium |
CN111061715A (en) * | 2019-12-16 | 2020-04-24 | 北京邮电大学 | Web and Kafka-based distributed data integration system and method |
CN111061788A (en) * | 2019-11-26 | 2020-04-24 | 江苏瑞中数据股份有限公司 | Multi-source heterogeneous data conversion integration system based on cloud architecture and implementation method thereof |
CN111104214A (en) * | 2019-12-26 | 2020-05-05 | 北京九章云极科技有限公司 | Workflow application method and device |
CN111125209A (en) * | 2019-11-25 | 2020-05-08 | 集奥聚合(北京)人工智能科技有限公司 | Access configuration system supporting multi-element heterogeneous type data |
CN111125230A (en) * | 2019-12-30 | 2020-05-08 | 中电工业互联网有限公司 | Data processing method and system of Internet of things platform based on rule engine |
CN111124679A (en) * | 2019-12-19 | 2020-05-08 | 南京莱斯信息技术股份有限公司 | Time-limited automatic processing method for multi-source heterogeneous mass data |
CN111142942A (en) * | 2019-12-26 | 2020-05-12 | 远景智能国际私人投资有限公司 | Window data processing method and device, server and storage medium |
CN111159268A (en) * | 2019-12-19 | 2020-05-15 | 武汉达梦数据库有限公司 | Method and device for running ETL (extract-transform-load) process in Spark cluster |
CN111324688A (en) * | 2020-02-24 | 2020-06-23 | 南京莱斯网信技术研究院有限公司 | Semi-structured data and unstructured data acquisition system based on events |
CN111400288A (en) * | 2019-01-02 | 2020-07-10 | 中国移动通信有限公司研究院 | Data quality inspection method and system |
CN111460019A (en) * | 2020-04-02 | 2020-07-28 | 中电工业互联网有限公司 | Data conversion method and middleware of heterogeneous data source |
CN111460772A (en) * | 2020-02-28 | 2020-07-28 | 上海维信荟智金融科技有限公司 | Automatic report processing method and system |
CN111506638A (en) * | 2020-03-03 | 2020-08-07 | 浙江大学 | Method for automatically collecting supervision data |
CN111581254A (en) * | 2020-05-03 | 2020-08-25 | 上海维信荟智金融科技有限公司 | ETL method and system based on internet financial data |
CN111639083A (en) * | 2020-04-10 | 2020-09-08 | 新智云数据服务有限公司 | Management system of unified database management method |
CN111666324A (en) * | 2020-05-18 | 2020-09-15 | 新浪网技术(中国)有限公司 | ETL scheduling method and device between relational databases |
CN111721355A (en) * | 2020-05-14 | 2020-09-29 | 中铁第一勘察设计院集团有限公司 | Railway contact net monitoring data acquisition system |
CN111737242A (en) * | 2020-06-19 | 2020-10-02 | 福建南威软件有限公司 | Method for monitoring mass data processing process |
CN111881154A (en) * | 2020-07-29 | 2020-11-03 | 北京浪潮数据技术有限公司 | ETL task processing method, device and related equipment |
CN111882203A (en) * | 2020-07-24 | 2020-11-03 | 山东管理学院 | Traditional Chinese medicine cloud service experimental system |
CN111897865A (en) * | 2020-08-13 | 2020-11-06 | 工银科技有限公司 | Dynamic adjustment method and device for ETL (extract transform load) working load |
CN111966394A (en) * | 2020-08-28 | 2020-11-20 | 珠海格力电器股份有限公司 | ETL-based data analysis method, device, equipment and storage medium |
CN112015724A (en) * | 2019-09-25 | 2020-12-01 | 国网湖北省电力有限公司黄石供电公司 | Method for analyzing metering abnormality of electric power operation data |
CN112035468A (en) * | 2020-08-24 | 2020-12-04 | 杭州览众数据科技有限公司 | Multi-data-source ETL tool based on memory calculation and web visual configuration |
CN112052284A (en) * | 2020-08-26 | 2020-12-08 | 南京越扬科技有限公司 | Main data management method and system under big data |
CN112100227A (en) * | 2020-09-22 | 2020-12-18 | 国网辽宁省电力有限公司电力科学研究院 | Big data processing method based on multilevel heterogeneous data storage |
CN112162754A (en) * | 2020-10-19 | 2021-01-01 | 科技谷(厦门)信息技术有限公司 | Multi-source heterogeneous data processing system |
CN112164430A (en) * | 2020-10-12 | 2021-01-01 | 深圳晶泰科技有限公司 | Data processing method and system for drug research and development |
CN112181959A (en) * | 2020-09-15 | 2021-01-05 | 山东特检鲁安工程技术服务有限公司 | Special equipment multi-source data processing platform and processing method |
CN112307103A (en) * | 2020-10-30 | 2021-02-02 | 山东浪潮通软信息科技有限公司 | Big data rendering method and device and computer readable medium |
CN112434016A (en) * | 2020-12-11 | 2021-03-02 | 上海中通吉网络技术有限公司 | Universal billion-level data heterogeneous migration method, device and equipment |
CN112433998A (en) * | 2020-11-20 | 2021-03-02 | 广东电网有限责任公司佛山供电局 | Multisource heterogeneous data acquisition and convergence system and method based on power system |
CN112434102A (en) * | 2020-11-25 | 2021-03-02 | 深圳前海微众银行股份有限公司 | Data visualization system for multiple data sources |
CN112527885A (en) * | 2020-12-23 | 2021-03-19 | 民生科技有限责任公司 | System and method for data processing based on rule configuration in ETL |
CN112527783A (en) * | 2020-11-27 | 2021-03-19 | 中科曙光南京研究院有限公司 | Data quality probing system based on Hadoop |
CN112565042A (en) * | 2020-12-24 | 2021-03-26 | 航天科工网络信息发展有限公司 | Method for exchanging star-structured data |
CN112579676A (en) * | 2019-09-30 | 2021-03-30 | 北京国双科技有限公司 | Data processing method and device between heterogeneous systems, storage medium and equipment |
CN112597203A (en) * | 2020-12-28 | 2021-04-02 | 恩亿科(北京)数据科技有限公司 | General data monitoring method and system based on big data platform |
CN112632177A (en) * | 2020-12-31 | 2021-04-09 | 中国农业银行股份有限公司 | Data loading operation generation method |
CN112667615A (en) * | 2020-12-25 | 2021-04-16 | 广东电网有限责任公司电力科学研究院 | Data cleaning system and method |
CN112732828A (en) * | 2020-12-22 | 2021-04-30 | 航天信息股份有限公司 | Cross-platform data sharing method based on data warehouse tool |
CN112765121A (en) * | 2021-01-08 | 2021-05-07 | 北京虹信万达科技有限公司 | Administration and application system based on big data service |
CN112860776A (en) * | 2021-01-20 | 2021-05-28 | 山东众阳健康科技集团有限公司 | Method and system for extracting and scheduling various data |
CN112860675A (en) * | 2021-02-06 | 2021-05-28 | 高云 | Big data processing method under online cloud service environment and cloud computing server |
CN112905420A (en) * | 2021-03-04 | 2021-06-04 | 广东电网有限责任公司 | Data monitoring system, method, electronic device and storage medium |
CN112925772A (en) * | 2019-12-06 | 2021-06-08 | 北京沃东天骏信息技术有限公司 | Data dynamic splitting method and device |
CN112925767A (en) * | 2021-03-03 | 2021-06-08 | 浪潮云信息技术股份公司 | Multi-data-source dynamic data synchronization management method and system based on internet supervision |
CN112966031A (en) * | 2019-12-12 | 2021-06-15 | 北京奇艺世纪科技有限公司 | Data processing method and device, electronic equipment and computer readable storage medium |
CN113051329A (en) * | 2021-04-12 | 2021-06-29 | 平安国际智慧城市科技股份有限公司 | Interface-based data acquisition method, device, equipment and storage medium |
CN113111107A (en) * | 2021-04-06 | 2021-07-13 | 创意信息技术股份有限公司 | Data comprehensive access system and method |
CN113111109A (en) * | 2021-04-06 | 2021-07-13 | 创意信息技术股份有限公司 | Interface warehousing analysis access method of data source |
CN113111105A (en) * | 2021-04-06 | 2021-07-13 | 创意信息技术股份有限公司 | Data customized access method and system based on big data |
CN113111111A (en) * | 2021-04-06 | 2021-07-13 | 创意信息技术股份有限公司 | Multi-data source database access method |
CN113177088A (en) * | 2021-04-02 | 2021-07-27 | 北京科技大学 | Multi-scale simulation big data management system for material irradiation damage |
CN113407734A (en) * | 2021-07-14 | 2021-09-17 | 重庆富民银行股份有限公司 | Construction method of knowledge map system based on real-time big data |
CN113407607A (en) * | 2021-06-22 | 2021-09-17 | 中国联合网络通信集团有限公司 | Multi-cloud heterogeneous data processing method and device and electronic equipment |
CN113485894A (en) * | 2021-07-14 | 2021-10-08 | 深信服科技股份有限公司 | Data acquisition method, device and equipment and readable storage medium |
CN113485747A (en) * | 2021-07-08 | 2021-10-08 | 广州钛动科技有限公司 | Data processing method, data processor, target source component and system |
CN113506098A (en) * | 2021-09-10 | 2021-10-15 | 国能信控互联技术有限公司 | Power plant metadata management system and method based on multi-source data |
CN113535835A (en) * | 2021-07-12 | 2021-10-22 | 上海浦东发展银行股份有限公司 | Data acquisition method, device, medium and equipment of kernel data processing software |
CN114064777A (en) * | 2021-11-19 | 2022-02-18 | 杭州雷数科技有限公司 | Configurable method for acquiring data at fixed time, scheduling data, encrypting transmission and visualizing |
CN114064643A (en) * | 2021-11-11 | 2022-02-18 | 南京熊猫电子股份有限公司 | Task type data conversion system based on Oracle |
CN114168672A (en) * | 2021-12-13 | 2022-03-11 | 明觉科技(北京)有限公司 | Log data processing method, device, system and medium |
WO2022077166A1 (en) * | 2020-10-12 | 2022-04-21 | 深圳晶泰科技有限公司 | Data processing method and system for drug research and development |
CN114379608A (en) * | 2021-12-13 | 2022-04-22 | 中铁南方投资集团有限公司 | Multi-source heterogeneous data integration processing method for urban rail transit engineering |
CN114817393A (en) * | 2022-06-24 | 2022-07-29 | 深圳市信联征信有限公司 | Data extraction and cleaning method and device and storage medium |
CN114936245A (en) * | 2022-04-28 | 2022-08-23 | 北京远舢智能科技有限公司 | Method and device for integrating and processing multi-source heterogeneous data |
CN115086303A (en) * | 2022-06-29 | 2022-09-20 | 徐工汉云技术股份有限公司 | Multi-data-source data repeater and design method thereof |
CN115796457A (en) * | 2023-02-03 | 2023-03-14 | 山东铁路投资控股集团有限公司 | Personnel and enterprise rating method and system based on multidimensional data |
CN116016032A (en) * | 2023-01-06 | 2023-04-25 | 广西电子口岸有限公司 | Customs service complex message packaging method |
CN116775737A (en) * | 2023-06-21 | 2023-09-19 | 上海腾道信息技术有限公司 | Method and system for automatically generating ETL configuration |
CN117271648A (en) * | 2023-11-23 | 2023-12-22 | 北京邮电大学 | Adaptation method of bottom data model and storage medium |
CN117312103A (en) * | 2023-11-30 | 2023-12-29 | 山东麦港数据系统有限公司 | Hot-pluggable distributed heterogeneous data source data scheduling processing system |
CN117539605A (en) * | 2024-01-09 | 2024-02-09 | 无锡挚达物联科技有限公司 | Data processing program assembling method, device, equipment and storage medium |
CN117555586A (en) * | 2024-01-11 | 2024-02-13 | 之江实验室 | Algorithm application publishing, managing and scoring method |
CN111966394B (en) * | 2020-08-28 | 2024-05-31 | 珠海格力电器股份有限公司 | ETL-based data analysis method, device, equipment and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160292186A1 (en) * | 2015-03-30 | 2016-10-06 | International Business Machines Corporation | Dynamically maintaining data structures driven by heterogeneous clients in a distributed data collection system |
CN106339509A (en) * | 2016-10-26 | 2017-01-18 | 国网山东省电力公司临沂供电公司 | Power grid operation data sharing system based on large data technology |
CN106611046A (en) * | 2016-12-16 | 2017-05-03 | 武汉中地数码科技有限公司 | Big data technology-based space data storage processing middleware framework |
CN107402976A (en) * | 2017-07-03 | 2017-11-28 | 国网山东省电力公司经济技术研究院 | Power grid multi-source data fusion method and system based on multi-element heterogeneous model |
CN107733986A (en) * | 2017-09-15 | 2018-02-23 | 中国南方电网有限责任公司 | Support the protection of integrated deployment and monitoring operation big data support platform |
-
2018
- 2018-06-08 CN CN201810588231.7A patent/CN108846076A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160292186A1 (en) * | 2015-03-30 | 2016-10-06 | International Business Machines Corporation | Dynamically maintaining data structures driven by heterogeneous clients in a distributed data collection system |
CN106339509A (en) * | 2016-10-26 | 2017-01-18 | 国网山东省电力公司临沂供电公司 | Power grid operation data sharing system based on large data technology |
CN106611046A (en) * | 2016-12-16 | 2017-05-03 | 武汉中地数码科技有限公司 | Big data technology-based space data storage processing middleware framework |
CN107402976A (en) * | 2017-07-03 | 2017-11-28 | 国网山东省电力公司经济技术研究院 | Power grid multi-source data fusion method and system based on multi-element heterogeneous model |
CN107733986A (en) * | 2017-09-15 | 2018-02-23 | 中国南方电网有限责任公司 | Support the protection of integrated deployment and monitoring operation big data support platform |
Cited By (140)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109558400A (en) * | 2018-11-28 | 2019-04-02 | 北京锐安科技有限公司 | Data processing method, device, equipment and storage medium |
CN109558400B (en) * | 2018-11-28 | 2021-04-27 | 北京锐安科技有限公司 | Data processing method, device, equipment and storage medium |
CN109246254A (en) * | 2018-11-29 | 2019-01-18 | 国网重庆市电力公司 | The data acquisition communications platform and communication means for supporting large-scale electric energy table directly to adopt |
CN109669977A (en) * | 2018-11-30 | 2019-04-23 | 金蝶软件(中国)有限公司 | Data cut-in method, device, computer equipment and the storage medium of integration across database |
CN109697215A (en) * | 2018-12-14 | 2019-04-30 | 安徽同徽网络技术有限公司 | Collecting method, data collection system and nonvolatile computer storage media |
CN109656963A (en) * | 2018-12-18 | 2019-04-19 | 深圳前海微众银行股份有限公司 | Metadata acquisition methods, device, equipment and computer readable storage medium |
CN109684399A (en) * | 2018-12-24 | 2019-04-26 | 成都四方伟业软件股份有限公司 | Data bank access method, database access device and Data Analysis Platform |
CN109857792A (en) * | 2018-12-24 | 2019-06-07 | 中译语通科技股份有限公司 | A kind of method and system of asynchronous big data cleaning conversion |
CN109783314A (en) * | 2018-12-26 | 2019-05-21 | 广州裕鼎信息科技有限公司 | Information technoloy equipment method for managing and monitoring and server |
CN109753502A (en) * | 2018-12-29 | 2019-05-14 | 山东浪潮商用系统有限公司 | A kind of collecting method based on NiFi |
CN109753502B (en) * | 2018-12-29 | 2023-05-12 | 浪潮软件科技有限公司 | Data acquisition method based on NiFi |
CN111400288A (en) * | 2019-01-02 | 2020-07-10 | 中国移动通信有限公司研究院 | Data quality inspection method and system |
CN109739851A (en) * | 2019-01-21 | 2019-05-10 | 广东创能科技股份有限公司 | Floating population's big data multi-source acquisition method and system |
CN109800220A (en) * | 2019-01-29 | 2019-05-24 | 浙江国贸云商企业服务有限公司 | A kind of big data cleaning method, system and relevant apparatus |
CN110032570A (en) * | 2019-04-01 | 2019-07-19 | 江西世恒信息产业有限公司 | A kind of spatial data dynamic update system based on B/S framework |
CN110147356A (en) * | 2019-05-14 | 2019-08-20 | 厦门欢乐逛科技股份有限公司 | Data transmission method and device |
CN110119422A (en) * | 2019-05-16 | 2019-08-13 | 武汉神算云信息科技有限责任公司 | Small wechat borrows tenant data depot data processing system and equipment |
CN110196876A (en) * | 2019-06-05 | 2019-09-03 | 浪潮软件股份有限公司 | A method of it is isolated tool based on web administration and scheduling Kettle |
CN110413404A (en) * | 2019-06-18 | 2019-11-05 | 平安科技(深圳)有限公司 | Resource allocation methods, device, equipment and storage medium priority-based |
CN110262945A (en) * | 2019-06-25 | 2019-09-20 | 苏宁消费金融有限公司 | A kind of method of intelligent monitoring data warehouse scheduling system |
CN110471968A (en) * | 2019-07-11 | 2019-11-19 | 新华三大数据技术有限公司 | Dissemination method, device, equipment and the storage medium of ETL task |
CN110347741A (en) * | 2019-07-18 | 2019-10-18 | 普元信息技术股份有限公司 | The system and its control method of the outputting result quality of data are effectively promoted in big data treatment process |
CN110347741B (en) * | 2019-07-18 | 2023-05-05 | 普元信息技术股份有限公司 | System for effectively improving output result data quality in big data processing process and control method thereof |
CN110502491A (en) * | 2019-07-25 | 2019-11-26 | 北京神州泰岳智能数据技术有限公司 | A kind of Log Collect System and its data transmission method, device |
CN110716774A (en) * | 2019-08-22 | 2020-01-21 | 华信永道(北京)科技股份有限公司 | Data driving method, system and storage medium for brain of financial business data |
CN110636116A (en) * | 2019-08-29 | 2019-12-31 | 武汉烽火众智数字技术有限责任公司 | Multidimensional data acquisition system and method |
CN110636116B (en) * | 2019-08-29 | 2022-05-10 | 武汉烽火众智数字技术有限责任公司 | Multidimensional data acquisition system and method |
CN110569298A (en) * | 2019-09-12 | 2019-12-13 | 成都中科大旗软件股份有限公司 | data docking and visualization method and system |
CN110569298B (en) * | 2019-09-12 | 2023-03-24 | 成都中科大旗软件股份有限公司 | Data docking and visualization method and system |
CN110597798A (en) * | 2019-09-17 | 2019-12-20 | 山东爱城市网信息技术有限公司 | Data detection method based on Thrift |
CN110597798B (en) * | 2019-09-17 | 2023-08-25 | 浪潮卓数大数据产业发展有限公司 | Data detection method based on thread |
CN110647570B (en) * | 2019-09-20 | 2022-04-29 | 百度在线网络技术(北京)有限公司 | Data processing method and device and electronic equipment |
CN110647570A (en) * | 2019-09-20 | 2020-01-03 | 百度在线网络技术(北京)有限公司 | Data processing method and device and electronic equipment |
CN112015724A (en) * | 2019-09-25 | 2020-12-01 | 国网湖北省电力有限公司黄石供电公司 | Method for analyzing metering abnormality of electric power operation data |
CN110781248A (en) * | 2019-09-27 | 2020-02-11 | 浙江省北大信息技术高等研究院 | Multi-source heterogeneous data acquisition method and device |
CN112579676A (en) * | 2019-09-30 | 2021-03-30 | 北京国双科技有限公司 | Data processing method and device between heterogeneous systems, storage medium and equipment |
CN110704502A (en) * | 2019-11-20 | 2020-01-17 | 中电万维信息技术有限责任公司 | Componentized data quality checking method |
CN110880146A (en) * | 2019-11-21 | 2020-03-13 | 上海中信信息发展股份有限公司 | Block chain chaining method, device, electronic equipment and storage medium |
CN111125209A (en) * | 2019-11-25 | 2020-05-08 | 集奥聚合(北京)人工智能科技有限公司 | Access configuration system supporting multi-element heterogeneous type data |
CN111061788B (en) * | 2019-11-26 | 2023-10-13 | 江苏瑞中数据股份有限公司 | Multi-source heterogeneous data conversion integration system based on cloud architecture and implementation method thereof |
CN111061788A (en) * | 2019-11-26 | 2020-04-24 | 江苏瑞中数据股份有限公司 | Multi-source heterogeneous data conversion integration system based on cloud architecture and implementation method thereof |
CN110990368A (en) * | 2019-11-29 | 2020-04-10 | 广西电网有限责任公司 | Full-link data management system and management method thereof |
CN110990390A (en) * | 2019-12-02 | 2020-04-10 | 东莞中国科学院云计算产业技术创新与育成中心 | Data cooperative processing method and device, computer equipment and storage medium |
CN110990390B (en) * | 2019-12-02 | 2024-03-08 | 东莞中国科学院云计算产业技术创新与育成中心 | Data cooperative processing method, device, computer equipment and storage medium |
CN110990391A (en) * | 2019-12-04 | 2020-04-10 | 中山市凯能集团有限公司 | Integration method and system of multi-source heterogeneous data, computer equipment and storage medium |
CN112925772A (en) * | 2019-12-06 | 2021-06-08 | 北京沃东天骏信息技术有限公司 | Data dynamic splitting method and device |
CN112966031A (en) * | 2019-12-12 | 2021-06-15 | 北京奇艺世纪科技有限公司 | Data processing method and device, electronic equipment and computer readable storage medium |
CN111061715B (en) * | 2019-12-16 | 2022-07-01 | 北京邮电大学 | Web and Kafka-based distributed data integration system and method |
CN111061715A (en) * | 2019-12-16 | 2020-04-24 | 北京邮电大学 | Web and Kafka-based distributed data integration system and method |
CN111159268A (en) * | 2019-12-19 | 2020-05-15 | 武汉达梦数据库有限公司 | Method and device for running ETL (extract-transform-load) process in Spark cluster |
CN111124679B (en) * | 2019-12-19 | 2023-11-21 | 南京莱斯信息技术股份有限公司 | Multi-source heterogeneous mass data-oriented time-limited automatic processing method |
CN111124679A (en) * | 2019-12-19 | 2020-05-08 | 南京莱斯信息技术股份有限公司 | Time-limited automatic processing method for multi-source heterogeneous mass data |
CN111159268B (en) * | 2019-12-19 | 2022-01-04 | 武汉达梦数据库股份有限公司 | Method and device for running ETL (extract-transform-load) process in Spark cluster |
CN111142942B (en) * | 2019-12-26 | 2023-08-04 | 远景智能国际私人投资有限公司 | Window data processing method and device, server and storage medium |
CN111142942A (en) * | 2019-12-26 | 2020-05-12 | 远景智能国际私人投资有限公司 | Window data processing method and device, server and storage medium |
CN111104214A (en) * | 2019-12-26 | 2020-05-05 | 北京九章云极科技有限公司 | Workflow application method and device |
CN111125230A (en) * | 2019-12-30 | 2020-05-08 | 中电工业互联网有限公司 | Data processing method and system of Internet of things platform based on rule engine |
CN111324688A (en) * | 2020-02-24 | 2020-06-23 | 南京莱斯网信技术研究院有限公司 | Semi-structured data and unstructured data acquisition system based on events |
CN111460772A (en) * | 2020-02-28 | 2020-07-28 | 上海维信荟智金融科技有限公司 | Automatic report processing method and system |
CN111506638A (en) * | 2020-03-03 | 2020-08-07 | 浙江大学 | Method for automatically collecting supervision data |
CN111460019A (en) * | 2020-04-02 | 2020-07-28 | 中电工业互联网有限公司 | Data conversion method and middleware of heterogeneous data source |
CN111639083A (en) * | 2020-04-10 | 2020-09-08 | 新智云数据服务有限公司 | Management system of unified database management method |
CN111581254A (en) * | 2020-05-03 | 2020-08-25 | 上海维信荟智金融科技有限公司 | ETL method and system based on internet financial data |
CN111721355A (en) * | 2020-05-14 | 2020-09-29 | 中铁第一勘察设计院集团有限公司 | Railway contact net monitoring data acquisition system |
CN111666324A (en) * | 2020-05-18 | 2020-09-15 | 新浪网技术(中国)有限公司 | ETL scheduling method and device between relational databases |
CN111666324B (en) * | 2020-05-18 | 2023-06-27 | 新浪技术(中国)有限公司 | ETL scheduling method and device between relational databases |
CN111737242A (en) * | 2020-06-19 | 2020-10-02 | 福建南威软件有限公司 | Method for monitoring mass data processing process |
CN111882203B (en) * | 2020-07-24 | 2022-12-02 | 山东管理学院 | Traditional Chinese medicine cloud service experimental system |
CN111882203A (en) * | 2020-07-24 | 2020-11-03 | 山东管理学院 | Traditional Chinese medicine cloud service experimental system |
CN111881154A (en) * | 2020-07-29 | 2020-11-03 | 北京浪潮数据技术有限公司 | ETL task processing method, device and related equipment |
CN111897865A (en) * | 2020-08-13 | 2020-11-06 | 工银科技有限公司 | Dynamic adjustment method and device for ETL (extract transform load) working load |
CN112035468A (en) * | 2020-08-24 | 2020-12-04 | 杭州览众数据科技有限公司 | Multi-data-source ETL tool based on memory calculation and web visual configuration |
CN112052284A (en) * | 2020-08-26 | 2020-12-08 | 南京越扬科技有限公司 | Main data management method and system under big data |
CN111966394B (en) * | 2020-08-28 | 2024-05-31 | 珠海格力电器股份有限公司 | ETL-based data analysis method, device, equipment and storage medium |
CN111966394A (en) * | 2020-08-28 | 2020-11-20 | 珠海格力电器股份有限公司 | ETL-based data analysis method, device, equipment and storage medium |
CN112181959A (en) * | 2020-09-15 | 2021-01-05 | 山东特检鲁安工程技术服务有限公司 | Special equipment multi-source data processing platform and processing method |
CN112100227A (en) * | 2020-09-22 | 2020-12-18 | 国网辽宁省电力有限公司电力科学研究院 | Big data processing method based on multilevel heterogeneous data storage |
CN112164430A (en) * | 2020-10-12 | 2021-01-01 | 深圳晶泰科技有限公司 | Data processing method and system for drug research and development |
WO2022077166A1 (en) * | 2020-10-12 | 2022-04-21 | 深圳晶泰科技有限公司 | Data processing method and system for drug research and development |
CN112164430B (en) * | 2020-10-12 | 2024-05-31 | 深圳晶泰科技有限公司 | Data processing method and system for drug development |
CN112162754A (en) * | 2020-10-19 | 2021-01-01 | 科技谷(厦门)信息技术有限公司 | Multi-source heterogeneous data processing system |
CN112307103A (en) * | 2020-10-30 | 2021-02-02 | 山东浪潮通软信息科技有限公司 | Big data rendering method and device and computer readable medium |
CN112433998B (en) * | 2020-11-20 | 2022-01-21 | 广东电网有限责任公司佛山供电局 | Multisource heterogeneous data acquisition and convergence system and method based on power system |
CN112433998A (en) * | 2020-11-20 | 2021-03-02 | 广东电网有限责任公司佛山供电局 | Multisource heterogeneous data acquisition and convergence system and method based on power system |
CN112434102A (en) * | 2020-11-25 | 2021-03-02 | 深圳前海微众银行股份有限公司 | Data visualization system for multiple data sources |
CN112527783A (en) * | 2020-11-27 | 2021-03-19 | 中科曙光南京研究院有限公司 | Data quality probing system based on Hadoop |
CN112527783B (en) * | 2020-11-27 | 2024-05-24 | 中科曙光南京研究院有限公司 | Hadoop-based data quality exploration system |
CN112434016A (en) * | 2020-12-11 | 2021-03-02 | 上海中通吉网络技术有限公司 | Universal billion-level data heterogeneous migration method, device and equipment |
CN112732828A (en) * | 2020-12-22 | 2021-04-30 | 航天信息股份有限公司 | Cross-platform data sharing method based on data warehouse tool |
CN112527885A (en) * | 2020-12-23 | 2021-03-19 | 民生科技有限责任公司 | System and method for data processing based on rule configuration in ETL |
CN112565042A (en) * | 2020-12-24 | 2021-03-26 | 航天科工网络信息发展有限公司 | Method for exchanging star-structured data |
CN112667615A (en) * | 2020-12-25 | 2021-04-16 | 广东电网有限责任公司电力科学研究院 | Data cleaning system and method |
CN112667615B (en) * | 2020-12-25 | 2022-02-15 | 广东电网有限责任公司电力科学研究院 | Data cleaning system and method |
CN112597203A (en) * | 2020-12-28 | 2021-04-02 | 恩亿科(北京)数据科技有限公司 | General data monitoring method and system based on big data platform |
CN112632177A (en) * | 2020-12-31 | 2021-04-09 | 中国农业银行股份有限公司 | Data loading operation generation method |
CN112765121A (en) * | 2021-01-08 | 2021-05-07 | 北京虹信万达科技有限公司 | Administration and application system based on big data service |
CN112860776A (en) * | 2021-01-20 | 2021-05-28 | 山东众阳健康科技集团有限公司 | Method and system for extracting and scheduling various data |
CN112860776B (en) * | 2021-01-20 | 2022-12-06 | 众阳健康科技集团有限公司 | Method and system for extracting and scheduling various data |
CN112860675A (en) * | 2021-02-06 | 2021-05-28 | 高云 | Big data processing method under online cloud service environment and cloud computing server |
CN112925767A (en) * | 2021-03-03 | 2021-06-08 | 浪潮云信息技术股份公司 | Multi-data-source dynamic data synchronization management method and system based on internet supervision |
CN112905420A (en) * | 2021-03-04 | 2021-06-04 | 广东电网有限责任公司 | Data monitoring system, method, electronic device and storage medium |
CN113177088B (en) * | 2021-04-02 | 2023-07-04 | 北京科技大学 | Multi-scale simulation big data management system for material irradiation damage |
CN113177088A (en) * | 2021-04-02 | 2021-07-27 | 北京科技大学 | Multi-scale simulation big data management system for material irradiation damage |
CN113111105A (en) * | 2021-04-06 | 2021-07-13 | 创意信息技术股份有限公司 | Data customized access method and system based on big data |
CN113111109A (en) * | 2021-04-06 | 2021-07-13 | 创意信息技术股份有限公司 | Interface warehousing analysis access method of data source |
CN113111107B (en) * | 2021-04-06 | 2023-10-13 | 创意信息技术股份有限公司 | Data comprehensive access system and method |
CN113111107A (en) * | 2021-04-06 | 2021-07-13 | 创意信息技术股份有限公司 | Data comprehensive access system and method |
CN113111111A (en) * | 2021-04-06 | 2021-07-13 | 创意信息技术股份有限公司 | Multi-data source database access method |
CN113051329A (en) * | 2021-04-12 | 2021-06-29 | 平安国际智慧城市科技股份有限公司 | Interface-based data acquisition method, device, equipment and storage medium |
CN113051329B (en) * | 2021-04-12 | 2024-03-15 | 平安国际智慧城市科技股份有限公司 | Data acquisition method, device, equipment and storage medium based on interface |
CN113407607A (en) * | 2021-06-22 | 2021-09-17 | 中国联合网络通信集团有限公司 | Multi-cloud heterogeneous data processing method and device and electronic equipment |
CN113407607B (en) * | 2021-06-22 | 2023-06-27 | 中国联合网络通信集团有限公司 | Multi-cloud heterogeneous data processing method and device and electronic equipment |
CN113485747A (en) * | 2021-07-08 | 2021-10-08 | 广州钛动科技有限公司 | Data processing method, data processor, target source component and system |
CN113535835A (en) * | 2021-07-12 | 2021-10-22 | 上海浦东发展银行股份有限公司 | Data acquisition method, device, medium and equipment of kernel data processing software |
CN113407734B (en) * | 2021-07-14 | 2023-05-19 | 重庆富民银行股份有限公司 | Method for constructing knowledge graph system based on real-time big data |
CN113485894A (en) * | 2021-07-14 | 2021-10-08 | 深信服科技股份有限公司 | Data acquisition method, device and equipment and readable storage medium |
CN113407734A (en) * | 2021-07-14 | 2021-09-17 | 重庆富民银行股份有限公司 | Construction method of knowledge map system based on real-time big data |
CN113506098A (en) * | 2021-09-10 | 2021-10-15 | 国能信控互联技术有限公司 | Power plant metadata management system and method based on multi-source data |
CN114064643A (en) * | 2021-11-11 | 2022-02-18 | 南京熊猫电子股份有限公司 | Task type data conversion system based on Oracle |
CN114064777A (en) * | 2021-11-19 | 2022-02-18 | 杭州雷数科技有限公司 | Configurable method for acquiring data at fixed time, scheduling data, encrypting transmission and visualizing |
CN114168672A (en) * | 2021-12-13 | 2022-03-11 | 明觉科技(北京)有限公司 | Log data processing method, device, system and medium |
CN114379608A (en) * | 2021-12-13 | 2022-04-22 | 中铁南方投资集团有限公司 | Multi-source heterogeneous data integration processing method for urban rail transit engineering |
CN114168672B (en) * | 2021-12-13 | 2022-09-23 | 明觉科技(北京)有限公司 | Log data processing method, device, system and medium |
CN114936245A (en) * | 2022-04-28 | 2022-08-23 | 北京远舢智能科技有限公司 | Method and device for integrating and processing multi-source heterogeneous data |
CN114817393A (en) * | 2022-06-24 | 2022-07-29 | 深圳市信联征信有限公司 | Data extraction and cleaning method and device and storage medium |
CN114817393B (en) * | 2022-06-24 | 2022-09-16 | 深圳市信联征信有限公司 | Data extraction and cleaning method and device and storage medium |
CN115086303A (en) * | 2022-06-29 | 2022-09-20 | 徐工汉云技术股份有限公司 | Multi-data-source data repeater and design method thereof |
CN115086303B (en) * | 2022-06-29 | 2024-05-17 | 徐工汉云技术股份有限公司 | Multi-data source data repeater and design method thereof |
CN116016032A (en) * | 2023-01-06 | 2023-04-25 | 广西电子口岸有限公司 | Customs service complex message packaging method |
CN116016032B (en) * | 2023-01-06 | 2023-08-11 | 广西电子口岸有限公司 | Customs service message packaging method |
CN115796457A (en) * | 2023-02-03 | 2023-03-14 | 山东铁路投资控股集团有限公司 | Personnel and enterprise rating method and system based on multidimensional data |
CN116775737A (en) * | 2023-06-21 | 2023-09-19 | 上海腾道信息技术有限公司 | Method and system for automatically generating ETL configuration |
CN116775737B (en) * | 2023-06-21 | 2024-04-30 | 上海腾道信息技术有限公司 | Method and system for automatically generating ETL configuration |
CN117271648A (en) * | 2023-11-23 | 2023-12-22 | 北京邮电大学 | Adaptation method of bottom data model and storage medium |
CN117312103B (en) * | 2023-11-30 | 2024-03-01 | 山东麦港数据系统有限公司 | Hot-pluggable distributed heterogeneous data source data scheduling processing system |
CN117312103A (en) * | 2023-11-30 | 2023-12-29 | 山东麦港数据系统有限公司 | Hot-pluggable distributed heterogeneous data source data scheduling processing system |
CN117539605B (en) * | 2024-01-09 | 2024-03-19 | 无锡挚达物联科技有限公司 | Data processing program assembling method, device, equipment and storage medium |
CN117539605A (en) * | 2024-01-09 | 2024-02-09 | 无锡挚达物联科技有限公司 | Data processing program assembling method, device, equipment and storage medium |
CN117555586B (en) * | 2024-01-11 | 2024-03-22 | 之江实验室 | Algorithm application publishing, managing and scoring method |
CN117555586A (en) * | 2024-01-11 | 2024-02-13 | 之江实验室 | Algorithm application publishing, managing and scoring method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108846076A (en) | The massive multi-source ETL process method and system of supporting interface adaptation | |
CN103152393B (en) | A kind of charging method of cloud computing and charge system | |
CN104639374B (en) | A kind of application deployment management system | |
CN105608758B (en) | A kind of big data analysis platform device and method calculated based on algorithm configuration and distributed stream | |
CN202058147U (en) | Distribution type real-time database management system | |
CN105005570B (en) | Magnanimity intelligent power data digging method and device based on cloud computing | |
CN105893628A (en) | Real-time data collection system and method | |
CN107733986A (en) | Support the protection of integrated deployment and monitoring operation big data support platform | |
CN103235820B (en) | Date storage method and device in a kind of group system | |
JP2015146183A (en) | Managing big data in process control systems | |
KR20150112357A (en) | Sensor data processing system and method thereof | |
CN103973815A (en) | Method for unified monitoring of storage environment across data centers | |
CN102955977A (en) | Energy efficiency service method and energy efficiency service platform adopting same on basis of cloud technology | |
CN104376365A (en) | Method for constructing information system running rule libraries on basis of association rule mining | |
CN106201754A (en) | Mission bit stream analyzes method and device | |
CN105577411B (en) | Cloud service monitoring method and device based on service origin | |
CN109542057A (en) | Novel maintenance model and its construction method based on virtual Machine Architecture | |
CN107612984B (en) | Big data platform based on internet | |
CN101072130A (en) | Network performance measuring method and system | |
CN103795575A (en) | Multi-data-centre-oriented system monitoring method | |
CN112365366A (en) | Micro-grid management method and system based on intelligent 5G slice | |
CN114448094A (en) | Data sharing system based on platform area intelligent service terminal edge calculation | |
Kalim et al. | Henge: Intent-driven multi-tenant stream processing | |
CN115391444A (en) | Heterogeneous data acquisition and interaction method, device, equipment and storage medium | |
CN103945005A (en) | Multiple evaluation indexes based dynamic load balancing framework |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181120 |
|
RJ01 | Rejection of invention patent application after publication |