CN110619014A - ETL-based data extraction method - Google Patents
ETL-based data extraction method Download PDFInfo
- Publication number
- CN110619014A CN110619014A CN201910882117.XA CN201910882117A CN110619014A CN 110619014 A CN110619014 A CN 110619014A CN 201910882117 A CN201910882117 A CN 201910882117A CN 110619014 A CN110619014 A CN 110619014A
- Authority
- CN
- China
- Prior art keywords
- data
- processing
- data extraction
- etl
- extraction
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/254—Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A data extraction method based on ETL. The invention relates to the field of data processing, in particular to a data extraction method based on ETL. The ETL-based data extraction method for efficiently and reliably extracting the visual management and control data of the enterprise project is provided. The invention achieves the following technical effects under the distributed integrated environment of multi-system and multi-data source extraction requirement, data extraction and data processing message notification manufacture and data extraction processing: 1. the perceptibility of project visualization management and control data extraction and analysis processing is as follows: the rules can be flexibly configured or expanded to adapt to data extraction dynamic sensing message transmission of different service systems and different relational data; 2. resource of project visualization management and control data extraction and analysis processing: and when the complex physical environment is extracted and processed, dynamically scheduling the data extraction resources in a multi-distributed cluster environment.
Description
Technical Field
The invention relates to the field of data processing, in particular to a data extraction method based on ETL.
Background
When an enterprise performs item visualization management and control, due to the fact that the enterprise business systems are different, suppliers are different, the database diversity and the main target of the traditional business system is subjected to content adding, modifying, deleting and other operations, visual display of theme data can not be unified frequently according to different item dimensions, time dimensions, multiple departments, data sharing of multiple business systems and other modes, a general data extraction technology is only used for processing multi-business data access and simple time series, and a complete notification mechanism for data extraction processing is lacked.
In the technology of related project visual management and control data extraction, data extraction definitions and time series are provided respectively, the problem of data unified extraction in a multi-service system and a multi-data environment is solved when data is extracted into a unified data resource library, but a message machine manufacturing and data scheduling processing method for guaranteeing data extraction under normal service processes such as a data extraction and data analysis message notification mechanism, a distributed cluster data processing environment and the like is not provided, and an exception handling measure after error data occurs is also not provided.
Disclosure of Invention
Aiming at the problems, the invention provides the ETL-based data extraction method for efficiently and reliably extracting the visual management and control data of the enterprise project.
The technical scheme of the invention is as follows: the method comprises the following steps:
s1, configuring extraction parameters of the extraction target data;
s2, compiling an SQL extracting command;
s3, extracting and processing the target data, and integrating the target data into a basic library;
s4, completing basic database data extraction and generating a message notification;
s5, the data processing bus monitors the received message notification queue to perform data processing calculation;
s6, completing data processing and calculation, and enabling the data to enter a project visualization theme database;
and S7, performing item visualization display on the data in the theme database.
In the step S1, in the step S,
the data extraction parameters include: basic information of source data, an access mode for extracting data and frequency for extracting data;
the basic information of the source data comprises basic information such as database types of the source data;
the access mode for extracting data comprises a Web Service interface, a Rest interface, a database interface, online filling, file uploading and batch import;
the frequency of the extracted data is set to create a job task at a frequency of a desired time.
In the step S3, in the step S,
extracting target data and then processing the target data, wherein the processing comprises filtering, cleaning, format conversion, desensitization processing, decryption processing and analysis processing; and managing data definition, data structure, data identification, data coding, data cataloging, source, conversion relation, quality level, dependency relation and security authority content.
The filtering comprises filtering incomplete data, error data and repeated data, and integrating the incomplete data, the error data and the repeated data into a basic library.
In step S5, the data processing calculation includes the steps of:
1) establishing a list of all data extraction computing resources in a data extraction front-end processor;
2) when data extraction initialization is carried out, initializing the maximum thread number in a thread pool of computing resources;
3) converting the received message notification into a message processing queue to wait for processing;
4) monitoring a computing resource thread pool, and judging whether a spare processing thread exists or not until the spare processing thread exists;
5) when the spare processing threads exist, selecting resources with more spare threads to process the message processing queue;
6) and after the calculation is completed, releasing the thread to occupy the processing resource.
The invention achieves the following technical effects under the distributed integrated environment of multi-system and multi-data source extraction requirement, data extraction and data processing message notification manufacture and data extraction processing:
1. the perceptibility of project visualization management and control data extraction and analysis processing is as follows: the rules can be flexibly configured or expanded to adapt to data extraction dynamic sensing message transmission of different service systems and different relational data;
2. resource of project visualization management and control data extraction and analysis processing: and when the complex physical environment is extracted and processed, dynamically scheduling the data extraction resources in a multi-distributed cluster environment.
Drawings
Figure 1 is a flow chart of the present invention,
fig. 2 is a flowchart of the data processing calculation in step S5.
Detailed Description
The present invention, as shown in fig. 1-2, comprises the following steps:
s1, configuring extraction parameters of the extraction target data;
the data extraction parameters include: basic information of source data, access mode of extracted data and frequency of extracted data.
The basic information of the source data comprises basic information such as database types of the source data; the access mode for extracting data comprises a Web Service interface, a Rest interface, a database interface, online filling, file uploading and batch import; the frequency of extracting data is, for example, 1 hour, 2 hours, 10 minutes, 1 day, and the job task is created at a frequency of a required time.
S2, compiling an SQL extracting command; and SQL scripts can be used for batch and timing extraction.
S3, extracting and processing the target data, and integrating the target data into a basic library;
extracting target data and then processing the target data, wherein the processing comprises filtering, cleaning, format conversion, desensitization processing, decryption processing and analysis processing; and managing data definition, data structure, data identification, data coding, data cataloging, source, conversion relation, quality level, dependency relation and security authority content;
the filtering comprises filtering dirty data and waste data such as incomplete data, error data and repeated data, and integrating the dirty data and the waste data into a basic library; the data extraction can adopt the synchronization modes of real-time synchronization, timing synchronization and manual synchronization.
S4, completing basic database data extraction and generating a message notification;
s5, the data processing bus monitors the received message notification queue to perform data processing calculation processing;
when the data extraction of the visual management and control of the project is finished, the monitor monitors the completion of one type of index data, message notifications capable of identifying the types of the identifiers are created, and the messages are pushed to a data processing bus of the message processing data extraction front-end processor to monitor a received message queue, so that the message task queue is durably realized for guaranteeing the abnormal condition of data processing, and the message queue can be recovered after the extreme abnormal condition (such as abnormal power failure) is guaranteed.
The specific data processing calculation comprises the following steps:
1) establishing a list of all data extraction computing resources in a data extraction front-end processor;
2) when data extraction initialization is carried out, initializing the maximum thread number in a thread pool of computing resources; the large thread number is determined by the application program according to the computing resource configuration list and the computing resource list.
3) Converting the received message notification into a message processing queue to wait for processing; the method comprises the step of implementing persistence on the message processing queue, wherein the persistence restores the message processing queue when the data processing is abnormal.
4) Monitoring a computing resource thread pool, and judging whether a spare processing thread exists or not until the spare processing thread exists;
5) when the spare processing threads exist, selecting resources with more spare threads to process the message processing queue;
6) and after the calculation is completed, releasing the thread to occupy the processing resource.
S6, completing data processing and calculation, and enabling the data to enter a project visualization theme database; the subject database is a database built according to current viewing and analysis requirements.
And S7, performing item visualization display on the data in the theme database.
The disclosure of the present application also includes the following points:
(1) the drawings of the embodiments disclosed herein only relate to the structures related to the embodiments disclosed herein, and other structures can refer to general designs;
(2) in case of conflict, the embodiments and features of the embodiments disclosed in this application can be combined with each other to arrive at new embodiments;
the above embodiments are only embodiments disclosed in the present disclosure, but the scope of the disclosure is not limited thereto, and the scope of the disclosure should be determined by the scope of the claims.
Claims (5)
1. A data extraction method based on ETL is characterized by comprising the following steps:
s1, configuring extraction parameters of the target data;
s2, compiling an SQL extracting command;
s3, extracting and processing the target data, and integrating the target data into a basic library;
s4, completing basic database data extraction and generating a message notification;
s5, monitoring the received message notification queue by the data processing bus, and performing data processing calculation;
s6, completing data processing and calculation, and enabling the data to enter a project visualization theme database;
and S7, carrying out item visualization display on the data in the theme database.
2. The ETL-based data extraction method as claimed in claim 1, wherein, in step S1,
the parameter extraction comprises the following steps: basic information of source data, an access mode for extracting data and frequency for extracting data;
the basic information of the source data comprises a database type of the source data;
the access mode for extracting data comprises a Web Service interface, a Rest interface, a database interface, online filling, file uploading and batch import;
the frequency of the extracted data is set to create a job task at a frequency of a desired time.
3. The ETL-based data extraction method as claimed in claim 1, wherein, in step S3,
the processing after the target data is extracted comprises filtering, cleaning, format conversion, desensitization processing, decryption processing and analysis processing; and managing data definition, data structure, data identification, data coding, data cataloging, source, conversion relation, quality level, dependency relation and security authority content.
4. The ETL-based data extraction method of claim 3, wherein the filtering comprises filtering incomplete data, error data and repeated data, and integrating the incomplete data, the error data and the repeated data into the base library.
5. The ETL-based data extraction method of claim 1, wherein in step S5, the data processing calculation comprises the following steps:
1) establishing a list of all data extraction computing resources in a data extraction front-end processor;
2) when data extraction initialization is carried out, initializing the maximum thread number in a thread pool of computing resources;
3) converting the received message notification into a message processing queue to wait for processing;
4) monitoring a computing resource thread pool, and judging whether a spare processing thread exists or not until the spare processing thread exists;
5) when the spare processing threads exist, selecting resources with more spare threads to process the message processing queue;
6) and after the calculation is completed, releasing the thread to occupy the processing resource.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910882117.XA CN110619014A (en) | 2019-09-18 | 2019-09-18 | ETL-based data extraction method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910882117.XA CN110619014A (en) | 2019-09-18 | 2019-09-18 | ETL-based data extraction method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110619014A true CN110619014A (en) | 2019-12-27 |
Family
ID=68923520
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910882117.XA Withdrawn CN110619014A (en) | 2019-09-18 | 2019-09-18 | ETL-based data extraction method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110619014A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111767327A (en) * | 2020-05-14 | 2020-10-13 | 中邮消费金融有限公司 | Data warehouse component method and system with dependency relationship among data streams |
CN113127919A (en) * | 2019-12-30 | 2021-07-16 | 航天信息股份有限公司 | Data processing method and device, computing equipment and storage medium |
CN113158233A (en) * | 2021-03-29 | 2021-07-23 | 重庆首亨软件股份有限公司 | Data preprocessing method and device and computer storage medium |
CN113986909A (en) * | 2021-12-24 | 2022-01-28 | 畅捷通信息技术股份有限公司 | Real-time data synchronization method, system and medium for reversely recording synchronization state |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108062407A (en) * | 2017-12-28 | 2018-05-22 | 成都飞机工业(集团)有限责任公司 | A kind of project visualizes management and control data pick-up method |
-
2019
- 2019-09-18 CN CN201910882117.XA patent/CN110619014A/en not_active Withdrawn
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108062407A (en) * | 2017-12-28 | 2018-05-22 | 成都飞机工业(集团)有限责任公司 | A kind of project visualizes management and control data pick-up method |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113127919A (en) * | 2019-12-30 | 2021-07-16 | 航天信息股份有限公司 | Data processing method and device, computing equipment and storage medium |
CN111767327A (en) * | 2020-05-14 | 2020-10-13 | 中邮消费金融有限公司 | Data warehouse component method and system with dependency relationship among data streams |
CN111767327B (en) * | 2020-05-14 | 2021-06-15 | 中邮消费金融有限公司 | Data warehouse construction method and system with dependency relationship among data streams |
CN113158233A (en) * | 2021-03-29 | 2021-07-23 | 重庆首亨软件股份有限公司 | Data preprocessing method and device and computer storage medium |
CN113986909A (en) * | 2021-12-24 | 2022-01-28 | 畅捷通信息技术股份有限公司 | Real-time data synchronization method, system and medium for reversely recording synchronization state |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110619014A (en) | ETL-based data extraction method | |
CN109684053B (en) | Task scheduling method and system for big data | |
CN101477543B (en) | System and method for automating ETL application | |
CN113569987A (en) | Model training method and device | |
CN102467532A (en) | Task processing method and task processing device | |
CN110895506B (en) | Method and system for constructing test data | |
CN113448712A (en) | Task scheduling execution method and device | |
CN110569113A (en) | Method and system for scheduling distributed tasks and computer readable storage medium | |
CN111026602A (en) | Health inspection scheduling management method and device of cloud platform and electronic equipment | |
US11016736B2 (en) | Constraint programming using block-based workflows | |
EP3502872B1 (en) | Pipeline task verification for a data processing platform | |
CN110611707A (en) | Task scheduling method and device | |
CN114416703A (en) | Method, device, equipment and medium for automatically monitoring data integrity | |
CN107145576B (en) | Big data ETL scheduling system supporting visualization and process | |
CN115168457A (en) | Visualization processing method and visualization processing device based on metadata management | |
US7475073B2 (en) | Technique for improving staff queries in a workflow management system | |
CN107153679B (en) | Extraction statistical method and system for semi-structured big data | |
CN111756778A (en) | Server disk cleaning script pushing method and device and storage medium | |
CN108062407A (en) | A kind of project visualizes management and control data pick-up method | |
CN109918363B (en) | Method for carrying out data model consistency management based on view cross-database type | |
CN115185673B (en) | Distributed timing task scheduling method, system, storage medium and program product | |
US20200019910A1 (en) | Block-based prediction for manufacturing environments | |
CN114723080A (en) | Equipment maintenance management method, system, device and storage medium | |
CN111767299A (en) | Database operation method, device and system, storage medium and electronic equipment | |
CN112507013B (en) | Industrial equipment data storage method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20191227 |
|
WW01 | Invention patent application withdrawn after publication |