CN110619014A - ETL-based data extraction method - Google Patents

ETL-based data extraction method Download PDF

Info

Publication number
CN110619014A
CN110619014A CN201910882117.XA CN201910882117A CN110619014A CN 110619014 A CN110619014 A CN 110619014A CN 201910882117 A CN201910882117 A CN 201910882117A CN 110619014 A CN110619014 A CN 110619014A
Authority
CN
China
Prior art keywords
data
processing
data extraction
etl
extraction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201910882117.XA
Other languages
Chinese (zh)
Inventor
吴鹏
章跃俊
潘康
梁晔
金明明
刘耀庭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Power Supply Branch Of Baoying County Jiangsu Electric Power Co Ltd
State Grid Jiangsu Electric Power Co Ltd
Yangzhou Power Supply Co of Jiangsu Electric Power Co
Original Assignee
Power Supply Branch Of Baoying County Jiangsu Electric Power Co Ltd
State Grid Jiangsu Electric Power Co Ltd
Yangzhou Power Supply Co of Jiangsu Electric Power Co
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Power Supply Branch Of Baoying County Jiangsu Electric Power Co Ltd, State Grid Jiangsu Electric Power Co Ltd, Yangzhou Power Supply Co of Jiangsu Electric Power Co filed Critical Power Supply Branch Of Baoying County Jiangsu Electric Power Co Ltd
Priority to CN201910882117.XA priority Critical patent/CN110619014A/en
Publication of CN110619014A publication Critical patent/CN110619014A/en
Withdrawn legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A data extraction method based on ETL. The invention relates to the field of data processing, in particular to a data extraction method based on ETL. The ETL-based data extraction method for efficiently and reliably extracting the visual management and control data of the enterprise project is provided. The invention achieves the following technical effects under the distributed integrated environment of multi-system and multi-data source extraction requirement, data extraction and data processing message notification manufacture and data extraction processing: 1. the perceptibility of project visualization management and control data extraction and analysis processing is as follows: the rules can be flexibly configured or expanded to adapt to data extraction dynamic sensing message transmission of different service systems and different relational data; 2. resource of project visualization management and control data extraction and analysis processing: and when the complex physical environment is extracted and processed, dynamically scheduling the data extraction resources in a multi-distributed cluster environment.

Description

ETL-based data extraction method
Technical Field
The invention relates to the field of data processing, in particular to a data extraction method based on ETL.
Background
When an enterprise performs item visualization management and control, due to the fact that the enterprise business systems are different, suppliers are different, the database diversity and the main target of the traditional business system is subjected to content adding, modifying, deleting and other operations, visual display of theme data can not be unified frequently according to different item dimensions, time dimensions, multiple departments, data sharing of multiple business systems and other modes, a general data extraction technology is only used for processing multi-business data access and simple time series, and a complete notification mechanism for data extraction processing is lacked.
In the technology of related project visual management and control data extraction, data extraction definitions and time series are provided respectively, the problem of data unified extraction in a multi-service system and a multi-data environment is solved when data is extracted into a unified data resource library, but a message machine manufacturing and data scheduling processing method for guaranteeing data extraction under normal service processes such as a data extraction and data analysis message notification mechanism, a distributed cluster data processing environment and the like is not provided, and an exception handling measure after error data occurs is also not provided.
Disclosure of Invention
Aiming at the problems, the invention provides the ETL-based data extraction method for efficiently and reliably extracting the visual management and control data of the enterprise project.
The technical scheme of the invention is as follows: the method comprises the following steps:
s1, configuring extraction parameters of the extraction target data;
s2, compiling an SQL extracting command;
s3, extracting and processing the target data, and integrating the target data into a basic library;
s4, completing basic database data extraction and generating a message notification;
s5, the data processing bus monitors the received message notification queue to perform data processing calculation;
s6, completing data processing and calculation, and enabling the data to enter a project visualization theme database;
and S7, performing item visualization display on the data in the theme database.
In the step S1, in the step S,
the data extraction parameters include: basic information of source data, an access mode for extracting data and frequency for extracting data;
the basic information of the source data comprises basic information such as database types of the source data;
the access mode for extracting data comprises a Web Service interface, a Rest interface, a database interface, online filling, file uploading and batch import;
the frequency of the extracted data is set to create a job task at a frequency of a desired time.
In the step S3, in the step S,
extracting target data and then processing the target data, wherein the processing comprises filtering, cleaning, format conversion, desensitization processing, decryption processing and analysis processing; and managing data definition, data structure, data identification, data coding, data cataloging, source, conversion relation, quality level, dependency relation and security authority content.
The filtering comprises filtering incomplete data, error data and repeated data, and integrating the incomplete data, the error data and the repeated data into a basic library.
In step S5, the data processing calculation includes the steps of:
1) establishing a list of all data extraction computing resources in a data extraction front-end processor;
2) when data extraction initialization is carried out, initializing the maximum thread number in a thread pool of computing resources;
3) converting the received message notification into a message processing queue to wait for processing;
4) monitoring a computing resource thread pool, and judging whether a spare processing thread exists or not until the spare processing thread exists;
5) when the spare processing threads exist, selecting resources with more spare threads to process the message processing queue;
6) and after the calculation is completed, releasing the thread to occupy the processing resource.
The invention achieves the following technical effects under the distributed integrated environment of multi-system and multi-data source extraction requirement, data extraction and data processing message notification manufacture and data extraction processing:
1. the perceptibility of project visualization management and control data extraction and analysis processing is as follows: the rules can be flexibly configured or expanded to adapt to data extraction dynamic sensing message transmission of different service systems and different relational data;
2. resource of project visualization management and control data extraction and analysis processing: and when the complex physical environment is extracted and processed, dynamically scheduling the data extraction resources in a multi-distributed cluster environment.
Drawings
Figure 1 is a flow chart of the present invention,
fig. 2 is a flowchart of the data processing calculation in step S5.
Detailed Description
The present invention, as shown in fig. 1-2, comprises the following steps:
s1, configuring extraction parameters of the extraction target data;
the data extraction parameters include: basic information of source data, access mode of extracted data and frequency of extracted data.
The basic information of the source data comprises basic information such as database types of the source data; the access mode for extracting data comprises a Web Service interface, a Rest interface, a database interface, online filling, file uploading and batch import; the frequency of extracting data is, for example, 1 hour, 2 hours, 10 minutes, 1 day, and the job task is created at a frequency of a required time.
S2, compiling an SQL extracting command; and SQL scripts can be used for batch and timing extraction.
S3, extracting and processing the target data, and integrating the target data into a basic library;
extracting target data and then processing the target data, wherein the processing comprises filtering, cleaning, format conversion, desensitization processing, decryption processing and analysis processing; and managing data definition, data structure, data identification, data coding, data cataloging, source, conversion relation, quality level, dependency relation and security authority content;
the filtering comprises filtering dirty data and waste data such as incomplete data, error data and repeated data, and integrating the dirty data and the waste data into a basic library; the data extraction can adopt the synchronization modes of real-time synchronization, timing synchronization and manual synchronization.
S4, completing basic database data extraction and generating a message notification;
s5, the data processing bus monitors the received message notification queue to perform data processing calculation processing;
when the data extraction of the visual management and control of the project is finished, the monitor monitors the completion of one type of index data, message notifications capable of identifying the types of the identifiers are created, and the messages are pushed to a data processing bus of the message processing data extraction front-end processor to monitor a received message queue, so that the message task queue is durably realized for guaranteeing the abnormal condition of data processing, and the message queue can be recovered after the extreme abnormal condition (such as abnormal power failure) is guaranteed.
The specific data processing calculation comprises the following steps:
1) establishing a list of all data extraction computing resources in a data extraction front-end processor;
2) when data extraction initialization is carried out, initializing the maximum thread number in a thread pool of computing resources; the large thread number is determined by the application program according to the computing resource configuration list and the computing resource list.
3) Converting the received message notification into a message processing queue to wait for processing; the method comprises the step of implementing persistence on the message processing queue, wherein the persistence restores the message processing queue when the data processing is abnormal.
4) Monitoring a computing resource thread pool, and judging whether a spare processing thread exists or not until the spare processing thread exists;
5) when the spare processing threads exist, selecting resources with more spare threads to process the message processing queue;
6) and after the calculation is completed, releasing the thread to occupy the processing resource.
S6, completing data processing and calculation, and enabling the data to enter a project visualization theme database; the subject database is a database built according to current viewing and analysis requirements.
And S7, performing item visualization display on the data in the theme database.
The disclosure of the present application also includes the following points:
(1) the drawings of the embodiments disclosed herein only relate to the structures related to the embodiments disclosed herein, and other structures can refer to general designs;
(2) in case of conflict, the embodiments and features of the embodiments disclosed in this application can be combined with each other to arrive at new embodiments;
the above embodiments are only embodiments disclosed in the present disclosure, but the scope of the disclosure is not limited thereto, and the scope of the disclosure should be determined by the scope of the claims.

Claims (5)

1. A data extraction method based on ETL is characterized by comprising the following steps:
s1, configuring extraction parameters of the target data;
s2, compiling an SQL extracting command;
s3, extracting and processing the target data, and integrating the target data into a basic library;
s4, completing basic database data extraction and generating a message notification;
s5, monitoring the received message notification queue by the data processing bus, and performing data processing calculation;
s6, completing data processing and calculation, and enabling the data to enter a project visualization theme database;
and S7, carrying out item visualization display on the data in the theme database.
2. The ETL-based data extraction method as claimed in claim 1, wherein, in step S1,
the parameter extraction comprises the following steps: basic information of source data, an access mode for extracting data and frequency for extracting data;
the basic information of the source data comprises a database type of the source data;
the access mode for extracting data comprises a Web Service interface, a Rest interface, a database interface, online filling, file uploading and batch import;
the frequency of the extracted data is set to create a job task at a frequency of a desired time.
3. The ETL-based data extraction method as claimed in claim 1, wherein, in step S3,
the processing after the target data is extracted comprises filtering, cleaning, format conversion, desensitization processing, decryption processing and analysis processing; and managing data definition, data structure, data identification, data coding, data cataloging, source, conversion relation, quality level, dependency relation and security authority content.
4. The ETL-based data extraction method of claim 3, wherein the filtering comprises filtering incomplete data, error data and repeated data, and integrating the incomplete data, the error data and the repeated data into the base library.
5. The ETL-based data extraction method of claim 1, wherein in step S5, the data processing calculation comprises the following steps:
1) establishing a list of all data extraction computing resources in a data extraction front-end processor;
2) when data extraction initialization is carried out, initializing the maximum thread number in a thread pool of computing resources;
3) converting the received message notification into a message processing queue to wait for processing;
4) monitoring a computing resource thread pool, and judging whether a spare processing thread exists or not until the spare processing thread exists;
5) when the spare processing threads exist, selecting resources with more spare threads to process the message processing queue;
6) and after the calculation is completed, releasing the thread to occupy the processing resource.
CN201910882117.XA 2019-09-18 2019-09-18 ETL-based data extraction method Withdrawn CN110619014A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910882117.XA CN110619014A (en) 2019-09-18 2019-09-18 ETL-based data extraction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910882117.XA CN110619014A (en) 2019-09-18 2019-09-18 ETL-based data extraction method

Publications (1)

Publication Number Publication Date
CN110619014A true CN110619014A (en) 2019-12-27

Family

ID=68923520

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910882117.XA Withdrawn CN110619014A (en) 2019-09-18 2019-09-18 ETL-based data extraction method

Country Status (1)

Country Link
CN (1) CN110619014A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111767327A (en) * 2020-05-14 2020-10-13 中邮消费金融有限公司 Data warehouse component method and system with dependency relationship among data streams
CN113127919A (en) * 2019-12-30 2021-07-16 航天信息股份有限公司 Data processing method and device, computing equipment and storage medium
CN113158233A (en) * 2021-03-29 2021-07-23 重庆首亨软件股份有限公司 Data preprocessing method and device and computer storage medium
CN113986909A (en) * 2021-12-24 2022-01-28 畅捷通信息技术股份有限公司 Real-time data synchronization method, system and medium for reversely recording synchronization state

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108062407A (en) * 2017-12-28 2018-05-22 成都飞机工业(集团)有限责任公司 A kind of project visualizes management and control data pick-up method

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108062407A (en) * 2017-12-28 2018-05-22 成都飞机工业(集团)有限责任公司 A kind of project visualizes management and control data pick-up method

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113127919A (en) * 2019-12-30 2021-07-16 航天信息股份有限公司 Data processing method and device, computing equipment and storage medium
CN111767327A (en) * 2020-05-14 2020-10-13 中邮消费金融有限公司 Data warehouse component method and system with dependency relationship among data streams
CN111767327B (en) * 2020-05-14 2021-06-15 中邮消费金融有限公司 Data warehouse construction method and system with dependency relationship among data streams
CN113158233A (en) * 2021-03-29 2021-07-23 重庆首亨软件股份有限公司 Data preprocessing method and device and computer storage medium
CN113986909A (en) * 2021-12-24 2022-01-28 畅捷通信息技术股份有限公司 Real-time data synchronization method, system and medium for reversely recording synchronization state

Similar Documents

Publication Publication Date Title
CN110619014A (en) ETL-based data extraction method
CN109684053B (en) Task scheduling method and system for big data
CN101477543B (en) System and method for automating ETL application
CN113569987A (en) Model training method and device
CN102467532A (en) Task processing method and task processing device
CN110895506B (en) Method and system for constructing test data
CN113448712A (en) Task scheduling execution method and device
CN110569113A (en) Method and system for scheduling distributed tasks and computer readable storage medium
CN111026602A (en) Health inspection scheduling management method and device of cloud platform and electronic equipment
US11016736B2 (en) Constraint programming using block-based workflows
EP3502872B1 (en) Pipeline task verification for a data processing platform
CN110611707A (en) Task scheduling method and device
CN114416703A (en) Method, device, equipment and medium for automatically monitoring data integrity
CN107145576B (en) Big data ETL scheduling system supporting visualization and process
CN115168457A (en) Visualization processing method and visualization processing device based on metadata management
US7475073B2 (en) Technique for improving staff queries in a workflow management system
CN107153679B (en) Extraction statistical method and system for semi-structured big data
CN111756778A (en) Server disk cleaning script pushing method and device and storage medium
CN108062407A (en) A kind of project visualizes management and control data pick-up method
CN109918363B (en) Method for carrying out data model consistency management based on view cross-database type
CN115185673B (en) Distributed timing task scheduling method, system, storage medium and program product
US20200019910A1 (en) Block-based prediction for manufacturing environments
CN114723080A (en) Equipment maintenance management method, system, device and storage medium
CN111767299A (en) Database operation method, device and system, storage medium and electronic equipment
CN112507013B (en) Industrial equipment data storage method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20191227

WW01 Invention patent application withdrawn after publication