CN108228908B - Data extraction method and device - Google Patents

Data extraction method and device Download PDF

Info

Publication number
CN108228908B
CN108228908B CN201810132705.7A CN201810132705A CN108228908B CN 108228908 B CN108228908 B CN 108228908B CN 201810132705 A CN201810132705 A CN 201810132705A CN 108228908 B CN108228908 B CN 108228908B
Authority
CN
China
Prior art keywords
data
data extraction
extraction
file
partition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810132705.7A
Other languages
Chinese (zh)
Other versions
CN108228908A (en
Inventor
林明
欧阳小兵
戴丽玛
于鸿鹏
陈宏亮
张丹
张素钊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bank of China Ltd
Original Assignee
Bank of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bank of China Ltd filed Critical Bank of China Ltd
Priority to CN201810132705.7A priority Critical patent/CN108228908B/en
Publication of CN108228908A publication Critical patent/CN108228908A/en
Application granted granted Critical
Publication of CN108228908B publication Critical patent/CN108228908B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data extraction method and a device, wherein the method comprises the following steps: analyzing the acquired data extraction task of the source system, and generating a data extraction list corresponding to the data extraction task according to the data partition granularity; determining a data extraction mode for the source system according to the data capacity in the data extraction list; if a first preset data extraction mode is adopted for data extraction, extracting data of each data partition, generating a first data file, and storing the first data file to a target system; and if the data extraction is carried out by adopting a second preset data extraction mode, carrying out data extraction on the source system, and storing the extracted data file to the target system. The invention realizes the partition extraction of the data table, improves the data extraction efficiency and reduces the data extraction errors.

Description

Data extraction method and device
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a data extraction method and apparatus.
Background
With the development of internet technology, more and more systems need to transmit and apply data, and data of some systems need to be extracted and imported or exported to corresponding destination systems.
The existing data extraction scheme is generally realized by the following steps: identifying a source system data table range that needs to be exported; and writing an export import statement, exporting the DMP file to a target system, or importing the DMP file. The whole processing flow of the existing data extraction scheme needs to be controlled and executed by an operator, so that the data extraction efficiency is low due to manual intervention. Moreover, errors are easy to occur in the process of exporting and importing a large number of data tables, once the exported DPM file has problems, hundreds of tables or even thousands of tables can not be exported and imported at one time, so that the efficiency is low and the accuracy is low; because the structures of all the data tables are different, the existing scheme cannot extract a part of data meeting requirements in all the tables by using data extraction statements, and the influence of data extraction errors can occur.
Disclosure of Invention
In view of the above problems, the present invention provides a data extraction method and apparatus, which achieve the purpose of extracting data table partitions, improving data extraction efficiency, and reducing data extraction errors.
In order to achieve the purpose, the invention provides the following technical scheme:
a method of data extraction, comprising:
analyzing the acquired data extraction task of the source system, and generating a data extraction list corresponding to the data extraction task according to the data partition granularity;
determining a data extraction mode for the source system according to the data capacity in the data extraction list;
if a first preset data extraction mode is adopted for data extraction, extracting data of each data partition, generating a first data file, and storing the first data file to a target system;
and if the data extraction is carried out by adopting a second preset data extraction mode, carrying out data extraction on the source system, and storing the extracted data file to the target system.
Preferably, the analyzing the acquired data extraction task of the source system, and generating a data extraction list corresponding to the data extraction task according to the data partition granularity includes:
analyzing the acquired data extraction task to obtain configuration information corresponding to the data extraction task;
partitioning the data extraction tasks according to the configuration information, and generating data extraction subtasks corresponding to the partitions for the data extraction tasks of each partition;
and generating a data extraction list by each data extraction subtask.
Preferably, before saving the data file to the target system, the method further comprises:
judging whether the target system has a partition corresponding to the data file, if not, adding the partition corresponding to the data file in the target system;
if the data file exists, the target partition corresponding to the data file in the target system is found, and the data in the target partition is emptied.
Preferably, the determining a data extraction manner for the source system according to the data capacity in the data extraction list includes:
judging whether the data capacity in the data extraction list is larger than a preset data quantity threshold value, if so, extracting data from the source system by adopting a first preset data extraction mode, and otherwise, extracting data by adopting a second preset data extraction mode;
the first data extraction mode is a DMP file extraction mode, and the second data extraction mode is a DBLINK data extraction mode.
Preferably, when saving the first data file to the target system, the method further comprises:
the first data file is transmitted to the target system in parallel according to a preset parallelism degree through a preset data transmission mode;
judging whether the target system successfully acquires the first data file according to the data extraction list in parallel, if so, continuing to upload the first data file; if not, judging whether the first data file has errors or not.
A data extraction apparatus comprising:
the generating module is used for analyzing the acquired data extraction task of the source system and generating a data extraction list corresponding to the data extraction task according to the data partition granularity;
the determining module is used for determining a data extraction mode of the source system according to the data capacity in the data extraction list;
the first extraction module is used for extracting the data of each data partition if a first preset data extraction mode is adopted for data extraction, generating a first data file and storing the first data file to a target system;
and the second extraction module is used for extracting data from the source system and storing the extracted data file to the target system if a second preset data extraction mode is adopted for data extraction.
Preferably, the generating module comprises:
the analysis unit is used for analyzing the acquired data extraction task to obtain configuration information corresponding to the data extraction task;
the partitioning unit is used for partitioning the data extraction tasks according to the configuration information and generating data extraction subtasks corresponding to the partitions for the data extraction tasks of each partition;
and the generating unit is used for generating a data extraction list from each data extraction subtask.
Preferably, before saving the data file to the target system, the method further comprises:
the judging module is used for judging whether the target system has a partition corresponding to the data file, and if not, the partition corresponding to the data file is added in the target system;
if the data file exists, the target partition corresponding to the data file in the target system is found, and the data in the target partition is emptied.
Preferably, the determining module comprises:
a capacity judging unit, configured to judge whether a data capacity in the data extraction list is greater than a preset data amount threshold, if so, perform data extraction on the source system in a first preset data extraction manner, and otherwise, perform data extraction in a second preset data extraction manner;
the first data extraction mode is a DMP file extraction mode, and the second data extraction mode is a DBLINK data extraction mode.
Preferably, when saving the first data file to the target system, the method further comprises:
the transmission unit is used for transmitting the first data file to the target system in parallel according to a preset parallelism by a preset data transmission mode;
the acquisition judging unit is used for judging whether the target system successfully acquires the first data file in parallel according to the data extraction list, and if so, continuously uploading the first data file; if not, judging whether the first data file has errors or not.
Compared with the prior art, the data extraction method and the data extraction device provided by the invention have the advantages that the acquired data extraction tasks of the source system are analyzed, the data extraction tasks are divided into sub-tasks according to the partition granularity, and finally the task extraction list is generated, so that the problem of low data extraction efficiency caused by extraction of all data to be extracted in the prior art can be solved when the data are extracted in parallel by partitions, namely the data are imported or exported. Meanwhile, two data extraction modes are set, so that the extraction efficiency can be improved by adopting the corresponding data extraction mode according to the data capacity, manual intervention is not needed, the data synchronization of a source system and a target system can be realized, and the error influence of data extraction is reduced.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a schematic flow chart of a data extraction method according to an embodiment of the present invention;
fig. 2 is a schematic flow chart of another data extraction method according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a data extraction device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The terms "first" and "second," and the like in the description and claims of the present invention and the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "comprising" and "having," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not set forth for a listed step or element but may include steps or elements not listed.
An embodiment of the present invention provides a data extraction method, and referring to fig. 1, the method may include the following steps:
s11, analyzing the acquired data extraction task of the source system, and generating a data extraction list corresponding to the data extraction task according to the data partition granularity;
in another embodiment of the present invention, a method for generating a data extraction list is further provided, which may include the following steps:
analyzing the acquired data extraction task to obtain configuration information corresponding to the data extraction task;
partitioning the data extraction tasks according to the configuration information, and generating data extraction subtasks corresponding to the partitions for the data extraction tasks of each partition;
and generating a data extraction list by each data extraction subtask.
The data extraction task configured at the front end is analyzed, wherein the data extraction task is generated by front-end task personnel selecting a table range, a date range, a region range and the like according to needs, and the data extraction can be specifically data export or data import in the embodiment of the invention.
Then, the background analyzes the configuration information of the data extraction task through a preset code, for example, p _ gen _ task _ file, and finally generates a data extraction list according to a way that a data extraction subtask is generated by one sub-partition, that is, the data extraction list includes a plurality of data extraction subtasks.
It should be noted that p _ gen _ task _ file is only a program for analyzing an import task configured at the front end and generating a subtask (file list); the actual exported program is exp _ process.sh, the shell reads subtasks (file lists), each subtask uses an expdp call to export the dmp according to the set parallelism, for example, the parallelism is set to 10, then ten expdp sentences run in the background at the same time to generate 10 dmp files; the imported program is imp _ process.sh, the process is to read subtasks (file lists), and each subtask uses impdp to call up and import according to the set parallelism.
S12, determining a data extraction mode for the source system according to the data capacity in the data extraction list;
the embodiment of the invention also provides a method for determining a data extraction mode, which can comprise the following steps:
judging whether the data capacity in the data extraction list is larger than a preset data quantity threshold value, if so, extracting data from the source system by adopting a first preset data extraction mode, and otherwise, extracting data by adopting a second preset data extraction mode;
the first data extraction mode is a DMP file extraction mode, and the second data extraction mode is a DBLINK data extraction mode.
The corresponding data extraction mode is selected according to the data capacity of the data extraction list, the DBLINK data extraction mode is adopted when the data volume is small, the DMP file extraction mode is adopted when the data volume is large, and the data volume is judged according to a preset threshold value, for example, a table with the small data volume is generally considered to be small when the data volume is smaller than 100M, and the table with the small data volume is configured in a table for a program to use.
S13, if a first preset data extraction mode is adopted for data extraction, extracting data of each data partition, generating a first data file, and storing the first data file to a target system;
and S14, if a second preset data extraction mode is adopted for data extraction, performing data extraction on the source system, and storing the extracted data file to the target system.
For example, when data is extracted, an export file list (EXP _ TASK _ FILES) is designed as follows, the primary keys are DMP _ TASK _ ID and DMP _ file, the table is created in the source system, provides export file information, records the export file state, each record corresponds to a partition or a table (a table without partitions), and EXP _ TASK _ FILES is inserted by the target system program through the DBLINK.
According to the data extraction method provided by the invention, the acquired data extraction task of the source system is analyzed, the data extraction task is divided into sub-tasks according to the partition granularity, and finally a task extraction list is generated, so that when the data is extracted in parallel by partitions, namely the data is imported or exported, the problem of low data extraction efficiency caused by extraction of all the data to be extracted in the prior art can be solved. Meanwhile, two data extraction modes are set, so that the extraction efficiency can be improved by adopting the corresponding data extraction mode according to the data capacity, manual intervention is not needed, the data synchronization of a source system and a target system can be realized, and the error influence of data extraction is reduced.
The embodiment of the invention also provides a partition adding and cleaning method, which comprises the following steps:
judging whether the target system has a partition corresponding to the data file, if not, adding the partition corresponding to the data file in the target system;
if the data file exists, the target partition corresponding to the data file in the target system is found, and the data in the target partition is emptied.
After the data file is generated, the partition corresponding to the data file may not exist in the target system, and then whether the partition exists or not needs to be judged, and if the partition does not exist, the partition needs to be added in the target system; if the data exists, the partition needs to be cleaned, and the original data of the partition is deleted. For a table without partitions, the entire table data is emptied before the data file is imported.
The basic data quantity of the table which is not partitioned is very small, the table can basically extract data by using a dblink data extraction mode, and only one dmp can be generated by using a dmp file extraction mode; in addition, the list without sub-partitions is subjected to generation of subtasks (file lists) by using the finest granularity, for example, the list without sub-partitions is generated, so that only one subtask (file list) is provided, namely the list per se; for a table with only a single partition (not a compound partition, i.e., no sub-partitions), then the number of subtask (file list) pieces is the number of single partitions. For a table of compound partitions, then the number of subtask (file list) pieces is the number of sub-partitions.
The embodiment of the invention also provides a data file transmission method, which comprises the following steps:
the first data file is transmitted to the target system in parallel according to a preset parallelism degree through a preset data transmission mode;
judging whether the target system successfully acquires the first data file according to the data extraction list in parallel, if so, continuing to upload the first data file; if not, judging whether the first data file has errors or not.
In the embodiment of the present invention, the predetermined data transmission mode is FTP (File transfer protocol), and FTP is used to transmit the dmp File to the target system. Specifically, the generated DMP file is FTP to a target system in parallel according to the set parallelism; if the FTP is successful, the file status in imp _ task _ files is set to ready for the import function to use:
for the case where the target system and the source system can be DBLINK, the source system performs an update file state to "ready" via DBLINK;
for the situation that the target system and the source system cannot be DBLINK, the target system determines whether all the tables related to the source system in the DMP _ TASK _ ID have been successfully received by judging whether an empty file with the file name of DMP _ TASK _ ID + the name of the source system exists in the receiving directory of the target system, and if yes, the target system sets the file states of all the source systems under all the TASKs to be ready.
The data import function in the data extraction process is divided into a DMP file extraction mode and a DBLINK data extraction mode:
the DMP file extraction mode comprises the following steps:
the target system executes import according to the configured parallelism parameter when the file state in the import file list is ready and the corresponding partition is processed;
when the data volume is large, a DMP file extraction mode is selected, each sub-partition can export a DMP file, the table without the partition exports the whole table data to generate the DMP file, a source system transmits the DMP file to a target system through a preset transmission format of FTP (file transfer protocol), the importing and exporting of the DMP file extraction mode have the advantages that breakpoint continuation can be supported, if the exporting and importing report is wrong, a program can automatically recognize to export and import again, and batch peak time can be avoided due to the support of the breakpoint continuation, so that the efficiency of the file in the importing and exporting process is higher.
And a DBLINK data extraction mode:
for the condition that the target system and the source system can use the DBLINK connection, a data extraction mode of the DBLINK data extraction mode is provided, and the specific configuration data is as follows: cfg _ value is a specific table name, part _ col is a main partition field, subpart _ col is a sub-partition field, the information is spliced into select statements according to the configuration information and the generated import file list, and different select statements are executed in parallel according to the set parallelism degree through dbms _ schedule.
The embodiment of the present invention further provides another data extraction method, which is shown in fig. 2 and mainly includes:
s21, data importing and configuring;
s22, importing configuration analysis;
s23, a data exporting step;
s24, file transmission;
s25, a data importing step;
and S26, front end display step.
In the step of data import configuration of S21, configuring data import tasks in province ranges through the data table range and the date range provided by the front-end interface;
in the step of analyzing the import configuration in S22, analyzing the data import configuration according to the partition information of the background process, and producing a corresponding export import file list, where each partition or each table (a table without partitions) corresponds to one record;
in the data export step of S23, a DMP file is generated for each record based on the export file list and the export is performed;
in the file transmission step of S24, FTP is performed on the generated DMP file to the target system, and the file status is updated;
in the data import step of S25, according to the import file list, performing DMP file import or performing data import directly from DBLINK to the source system;
in the S26 front end presentation step, the front end presents the execution state of the configured import task and provides the already imported partition list for each table.
The traditional data extraction whole process needs manual intervention. The method and the device have the advantages that the data range is determined, the execution statement is compiled, the export is executed, the DMP file is generated, and the import is executed, and the whole process is executed step by an operator, so that the time and the labor are consumed, the efficiency is low, the data range is configured at the front end, and all subsequent operations are automated without manual intervention.
The method is characterized in that errors are easy to occur in the process of exporting and importing a large number of data tables, once the exported DMP files have problems, thousands of tables are set for hundreds of disposable exports, all the tables cannot be exported and imported successfully, and the exports and imports need to be integrally executed again; the export and import parallelism is controlled by parameters, one file is successfully imported, a new import process can be automatically invoked subsequently, the export and import parallelism is controlled within the parallelism set by the parameters, the system load is prevented from being shut down due to high parallelism, and the export and import between the files are not influenced mutually. If a problem exists in an individual partition, the export import of other partitions is not affected.
The embodiment of the invention judges whether the file export and import is finished or not by the existence of the export and import process, automatically reads the generated log file, and inquires the error keywords to judge whether the export and import is successful or not. And because the finest granularity can be in the level of sub-partitions, the data extraction requirement is divided into a plurality of sub-partitions for exporting and importing, so that partial data of the data table can be extracted, and the whole table is not required to be exported and imported. And due to the difference of daily table structures, in the implementation process of the data extraction program, the program automatically synchronizes between the source system and the target system, so that the efficiency of the data extraction process is higher, and the manual intervention is not needed.
Corresponding to the data extraction method provided in the embodiment of the present invention, a data extraction device is further provided in the embodiment of the present invention, with reference to fig. 3, including:
the generation module 1 is used for analyzing the acquired data extraction task of the source system and generating a data extraction list corresponding to the data extraction task according to the data partition granularity;
a determining module 2, configured to determine a data extraction manner for the source system according to the data capacity in the data extraction list;
the first extraction module 3 is configured to extract data of each data partition if a first preset data extraction manner is adopted for data extraction, generate a first data file, and store the first data file to a target system;
and the second extraction module 4 is configured to, if data extraction is performed in a second preset data extraction manner, perform data extraction on the source system, and store the extracted data file to the target system.
Optionally, in another embodiment of the present invention, the generating module includes:
the analysis unit is used for analyzing the acquired data extraction task to obtain configuration information corresponding to the data extraction task;
the partitioning unit is used for partitioning the data extraction tasks according to the configuration information and generating data extraction subtasks corresponding to the partitions for the data extraction tasks of each partition;
and the generating unit is used for generating a data extraction list from each data extraction subtask.
Optionally, in another embodiment of the present invention, before saving the data file to the target system, the method further includes:
the judging module is used for judging whether the target system has a partition corresponding to the data file, and if not, the partition corresponding to the data file is added in the target system;
if the data file exists, the target partition corresponding to the data file in the target system is found, and the data in the target partition is emptied.
Optionally, in another embodiment of the present invention, the determining module includes:
a capacity judging unit, configured to judge whether a data capacity in the data extraction list is greater than a preset data amount threshold, if so, perform data extraction on the source system in a first preset data extraction manner, and otherwise, perform data extraction in a second preset data extraction manner;
the first data extraction mode is a DMP file extraction mode, and the second data extraction mode is a DBLINK data extraction mode.
Optionally, in another embodiment of the present invention, when saving the first data file to the target system, the method further includes:
the transmission unit is used for transmitting the first data file to the target system in parallel according to a preset parallelism by a preset data transmission mode;
the acquisition judging unit is used for judging whether the target system successfully acquires the first data file in parallel according to the data extraction list, and if so, continuously uploading the first data file; if not, judging whether the first data file has errors or not.
According to the data extraction device provided by the invention, the acquired data extraction task of the source system is analyzed, the data extraction task is divided into sub-tasks according to the partition granularity, and finally a task extraction list is generated, so that when the data is extracted in parallel by partitions, namely the data is imported or exported, the problem of low data extraction efficiency caused by extraction of all the data to be extracted in the prior art can be solved. Meanwhile, two data extraction modes are set, so that the extraction efficiency can be improved by adopting the corresponding data extraction mode according to the data capacity, manual intervention is not needed, the data synchronization of a source system and a target system can be realized, and the error influence of data extraction is reduced.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (8)

1. A data extraction method, comprising:
analyzing the acquired data extraction task of the source system, and generating a data extraction list corresponding to the data extraction task according to the data partition granularity;
judging whether the data capacity in the data extraction list is larger than a preset data quantity threshold value, if so, extracting data from the source system by adopting a first preset data extraction mode, and otherwise, extracting data by adopting a second preset data extraction mode; the first preset data extraction mode is a DMP file extraction mode, and the second preset data extraction mode is a DBLINK data extraction mode;
if a first preset data extraction mode is adopted for data extraction, extracting data of each data partition, generating a first data file, and storing the first data file to a target system;
and if the data extraction is carried out by adopting a second preset data extraction mode, carrying out data extraction on the source system, and storing the extracted data file to the target system.
2. The method according to claim 1, wherein the parsing the acquired data extraction task of the source system and generating a data extraction list corresponding to the data extraction task according to a data partition granularity includes:
analyzing the acquired data extraction task to obtain configuration information corresponding to the data extraction task;
partitioning the data extraction tasks according to the configuration information, and generating data extraction subtasks corresponding to the partitions for the data extraction tasks of each partition;
and generating a data extraction list by each data extraction subtask.
3. The method of claim 1, further comprising, prior to saving a data file to the target system:
judging whether the target system has a partition corresponding to the data file, if not, adding the partition corresponding to the data file in the target system;
if the data file exists, the target partition corresponding to the data file in the target system is found, and the data in the target partition is emptied.
4. The method of claim 1, when saving the first data file to a target system, further comprising:
the first data file is transmitted to the target system in parallel according to a preset parallelism degree through a preset data transmission mode;
judging whether the target system successfully acquires the first data file according to the data extraction list in parallel, if so, continuing to upload the first data file; if not, judging whether the first data file has errors or not.
5. A data extraction apparatus, comprising:
the generating module is used for analyzing the acquired data extraction task of the source system and generating a data extraction list corresponding to the data extraction task according to the data partition granularity;
the determining module is used for determining a data extraction mode of the source system according to the data capacity in the data extraction list;
the first extraction module is used for extracting the data of each data partition if a first preset data extraction mode is adopted for data extraction, generating a first data file and storing the first data file to a target system;
the second extraction module is used for extracting data from the source system and storing the extracted data file to the target system if a second preset data extraction mode is adopted for data extraction;
the determining module comprises:
a capacity judging unit, configured to judge whether a data capacity in the data extraction list is greater than a preset data amount threshold, if so, perform data extraction on the source system in a first preset data extraction manner, and otherwise, perform data extraction in a second preset data extraction manner;
the first preset data extraction mode is a DMP file mode, and the second preset data extraction mode is a DBLINK data mode.
6. The apparatus of claim 5, wherein the generating module comprises:
the analysis unit is used for analyzing the acquired data extraction task to obtain configuration information corresponding to the data extraction task;
the partitioning unit is used for partitioning the data extraction tasks according to the configuration information and generating data extraction subtasks corresponding to the partitions for the data extraction tasks of each partition;
and the generating unit is used for generating a data extraction list from each data extraction subtask.
7. The apparatus of claim 5, further comprising, prior to saving a data file to the target system:
the judging module is used for judging whether the target system has a partition corresponding to the data file, and if not, the partition corresponding to the data file is added in the target system;
if the data file exists, the target partition corresponding to the data file in the target system is found, and the data in the target partition is emptied.
8. The apparatus of claim 5, when saving the first data file to a target system, further comprising:
the transmission unit is used for transmitting the first data file to the target system in parallel according to a preset parallelism by a preset data transmission mode;
the acquisition judging unit is used for judging whether the target system successfully acquires the first data file in parallel according to the data extraction list, and if so, continuously uploading the first data file; if not, judging whether the first data file has errors or not.
CN201810132705.7A 2018-02-09 2018-02-09 Data extraction method and device Active CN108228908B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810132705.7A CN108228908B (en) 2018-02-09 2018-02-09 Data extraction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810132705.7A CN108228908B (en) 2018-02-09 2018-02-09 Data extraction method and device

Publications (2)

Publication Number Publication Date
CN108228908A CN108228908A (en) 2018-06-29
CN108228908B true CN108228908B (en) 2021-11-12

Family

ID=62661325

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810132705.7A Active CN108228908B (en) 2018-02-09 2018-02-09 Data extraction method and device

Country Status (1)

Country Link
CN (1) CN108228908B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108984738A (en) * 2018-07-16 2018-12-11 中国银行股份有限公司 A kind of data shop fixtures method and device
CN110032559A (en) * 2019-04-19 2019-07-19 成都四方伟业软件股份有限公司 A kind of data pick-up method and device
KR102188132B1 (en) * 2020-05-27 2020-12-07 비코어(주) System for loading and processing data and method thereof

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101216821A (en) * 2007-01-05 2008-07-09 中兴通讯股份有限公司 Data acquisition system storage management method
CN101329676A (en) * 2007-06-20 2008-12-24 华为技术有限公司 Data paralleling abstracting method and apparatus and database system
US7769648B1 (en) * 2003-12-04 2010-08-03 Drugstore.Com Method and system for automating keyword generation, management, and determining effectiveness
US9426219B1 (en) * 2013-12-06 2016-08-23 Amazon Technologies, Inc. Efficient multi-part upload for a data warehouse
CN107040608A (en) * 2017-05-19 2017-08-11 宁波绮耘软件股份有限公司 A kind of data processing method and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6721749B1 (en) * 2000-07-06 2004-04-13 Microsoft Corporation Populating a data warehouse using a pipeline approach

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7769648B1 (en) * 2003-12-04 2010-08-03 Drugstore.Com Method and system for automating keyword generation, management, and determining effectiveness
CN101216821A (en) * 2007-01-05 2008-07-09 中兴通讯股份有限公司 Data acquisition system storage management method
CN101329676A (en) * 2007-06-20 2008-12-24 华为技术有限公司 Data paralleling abstracting method and apparatus and database system
US9426219B1 (en) * 2013-12-06 2016-08-23 Amazon Technologies, Inc. Efficient multi-part upload for a data warehouse
CN107040608A (en) * 2017-05-19 2017-08-11 宁波绮耘软件股份有限公司 A kind of data processing method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
面向复杂数据源的数据抽取模型和算法研究;邓绪斌;《中国博士学位论文全文数据库》;20050731;全文 *

Also Published As

Publication number Publication date
CN108228908A (en) 2018-06-29

Similar Documents

Publication Publication Date Title
CN108228908B (en) Data extraction method and device
EP2891994A1 (en) Method for achieving automatic synchronization of multisource heterogeneous data resources
US8191061B2 (en) Method for managing internal software of terminal through device management server
US20180081956A1 (en) Method for automatically synchronizing multi-source heterogeneous data resources
CN108280023B (en) Task execution method and device and server
CN111125444A (en) Big data task scheduling management method, device, equipment and storage medium
CN101510167A (en) Plug-in component operation method, apparatus and system
CN103200247B (en) A kind of data download method and PC download client
CN108830389A (en) A kind of method and system of information system automatic detecting
CN109634970A (en) Table method of data synchronization, equipment, storage medium and device
CN110198327B (en) Data transmission method and related equipment
CN104679500A (en) Automatic generation realizing method and device for entity classes
US20050144596A1 (en) Method and apparatus for parallel action processing
CN111274325B (en) Platform automatic test method and system
CN112395307A (en) Statement execution method, statement execution device, server and storage medium
CN112069144A (en) Method and device for collecting system logs by multi-control cluster
CN112202909A (en) Online upgrading method and system for computer storage system
CN111625300A (en) Efficient data acquisition loading method and system
CN114416546A (en) Code coverage rate determining method and device
CN108629002A (en) A kind of big data comparison method and device based on kettle
CN114357068A (en) Method for synchronizing data from kafka to database
CN111045779B (en) System memory recovery configuration method and storage medium
CN104199930A (en) System and method for acquiring and processing data
CN117009251B (en) Data analysis system, data analysis algorithm library, dynamic loading method and system
US20100257152A1 (en) Enhanced identification of relevant database indices

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant