CN108228908A - A kind of data pick-up method and device - Google Patents

A kind of data pick-up method and device Download PDF

Info

Publication number
CN108228908A
CN108228908A CN201810132705.7A CN201810132705A CN108228908A CN 108228908 A CN108228908 A CN 108228908A CN 201810132705 A CN201810132705 A CN 201810132705A CN 108228908 A CN108228908 A CN 108228908A
Authority
CN
China
Prior art keywords
data
pick
data pick
mode
file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810132705.7A
Other languages
Chinese (zh)
Other versions
CN108228908B (en
Inventor
林明
欧阳小兵
戴丽玛
于鸿鹏
陈宏亮
张丹
张素钊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bank of China Ltd
Original Assignee
Bank of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bank of China Ltd filed Critical Bank of China Ltd
Priority to CN201810132705.7A priority Critical patent/CN108228908B/en
Publication of CN108228908A publication Critical patent/CN108228908A/en
Application granted granted Critical
Publication of CN108228908B publication Critical patent/CN108228908B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of data pick-up method and device, this method includes:The data pick-up task of the source system of acquisition is parsed, and data pick-up list corresponding with the data pick-up task is generated according to data partition granularity;According to the data capacity in the data pick-up list, the data pick-up mode to the source system is determined;When carrying out data pick-up according to the first preset data extraction mode, the data of each data partition are extracted, and generate the first data file, first data file is preserved to goal systems;When carrying out data pick-up according to the second preset data extraction mode, data pick-up is carried out, and the data file being drawn into is preserved to the goal systems to the source system.Tables of data subregion is realized by the present invention to extract, and improves data pick-up efficiency and reduces the purpose of data pick-up mistake.

Description

A kind of data pick-up method and device
Technical field
The present invention relates to technical field of data processing, more particularly to a kind of data pick-up method and device.
Background technology
With the development of Internet technology, need to carry out data transmission between more and more systems and apply, this is just needed The data of certain systems are extracted and are imported or exported to corresponding purpose system.
Existing data pick-up scheme usually through the following steps that realize:Source system data table derived from identification needs Range;Export import statement is write, export DMP files pass to goal systems or import DMP files.Available data extraction side The entire process flow of case is required for operating personnel to be controlled and performed, and can cause data pick-up efficiency due to manual intervention in this way It is relatively low.Also, it easily malfunctions for the export importing process of mass data table, once derived DPM files are problematic, for one Secondary property export hundreds of even thousands of tables, then all tables all will be unable to successfully export and import so that it is less efficient simultaneously And accuracy is relatively low;Since the structure of every tables of data is different, pumping can not be realized using data pick-up sentence in currently existing scheme Take a part of data for meeting demand in all tables, it may appear that the influence of data pick-up mistake.
Invention content
The above problem is directed to, the present invention provides a kind of data pick-up method and device, realizes the extraction of tables of data subregion, It improves data pick-up efficiency and reduces the purpose of data pick-up mistake.
To achieve these goals, the present invention provides following technical solutions:
A kind of data pick-up method, including:
The data pick-up task of the source system of acquisition is parsed, and according to the generation of data partition granularity and the data The corresponding data pick-up list of extraction task;
According to the data capacity in the data pick-up list, the data pick-up mode to the source system is determined;
When carrying out data pick-up according to the first preset data extraction mode, the data of each data partition are carried out It extracts, and generates the first data file, first data file is preserved to goal systems;
When carrying out data pick-up according to the second preset data extraction mode, data pick-up is carried out to the source system, and The data file being drawn into is preserved to the goal systems.
Preferably, which is characterized in that the data pick-up task of the source system of described pair of acquisition parses, and according to data Subregion granularity generates data pick-up list corresponding with the data pick-up task, including:
The data pick-up task of acquisition is parsed, obtains configuration information corresponding with the data pick-up task;
Subregion is carried out to the data pick-up task according to the configuration information, by the data pick-up task of each subregion Generation data pick-up subtask corresponding with the subregion;
Each data pick-up subtask is generated into data pick-up list.
Preferably, data file is being preserved to before the goal systems, further included:
The goal systems is judged with the presence or absence of subregion corresponding with the data file, if being not present, in the mesh Subregion corresponding with the data file is added in mark system;
If in the presence of target partition corresponding with the data file in the goal systems is found, and by the mesh Data in mark subregion are emptied.
Preferably, the data capacity in the data pick-up list determines to take out the data of the source system Mode is taken, including:
Judge whether the data capacity in the data pick-up list is more than preset data amount threshold value, if it is, to institute It states source system and data pick-up is carried out using the first preset data extraction mode, conversely, then extracting mode using the second preset data Carry out data pick-up;
The first data pick-up mode extracts mode for DMP files, and the second data pick-up mode is DBLIK data Extraction mode.
Preferably, it when preserving first data file to goal systems, further includes:
First data file is transmitted to by preset degree of parallelism by preset data transfer mode described Goal systems;
Judge whether the goal systems is successful and first data file obtained according to the data pick-up list parallel, If it is, continue to upload first data file;If it is not, then judge that first data file whether there is mistake.
A kind of data pick-up device, including:
Generation module, the data pick-up task for the source system to acquisition parse, and according to data partition granularity Generation data pick-up list corresponding with the data pick-up task;
Determining module for the data capacity in the data pick-up list, determines the data to the source system Extraction mode;
First abstraction module, will be each described when carrying out data pick-up for extracting mode according to the first preset data The data of data partition are extracted, and generate the first data file, and first data file is preserved to goal systems;
Second abstraction module, when carrying out data pick-up for extracting mode according to the second preset data, to the source system System carries out data pick-up, and the data file being drawn into is preserved to the goal systems.
Preferably, the generation module includes:
Resolution unit parses for the data pick-up task to acquisition, obtains corresponding with the data pick-up task Configuration information;
Zoning unit, for carrying out subregion to the data pick-up task according to the configuration information, by each subregion Data pick-up task generate corresponding with subregion data pick-up subtask;
Generation unit, for each data pick-up subtask to be generated data pick-up list.
Preferably, data file is being preserved to before the goal systems, further included:
Judgment module, for judging the goal systems with the presence or absence of subregion corresponding with the data file, if not depositing Subregion corresponding with the data file is then being added in the goal systems;
If in the presence of target partition corresponding with the data file in the goal systems is found, and by the mesh Data in mark subregion are emptied.
Preferably, the determining module includes:
Capacity judging unit, for judging whether the data capacity in the data pick-up list is more than preset data amount threshold Value, if it is, extracting mode using the first preset data to the source system carries out data pick-up, conversely, then using second Preset data extracts mode and carries out data pick-up;
The first data pick-up mode extracts mode for DMP files, and the second data pick-up mode is DBLIK data Extraction mode.
Preferably, it when preserving first data file to goal systems, further includes:
Transmission unit, for by preset data transfer mode by first data file by preset degree of parallelism simultaneously Row is transmitted to the goal systems;
Judging unit is obtained, is obtained parallel according to the data pick-up list for judging whether the goal systems is successful First data file, if it is, continuing to upload first data file;If it is not, then judge first data File whether there is mistake.
Compared to the prior art, data pick-up method and device provided by the invention passes through the number of the source system to acquisition It is parsed according to extraction task, data pick-up task is divided into a rule subtask according to subregion granularity ultimately generates task extraction List, such Paralleled extracts data when namely being imported or exported to data, can solve the prior art The problem of data pick-up efficiency is low caused by the middle data that whole is to be extracted are extracted.Two kinds of data pick-ups are set simultaneously Mode, while can so that extraction efficiency improves using corresponding data pick-up mode according to data capacity, without manually doing In advance, it can also realize that source system is synchronous with the data of goal systems, decrease the erroneous effects of data pick-up.
Description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, to embodiment or will show below There is attached drawing needed in technology description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this The embodiment of invention, for those of ordinary skill in the art, without creative efforts, can also basis The attached drawing of offer obtains other attached drawings.
Fig. 1 is a kind of flow diagram of data pick-up method provided in an embodiment of the present invention;
Fig. 2 is the flow diagram of another data pick-up method provided in an embodiment of the present invention;
Fig. 3 is a kind of structure diagram of data pick-up device provided in an embodiment of the present invention.
Specific embodiment
Below in conjunction with the attached drawing in the embodiment of the present invention, the technical solution in the embodiment of the present invention is carried out clear, complete Site preparation describes, it is clear that described embodiment is only part of the embodiment of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, those of ordinary skill in the art are obtained every other without making creative work Embodiment shall fall within the protection scope of the present invention.
Term " first " and " second " in description and claims of this specification and above-mentioned attached drawing etc. are for area Not different object rather than for describing specific sequence.In addition term " comprising " and " having " and their any deformations, It is intended to cover non-exclusive include.Such as it contains the process of series of steps or unit, method, system, product or sets It is standby not to be set in the step of having listed or unit, but the step of may include not listing or unit.
An embodiment of the present invention provides a kind of data pick-up methods, and referring to Fig. 1, this method may comprise steps of:
S11, the data pick-up task of the source system of acquisition is parsed, and according to data partition granularity generation with it is described The corresponding data pick-up list of data pick-up task;
A kind of data pick-up list generation method is additionally provided in another embodiment of the invention, can include following step Suddenly:
The data pick-up task of acquisition is parsed, obtains configuration information corresponding with the data pick-up task;
Subregion is carried out to the data pick-up task according to the configuration information, by the data pick-up task of each subregion Generation data pick-up subtask corresponding with the subregion;
Each data pick-up subtask is generated into data pick-up list.
The data pick-up task of front-end configuration is parsed, wherein, data pick-up task for front-end task personnel according to It needs, selects table range, date range, territorial scope etc., generate a data pick-up task, in embodiments of the present invention data Extraction can be specially that data export or data import.
Then, from the background can by pre-set code, such as p_gen_task_file parse the data pick-up task with confidence Breath in a manner that a child partition generation a data goes out subtask, ultimately generates data pick-up list, i.e., the data are taken out List is taken to include multiple data pick-up subtasks.
It should be noted that p_gen_task_file is the importing task for parsing front-end configuration, generation subtask (text Part list) program;Program derived from reality is exp_process.sh, this shell can read subtask (listed files), According to the degree of parallelism of setting, export dmp is played in each subtask using expdp tune, for example degree of parallelism is set as 10, then same Just there are ten expdp sentences at moment in running background, 10 dmp files of generation;The program of importing is imp_process.sh, mistake Journey is to read subtask (listed files), and according to the degree of parallelism of setting, each subtask is risen using impdp tune and imported.
S12, the data capacity in the data pick-up list determine the data pick-up mode to the source system;
The embodiment of the present invention additionally provides a kind of data pick-up mode and determines method, can include:
Judge whether the data capacity in the data pick-up list is more than preset data amount threshold value, if it is, to institute It states source system and data pick-up is carried out using the first preset data extraction mode, conversely, then extracting mode using the second preset data Carry out data pick-up;
The first data pick-up mode extracts mode for DMP files, and the second data pick-up mode is DBLIK data Extraction mode.
Corresponding data pick-up mode, the small use of data volume are selected for the data capacity size of data pick-up list DBLIK data pick-up modes, the big use DMP file modes of data volume, the size of data volume is judged with predetermined threshold value , for example, data volume is generally considered that data volume is small less than 100M, the small table configuration of data volume can be supplied journey in a table Sequence uses.
S13, according to the first preset data extract mode carry out data pick-up when, by the data of each data partition It is extracted, and generates the first data file, first data file is preserved to goal systems;
S14, according to the second preset data extract mode carry out data pick-up when, to the source system carry out data pumping It takes, and the data file being drawn into is preserved to the goal systems.
For example, when being extracted to data, export list (EXP_TASK_FILES) designs as follows, major key For DMP_TASK_ID and DMP_FILENAME, which created in source system, provides export information, text derived from record Part state, every corresponding subregion of record or a table (zoneless table), EXP_TASK_FILES is goal systems journey Sequence is inserted by DBLINK, then in the case of goal systems and source system can not use DBLINK, needs goal systems With source system treaty rule, source system voluntarily generates DMP files by treaty rule.
Data pick-up method provided by the invention is parsed by the data pick-up task of the source system to acquisition, will Data pick-up task is divided into a rule subtask according to subregion granularity and ultimately generates task extraction list, such Paralleled logarithm According to extracted namely data are imported or are exported when, can solve in the prior art by all data to be extracted into The problem of data pick-up efficiency is low caused by row extracts.Two kinds of data pick-up modes are set simultaneously, it can be according to data capacity While so that extraction efficiency improves using corresponding data pick-up mode, without manual intervention, can also realize source system and The data of goal systems synchronize, and decrease the erroneous effects of data pick-up.
A kind of subregion addition and method for cleaning are additionally provided in embodiments of the present invention, can be included:
The goal systems is judged with the presence or absence of subregion corresponding with the data file, if being not present, in the mesh Subregion corresponding with the data file is added in mark system;
If in the presence of target partition corresponding with the data file in the goal systems is found, and by the mesh Data in mark subregion are emptied.
After data file is generated, subregion corresponding with the data file is possible in goal systems and is not present, that It just needs to judge that the subregion whether there is, if it does not, just needing to be added in goal systems;If it does, it needs The subregion is cleared up, the original data of the subregion are all deleted.For zoneless table, before data file importing Whole table data can be emptied.
Not table master data amount all very littles of subregion, this kind of table can carry out data pick-up using dblink modes substantially, Dmp file modes can also be used, only generate a dmp;In addition this kind of table without child partition, it is just deferred to use most fine granularity It generates subtask (listed files), such as not partition table, then subtask (listed files) just only one, i.e. table are in itself;It is right In the table of only single subregion (not being compound subregion, i.e., no child partition), then subtask (listed files) item number is exactly single One number of partitions.For the table of compound subregion, then subtask (listed files) item number is exactly child partition number.
A kind of data file transmission method is additionally provided in embodiments of the present invention, can be included:
First data file is transmitted to by preset degree of parallelism by preset data transfer mode described Goal systems;
Judge whether the goal systems is successful and first data file obtained according to the data pick-up list parallel, If it is, continue to upload first data file;If it is not, then judge that first data file whether there is mistake.
According to transmission mode, for FTP, (File Transfer Protocol, file pass preset fraction in embodiments of the present invention Defeated agreement), FTP is used for transmit dmp files to goal systems.Specifically, according to setting degree of parallelism and be about to generation DMP File FTP is to goal systems;If FTP successes, then it is ready that can put file status in imp_task_files, for importing Function uses:
It is that source system performs update file by DBKLINK in the case of goal systems and source system can be with DBLINK State is " ready ";
For goal systems and source system can not DBLINK in the case of, then goal systems is by judging that goal systems receives Under catalogue source system involved in DMP_TASK_ID is determined with the presence or absence of the empty file of the entitled DMP_TASK_ID+ sources system name of file Whether the table of system has all received the text that institute's active system under all tasks is successfully put if there is so goal systems Part state is " ready ".
Data import feature in data extraction process has been divided into DMP file modes and DBLINK modes:
DMP file modes:
Goal systems is " ready " for importing file status in listed files, and corresponding subregion is processed, according to The degree of parallelism parameter of configuration performs importing;
When data volume is bigger than normal, DMP file modes are selected, each child partition can export a DMP file, not divide The table in area exports whole table data generation DMP file, and source system is passed to DMP files by this preset transformats of FTP Goal systems, DMP file modes are to support breakpoint continues to lead the advantages of importing and exporting, report an error if export imports, journey Sequence can automatic identification carry out export again and import, and lead can be so that avoid batch peak period since breakpoint being supported to continue, this Sample can so that file is more efficient during importing and exporting.
DBLINK modes:
In the case of goal systems can be connected with source system using DBLINK, the data pick-up of DBLINK modes is provided Mode, particular configuration data are:Cfg_value is specific table name, and part_col is main partition field, and subpart_col is son These information are spliced into select sentences, according to setting by subregion field according to the importing listed files of the configuration information and generation The degree of parallelism put is performed different select sentences by dbms_scheduler.create_job and realizes data from source system parallel It unites to the extraction of goal systems.
The embodiment of the present invention additionally provides another data pick-up mode, referring to Fig. 2, mainly includes:
S21, data import configuration step;
S22, Command Line Parsing step is imported;
S23, data deriving step;
S24, file transmitting step;
S25, data steps for importing;
S26, front end show step.
It is imported in configuration step in S21 data, the tables of data range of front-end interface offer, date range, affiliated province is provided Range configuration data imports task;
It is imported in Command Line Parsing step in S22, according to the partition information of backstage processing, parsing data import configuration, production Corresponding export imports listed files, each subregion or the corresponding record of every table (zoneless table);
In S23 data deriving steps, according to export list and export is performed, every record one DMP text of generation Part;
In S24 file transmitting steps, by the DMP files FTP of generation to goal systems, and file status is updated;
In S25 data steps for importing, according to listed files is imported, perform the importing of DMP files or directly DBLINK is arrived Source system performs data and imports;
In showing step in S26 front ends, front end can show the execution state of the importing task of configuration, and provide every table The partition list having been introduced into.
Traditional data pick-up whole flow process is required for manual intervention.Need from determining data area, write perform sentence, It performs export, generation DMP files, perform importing, a whole set of flow is required for operating personnel to perform step by step, takes time and effort, effect Rate is low, and the embodiment of the present invention is only needed in the good data area of front-end configuration, and subsequent operation whole programming automation is without people Work intervention.
It easily malfunctions for the export importing process of mass data table, once derived DMP files are problematic, for primary Property derivative hundred open setting thousands of tables, then all tables all will be unable to successfully export importing, and integrally re-execute export Import, and by the present invention come realize export and import when, derived file most fine granularity to child partition rank, each child partition A DMP file is generated, DMP files are passed to goal systems, at goal systems end, have program repeating query to guard monitoring by program one by one Whether there is new file, just fallen importing automatically if having;Export imports degree of parallelism by state modulator, and a file imports successfully, Extended meeting has transferred new importing process automatically afterwards, controls in the degree of parallelism of parameter setting, avoids system being caused to be born parallel because of height Lotus is much to lead to the machine of delaying, and importing is exported between file and file and is independent of each other.It is not interfered with so if individual partition is problematic Export to other subregions imports.
The embodiment of the present invention can be led by exporting the presence or absence of importing process to determine whether having completed file Enter, the automatic journal file for reading generation, inquiry error keyword determines whether successfully to export importing.And due to most particulate Degree can import, so as to fulfill number is extracted in child partition rank so data pick-up demand is split into numerous child partition export According to the partial data of table, without whole table, all export imports.And due to the difference of daily table structure, in data pick-up journey During the realization of sequence, mode that program synchronizes between source system and goal systems automatically so that data extraction process It is more efficient succinct without manual intervention mode.
It is corresponding with data pick-up method provided in an embodiment of the present invention, a kind of number is additionally provided in the embodiment of the present invention According to draw-out device, referring to Fig. 3, including:
Generation module 1, the data pick-up task for the source system to acquisition parse, and according to data partition granularity Generation data pick-up list corresponding with the data pick-up task;
Determining module 2 for the data capacity in the data pick-up list, determines the data to the source system Extraction mode;
First abstraction module 3, will be each described when carrying out data pick-up for extracting mode according to the first preset data The data of data partition are extracted, and generate the first data file, and first data file is preserved to goal systems;
Second abstraction module 4, when carrying out data pick-up for extracting mode according to the second preset data, to the source system System carries out data pick-up, and the data file being drawn into is preserved to the goal systems.
Optionally, in an alternative embodiment of the invention, the generation module includes:
Resolution unit parses for the data pick-up task to acquisition, obtains corresponding with the data pick-up task Configuration information;
Zoning unit, for carrying out subregion to the data pick-up task according to the configuration information, by each subregion Data pick-up task generate corresponding with subregion data pick-up subtask;
Generation unit, for each data pick-up subtask to be generated data pick-up list.
Optionally, in an alternative embodiment of the invention, data file is being preserved to before the goal systems, also wrapped It includes:
Judgment module, for judging the goal systems with the presence or absence of subregion corresponding with the data file, if not depositing Subregion corresponding with the data file is then being added in the goal systems;
If in the presence of target partition corresponding with the data file in the goal systems is found, and by the mesh Data in mark subregion are emptied.
Optionally, in an alternative embodiment of the invention, the determining module includes:
Capacity judging unit, for judging whether the data capacity in the data pick-up list is more than preset data amount threshold Value, if it is, extracting mode using the first preset data to the source system carries out data pick-up, conversely, then using second Preset data extracts mode and carries out data pick-up;
The first data pick-up mode extracts mode for DMP files, and the second data pick-up mode is DBLIK data Extraction mode.
Optionally, in an alternative embodiment of the invention, it when preserving first data file to goal systems, also wraps It includes:
Transmission unit, for by preset data transfer mode by first data file by preset degree of parallelism simultaneously Row is transmitted to the goal systems;
Judging unit is obtained, is obtained parallel according to the data pick-up list for judging whether the goal systems is successful First data file, if it is, continuing to upload first data file;If it is not, then judge first data File whether there is mistake.
Data pick-up device provided by the invention is parsed by the data pick-up task of the source system to acquisition, will Data pick-up task is divided into a rule subtask according to subregion granularity and ultimately generates task extraction list, such Paralleled logarithm According to extracted namely data are imported or are exported when, can solve in the prior art by all data to be extracted into The problem of data pick-up efficiency is low caused by row extracts.Two kinds of data pick-up modes are set simultaneously, it can be according to data capacity While so that extraction efficiency improves using corresponding data pick-up mode, without manual intervention, can also realize source system and The data of goal systems synchronize, and decrease the erroneous effects of data pick-up.
Each embodiment is described by the way of progressive in this specification, the highlights of each of the examples are with other The difference of embodiment, just to refer each other for identical similar portion between each embodiment.For device disclosed in embodiment For, since it is corresponded to the methods disclosed in the examples, so description is fairly simple, related part is said referring to method part It is bright.
The foregoing description of the disclosed embodiments enables professional and technical personnel in the field to realize or use the present invention. A variety of modifications of these embodiments will be apparent for those skilled in the art, it is as defined herein General Principle can be realized in other embodiments without departing from the spirit or scope of the present invention.Therefore, it is of the invention The embodiments shown herein is not intended to be limited to, and is to fit to and the principles and novel features disclosed herein phase one The most wide range caused.

Claims (10)

  1. A kind of 1. data pick-up method, which is characterized in that including:
    The data pick-up task of the source system of acquisition is parsed, and according to the generation of data partition granularity and the data pick-up The corresponding data pick-up list of task;
    According to the data capacity in the data pick-up list, the data pick-up mode to the source system is determined;
    When carrying out data pick-up according to the first preset data extraction mode, the data of each data partition are taken out It takes, and generates the first data file, first data file is preserved to goal systems;
    When carrying out data pick-up according to the second preset data extraction mode, data pick-up is carried out to the source system, and will take out The data file got is preserved to the goal systems.
  2. 2. according to the method described in claim 1, it is characterized in that, the data pick-up task of the source system of described pair of acquisition carries out Parsing, and data pick-up list corresponding with the data pick-up task is generated according to data partition granularity, including:
    The data pick-up task of acquisition is parsed, obtains configuration information corresponding with the data pick-up task;
    Subregion is carried out to the data pick-up task according to the configuration information, the data pick-up task of each subregion is generated Data pick-up subtask corresponding with the subregion;
    Each data pick-up subtask is generated into data pick-up list.
  3. 3. according to the method described in claim 1, it is characterized in that, preserving data file to before the goal systems, It further includes:
    The goal systems is judged with the presence or absence of subregion corresponding with the data file, if being not present, in the target system Subregion corresponding with the data file is added in system;
    If in the presence of target partition corresponding with the data file in the goal systems is found, and the target is divided Data in area are emptied.
  4. 4. according to the method described in claim 1, it is characterized in that, the data in the data pick-up list are held Amount determines the data pick-up mode to the source system, including:
    Judge whether the data capacity in the data pick-up list is more than preset data amount threshold value, if it is, to the source System extracts mode using the first preset data and carries out data pick-up, is carried out conversely, then extracting mode using the second preset data Data pick-up;
    The first data pick-up mode extracts mode for DMP files, and the second data pick-up mode is DBLIK data pick-ups Mode.
  5. 5. according to the method described in claim 1, it is characterized in that, when preserving first data file to goal systems When, it further includes:
    First data file is transmitted to by the target by preset degree of parallelism by preset data transfer mode System;
    Judge whether the goal systems is successful and first data file is obtained according to the data pick-up list parallel, if It is then to continue to upload first data file;If it is not, then judge that first data file whether there is mistake.
  6. 6. a kind of data pick-up device, which is characterized in that including:
    Generation module, the data pick-up task for the source system to acquisition parse, and are generated according to data partition granularity Data pick-up list corresponding with the data pick-up task;
    Determining module for the data capacity in the data pick-up list, determines the data pick-up to the source system Mode;
    First abstraction module, will each data when carrying out data pick-up for extracting mode according to the first preset data The data of subregion are extracted, and generate the first data file, and first data file is preserved to goal systems;
    Second abstraction module, for according to the second preset data extract mode carry out data pick-up when, to the source system into Row data pick-up, and the data file being drawn into is preserved to the goal systems.
  7. 7. device according to claim 6, which is characterized in that the generation module includes:
    Resolution unit parses for the data pick-up task to acquisition, obtains match corresponding with the data pick-up task Confidence ceases;
    Zoning unit, for carrying out subregion to the data pick-up task according to the configuration information, by the number of each subregion Data pick-up subtask corresponding with the subregion is generated according to the task of extraction;
    Generation unit, for each data pick-up subtask to be generated data pick-up list.
  8. 8. data file is being preserved to before the goal systems, further included by device according to claim 6:
    Judgment module, for judging that the goal systems whether there is subregion corresponding with the data file, if being not present, Subregion corresponding with the data file is added in the goal systems;
    If in the presence of target partition corresponding with the data file in the goal systems is found, and the target is divided Data in area are emptied.
  9. 9. device according to claim 6, which is characterized in that the determining module includes:
    Capacity judging unit, for judging whether the data capacity in the data pick-up list is more than preset data amount threshold value, If it is, extracting mode using the first preset data to the source system carries out data pick-up, conversely, then default using second Data pick-up mode carries out data pick-up;
    The first data pick-up mode extracts mode for DMP files, and the second data pick-up mode is DBLIK data pick-ups Mode.
  10. 10. device according to claim 6, which is characterized in that preserved when by first data file to goal systems When, it further includes:
    Transmission unit, for being passed first data file parallel by preset degree of parallelism by preset data transfer mode Transport to the goal systems;
    Judging unit is obtained, for judging whether the goal systems is successful according to data pick-up list acquisition parallel First data file, if it is, continuing to upload first data file;If it is not, then judge first data file With the presence or absence of mistake.
CN201810132705.7A 2018-02-09 2018-02-09 Data extraction method and device Active CN108228908B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810132705.7A CN108228908B (en) 2018-02-09 2018-02-09 Data extraction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810132705.7A CN108228908B (en) 2018-02-09 2018-02-09 Data extraction method and device

Publications (2)

Publication Number Publication Date
CN108228908A true CN108228908A (en) 2018-06-29
CN108228908B CN108228908B (en) 2021-11-12

Family

ID=62661325

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810132705.7A Active CN108228908B (en) 2018-02-09 2018-02-09 Data extraction method and device

Country Status (1)

Country Link
CN (1) CN108228908B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108984738A (en) * 2018-07-16 2018-12-11 中国银行股份有限公司 A kind of data shop fixtures method and device
CN110032559A (en) * 2019-04-19 2019-07-19 成都四方伟业软件股份有限公司 A kind of data pick-up method and device
EP4160432A4 (en) * 2020-05-27 2024-06-12 Bcore Data loading and processing system, and method therefor

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060173926A1 (en) * 2000-07-06 2006-08-03 Microsoft Corporation Data transformation to maintain detailed user information in a data warehouse
CN101216821A (en) * 2007-01-05 2008-07-09 中兴通讯股份有限公司 Data acquisition system storage management method
CN101329676A (en) * 2007-06-20 2008-12-24 华为技术有限公司 Data paralleling abstracting method and apparatus and database system
US7769648B1 (en) * 2003-12-04 2010-08-03 Drugstore.Com Method and system for automating keyword generation, management, and determining effectiveness
US9426219B1 (en) * 2013-12-06 2016-08-23 Amazon Technologies, Inc. Efficient multi-part upload for a data warehouse
CN107040608A (en) * 2017-05-19 2017-08-11 宁波绮耘软件股份有限公司 A kind of data processing method and system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060173926A1 (en) * 2000-07-06 2006-08-03 Microsoft Corporation Data transformation to maintain detailed user information in a data warehouse
US7769648B1 (en) * 2003-12-04 2010-08-03 Drugstore.Com Method and system for automating keyword generation, management, and determining effectiveness
CN101216821A (en) * 2007-01-05 2008-07-09 中兴通讯股份有限公司 Data acquisition system storage management method
CN101329676A (en) * 2007-06-20 2008-12-24 华为技术有限公司 Data paralleling abstracting method and apparatus and database system
US9426219B1 (en) * 2013-12-06 2016-08-23 Amazon Technologies, Inc. Efficient multi-part upload for a data warehouse
CN107040608A (en) * 2017-05-19 2017-08-11 宁波绮耘软件股份有限公司 A kind of data processing method and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
邓绪斌: "面向复杂数据源的数据抽取模型和算法研究", 《中国博士学位论文全文数据库》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108984738A (en) * 2018-07-16 2018-12-11 中国银行股份有限公司 A kind of data shop fixtures method and device
CN110032559A (en) * 2019-04-19 2019-07-19 成都四方伟业软件股份有限公司 A kind of data pick-up method and device
EP4160432A4 (en) * 2020-05-27 2024-06-12 Bcore Data loading and processing system, and method therefor

Also Published As

Publication number Publication date
CN108228908B (en) 2021-11-12

Similar Documents

Publication Publication Date Title
CN104881494B (en) The methods, devices and systems synchronous with Redis server progress data
CN104809202B (en) A kind of method and apparatus of database synchronization
US7827299B2 (en) Transitioning between historical and real time data streams in the processing of data change messages
CN104809201B (en) A kind of method and apparatus of database synchronization
CN104317843B (en) A kind of data syn-chronization ETL system
CN108228908A (en) A kind of data pick-up method and device
CN104809200B (en) A kind of method and apparatus of database synchronization
CN102890682B (en) Build the method, search method, apparatus and system of index
CN105447156A (en) Resource description framework distributed engine and incremental updating method
CN109542593B (en) NIFI-based data processing flow design method
CN107992367A (en) A kind of Modbus serial datas processing method
CN109670081A (en) The method and device of service request processing
EP3673369B1 (en) Method of executing a tuple graph program across a network
EP3616057B1 (en) Method for intra-subgraph optimization in tuple graph programs
CN102096626A (en) Mobile terminal and processing method of test log thereof
CN113420026B (en) Database table structure changing method, device, equipment and storage medium
CN102073527A (en) Method and device for updating input method word stock
CN107247811A (en) SQL statement performance optimization method and device based on oracle database
CN109062592A (en) A kind of method and system that game numerical value is synchronous
CN105138679A (en) Data processing system and method based on distributed caching
CN104657164B (en) Software upgrading treating method and apparatus
EP3789882A1 (en) Automatic configuration of logging infrastructure for software deployments using source code
CN101286886B (en) Method and device to recover configuring information of network appliance
CN106528300A (en) Console game synchronizing method, device and terminal
US20070061092A1 (en) Generational global name table

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant