CN111309792B - Data extraction and conversion method covering complex heterogeneous conditions - Google Patents
Data extraction and conversion method covering complex heterogeneous conditions Download PDFInfo
- Publication number
- CN111309792B CN111309792B CN201911419254.6A CN201911419254A CN111309792B CN 111309792 B CN111309792 B CN 111309792B CN 201911419254 A CN201911419254 A CN 201911419254A CN 111309792 B CN111309792 B CN 111309792B
- Authority
- CN
- China
- Prior art keywords
- data
- conversion
- field
- fields
- marking
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000006243 chemical reaction Methods 0.000 title claims abstract description 44
- 238000000034 method Methods 0.000 title claims abstract description 39
- 238000013075 data extraction Methods 0.000 title claims abstract description 11
- 230000005012 migration Effects 0.000 claims abstract description 21
- 238000013508 migration Methods 0.000 claims abstract description 21
- 230000008520 organization Effects 0.000 claims abstract description 11
- 238000004364 calculation method Methods 0.000 claims description 4
- 239000013589 supplement Substances 0.000 claims description 4
- 230000001502 supplementing effect Effects 0.000 claims description 2
- 230000008569 process Effects 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 3
- 238000000605 extraction Methods 0.000 description 2
- 238000007418 data mining Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/254—Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a data extraction and conversion method covering complex heterogeneous conditions. Firstly, the method is used for researching the difficulties and problems encountered in the heterogeneous data conversion process, and summarizing the problems to form a heterogeneous data conversion problem library. Solutions are respectively proposed for each problem, and the solutions are assembled to form a set of solutions. The invention solves various heterogeneous conditions including different data organization structures, different storage forms, different metadata, accessory migration and the like in heterogeneous data conversion, and improves the conversion efficiency between heterogeneous data.
Description
Technical Field
The invention belongs to the technical field of data processing, and particularly relates to a data extraction and conversion method covering complex heterogeneous conditions.
Background
Data extraction conversion is a data processing flow from different heterogeneous data sources to unified target data in the process of data streaming. The data extraction and conversion is the basis of data application, is widely applied to the work of big data calculation processing and data mining analysis of various industries, has more and more requirements on the data extraction and conversion algorithm with complete functions and excellent performance, and is the key point of whether the data application can be efficiently performed.
However, the existing data extraction and conversion method in the industry does not consider complex heterogeneous conditions on one hand, and does not consider complex conditions such as master-slave table splitting, file migration, file storage format conversion and the like correspondingly, so that the existing achievements cannot be found when facing the conditions, and resources are consumed to customize and develop specific conditions; on the other hand, part of tools are not converted into methods and functions, but are not fully covered, and various complex heterogeneous extraction conversion conditions required by projects are difficult to meet at one time.
Disclosure of Invention
The invention aims to provide a data extraction conversion method capable of covering multiple complex heterogeneous data conversion conditions, and the requirements of the multiple conversion methods can be met at the same time.
The technical solution for realizing the purpose of the invention is as follows: a data extraction and conversion method covering complex heterogeneous conditions comprises the following steps:
the first step: and (5) sorting heterogeneous data structures before and after conversion, and marking out structural differences and organization corresponding relations between front and rear structure detailed tables and fields.
And a second step of: and searching newly added or missing field information in the front and back data structures, and reserving, deleting or supplementing the fields according to requirements.
And a third step of: and (5) comparing the names of the synonymous fields in the front and rear heterogeneous data, and marking and corresponding the corresponding fields of the synonymous different names.
Fourth step: and checking whether the file storage condition exists before conversion, marking the file path storage condition, and selecting a migration tool for migration.
Fifth step: checking the difference of storage modes of the file formats in the front and rear heterogeneous data, marking the storage modes, and selecting corresponding conversion tool definition conversion methods.
Sixth step: and comparing the different metadata in the front heterogeneous data and the rear heterogeneous data, and marking and corresponding the fields with different metadata.
The specific implementation method of the step 1 is as follows:
step 1-1, comparing the organization relations of the tables of the front and rear heterogeneous data structures, wherein the organization relations comprise data of the tables, information types described by the tables and expression forms of master and slave tables; finding out the difference of the above aspects, and marking the corresponding relation of the front and rear table structures according to the table number and the master-slave table form according to the difference;
and step 1-2, comparing field correspondence relations of the front and rear heterogeneous data structures, and marking the same corresponding fields in software.
The specific implementation method of the step 2 is as follows:
step 2-1, comparing and analyzing the missing table information and field information in the data structure to be converted compared with the target data structure;
step 2-2, comparing and analyzing redundant table information and field information compared with a target data structure in the data structure to be converted;
step 2-3, obtaining conversion requirements, and performing calculation supplement or discarding supplement on the missing tables and fields;
and 2-4, acquiring conversion requirements, and deleting or reserving translation operations on the redundant tables and fields.
The specific implementation method of the step 3 is as follows:
step 3-1, finding out the synonymous field names which are distinguished by inconsistent field names through analysis and comparison;
step 3-2, finding out synonymous field names which are distinguished by different field name naming habits through analysis and comparison;
and 3-3, converting and associating all synonymous different name fields.
The specific implementation method of the step 4 is as follows:
step 4-1, obtaining all field information of the data to be converted, which is stored in a file by using a storage path;
step 4-2, marking the selected field to be converted;
step 4-3, selecting a migration tool or converting and migrating paths and files by using the related migration method.
The specific implementation method of the step 5 is as follows:
step 5-1, obtaining all field information which directly stores the data blocks in a BLOB form in the data to be converted;
step 5-2, marking the selected field to be converted;
and 5-3, calling a related migration method or manually writing a conversion method block to copy, transmit and convert the data block.
The specific implementation method of the step 6 is as follows:
step 6-1, collecting and forming all metadata difference field sets by analyzing metadata and comparing front and rear heterogeneous data;
step 6-2, marking the selected metadata difference field;
and step 6-3, associating the metadata difference fields by using software, and executing conversion.
Compared with the prior art, the invention has the remarkable advantages that: the method integrates various complex heterogeneous extraction and conversion conditions, and can provide a solution for the problems of form splitting, file migration, file storage format conversion and the like. 1) For table splitting, the organization relations of the tables of the front and rear heterogeneous data structures are compared, wherein the organization relations comprise data of the tables, information types described by the tables, expression forms of the master table and the slave table, field corresponding relations and the like. Analyzing the difference of the above aspects, and marking the corresponding relation of the front and rear table structures according to the table number and the master-slave table form according to the difference. 2) For file migration, the selected fields to be converted are marked by acquiring field information of file storage in the data to be converted in a mode of utilizing a storage path, and a migration tool is selected or a related migration method is utilized to convert and migrate paths and files. 3) For file storage format conversion, all field information of directly storing the data blocks in the form of BLOB and the like in the data to be converted is obtained, the selected fields to be converted are marked, related migration methods are called or manual writing of conversion method blocks is carried out, and operations such as copying, transmitting, converting and the like are carried out on the data blocks. In addition, the method has multiple schemes capable of meeting the requirements of the project group on multiple conversion methods.
Drawings
FIG. 1 is a general flow chart of an embodiment of the present invention.
FIG. 2 is a schematic diagram of an embodiment of the present invention.
FIG. 3 is a diagram illustrating an exemplary file migration algorithm according to an embodiment of the present invention.
Fig. 4 is a diagram illustrating an example of a metadata transformation algorithm according to an embodiment of the present invention.
Detailed Description
The invention is further described below with reference to the drawings and examples.
As shown in fig. 1 and 2, the implementation selects two heterogeneous data structures for illustration, and the implementation steps are as follows. a) The organization relation of the tables of the front and rear heterogeneous data structures is compared, and the organization relation comprises data of the tables, information types described by the tables, expression forms of the master table and the slave table and the like. Finding out the difference of the above aspects, and marking the corresponding relation of the front and rear table structures according to the table number and the master-slave table form according to the difference.
b) And comparing the field correspondence of the front and rear heterogeneous data structures, and marking the same corresponding fields in the software.
c) And comparing and analyzing the table information and the field information which are lack in the data structure to be converted compared with the target data structure.
d) And comparing and analyzing the redundant table information and field information in the data structure to be converted compared with the target data structure.
e) The conversion requirement is obtained, and the missing tables and fields are supplemented by calculation or are abandoned.
f) And obtaining the conversion requirement, and deleting the redundant tables and fields or reserving the translation operation.
g) And (5) finding out the synonymous field names which are distinguished by inconsistent field names through analysis and comparison.
h) And (5) finding out the synonymous field names which are distinguished by different field name naming habits through analysis and comparison.
i) Converting and associating all synonymous different name fields
j) And acquiring all field information of the data to be converted, which is stored in the file by using a storage path, by a software data manager.
k) And marking the selected field to be converted.
l) selecting migration tools or converting and migrating paths and files by using related migration method calls.
m) acquiring all field information which directly stores the data blocks in the form of BLOB and the like in the data to be converted through a software data manager.
n) marking the selected field to be converted.
o) invoking the related migration method or manually writing the conversion method block to copy, transmit and convert the data block, and converting the code example is shown in fig. 3.
p) collecting and forming all metadata difference field sets by analyzing metadata and comparing front and rear heterogeneous data.
q) marking the selected metadata difference field.
The metadata difference fields are associated with software and a conversion is performed, an example of which is shown in fig. 4.
Claims (1)
1. The data extraction and conversion method covering complex heterogeneous conditions is characterized by comprising the following steps:
step 1: sorting heterogeneous data structures before and after conversion, and marking structural differences and organization corresponding relations between front and rear structure detailed tables and fields:
step 1-1, comparing the organization relations of the tables of the front and rear heterogeneous data structures, wherein the organization relations comprise data of the tables, information types described by the tables and expression forms of master and slave tables; finding out the difference of the above aspects, and marking the corresponding relation of the front and rear table structures according to the table number and the master-slave table form according to the difference;
step 1-2, comparing field correspondence of the front and rear heterogeneous data structures, and marking the same corresponding fields in software;
step 2: searching newly added or missing field information in the front and back data structures, and reserving, deleting or supplementing the fields according to requirements;
step 2-1, comparing and analyzing the missing table information and field information in the data structure to be converted compared with the target data structure;
step 2-2, comparing and analyzing redundant table information and field information compared with a target data structure in the data structure to be converted;
step 2-3, obtaining conversion requirements, and performing calculation supplement or discarding supplement on the missing tables and fields;
step 2-4, obtaining conversion requirements, and deleting redundant tables and fields or reserving translation operations;
step 3: comparing the names of synonymous fields in the front and rear heterogeneous data, and marking and corresponding fields of synonymous different names:
step 3-1, finding out the synonymous field names which are distinguished by inconsistent field names through analysis and comparison;
step 3-2, finding out synonymous field names which are distinguished by different field name naming habits through analysis and comparison;
step 3-3, converting and associating all synonymous different name fields;
step 4: checking whether a file storage condition exists before conversion, marking the file path storage condition, and selecting a migration tool for migration; the specific implementation method is as follows:
step 4-1, obtaining all field information of the data to be converted, which is stored in a file by using a storage path;
step 4-2, marking the selected field to be converted;
step 4-3, selecting a migration tool or converting and migrating paths and files by using the related migration method;
step 5: checking the difference of storage modes of the file formats in the front and rear heterogeneous data, marking the changed storage modes, and selecting corresponding conversion tool definition conversion methods; the specific implementation method is as follows:
step 5-1, obtaining all field information which directly stores the data blocks in a BLOB form in the data to be converted;
step 5-2, marking the selected field to be converted;
step 5-3, calling a related migration method or manually writing a conversion method block to copy, transmit and convert the data block;
step 6: the metadata in the heterogeneous data before and after comparison are different, and different fields of the metadata are marked and corresponding, and the specific implementation method is as follows:
step 6-1, collecting and forming all metadata difference field sets by analyzing metadata and comparing front and rear heterogeneous data;
step 6-2, marking the selected metadata difference field;
and step 6-3, associating the metadata difference fields by using software, and executing conversion.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911419254.6A CN111309792B (en) | 2019-12-31 | 2019-12-31 | Data extraction and conversion method covering complex heterogeneous conditions |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911419254.6A CN111309792B (en) | 2019-12-31 | 2019-12-31 | Data extraction and conversion method covering complex heterogeneous conditions |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111309792A CN111309792A (en) | 2020-06-19 |
CN111309792B true CN111309792B (en) | 2023-12-08 |
Family
ID=71156381
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911419254.6A Active CN111309792B (en) | 2019-12-31 | 2019-12-31 | Data extraction and conversion method covering complex heterogeneous conditions |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111309792B (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102902750A (en) * | 2012-09-20 | 2013-01-30 | 浪潮齐鲁软件产业有限公司 | Universal data extraction and conversion method |
CN105373599A (en) * | 2015-10-28 | 2016-03-02 | 北京汇商融通信息技术有限公司 | Data migration system based on various data storage platforms |
CN110019127A (en) * | 2017-11-28 | 2019-07-16 | 清远市易通科技有限公司 | It is a kind of to fast implement the asynchronous information synchronization method of MYSQL database |
-
2019
- 2019-12-31 CN CN201911419254.6A patent/CN111309792B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102902750A (en) * | 2012-09-20 | 2013-01-30 | 浪潮齐鲁软件产业有限公司 | Universal data extraction and conversion method |
CN105373599A (en) * | 2015-10-28 | 2016-03-02 | 北京汇商融通信息技术有限公司 | Data migration system based on various data storage platforms |
WO2017071135A1 (en) * | 2015-10-28 | 2017-05-04 | 北京汇商融通信息技术有限公司 | Data migration system based on various data storage platforms |
CN110019127A (en) * | 2017-11-28 | 2019-07-16 | 清远市易通科技有限公司 | It is a kind of to fast implement the asynchronous information synchronization method of MYSQL database |
Non-Patent Citations (2)
Title |
---|
信息系统中的通用数据迁移工具的研究与设计;徐燕等;《计算机与现代化》;20100615;第156-165页 * |
科研数据的迁移和保存元数据研究;张民;《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》;20131115;正文第16-38页 * |
Also Published As
Publication number | Publication date |
---|---|
CN111309792A (en) | 2020-06-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11693833B2 (en) | Computer-implemented method for storing unlimited amount of data as a mind map in relational database systems | |
TWI479341B (en) | High throughput, reliable replication of transformed data in information systems | |
CN105205117A (en) | Data table migrating method and device | |
CN103020024B (en) | A kind of file layout change-over method | |
CN109460354B (en) | Method for test case reduction based on RDF reasoning | |
US20140289280A1 (en) | System and Method for Bi-directional Conversion of Directed Acyclic Graphs and Inter-File Branching | |
CN102663076B (en) | Method for processing file data | |
CN105138603A (en) | Oracle database migration tool for carrying out migration from HP-UX platform to K-UX platform | |
CN114036119A (en) | Data synchronization method based on button and database log | |
CN112000649B (en) | Method and device for synchronizing incremental data based on map reduce | |
CN104731911A (en) | Dynamic mapping and conversion method of data table and entity class | |
CN100589101C (en) | Data access method based on the Oracle relational database of routine call interface | |
CN111309792B (en) | Data extraction and conversion method covering complex heterogeneous conditions | |
Jun et al. | The research & application of ETL tool in business intelligence project | |
CN103677811A (en) | Design system and method of spacecraft development process | |
CN107045538B (en) | A kind of web terminal exchange management method based on kettle | |
CN101968747B (en) | Cluster application management system and application management method thereof | |
CN116185988A (en) | Data modular transfer method based on Spark technology, server and storage medium | |
CN104731597A (en) | Method for applying SQL statements to engineering software compiling platform | |
CN111177234A (en) | Device and method for rapidly processing document type data file | |
CN107844639B (en) | Project standard structure automatic generation method and system | |
CN108259354B (en) | Multi-stage flow table design method based on logic relation between matching fields | |
CN109857380A (en) | A kind of workflow file compiling method and device | |
CN111737268B (en) | Data processing method based on document database | |
CN110263028B (en) | Full-scale synchronization method applied to search service |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |