CN111309792A - Data extraction and conversion method for covering complex heterogeneous situation - Google Patents
Data extraction and conversion method for covering complex heterogeneous situation Download PDFInfo
- Publication number
- CN111309792A CN111309792A CN201911419254.6A CN201911419254A CN111309792A CN 111309792 A CN111309792 A CN 111309792A CN 201911419254 A CN201911419254 A CN 201911419254A CN 111309792 A CN111309792 A CN 111309792A
- Authority
- CN
- China
- Prior art keywords
- data
- conversion
- fields
- heterogeneous
- marking
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000006243 chemical reaction Methods 0.000 title claims abstract description 48
- 238000000034 method Methods 0.000 title claims abstract description 45
- 238000013075 data extraction Methods 0.000 title claims abstract description 11
- 230000005012 migration Effects 0.000 claims abstract description 21
- 238000013508 migration Methods 0.000 claims abstract description 21
- 230000008520 organization Effects 0.000 claims abstract description 8
- 238000004458 analytical method Methods 0.000 claims description 6
- 239000013589 supplement Substances 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 claims description 3
- 238000012217 deletion Methods 0.000 claims description 3
- 230000037430 deletion Effects 0.000 claims description 3
- 238000013519 translation Methods 0.000 claims description 3
- 230000001502 supplementing effect Effects 0.000 claims description 2
- 230000008569 process Effects 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000007418 data mining Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/254—Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a data extraction and conversion method for covering complex heterogeneous conditions. The method comprises the steps of firstly, researching difficult points and problems encountered in the heterogeneous data conversion process, summarizing and summarizing the problems, and forming a heterogeneous data conversion problem library. And respectively proposing a solution for each problem, and gathering the solutions to form a set of solution. The invention solves various heterogeneous conditions including the conditions of different data organization structures, different storage forms, different metadata, attachment migration and the like in the heterogeneous data conversion, and improves the conversion efficiency between heterogeneous data.
Description
Technical Field
The invention belongs to the technical field of data processing, and particularly relates to a data extraction and conversion method covering complex heterogeneous conditions.
Background
The data extraction and conversion is a data processing flow from different heterogeneous data sources to uniform target data in the data flow process. The data extraction and conversion is the basis of data application, is widely applied to the work of big data calculation processing and data mining analysis in various industries, and the key point of efficient data application is that the requirement of a data extraction and conversion algorithm with complete functions and excellent performance is more and more large.
However, on the one hand, the existing data extraction and conversion method in the industry does not consider complex heterogeneous conditions, and does not correspondingly consider the relatively complex conditions of master-slave table form splitting, file migration, file storage format conversion and the like, so that the existing results cannot be found in the face of the conditions, and resources are required to be consumed to perform customized development on specific conditions; on the other hand, part of tools are not converted in method and function, but the coverage is incomplete, and the situation of extracting and converting various complicated isomerisms which are needed by projects at one time is difficult to meet.
Disclosure of Invention
The invention aims to provide a data extraction and conversion method capable of covering conversion conditions of various complex heterogeneous data, and the method can simultaneously meet the requirements of various conversion methods.
The technical solution for realizing the purpose of the invention is as follows: a data extraction and conversion method covering complex heterogeneous conditions comprises the following steps:
the first step is as follows: and (5) arranging heterogeneous data structures before and after conversion, and marking structure differences and organization corresponding relations between detailed tables of structures before and after conversion and fields.
The second step is that: and newly added or lacked field information in the data structures before and after searching, and reserving, deleting or supplementing the fields according to requirements.
The third step: and comparing the names of the synonymous fields in the heterogeneous data before and after, and marking and corresponding fields with different synonymous names.
The fourth step: and checking whether the file storage condition exists before conversion, marking the file path storage condition and selecting a migration tool for migration.
The fifth step: and checking the difference of the storage modes of the file formats in the front heterogeneous data and the back heterogeneous data, marking the storage modes which are changed, and selecting a corresponding conversion tool to define a conversion method.
And a sixth step: and comparing the difference of the metadata in the heterogeneous data before and after comparison, and marking and corresponding the fields with different metadata.
The specific implementation method of the step 1 is as follows:
step 1-1, comparing organization relations of tables of the front and back heterogeneous data structures, wherein the organization relations comprise data of the tables, information types described by the tables and expression forms of a master table and a slave table; finding out the similarities and differences in the aspects, and marking the corresponding relation of the front and rear table structures according to the number of tables and the form of a master table and a slave table according to the similarities and differences;
and 1-2, comparing the corresponding relations of the fields of the front and back heterogeneous data structures, and marking the same corresponding fields in software.
The specific implementation method of the step 2 is as follows:
step 2-1, comparing and analyzing table information and field information which are lacked in the data structure to be converted compared with a target data structure;
step 2-2, comparing and analyzing table information and field information which are more than a target data structure in the data structure to be converted;
step 2-3, acquiring conversion requirements, and performing calculation supplement or abandon supplement on the missing tables and fields;
and 2-4, acquiring a conversion requirement, and carrying out deletion operation or translation retaining operation on the redundant tables and fields.
The specific implementation method of the step 3 is as follows:
step 3-1, finding out the synonymy field names which are different due to the inconsistent length of the field names through analysis and comparison;
step 3-2, finding out the synonym field names which are different due to different field name naming habits through analysis and comparison;
and 3-3, performing conversion association on all the synonymous different name fields.
The specific implementation method of the step 4 is as follows:
step 4-1, acquiring all field information for file storage in a storage path mode in the data to be converted;
step 4-2, marking the selected field to be converted;
and 4-3, selecting a migration tool or converting and migrating the path and the file by utilizing the call of a related migration method.
The specific implementation method of the step 5 is as follows:
step 5-1, acquiring all field information for directly storing the data blocks in the data to be converted in a BLOB form;
step 5-2, marking the selected field to be converted;
and 5-3, calling a related migration method or manually writing a conversion method block to copy, transmit and convert the data block.
The specific implementation method of the step 6 is as follows:
step 6-1, collecting and forming all metadata difference field sets by analyzing metadata and comparing the metadata with previous and next heterogeneous data;
6-2, marking the selected metadata difference field;
and 6-3, associating the metadata difference fields by using software, and executing conversion.
Compared with the prior art, the invention has the following remarkable advantages: the method integrates various complicated heterogeneous extraction and conversion conditions, and can provide a solution for the difficult problems of table splitting, file migration, file storage format conversion and the like. 1) Aiming at table splitting, comparing the organization relations of tables of front and back heterogeneous data structures, including data of the tables, information types described by the tables, expression forms and field corresponding relations of a master table and a slave table, and the like. And analyzing the similarities and differences in the aspects, and marking the corresponding relation of the front and rear table structures according to the table quantity and the master and slave table forms according to the similarities and differences. 2) And aiming at file migration, marking the selected field to be converted by acquiring all field information for file storage in a storage path mode in the data to be converted, and selecting a migration tool or converting and migrating the path and the file by using the call of a related migration method. 3) For file storage format conversion, all field information of directly storing data blocks in a BLOB form and the like in the data to be converted is acquired, the selected field to be converted is marked, and a related migration method is called or manual writing of a conversion method block is carried out to copy, transmit, convert and the like the data block. In addition, the method has multiple schemes which can simultaneously meet the requirements of project groups on multiple conversion methods.
Drawings
FIG. 1 is a general flow chart of an embodiment of the present invention.
FIG. 2 is a schematic diagram of an exemplary embodiment of the present invention.
FIG. 3 is a diagram illustrating a file migration algorithm according to an embodiment of the present invention.
Fig. 4 is a diagram illustrating an example of a metadata conversion algorithm according to an embodiment of the present invention.
Detailed Description
The invention is further described below with reference to the figures and examples.
As shown in fig. 1 and 2, two sets of heterogeneous data structures are selected for description in the present embodiment, and the implementation steps are as follows. a) Comparing the organizational relations of tables of the front and back heterogeneous data structures, wherein the organizational relations comprise the data of the tables, the information types described by the tables, the expression forms of the master table and the slave table, and the like. Finding out the similarities and differences in the aspects, and marking the corresponding relation of the front and rear table structures according to the table quantity and the master and slave table forms according to the similarities and differences.
b) And comparing the corresponding relations of the fields of the front and back heterogeneous data structures, and marking the same corresponding fields in software.
c) And comparing and analyzing table information and field information which are lacked in the data structure to be converted compared with the target data structure.
d) And comparing and analyzing table information and field information which are more than the target data structure in the data structure to be converted.
e) Conversion requirements are acquired, and the missing tables and fields are computationally supplemented or abandoned.
f) And acquiring a conversion requirement, and carrying out deletion operation or translation retaining operation on the redundant tables and fields.
g) And (4) finding out the synonymous field names which are different due to the inconsistent length of the field names by analyzing and comparing.
h) And (4) finding out synonymous field names which are different due to different field name naming habits through analysis and comparison.
i) Converting and correlating all synonymous different name fields
j) And acquiring all field information for file storage in a storage path mode in the data to be converted by a software data manager.
k) And marking the selected field to be converted.
l) selecting a migration tool or converting and migrating paths and files by utilizing the calling of related migration methods.
m) acquiring all field information directly storing the data blocks in a BLOB form and the like in the data to be converted by software data management personnel.
n) marking the selected field to be converted.
o) calling the related migration method or manually writing a conversion method block to copy, transmit and convert the data block, wherein the example of the conversion code is shown in fig. 3.
p) collecting and forming all metadata difference field sets by analyzing metadata and comparing the former heterogeneous data and the latter heterogeneous data.
q) marking the selected metadata difference field.
The metadata difference fields are associated with software and a transformation is performed, an example of which is shown in fig. 4.
Claims (7)
1. A data extraction and conversion method for covering complex heterogeneous conditions is characterized by comprising the following steps:
step 1: the heterogeneous data structures before and after conversion are sorted, and the structure difference and the organization corresponding relation between the detailed tables of the structures before and after conversion and the fields are marked;
step 2: newly added or lacked field information in the data structures before and after searching, and reserving, deleting or supplementing the fields according to requirements;
and step 3: comparing the names of the synonymous fields in the previous heterogeneous data and the synonymous fields in the subsequent heterogeneous data, and marking and corresponding the corresponding fields with different synonymous names;
and 4, step 4: checking whether a file storage condition exists before conversion, marking the file path storage condition and selecting a migration tool for migration;
and 5: checking the difference and the sameness of the storage modes of the file formats in the front heterogeneous data and the back heterogeneous data, marking the storage modes which are changed, and selecting a corresponding conversion tool to define a conversion method;
step 6: and comparing the difference of the metadata in the heterogeneous data before and after comparison, and marking and corresponding the fields with different metadata.
2. The method for extracting and converting data covering complex heterogeneous conditions according to claim 1, wherein the specific implementation method of step 1 is as follows:
step 1-1, comparing organization relations of tables of the front and back heterogeneous data structures, wherein the organization relations comprise data of the tables, information types described by the tables and expression forms of a master table and a slave table; finding out the similarities and differences in the aspects, and marking the corresponding relation of the front and rear table structures according to the number of tables and the form of a master table and a slave table according to the similarities and differences;
and 1-2, comparing the corresponding relations of the fields of the front and back heterogeneous data structures, and marking the same corresponding fields in software.
3. The method for extracting and converting data covering complex heterogeneous conditions according to claim 1, wherein the step 2 is implemented as follows:
step 2-1, comparing and analyzing table information and field information which are lacked in the data structure to be converted compared with a target data structure;
step 2-2, comparing and analyzing table information and field information which are more than a target data structure in the data structure to be converted;
step 2-3, acquiring conversion requirements, and performing calculation supplement or abandon supplement on the missing tables and fields;
and 2-4, acquiring a conversion requirement, and carrying out deletion operation or translation retaining operation on the redundant tables and fields.
4. The method for extracting and converting data covering complex heterogeneous conditions according to claim 1, wherein the specific implementation method of step 3 is as follows:
step 3-1, finding out the synonymy field names which are different due to the inconsistent length of the field names through analysis and comparison;
step 3-2, finding out the synonym field names which are different due to different field name naming habits through analysis and comparison;
and 3-3, performing conversion association on all the synonymous different name fields.
5. The method for extracting and converting data according to claim 1, wherein the step 4 is implemented as follows:
step 4-1, acquiring all field information for file storage in a storage path mode in the data to be converted;
step 4-2, marking the selected field to be converted;
and 4-3, selecting a migration tool or converting and migrating the path and the file by utilizing the call of a related migration method.
6. The method for extracting and converting data according to claim 1, wherein the step 5 is implemented as follows:
step 5-1, acquiring all field information for directly storing the data blocks in the data to be converted in a BLOB form;
step 5-2, marking the selected field to be converted;
and 5-3, calling a related migration method or manually writing a conversion method block to copy, transmit and convert the data block.
7. The method for extracting and converting data according to claim 1, wherein the step 6 is implemented as follows:
step 6-1, collecting and forming all metadata difference field sets by analyzing metadata and comparing the metadata with previous and next heterogeneous data;
6-2, marking the selected metadata difference field;
and 6-3, associating the metadata difference fields by using software, and executing conversion.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911419254.6A CN111309792B (en) | 2019-12-31 | 2019-12-31 | Data extraction and conversion method covering complex heterogeneous conditions |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911419254.6A CN111309792B (en) | 2019-12-31 | 2019-12-31 | Data extraction and conversion method covering complex heterogeneous conditions |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111309792A true CN111309792A (en) | 2020-06-19 |
CN111309792B CN111309792B (en) | 2023-12-08 |
Family
ID=71156381
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911419254.6A Active CN111309792B (en) | 2019-12-31 | 2019-12-31 | Data extraction and conversion method covering complex heterogeneous conditions |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111309792B (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102902750A (en) * | 2012-09-20 | 2013-01-30 | 浪潮齐鲁软件产业有限公司 | Universal data extraction and conversion method |
CN105373599A (en) * | 2015-10-28 | 2016-03-02 | 北京汇商融通信息技术有限公司 | Data migration system based on various data storage platforms |
CN110019127A (en) * | 2017-11-28 | 2019-07-16 | 清远市易通科技有限公司 | It is a kind of to fast implement the asynchronous information synchronization method of MYSQL database |
-
2019
- 2019-12-31 CN CN201911419254.6A patent/CN111309792B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102902750A (en) * | 2012-09-20 | 2013-01-30 | 浪潮齐鲁软件产业有限公司 | Universal data extraction and conversion method |
CN105373599A (en) * | 2015-10-28 | 2016-03-02 | 北京汇商融通信息技术有限公司 | Data migration system based on various data storage platforms |
WO2017071135A1 (en) * | 2015-10-28 | 2017-05-04 | 北京汇商融通信息技术有限公司 | Data migration system based on various data storage platforms |
CN110019127A (en) * | 2017-11-28 | 2019-07-16 | 清远市易通科技有限公司 | It is a kind of to fast implement the asynchronous information synchronization method of MYSQL database |
Non-Patent Citations (4)
Title |
---|
张民: "科研数据的迁移和保存元数据研究", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 * |
张民: "科研数据的迁移和保存元数据研究", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》, 15 November 2013 (2013-11-15), pages 16 - 38 * |
徐燕等: "信息系统中的通用数据迁移工具的研究与设计", 《计算机与现代化》 * |
徐燕等: "信息系统中的通用数据迁移工具的研究与设计", 《计算机与现代化》, 15 June 2010 (2010-06-15), pages 156 - 165 * |
Also Published As
Publication number | Publication date |
---|---|
CN111309792B (en) | 2023-12-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US10657111B2 (en) | Computer-implemented method for storing unlimited amount of data as a mind map in relational database systems | |
CN106649378B (en) | Data synchronization method and device | |
US9026901B2 (en) | Viewing annotations across multiple applications | |
CN102663076B (en) | Method for processing file data | |
CN104331285A (en) | Automatic code generation method and system | |
CN113553313B (en) | Data migration method and system, storage medium and electronic equipment | |
WO2023029275A1 (en) | Data association analysis method and apparatus, and computer device and storage medium | |
CN110134663B (en) | Organization structure data processing method and device and electronic equipment | |
CN112000649B (en) | Method and device for synchronizing incremental data based on map reduce | |
CN113434482A (en) | Data migration method and device, computer equipment and storage medium | |
CN114218218A (en) | Data processing method, device and equipment based on data warehouse and storage medium | |
CN115905628A (en) | Dynamic resource directory construction method, device, equipment and storage medium | |
CN104636401A (en) | Data rollback method and device for SCADA system | |
CN105224663A (en) | A kind of data-accessing tasks management method based on multiple data source and device | |
CN111309792A (en) | Data extraction and conversion method for covering complex heterogeneous situation | |
CN108228592B (en) | Data archiving method and data archiving device based on binary log | |
CN109446201A (en) | A kind of method for sorting, device and the equipment of Excel table endorsement information | |
CN115114297A (en) | Data lightweight storage and search method and device, electronic equipment and storage medium | |
CN104679740A (en) | Data processing system | |
CN114020719A (en) | License data migration method applied to heterogeneous database | |
CN115617773A (en) | Data migration method, device and system | |
CN105320562A (en) | Distributed operation accelerating running method and system based on operation characteristic fingerprints | |
CN107844639B (en) | Project standard structure automatic generation method and system | |
CN111399838A (en) | Data modeling method and device based on spark SQ L and materialized view | |
CN109522216A (en) | Team's interface exploitation cooperative system and method based on API testing tool export data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |