CN111309792B - Data extraction and conversion method covering complex heterogeneous conditions - Google Patents

Data extraction and conversion method covering complex heterogeneous conditions Download PDF

Info

Publication number
CN111309792B
CN111309792B CN201911419254.6A CN201911419254A CN111309792B CN 111309792 B CN111309792 B CN 111309792B CN 201911419254 A CN201911419254 A CN 201911419254A CN 111309792 B CN111309792 B CN 111309792B
Authority
CN
China
Prior art keywords
data
conversion
field
fields
marking
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911419254.6A
Other languages
Chinese (zh)
Other versions
CN111309792A (en
Inventor
刘太敏
张翠侠
杨博文
张永伟
段然
陈奡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 28 Research Institute
Original Assignee
CETC 28 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 28 Research Institute filed Critical CETC 28 Research Institute
Priority to CN201911419254.6A priority Critical patent/CN111309792B/en
Publication of CN111309792A publication Critical patent/CN111309792A/en
Application granted granted Critical
Publication of CN111309792B publication Critical patent/CN111309792B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data extraction and conversion method covering complex heterogeneous conditions. Firstly, the method is used for researching the difficulties and problems encountered in the heterogeneous data conversion process, and summarizing the problems to form a heterogeneous data conversion problem library. Solutions are respectively proposed for each problem, and the solutions are assembled to form a set of solutions. The invention solves various heterogeneous conditions including different data organization structures, different storage forms, different metadata, accessory migration and the like in heterogeneous data conversion, and improves the conversion efficiency between heterogeneous data.

Description

Data extraction and conversion method covering complex heterogeneous conditions
Technical Field
The invention belongs to the technical field of data processing, and particularly relates to a data extraction and conversion method covering complex heterogeneous conditions.
Background
Data extraction conversion is a data processing flow from different heterogeneous data sources to unified target data in the process of data streaming. The data extraction and conversion is the basis of data application, is widely applied to the work of big data calculation processing and data mining analysis of various industries, has more and more requirements on the data extraction and conversion algorithm with complete functions and excellent performance, and is the key point of whether the data application can be efficiently performed.
However, the existing data extraction and conversion method in the industry does not consider complex heterogeneous conditions on one hand, and does not consider complex conditions such as master-slave table splitting, file migration, file storage format conversion and the like correspondingly, so that the existing achievements cannot be found when facing the conditions, and resources are consumed to customize and develop specific conditions; on the other hand, part of tools are not converted into methods and functions, but are not fully covered, and various complex heterogeneous extraction conversion conditions required by projects are difficult to meet at one time.
Disclosure of Invention
The invention aims to provide a data extraction conversion method capable of covering multiple complex heterogeneous data conversion conditions, and the requirements of the multiple conversion methods can be met at the same time.
The technical solution for realizing the purpose of the invention is as follows: a data extraction and conversion method covering complex heterogeneous conditions comprises the following steps:
the first step: and (5) sorting heterogeneous data structures before and after conversion, and marking out structural differences and organization corresponding relations between front and rear structure detailed tables and fields.
And a second step of: and searching newly added or missing field information in the front and back data structures, and reserving, deleting or supplementing the fields according to requirements.
And a third step of: and (5) comparing the names of the synonymous fields in the front and rear heterogeneous data, and marking and corresponding the corresponding fields of the synonymous different names.
Fourth step: and checking whether the file storage condition exists before conversion, marking the file path storage condition, and selecting a migration tool for migration.
Fifth step: checking the difference of storage modes of the file formats in the front and rear heterogeneous data, marking the storage modes, and selecting corresponding conversion tool definition conversion methods.
Sixth step: and comparing the different metadata in the front heterogeneous data and the rear heterogeneous data, and marking and corresponding the fields with different metadata.
The specific implementation method of the step 1 is as follows:
step 1-1, comparing the organization relations of the tables of the front and rear heterogeneous data structures, wherein the organization relations comprise data of the tables, information types described by the tables and expression forms of master and slave tables; finding out the difference of the above aspects, and marking the corresponding relation of the front and rear table structures according to the table number and the master-slave table form according to the difference;
and step 1-2, comparing field correspondence relations of the front and rear heterogeneous data structures, and marking the same corresponding fields in software.
The specific implementation method of the step 2 is as follows:
step 2-1, comparing and analyzing the missing table information and field information in the data structure to be converted compared with the target data structure;
step 2-2, comparing and analyzing redundant table information and field information compared with a target data structure in the data structure to be converted;
step 2-3, obtaining conversion requirements, and performing calculation supplement or discarding supplement on the missing tables and fields;
and 2-4, acquiring conversion requirements, and deleting or reserving translation operations on the redundant tables and fields.
The specific implementation method of the step 3 is as follows:
step 3-1, finding out the synonymous field names which are distinguished by inconsistent field names through analysis and comparison;
step 3-2, finding out synonymous field names which are distinguished by different field name naming habits through analysis and comparison;
and 3-3, converting and associating all synonymous different name fields.
The specific implementation method of the step 4 is as follows:
step 4-1, obtaining all field information of the data to be converted, which is stored in a file by using a storage path;
step 4-2, marking the selected field to be converted;
step 4-3, selecting a migration tool or converting and migrating paths and files by using the related migration method.
The specific implementation method of the step 5 is as follows:
step 5-1, obtaining all field information which directly stores the data blocks in a BLOB form in the data to be converted;
step 5-2, marking the selected field to be converted;
and 5-3, calling a related migration method or manually writing a conversion method block to copy, transmit and convert the data block.
The specific implementation method of the step 6 is as follows:
step 6-1, collecting and forming all metadata difference field sets by analyzing metadata and comparing front and rear heterogeneous data;
step 6-2, marking the selected metadata difference field;
and step 6-3, associating the metadata difference fields by using software, and executing conversion.
Compared with the prior art, the invention has the remarkable advantages that: the method integrates various complex heterogeneous extraction and conversion conditions, and can provide a solution for the problems of form splitting, file migration, file storage format conversion and the like. 1) For table splitting, the organization relations of the tables of the front and rear heterogeneous data structures are compared, wherein the organization relations comprise data of the tables, information types described by the tables, expression forms of the master table and the slave table, field corresponding relations and the like. Analyzing the difference of the above aspects, and marking the corresponding relation of the front and rear table structures according to the table number and the master-slave table form according to the difference. 2) For file migration, the selected fields to be converted are marked by acquiring field information of file storage in the data to be converted in a mode of utilizing a storage path, and a migration tool is selected or a related migration method is utilized to convert and migrate paths and files. 3) For file storage format conversion, all field information of directly storing the data blocks in the form of BLOB and the like in the data to be converted is obtained, the selected fields to be converted are marked, related migration methods are called or manual writing of conversion method blocks is carried out, and operations such as copying, transmitting, converting and the like are carried out on the data blocks. In addition, the method has multiple schemes capable of meeting the requirements of the project group on multiple conversion methods.
Drawings
FIG. 1 is a general flow chart of an embodiment of the present invention.
FIG. 2 is a schematic diagram of an embodiment of the present invention.
FIG. 3 is a diagram illustrating an exemplary file migration algorithm according to an embodiment of the present invention.
Fig. 4 is a diagram illustrating an example of a metadata transformation algorithm according to an embodiment of the present invention.
Detailed Description
The invention is further described below with reference to the drawings and examples.
As shown in fig. 1 and 2, the implementation selects two heterogeneous data structures for illustration, and the implementation steps are as follows. a) The organization relation of the tables of the front and rear heterogeneous data structures is compared, and the organization relation comprises data of the tables, information types described by the tables, expression forms of the master table and the slave table and the like. Finding out the difference of the above aspects, and marking the corresponding relation of the front and rear table structures according to the table number and the master-slave table form according to the difference.
b) And comparing the field correspondence of the front and rear heterogeneous data structures, and marking the same corresponding fields in the software.
c) And comparing and analyzing the table information and the field information which are lack in the data structure to be converted compared with the target data structure.
d) And comparing and analyzing the redundant table information and field information in the data structure to be converted compared with the target data structure.
e) The conversion requirement is obtained, and the missing tables and fields are supplemented by calculation or are abandoned.
f) And obtaining the conversion requirement, and deleting the redundant tables and fields or reserving the translation operation.
g) And (5) finding out the synonymous field names which are distinguished by inconsistent field names through analysis and comparison.
h) And (5) finding out the synonymous field names which are distinguished by different field name naming habits through analysis and comparison.
i) Converting and associating all synonymous different name fields
j) And acquiring all field information of the data to be converted, which is stored in the file by using a storage path, by a software data manager.
k) And marking the selected field to be converted.
l) selecting migration tools or converting and migrating paths and files by using related migration method calls.
m) acquiring all field information which directly stores the data blocks in the form of BLOB and the like in the data to be converted through a software data manager.
n) marking the selected field to be converted.
o) invoking the related migration method or manually writing the conversion method block to copy, transmit and convert the data block, and converting the code example is shown in fig. 3.
p) collecting and forming all metadata difference field sets by analyzing metadata and comparing front and rear heterogeneous data.
q) marking the selected metadata difference field.
The metadata difference fields are associated with software and a conversion is performed, an example of which is shown in fig. 4.

Claims (1)

1. The data extraction and conversion method covering complex heterogeneous conditions is characterized by comprising the following steps:
step 1: sorting heterogeneous data structures before and after conversion, and marking structural differences and organization corresponding relations between front and rear structure detailed tables and fields:
step 1-1, comparing the organization relations of the tables of the front and rear heterogeneous data structures, wherein the organization relations comprise data of the tables, information types described by the tables and expression forms of master and slave tables; finding out the difference of the above aspects, and marking the corresponding relation of the front and rear table structures according to the table number and the master-slave table form according to the difference;
step 1-2, comparing field correspondence of the front and rear heterogeneous data structures, and marking the same corresponding fields in software;
step 2: searching newly added or missing field information in the front and back data structures, and reserving, deleting or supplementing the fields according to requirements;
step 2-1, comparing and analyzing the missing table information and field information in the data structure to be converted compared with the target data structure;
step 2-2, comparing and analyzing redundant table information and field information compared with a target data structure in the data structure to be converted;
step 2-3, obtaining conversion requirements, and performing calculation supplement or discarding supplement on the missing tables and fields;
step 2-4, obtaining conversion requirements, and deleting redundant tables and fields or reserving translation operations;
step 3: comparing the names of synonymous fields in the front and rear heterogeneous data, and marking and corresponding fields of synonymous different names:
step 3-1, finding out the synonymous field names which are distinguished by inconsistent field names through analysis and comparison;
step 3-2, finding out synonymous field names which are distinguished by different field name naming habits through analysis and comparison;
step 3-3, converting and associating all synonymous different name fields;
step 4: checking whether a file storage condition exists before conversion, marking the file path storage condition, and selecting a migration tool for migration; the specific implementation method is as follows:
step 4-1, obtaining all field information of the data to be converted, which is stored in a file by using a storage path;
step 4-2, marking the selected field to be converted;
step 4-3, selecting a migration tool or converting and migrating paths and files by using the related migration method;
step 5: checking the difference of storage modes of the file formats in the front and rear heterogeneous data, marking the changed storage modes, and selecting corresponding conversion tool definition conversion methods; the specific implementation method is as follows:
step 5-1, obtaining all field information which directly stores the data blocks in a BLOB form in the data to be converted;
step 5-2, marking the selected field to be converted;
step 5-3, calling a related migration method or manually writing a conversion method block to copy, transmit and convert the data block;
step 6: the metadata in the heterogeneous data before and after comparison are different, and different fields of the metadata are marked and corresponding, and the specific implementation method is as follows:
step 6-1, collecting and forming all metadata difference field sets by analyzing metadata and comparing front and rear heterogeneous data;
step 6-2, marking the selected metadata difference field;
and step 6-3, associating the metadata difference fields by using software, and executing conversion.
CN201911419254.6A 2019-12-31 2019-12-31 Data extraction and conversion method covering complex heterogeneous conditions Active CN111309792B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911419254.6A CN111309792B (en) 2019-12-31 2019-12-31 Data extraction and conversion method covering complex heterogeneous conditions

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911419254.6A CN111309792B (en) 2019-12-31 2019-12-31 Data extraction and conversion method covering complex heterogeneous conditions

Publications (2)

Publication Number Publication Date
CN111309792A CN111309792A (en) 2020-06-19
CN111309792B true CN111309792B (en) 2023-12-08

Family

ID=71156381

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911419254.6A Active CN111309792B (en) 2019-12-31 2019-12-31 Data extraction and conversion method covering complex heterogeneous conditions

Country Status (1)

Country Link
CN (1) CN111309792B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102902750A (en) * 2012-09-20 2013-01-30 浪潮齐鲁软件产业有限公司 Universal data extraction and conversion method
CN105373599A (en) * 2015-10-28 2016-03-02 北京汇商融通信息技术有限公司 Data migration system based on various data storage platforms
CN110019127A (en) * 2017-11-28 2019-07-16 清远市易通科技有限公司 It is a kind of to fast implement the asynchronous information synchronization method of MYSQL database

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102902750A (en) * 2012-09-20 2013-01-30 浪潮齐鲁软件产业有限公司 Universal data extraction and conversion method
CN105373599A (en) * 2015-10-28 2016-03-02 北京汇商融通信息技术有限公司 Data migration system based on various data storage platforms
WO2017071135A1 (en) * 2015-10-28 2017-05-04 北京汇商融通信息技术有限公司 Data migration system based on various data storage platforms
CN110019127A (en) * 2017-11-28 2019-07-16 清远市易通科技有限公司 It is a kind of to fast implement the asynchronous information synchronization method of MYSQL database

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
信息系统中的通用数据迁移工具的研究与设计;徐燕等;《计算机与现代化》;20100615;第156-165页 *
科研数据的迁移和保存元数据研究;张民;《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》;20131115;正文第16-38页 *

Also Published As

Publication number Publication date
CN111309792A (en) 2020-06-19

Similar Documents

Publication Publication Date Title
US11693833B2 (en) Computer-implemented method for storing unlimited amount of data as a mind map in relational database systems
TWI479341B (en) High throughput, reliable replication of transformed data in information systems
CN105205117A (en) Data table migrating method and device
CN103020024B (en) A kind of file layout change-over method
CN109460354B (en) Method for test case reduction based on RDF reasoning
US20140289280A1 (en) System and Method for Bi-directional Conversion of Directed Acyclic Graphs and Inter-File Branching
CN102663076B (en) Method for processing file data
CN105138603A (en) Oracle database migration tool for carrying out migration from HP-UX platform to K-UX platform
CN114036119A (en) Data synchronization method based on button and database log
CN112000649B (en) Method and device for synchronizing incremental data based on map reduce
CN104731911A (en) Dynamic mapping and conversion method of data table and entity class
CN100589101C (en) Data access method based on the Oracle relational database of routine call interface
CN111309792B (en) Data extraction and conversion method covering complex heterogeneous conditions
Jun et al. The research & application of ETL tool in business intelligence project
CN103677811A (en) Design system and method of spacecraft development process
CN107045538B (en) A kind of web terminal exchange management method based on kettle
CN101968747B (en) Cluster application management system and application management method thereof
CN116185988A (en) Data modular transfer method based on Spark technology, server and storage medium
CN104731597A (en) Method for applying SQL statements to engineering software compiling platform
CN111177234A (en) Device and method for rapidly processing document type data file
CN107844639B (en) Project standard structure automatic generation method and system
CN108259354B (en) Multi-stage flow table design method based on logic relation between matching fields
CN109857380A (en) A kind of workflow file compiling method and device
CN111737268B (en) Data processing method based on document database
CN110263028B (en) Full-scale synchronization method applied to search service

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant