CN111309792A - Data extraction and conversion method for covering complex heterogeneous situation - Google Patents

Data extraction and conversion method for covering complex heterogeneous situation Download PDF

Info

Publication number
CN111309792A
CN111309792A CN201911419254.6A CN201911419254A CN111309792A CN 111309792 A CN111309792 A CN 111309792A CN 201911419254 A CN201911419254 A CN 201911419254A CN 111309792 A CN111309792 A CN 111309792A
Authority
CN
China
Prior art keywords
data
conversion
fields
heterogeneous
marking
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911419254.6A
Other languages
Chinese (zh)
Other versions
CN111309792B (en
Inventor
刘太敏
张翠侠
杨博文
张永伟
段然
陈奡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 28 Research Institute
Original Assignee
CETC 28 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 28 Research Institute filed Critical CETC 28 Research Institute
Priority to CN201911419254.6A priority Critical patent/CN111309792B/en
Publication of CN111309792A publication Critical patent/CN111309792A/en
Application granted granted Critical
Publication of CN111309792B publication Critical patent/CN111309792B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data extraction and conversion method for covering complex heterogeneous conditions. The method comprises the steps of firstly, researching difficult points and problems encountered in the heterogeneous data conversion process, summarizing and summarizing the problems, and forming a heterogeneous data conversion problem library. And respectively proposing a solution for each problem, and gathering the solutions to form a set of solution. The invention solves various heterogeneous conditions including the conditions of different data organization structures, different storage forms, different metadata, attachment migration and the like in the heterogeneous data conversion, and improves the conversion efficiency between heterogeneous data.

Description

Data extraction and conversion method for covering complex heterogeneous situation
Technical Field
The invention belongs to the technical field of data processing, and particularly relates to a data extraction and conversion method covering complex heterogeneous conditions.
Background
The data extraction and conversion is a data processing flow from different heterogeneous data sources to uniform target data in the data flow process. The data extraction and conversion is the basis of data application, is widely applied to the work of big data calculation processing and data mining analysis in various industries, and the key point of efficient data application is that the requirement of a data extraction and conversion algorithm with complete functions and excellent performance is more and more large.
However, on the one hand, the existing data extraction and conversion method in the industry does not consider complex heterogeneous conditions, and does not correspondingly consider the relatively complex conditions of master-slave table form splitting, file migration, file storage format conversion and the like, so that the existing results cannot be found in the face of the conditions, and resources are required to be consumed to perform customized development on specific conditions; on the other hand, part of tools are not converted in method and function, but the coverage is incomplete, and the situation of extracting and converting various complicated isomerisms which are needed by projects at one time is difficult to meet.
Disclosure of Invention
The invention aims to provide a data extraction and conversion method capable of covering conversion conditions of various complex heterogeneous data, and the method can simultaneously meet the requirements of various conversion methods.
The technical solution for realizing the purpose of the invention is as follows: a data extraction and conversion method covering complex heterogeneous conditions comprises the following steps:
the first step is as follows: and (5) arranging heterogeneous data structures before and after conversion, and marking structure differences and organization corresponding relations between detailed tables of structures before and after conversion and fields.
The second step is that: and newly added or lacked field information in the data structures before and after searching, and reserving, deleting or supplementing the fields according to requirements.
The third step: and comparing the names of the synonymous fields in the heterogeneous data before and after, and marking and corresponding fields with different synonymous names.
The fourth step: and checking whether the file storage condition exists before conversion, marking the file path storage condition and selecting a migration tool for migration.
The fifth step: and checking the difference of the storage modes of the file formats in the front heterogeneous data and the back heterogeneous data, marking the storage modes which are changed, and selecting a corresponding conversion tool to define a conversion method.
And a sixth step: and comparing the difference of the metadata in the heterogeneous data before and after comparison, and marking and corresponding the fields with different metadata.
The specific implementation method of the step 1 is as follows:
step 1-1, comparing organization relations of tables of the front and back heterogeneous data structures, wherein the organization relations comprise data of the tables, information types described by the tables and expression forms of a master table and a slave table; finding out the similarities and differences in the aspects, and marking the corresponding relation of the front and rear table structures according to the number of tables and the form of a master table and a slave table according to the similarities and differences;
and 1-2, comparing the corresponding relations of the fields of the front and back heterogeneous data structures, and marking the same corresponding fields in software.
The specific implementation method of the step 2 is as follows:
step 2-1, comparing and analyzing table information and field information which are lacked in the data structure to be converted compared with a target data structure;
step 2-2, comparing and analyzing table information and field information which are more than a target data structure in the data structure to be converted;
step 2-3, acquiring conversion requirements, and performing calculation supplement or abandon supplement on the missing tables and fields;
and 2-4, acquiring a conversion requirement, and carrying out deletion operation or translation retaining operation on the redundant tables and fields.
The specific implementation method of the step 3 is as follows:
step 3-1, finding out the synonymy field names which are different due to the inconsistent length of the field names through analysis and comparison;
step 3-2, finding out the synonym field names which are different due to different field name naming habits through analysis and comparison;
and 3-3, performing conversion association on all the synonymous different name fields.
The specific implementation method of the step 4 is as follows:
step 4-1, acquiring all field information for file storage in a storage path mode in the data to be converted;
step 4-2, marking the selected field to be converted;
and 4-3, selecting a migration tool or converting and migrating the path and the file by utilizing the call of a related migration method.
The specific implementation method of the step 5 is as follows:
step 5-1, acquiring all field information for directly storing the data blocks in the data to be converted in a BLOB form;
step 5-2, marking the selected field to be converted;
and 5-3, calling a related migration method or manually writing a conversion method block to copy, transmit and convert the data block.
The specific implementation method of the step 6 is as follows:
step 6-1, collecting and forming all metadata difference field sets by analyzing metadata and comparing the metadata with previous and next heterogeneous data;
6-2, marking the selected metadata difference field;
and 6-3, associating the metadata difference fields by using software, and executing conversion.
Compared with the prior art, the invention has the following remarkable advantages: the method integrates various complicated heterogeneous extraction and conversion conditions, and can provide a solution for the difficult problems of table splitting, file migration, file storage format conversion and the like. 1) Aiming at table splitting, comparing the organization relations of tables of front and back heterogeneous data structures, including data of the tables, information types described by the tables, expression forms and field corresponding relations of a master table and a slave table, and the like. And analyzing the similarities and differences in the aspects, and marking the corresponding relation of the front and rear table structures according to the table quantity and the master and slave table forms according to the similarities and differences. 2) And aiming at file migration, marking the selected field to be converted by acquiring all field information for file storage in a storage path mode in the data to be converted, and selecting a migration tool or converting and migrating the path and the file by using the call of a related migration method. 3) For file storage format conversion, all field information of directly storing data blocks in a BLOB form and the like in the data to be converted is acquired, the selected field to be converted is marked, and a related migration method is called or manual writing of a conversion method block is carried out to copy, transmit, convert and the like the data block. In addition, the method has multiple schemes which can simultaneously meet the requirements of project groups on multiple conversion methods.
Drawings
FIG. 1 is a general flow chart of an embodiment of the present invention.
FIG. 2 is a schematic diagram of an exemplary embodiment of the present invention.
FIG. 3 is a diagram illustrating a file migration algorithm according to an embodiment of the present invention.
Fig. 4 is a diagram illustrating an example of a metadata conversion algorithm according to an embodiment of the present invention.
Detailed Description
The invention is further described below with reference to the figures and examples.
As shown in fig. 1 and 2, two sets of heterogeneous data structures are selected for description in the present embodiment, and the implementation steps are as follows. a) Comparing the organizational relations of tables of the front and back heterogeneous data structures, wherein the organizational relations comprise the data of the tables, the information types described by the tables, the expression forms of the master table and the slave table, and the like. Finding out the similarities and differences in the aspects, and marking the corresponding relation of the front and rear table structures according to the table quantity and the master and slave table forms according to the similarities and differences.
b) And comparing the corresponding relations of the fields of the front and back heterogeneous data structures, and marking the same corresponding fields in software.
c) And comparing and analyzing table information and field information which are lacked in the data structure to be converted compared with the target data structure.
d) And comparing and analyzing table information and field information which are more than the target data structure in the data structure to be converted.
e) Conversion requirements are acquired, and the missing tables and fields are computationally supplemented or abandoned.
f) And acquiring a conversion requirement, and carrying out deletion operation or translation retaining operation on the redundant tables and fields.
g) And (4) finding out the synonymous field names which are different due to the inconsistent length of the field names by analyzing and comparing.
h) And (4) finding out synonymous field names which are different due to different field name naming habits through analysis and comparison.
i) Converting and correlating all synonymous different name fields
j) And acquiring all field information for file storage in a storage path mode in the data to be converted by a software data manager.
k) And marking the selected field to be converted.
l) selecting a migration tool or converting and migrating paths and files by utilizing the calling of related migration methods.
m) acquiring all field information directly storing the data blocks in a BLOB form and the like in the data to be converted by software data management personnel.
n) marking the selected field to be converted.
o) calling the related migration method or manually writing a conversion method block to copy, transmit and convert the data block, wherein the example of the conversion code is shown in fig. 3.
p) collecting and forming all metadata difference field sets by analyzing metadata and comparing the former heterogeneous data and the latter heterogeneous data.
q) marking the selected metadata difference field.
The metadata difference fields are associated with software and a transformation is performed, an example of which is shown in fig. 4.

Claims (7)

1. A data extraction and conversion method for covering complex heterogeneous conditions is characterized by comprising the following steps:
step 1: the heterogeneous data structures before and after conversion are sorted, and the structure difference and the organization corresponding relation between the detailed tables of the structures before and after conversion and the fields are marked;
step 2: newly added or lacked field information in the data structures before and after searching, and reserving, deleting or supplementing the fields according to requirements;
and step 3: comparing the names of the synonymous fields in the previous heterogeneous data and the synonymous fields in the subsequent heterogeneous data, and marking and corresponding the corresponding fields with different synonymous names;
and 4, step 4: checking whether a file storage condition exists before conversion, marking the file path storage condition and selecting a migration tool for migration;
and 5: checking the difference and the sameness of the storage modes of the file formats in the front heterogeneous data and the back heterogeneous data, marking the storage modes which are changed, and selecting a corresponding conversion tool to define a conversion method;
step 6: and comparing the difference of the metadata in the heterogeneous data before and after comparison, and marking and corresponding the fields with different metadata.
2. The method for extracting and converting data covering complex heterogeneous conditions according to claim 1, wherein the specific implementation method of step 1 is as follows:
step 1-1, comparing organization relations of tables of the front and back heterogeneous data structures, wherein the organization relations comprise data of the tables, information types described by the tables and expression forms of a master table and a slave table; finding out the similarities and differences in the aspects, and marking the corresponding relation of the front and rear table structures according to the number of tables and the form of a master table and a slave table according to the similarities and differences;
and 1-2, comparing the corresponding relations of the fields of the front and back heterogeneous data structures, and marking the same corresponding fields in software.
3. The method for extracting and converting data covering complex heterogeneous conditions according to claim 1, wherein the step 2 is implemented as follows:
step 2-1, comparing and analyzing table information and field information which are lacked in the data structure to be converted compared with a target data structure;
step 2-2, comparing and analyzing table information and field information which are more than a target data structure in the data structure to be converted;
step 2-3, acquiring conversion requirements, and performing calculation supplement or abandon supplement on the missing tables and fields;
and 2-4, acquiring a conversion requirement, and carrying out deletion operation or translation retaining operation on the redundant tables and fields.
4. The method for extracting and converting data covering complex heterogeneous conditions according to claim 1, wherein the specific implementation method of step 3 is as follows:
step 3-1, finding out the synonymy field names which are different due to the inconsistent length of the field names through analysis and comparison;
step 3-2, finding out the synonym field names which are different due to different field name naming habits through analysis and comparison;
and 3-3, performing conversion association on all the synonymous different name fields.
5. The method for extracting and converting data according to claim 1, wherein the step 4 is implemented as follows:
step 4-1, acquiring all field information for file storage in a storage path mode in the data to be converted;
step 4-2, marking the selected field to be converted;
and 4-3, selecting a migration tool or converting and migrating the path and the file by utilizing the call of a related migration method.
6. The method for extracting and converting data according to claim 1, wherein the step 5 is implemented as follows:
step 5-1, acquiring all field information for directly storing the data blocks in the data to be converted in a BLOB form;
step 5-2, marking the selected field to be converted;
and 5-3, calling a related migration method or manually writing a conversion method block to copy, transmit and convert the data block.
7. The method for extracting and converting data according to claim 1, wherein the step 6 is implemented as follows:
step 6-1, collecting and forming all metadata difference field sets by analyzing metadata and comparing the metadata with previous and next heterogeneous data;
6-2, marking the selected metadata difference field;
and 6-3, associating the metadata difference fields by using software, and executing conversion.
CN201911419254.6A 2019-12-31 2019-12-31 Data extraction and conversion method covering complex heterogeneous conditions Active CN111309792B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911419254.6A CN111309792B (en) 2019-12-31 2019-12-31 Data extraction and conversion method covering complex heterogeneous conditions

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911419254.6A CN111309792B (en) 2019-12-31 2019-12-31 Data extraction and conversion method covering complex heterogeneous conditions

Publications (2)

Publication Number Publication Date
CN111309792A true CN111309792A (en) 2020-06-19
CN111309792B CN111309792B (en) 2023-12-08

Family

ID=71156381

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911419254.6A Active CN111309792B (en) 2019-12-31 2019-12-31 Data extraction and conversion method covering complex heterogeneous conditions

Country Status (1)

Country Link
CN (1) CN111309792B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102902750A (en) * 2012-09-20 2013-01-30 浪潮齐鲁软件产业有限公司 Universal data extraction and conversion method
CN105373599A (en) * 2015-10-28 2016-03-02 北京汇商融通信息技术有限公司 Data migration system based on various data storage platforms
CN110019127A (en) * 2017-11-28 2019-07-16 清远市易通科技有限公司 It is a kind of to fast implement the asynchronous information synchronization method of MYSQL database

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102902750A (en) * 2012-09-20 2013-01-30 浪潮齐鲁软件产业有限公司 Universal data extraction and conversion method
CN105373599A (en) * 2015-10-28 2016-03-02 北京汇商融通信息技术有限公司 Data migration system based on various data storage platforms
WO2017071135A1 (en) * 2015-10-28 2017-05-04 北京汇商融通信息技术有限公司 Data migration system based on various data storage platforms
CN110019127A (en) * 2017-11-28 2019-07-16 清远市易通科技有限公司 It is a kind of to fast implement the asynchronous information synchronization method of MYSQL database

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
张民: "科研数据的迁移和保存元数据研究", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 *
张民: "科研数据的迁移和保存元数据研究", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》, 15 November 2013 (2013-11-15), pages 16 - 38 *
徐燕等: "信息系统中的通用数据迁移工具的研究与设计", 《计算机与现代化》 *
徐燕等: "信息系统中的通用数据迁移工具的研究与设计", 《计算机与现代化》, 15 June 2010 (2010-06-15), pages 156 - 165 *

Also Published As

Publication number Publication date
CN111309792B (en) 2023-12-08

Similar Documents

Publication Publication Date Title
US10657111B2 (en) Computer-implemented method for storing unlimited amount of data as a mind map in relational database systems
CN106649378B (en) Data synchronization method and device
US9026901B2 (en) Viewing annotations across multiple applications
CN102663076B (en) Method for processing file data
CN104331285A (en) Automatic code generation method and system
CN113553313B (en) Data migration method and system, storage medium and electronic equipment
WO2023029275A1 (en) Data association analysis method and apparatus, and computer device and storage medium
CN110134663B (en) Organization structure data processing method and device and electronic equipment
CN112000649B (en) Method and device for synchronizing incremental data based on map reduce
CN113434482A (en) Data migration method and device, computer equipment and storage medium
CN114218218A (en) Data processing method, device and equipment based on data warehouse and storage medium
CN115905628A (en) Dynamic resource directory construction method, device, equipment and storage medium
CN104636401A (en) Data rollback method and device for SCADA system
CN105224663A (en) A kind of data-accessing tasks management method based on multiple data source and device
CN111309792A (en) Data extraction and conversion method for covering complex heterogeneous situation
CN108228592B (en) Data archiving method and data archiving device based on binary log
CN109446201A (en) A kind of method for sorting, device and the equipment of Excel table endorsement information
CN115114297A (en) Data lightweight storage and search method and device, electronic equipment and storage medium
CN104679740A (en) Data processing system
CN114020719A (en) License data migration method applied to heterogeneous database
CN115617773A (en) Data migration method, device and system
CN105320562A (en) Distributed operation accelerating running method and system based on operation characteristic fingerprints
CN107844639B (en) Project standard structure automatic generation method and system
CN111399838A (en) Data modeling method and device based on spark SQ L and materialized view
CN109522216A (en) Team's interface exploitation cooperative system and method based on API testing tool export data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant