CN104462562A - Data migration system and method based on data warehouse automation - Google Patents

Data migration system and method based on data warehouse automation Download PDF

Info

Publication number
CN104462562A
CN104462562A CN201410832607.6A CN201410832607A CN104462562A CN 104462562 A CN104462562 A CN 104462562A CN 201410832607 A CN201410832607 A CN 201410832607A CN 104462562 A CN104462562 A CN 104462562A
Authority
CN
China
Prior art keywords
data
data file
interface unit
warehouse
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410832607.6A
Other languages
Chinese (zh)
Other versions
CN104462562B (en
Inventor
郭凤
杨培强
王永军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Software Technology Co Ltd
Original Assignee
Inspur Software Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Software Group Co Ltd filed Critical Inspur Software Group Co Ltd
Priority to CN201410832607.6A priority Critical patent/CN104462562B/en
Publication of CN104462562A publication Critical patent/CN104462562A/en
Application granted granted Critical
Publication of CN104462562B publication Critical patent/CN104462562B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/185Hierarchical storage management [HSM] systems, e.g. file migration or policies thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/06Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2107File encryption

Abstract

The invention relates to a data migration system and a method based on data warehouse automation, which comprises a triggering module, an interface unit, a warehousing module and a server module, wherein the triggering module is connected with the interface unit, the interface unit is connected with the warehousing unit, and the triggering module, the interface unit and the warehousing module are all connected with the server module. The invention automatically executes the timing task through ETL (the process of extracting, transposing and loading data from a source end to a destination end), adopts a mainstream network transmission protocol ftp to transmit data files, does not need manual intervention in the whole process, and really realizes safe, reliable and effective automatic data migration.

Description

A kind of data mover system based on data warehouse robotization and method
Technical field
The present invention relates to and count processing technology field, particularly a kind of data mover system and method adopting mainstream network host-host protocol (ftp) transmission data file.
Background technology
Along with the arrival of large data age, developing rapidly of data warehouse technology, people will more and more recognize the importance of data to enterprise.And the data interaction demand between system also gets more and more.Data Migration between database is all a difficult problem all the time.What traditional data transference package was too much depends on database, as: get through DataBase combining by dblink mode.Shortcoming is as follows: 1, too depend on database, if disconnecting can not recover 2 automatically, for a long time data interaction add the load of database, affect database performance 3, need manual intervention, workload is large.4, between database, the degree of coupling is too high, and security can not get ensureing.
Summary of the invention
The object of the present invention is to provide a kind of omnidistance without the need to manual intervention, really realize safe, reliable and effective automation data migratory system and method.
For achieving the above object, the present invention adopts following technical scheme:
A kind of data mover system based on data warehouse robotization, comprise trigger module, interface unit, enter library module, server module, wherein trigger element is interconnected with interface unit, interface unit is interconnected with warehouse-in unit, trigger module, interface unit, enters library module and is all interconnected with server module.
Wherein in an embodiment, described trigger module carrys out trigger data migration by ETL timed task automated execution.
Wherein in an embodiment, described interface unit comprises data file and control documents.
Wherein in an embodiment, described in enter library module include storehouse daily record and the quality of data report.
Wherein in an embodiment, described server module comprises interface server, destination server, ftp server.
Wherein in an embodiment, described data file adopts bundling transmission, and transmission mode is FTP transmission, and transmission mode comprises parallel transmission and serial transmission.
Wherein in an embodiment, described control documents is transmitted by FTP transmission mode, and described control documents adopts md5 encryption, has identification code, and described identification code comprises record number and cipher-text information.
Another technical scheme of the present invention is:
A kind of data migration method, comprises the following steps:
Interface table generation data file information by data genaration data file in interface table, and is carried out Md5 encryption by A: data source ETL task, generates control documents;
B: group of data files and control documents are uploaded to ftp server as an interface unit;
C: destination host ETL task has adjusted shell script, detect interface unit control documents in ftp server whether to exist, then show that in interface unit, data file all receives as existed, the data file information received is carried out Md5 encryption, compare with information in control documents, avoid the missing documents that packet loss in network transmission process causes, ensure the integrality receiving file, and then ensure the integrality of data;
D: data file verification by after carry out in-stockroom operation, ETL task successively by data file put in storage, call after having put in storage and check script, carry out business datum quality and check.
In steps A: interface table generation data file information by data genaration data file in interface table, and is carried out Md5 encryption by data source ETL task, generates control documents and comprises:
A1: data source end data starts transmission;
A2: generate data file at interface unit, data file forms the data file of point roll form according to its size;
A3: generate control documents at interface unit, carry out md5 encryption to control documents, to control documents marker recognition code to ensure the integrality of data file, identification code comprises number and the cipher-text information of data file;
A4: adopt FTP transmission mode the data file of generation, control documents to be uploaded, serial transmission and parallel transmission can be adopted in transmitting procedure;
A5: interface unit sends mail notification to entering library module in the process of transmission;
A6: complete transmission.
Step D described in an embodiment wherein: data file verification by after carry out in-stockroom operation, data file is put in storage by ETL task successively, calls and check script after having put in storage, carries out business datum quality and checks and comprise:
D1: enter library module and start to take over data file warehouse-in;
D2: whether warehouse-in module check control documents exists, and just enters next step if existed, and if there is no then continues to wait for;
D3: whether meet the requirements with checking file entering in library module to contrast control documents, just abandons warehouse-in, if met the requirements just enter next step if undesirable;
D4: enter library module and data file is put in storage, and generate warehouse-in daily record;
D5: check that data file script forms quality of data report;
D6: send mail notification to interface module;
D7: complete data loading.
Present invention achieves omnidistance without the need to manual intervention, really achieve safe, reliable and effective automation data migration scheme.And transmitted by derived data file, do not rely on and certain database platform, can Data Migration between compatible distinct type data-base.It can be applicable to any scene needing Data Migration, is a kind of desirable automation data migration solution.
Accompanying drawing explanation
Fig. 1 is system composition diagram of the present invention.
Fig. 2 is interface unit schematic diagram in the present invention.
Fig. 3 is workflow schematic diagram of the present invention.
Fig. 4 is data source end data file generated process flow diagram of the present invention.
Fig. 5 is target database of the present invention warehouse-in schematic flow sheet.
Embodiment
Below in conjunction with Figure of description, the specific embodiment of the present invention is described.
Fig. 1 is system composition diagram of the present invention, as shown in Figure 1, a kind of data mover system of the present invention, comprise trigger module, interface unit, enter library module, server module, wherein trigger element is interconnected with interface unit, interface unit is interconnected with warehouse-in unit, trigger module, interface unit, enters library module and is all interconnected with server module.
Described trigger module carrys out trigger data migration by ETL timed task automated execution.
Described enter library module include storehouse daily record and the quality of data report.
Described server module comprises interface server, destination server, ftp server.
Described control documents is transmitted by FTP transmission mode, and described control documents adopts md5 encryption, has identification code, and described identification code comprises record number and cipher-text information.
Fig. 2 is interface unit schematic diagram in the present invention, and described interface unit comprises data file and control documents as can be seen from Figure 2.
Described data file adopts bundling transmission, and transmission mode is FTP transmission, and transmission mode comprises parallel transmission and serial transmission.
Fig. 3 is workflow schematic diagram of the present invention.Fig. 3 discloses a kind of data migration method based on data warehouse robotization of the present invention, and the method comprises the following steps:
Interface table generation data file information by data genaration data file in interface table, and is carried out Md5 encryption by A: data source ETL task, generates control documents;
B: group of data files and control documents are uploaded to ftp server as an interface unit;
C: destination host ETL task has adjusted shell script, detect interface unit control documents in ftp server whether to exist, then show that in interface unit, data file all receives as existed, the data file information received is carried out Md5 encryption, compare with information in control documents, avoid the missing documents that packet loss in network transmission process causes, ensure the integrality receiving file, and then ensure the integrality of data;
D: data file verification by after carry out in-stockroom operation, ETL task successively by data file put in storage, call after having put in storage and check script, carry out business datum quality and check.
The present invention, for avoiding derived data file excessive, is unfavorable for transmission and management, limits Single document maximum data number, can generate multiple data file according to data volume in the restriction of data file maximum number and actual table.The mode of derived data file bundling, is more conducive to Internet Transmission.Can parallel transmission, the network bandwidth can be utilized fully, also reduce the offered load that the file corruption caused due to packet loss needs retransmission file simultaneously.
The present invention is by export number, and the information such as file size are combined into character string in order, and carries out Md5 encryption to it, generating ciphertext.Derived data summary journal number, cipher-text information are generated control documents, as identification code.Data file and control documents form an interface unit.
Fig. 4 is data source end data file generated process flow diagram of the present invention.Corresponding to the steps A in Fig. 3, steps A comprises further:
A1: data source end data starts transmission;
A2: generate data file at interface unit, data file forms the data file of point roll form according to its size;
A3: generate control documents at interface unit, carry out md5 encryption to control documents, to control documents marker recognition code to ensure the integrality of data file, identification code comprises number and the cipher-text information of data file;
A4: adopt FTP transmission mode the data file of generation, control documents to be uploaded, serial transmission and parallel transmission can be adopted in transmitting procedure;
A5: interface unit sends mail notification to entering library module in the process of transmission;
A6: complete transmission.
Fig. 5 is target database of the present invention warehouse-in schematic flow sheet.Step D corresponding in Fig. 3: data file verification by after carry out in-stockroom operation, ETL task successively by data file put in storage, call after having put in storage and check script, carry out business datum quality and check, it comprises further:
D1: enter library module and start to take over data file warehouse-in;
D2: whether warehouse-in module check control documents exists, and just enters next step if existed, and if there is no then continues to wait for;
D3: whether meet the requirements with checking file entering in library module to contrast control documents, just abandons warehouse-in, if met the requirements just enter next step if undesirable;
D4: enter library module and data file is put in storage, and generate warehouse-in daily record;
D5: check that data file script forms quality of data report;
D6: send mail notification to interface module;
D7: complete data loading.
The present invention is triggered by ETL task automation, interface data is exported as data file, and generate control documents according to data file information, be transferred to target file server by ftp, destination host task scan file, interface group data file information is generated enciphered message according to same algorithm, and compare with information in control documents, avoid the missing documents that packet loss in network transmission process causes, ensure the integrality receiving file, and then ensure the integrality of data.Data file verification by after call and put in storage into library, after warehouse-in, Automatically invoked checks that script carries out business datum quality and checks, and generates quality of data report.
In sum, above embodiment only in order to technical scheme of the present invention to be described, is not intended to limit; Although with reference to above-described embodiment to invention has been detailed description, those of ordinary skill in the art is to be understood that: it still can be modified to the technical scheme described in the various embodiments described above, or carries out equivalent replacement to wherein portion of techniques feature; And these amendments or replacement, do not make the essence of appropriate technical solution depart from the spirit and scope of various embodiments of the present invention technical scheme.

Claims (10)

1. the data mover system based on data warehouse robotization, it is characterized in that: comprise trigger module, interface unit, enter library module, server module, wherein trigger element is interconnected with interface unit, interface unit is interconnected with warehouse-in unit, trigger module, interface unit, enters library module and is all interconnected with server module.
2. according to claim 1 based on the data mover system of data warehouse robotization, it is characterized in that: described trigger module carrys out trigger data migration by ETL timed task automated execution.
3. according to claim 1 based on the data mover system of data warehouse robotization, it is characterized in that: described interface unit comprises data file and control documents.
4., according to claim 1 based on the data mover system of data warehouse robotization, it is characterized in that: described in enter library module include storehouse daily record and the quality of data report.
5. according to claim 1 based on the data mover system of data warehouse robotization, it is characterized in that: described server module comprises interface server, destination server, ftp server.
6. according to claim 3 based on the data mover system of data warehouse robotization, it is characterized in that: described data file adopts bundling transmission, transmission mode is FTP transmission, and transmission mode comprises parallel transmission and serial transmission.
7. according to claim 3 based on the data mover system of data warehouse robotization, it is characterized in that: described control documents is transmitted by FTP transmission mode, described control documents adopts md5 encryption, has identification code, and described identification code comprises record number and cipher-text information.
8., based on a data migration method for data warehouse robotization, it is characterized in that comprising the following steps:
Interface table generation data file information by data genaration data file in interface table, and is carried out Md5 encryption by A: data source ETL task, generates control documents;
B: group of data files and control documents are uploaded to ftp server as an interface unit;
C: destination host ETL task has adjusted shell script, detect interface unit control documents in ftp server whether to exist, then show that in interface unit, data file all receives as existed, the data file information received is carried out Md5 encryption, compare with information in control documents, avoid the missing documents that packet loss in network transmission process causes, ensure the integrality receiving file, and then ensure the integrality of data;
D: data file verification by after carry out in-stockroom operation, ETL task successively by data file put in storage, call after having put in storage and check script, carry out business datum quality and check.
9. according to claim 8 based on the data migration method of data warehouse robotization, it is characterized in that: described data source ETL task is by data genaration data file in interface table, and interface table generation data file information is carried out Md5 encryption, the step generating control documents comprises:
A1: data source end data starts transmission;
A2: generate data file at interface unit, data file forms the data file of point roll form according to its size;
A3: generate control documents at interface unit, carry out md5 encryption to control documents, to control documents marker recognition code to ensure the integrality of data file, identification code comprises number and the cipher-text information of data file;
A4: adopt FTP transmission mode the data file of generation, control documents to be uploaded, serial transmission and parallel transmission can be adopted in transmitting procedure;
A5: interface unit sends mail notification to entering library module in the process of transmission;
A6: complete transmission.
10. according to claim 8 based on the data migration method of data warehouse robotization, it is characterized in that: the verification of described data file by after carry out in-stockroom operation, data file is put in storage by ETL task successively, call after having put in storage and check script, carry out the step that business datum quality checks and comprise:
D1: enter library module and start to take over data file warehouse-in;
D2: whether warehouse-in module check control documents exists, and just enters next step if existed, and if there is no then continues to wait for;
D3: whether meet the requirements with checking file entering in library module to contrast control documents, just abandons warehouse-in, if met the requirements just enter next step if undesirable;
D4: enter library module and data file is put in storage, and generate warehouse-in daily record;
D5: check that data file script forms quality of data report;
D6: send mail notification to interface module;
D7: complete data loading.
CN201410832607.6A 2014-12-29 2014-12-29 Data migration system and method based on data warehouse automation Active CN104462562B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410832607.6A CN104462562B (en) 2014-12-29 2014-12-29 Data migration system and method based on data warehouse automation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410832607.6A CN104462562B (en) 2014-12-29 2014-12-29 Data migration system and method based on data warehouse automation

Publications (2)

Publication Number Publication Date
CN104462562A true CN104462562A (en) 2015-03-25
CN104462562B CN104462562B (en) 2018-05-18

Family

ID=52908597

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410832607.6A Active CN104462562B (en) 2014-12-29 2014-12-29 Data migration system and method based on data warehouse automation

Country Status (1)

Country Link
CN (1) CN104462562B (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105068805A (en) * 2015-08-07 2015-11-18 北京思特奇信息技术股份有限公司 Method and system for auditing data in data transplantation process
CN107045610A (en) * 2017-05-08 2017-08-15 广东欧珀移动通信有限公司 Data migration method, terminal device and computer-readable recording medium
CN107203564A (en) * 2016-03-18 2017-09-26 北京京东尚科信息技术有限公司 The method of data transfer, apparatus and system
CN107818106A (en) * 2016-09-13 2018-03-20 腾讯科技(深圳)有限公司 A kind of big data off-line calculation quality of data method of calibration and device
CN108874825A (en) * 2017-05-12 2018-11-23 北京京东尚科信息技术有限公司 A kind of method of calibration and device of abnormal data
CN109871410A (en) * 2019-02-14 2019-06-11 深圳市盟天科技有限公司 A kind of method, apparatus of data assembling storage, server, storage medium
CN111831755A (en) * 2020-07-23 2020-10-27 北京思特奇信息技术股份有限公司 Cross-database data synchronization method, system, medium and equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101621529A (en) * 2008-06-30 2010-01-06 上海全成通信技术有限公司 High-efficient and low-cost loading method for heterogeneous mass data
CN102375891A (en) * 2011-11-15 2012-03-14 山东浪潮金融信息系统有限公司 Implementation tool for unloading and loading incremental data
CN102999537A (en) * 2011-09-19 2013-03-27 阿里巴巴集团控股有限公司 System and method for data migration
CN103455526A (en) * 2012-06-05 2013-12-18 杭州勒卡斯广告策划有限公司 ETL (extract-transform-load) data processing method, device and system
US20140310231A1 (en) * 2013-04-16 2014-10-16 Cognizant Technology Solutions India Pvt. Ltd. System and method for automating data warehousing processes

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101621529A (en) * 2008-06-30 2010-01-06 上海全成通信技术有限公司 High-efficient and low-cost loading method for heterogeneous mass data
CN102999537A (en) * 2011-09-19 2013-03-27 阿里巴巴集团控股有限公司 System and method for data migration
CN102375891A (en) * 2011-11-15 2012-03-14 山东浪潮金融信息系统有限公司 Implementation tool for unloading and loading incremental data
CN103455526A (en) * 2012-06-05 2013-12-18 杭州勒卡斯广告策划有限公司 ETL (extract-transform-load) data processing method, device and system
US20140310231A1 (en) * 2013-04-16 2014-10-16 Cognizant Technology Solutions India Pvt. Ltd. System and method for automating data warehousing processes

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
黄觉明: "可扩展的ETL技术研究与工具设计", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105068805A (en) * 2015-08-07 2015-11-18 北京思特奇信息技术股份有限公司 Method and system for auditing data in data transplantation process
CN105068805B (en) * 2015-08-07 2018-09-11 北京思特奇信息技术股份有限公司 The method and system of data auditing during a kind of data migration
CN107203564A (en) * 2016-03-18 2017-09-26 北京京东尚科信息技术有限公司 The method of data transfer, apparatus and system
CN107818106A (en) * 2016-09-13 2018-03-20 腾讯科技(深圳)有限公司 A kind of big data off-line calculation quality of data method of calibration and device
CN107818106B (en) * 2016-09-13 2021-11-16 腾讯科技(深圳)有限公司 Big data offline calculation data quality verification method and device
CN107045610A (en) * 2017-05-08 2017-08-15 广东欧珀移动通信有限公司 Data migration method, terminal device and computer-readable recording medium
CN108874825A (en) * 2017-05-12 2018-11-23 北京京东尚科信息技术有限公司 A kind of method of calibration and device of abnormal data
CN109871410A (en) * 2019-02-14 2019-06-11 深圳市盟天科技有限公司 A kind of method, apparatus of data assembling storage, server, storage medium
CN111831755A (en) * 2020-07-23 2020-10-27 北京思特奇信息技术股份有限公司 Cross-database data synchronization method, system, medium and equipment
CN111831755B (en) * 2020-07-23 2024-01-16 北京思特奇信息技术股份有限公司 Cross-database data synchronization method, system, medium and device

Also Published As

Publication number Publication date
CN104462562B (en) 2018-05-18

Similar Documents

Publication Publication Date Title
CN104462562A (en) Data migration system and method based on data warehouse automation
CN107729366B (en) Universal multi-source heterogeneous large-scale data synchronization system
US10878248B2 (en) Media authentication using distributed ledger
CN101009516B (en) A method, system and device for data synchronization
WO2020019943A1 (en) Method and device for transmitting data, and method and apparatus for receiving data
CN104539690B (en) A kind of Server remote method of data synchronization detected based on feedback mechanism and MD5 codes
CN110502364B (en) Cross-cloud backup recovery method for big data sandbox cluster under OpenStack platform
CN104063353B (en) The method of synchronizing information between master-slave equipment
WO2016155492A1 (en) Remote data synchronization method and apparatus for database
CN102185841A (en) Classified data transmission method and system
CN103259797A (en) Data file transmission method and platform
CN109189749A (en) File synchronisation method and terminal device
CN104679596A (en) Message processing method and system for improving concurrence performance of server-side
CN110222117A (en) A kind of data conversion synchronous method, equipment and the storage medium of heterogeneous database
US10506392B1 (en) Stream-processing of telecommunication diameter event records
CN113326165A (en) Data processing method and device based on block chain and computer readable storage medium
CN101771718A (en) Clipboard synchronous method and system
CN104933059B (en) File prestige acquisition methods, gateway and file reputation server
CN110417892B (en) Message analysis-based data replication link optimization method and device
CN108614820A (en) The method and apparatus for realizing the parsing of streaming source data
CN103092932A (en) Distributed document transcoding system
US20100268784A1 (en) Data synchronization system and method
US20150088958A1 (en) Information Processing System and Distributed Processing Method
CN106557530B (en) Operation system, data recovery method and device
CN114785805A (en) Data transmission method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20200604

Address after: 250100, No. 2877, fairway, Sun Town, Ji'nan hi tech Zone, Shandong

Patentee after: Inspur Software Technology Co.,Ltd.

Address before: 250100 Ji'nan science and Technology Development Zone, Shandong Branch Road No. 2877

Patentee before: INSPUR GROUP Co.,Ltd.

TR01 Transfer of patent right