CN104462562B - Data migration system and method based on data warehouse automation - Google Patents

Data migration system and method based on data warehouse automation Download PDF

Info

Publication number
CN104462562B
CN104462562B CN201410832607.6A CN201410832607A CN104462562B CN 104462562 B CN104462562 B CN 104462562B CN 201410832607 A CN201410832607 A CN 201410832607A CN 104462562 B CN104462562 B CN 104462562B
Authority
CN
China
Prior art keywords
data
file
data file
storage
interface unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410832607.6A
Other languages
Chinese (zh)
Other versions
CN104462562A (en
Inventor
郭凤
杨培强
王永军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Software Technology Co Ltd
Original Assignee
Inspur Software Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Software Group Co Ltd filed Critical Inspur Software Group Co Ltd
Priority to CN201410832607.6A priority Critical patent/CN104462562B/en
Publication of CN104462562A publication Critical patent/CN104462562A/en
Application granted granted Critical
Publication of CN104462562B publication Critical patent/CN104462562B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/185Hierarchical storage management [HSM] systems, e.g. file migration or policies thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/06Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2107File encryption

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioethics (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Storage Device Security (AREA)
  • Computer And Data Communications (AREA)

Abstract

the invention relates to a data migration system and a method based on data warehouse automation, which comprises a trigger module, an interface unit, a warehousing module and a server module, wherein the trigger unit is connected with the interface unit, the interface unit is connected with the warehousing unit, and the trigger module, the interface unit and the warehousing module are all connected with the server module.

Description

A kind of data mover system and method based on data warehouse automation
Technical field
It is particularly a kind of that mainstream network transport protocol (ftp) is used to transmit number the present invention relates to processing technology field is counted According to the data mover system and method for file.
Background technology
With the arrival in big data epoch, the rapid development of data warehouse technology, people more and more will recognize to count According to the importance to enterprise.And the data interaction demand between system is also more and more.Data Migration between database is always Since be all a problem.Traditional data transference package it is excessive dependent on database, such as:Number is got through by dblink modes It is connected according to storehouse.Shortcoming is as follows:1st, it is undue dependent on database, if disconnecting cannot recover the friendship of 2, prolonged data automatically The load of database is mutually added, database performance 3 is influenced, needs manual intervention, heavy workload.4th, the degree of coupling between database Too high, security cannot ensure.
The content of the invention
It is an object of the invention to provide a kind of whole without manual intervention, really realize safe, reliable and effective automatic Change data mover system and method.
In order to achieve the above objectives, the present invention adopts the following technical scheme that:
It is a kind of based on data warehouse automation data mover system, including trigger module, interface unit, enter library module, Server module, wherein trigger element are connected with each other with interface unit, and interface unit is connected with each other with storage unit, trigger mode Block, interface unit, enter library module with server module be connected with each other.
In one of the embodiments, the trigger module is come trigger data by ETL timed task automated executions Migration.
In one of the embodiments, the interface unit includes data file and control file.
In one of the embodiments, it is described enter library module include storage daily record and data quality report.
In one of the embodiments, the server module includes interface server, destination server, ftp server.
In one of the embodiments, the data file is transmitted using bundling, and transmission mode is transmitted for FTP, transmits mould Formula includes parallel transmission or serial transmission.
In one of the embodiments, the control file is transmitted by FTP transmission modes, and the control file uses Md5 encryption, has identification code, and the identification code includes record strip number and cipher-text information.
Another technical solution of the present invention is:
A kind of data migration method, comprises the following steps:
A:Data in interface table are generated data file by data source ETL tasks, and interface table generation data file is believed Breath carries out Md5 encryptions, generation control file;
B:Group of data files and control file are uploaded to ftp server as an interface unit;
C:Destination host ETL tasks tune plays shell scripts, detects whether interface unit control file in ftp server is deposited Showing that data file has all received in interface unit if existing, the data file information received is subjected to Md5 solutions It is close, it is compared with information in control file, avoids missing documents caused by packet loss in network transmission process, ensure to receive file Integrality, and then ensure data integrality;
D:Data file verification carries out in-stockroom operation after passing through, and ETL tasks are successively put in storage data file, after the completion of storage Calling checks script, carries out business datum quality and checks.
In step A:Data in interface table are generated data file by data source ETL tasks, and interface table is generated data Fileinfo carries out Md5 encryptions, and generation control file includes:
A1:Data source end data starts to transmit;
A2:Data file is generated in interface unit, data file forms the data file of bundling form according to its size;
A3:Control file is generated in interface unit, md5 encryption is carried out to control file, gives control file mark identification code To ensure the integrality of data file, identification code includes the item number and cipher-text information of data file;
A4:The data file of generation, control file are uploaded using FTP transmission modes, can be adopted in transmission process With serial transmission and parallel transmission;
A5:Interface unit sends mail notification to library module is entered during transmission;
A6:Complete transmission.
The step D in one of the embodiments:Data file verification carries out in-stockroom operation after passing through, ETL tasks according to It is secondary to be put in storage data file, called after the completion of storage and check script, carry out business datum quality check including:
D1:Enter library module and start to take over data file storage;
D2:Storage module check control file whether there is, if there is being put into next step, if there is no then continuing It waits;
D3:Control file is compared so that whether checking file meets the requirements in library module is entered, it will if do not met It asks and just abandons being put in storage, be put into next step if met the requirements;
D4:Enter library module to be put in storage data file, and generate storage daily record;
D5:Check that data file script forms quality of data report;
D6:Mail notification is sent to interface module;
D7:Complete data loading.
The present invention realizes whole process without manual intervention, is truly realized safe, reliable and effective automation data migration Scheme.And be transmitted by exporting data file, do not depend on certain database platform, different types of data can be compatible with Data Migration between storehouse.It can be applied to the scene of any required Data Migration, be a kind of preferable automation data migration Solution.
Description of the drawings
Fig. 1 is the system composition figure of the present invention.
Fig. 2 is interface unit schematic diagram in the present invention.
Fig. 3 is the workflow schematic diagram of the present invention.
Fig. 4 is data source end data file generated flow chart of the present invention.
Fig. 5 is target database storage flow diagram of the present invention.
Specific embodiment
The specific embodiment of the present invention is illustrated with reference to Figure of description.
Fig. 1 is the system composition figure of the present invention, as shown in Figure 1, a kind of data mover system of the present invention, including trigger mode Block, interface unit enter library module, server module, and wherein trigger module is connected with each other with interface unit, and interface unit is with storage Module be connected with each other, trigger module, interface unit, enter library module with server module be connected with each other.
The trigger module is to be migrated by ETL timed tasks automated execution come trigger data.
It is described enter library module include storage daily record and data quality report.
The server module includes interface server, destination server, ftp server.
The control file is transmitted by FTP transmission modes, and the control file has identification code, institute using md5 encryption Stating identification code includes record strip number and cipher-text information.
Fig. 2 is interface unit schematic diagram in the present invention, and the interface unit includes data file and control as can be seen from Figure 2 File processed.
The data file is transmitted using bundling, and transmission mode is transmitted for FTP, and transmission mode is including parallel transmission and serially Transmission.
Fig. 3 is the workflow schematic diagram of the present invention.Fig. 3 discloses a kind of number based on data warehouse automation of the present invention According to moving method, this method comprises the following steps:
A:Data in interface table are generated data file by data source ETL tasks, and interface table generation data file is believed Breath carries out Md5 encryptions, generation control file;
B:Group of data files and control file are uploaded to ftp server as an interface unit;
C:Destination host ETL tasks tune plays shell scripts, detects whether interface unit control file in ftp server is deposited Showing that data file has all received in interface unit if existing, the data file information received is subjected to Md5 solutions It is close, it is compared with information in control file, avoids missing documents caused by packet loss in network transmission process, ensure to receive file Integrality, and then ensure data integrality;
D:Data file verification carries out in-stockroom operation after passing through, and ETL tasks are successively put in storage data file, after the completion of storage Calling checks script, carries out business datum quality and checks.
The present invention is excessive to avoid export data file, is unfavorable for transmitting and manage, to single file maximum data item number It is limited, multiple data files can be generated according to data volume in the limitation of data file maximum item number and actual table.Export data The mode of file bundling, is more advantageous to network transmission.It can sufficiently utilize network bandwidth with parallel transmission, while also subtract It is small since file corruption caused by packet loss need to retransmit the network load of file.
The information such as export number, file size are combined into character string by the present invention in order, and are carried out Md5 to it and added It is close, generate ciphertext.Data summary journal item number, cipher-text information generation control file will be exported, as identification code.Data file and control File processed forms an interface unit.
Fig. 4 is data source end data file generated flow chart of the present invention.Corresponding to the step A in Fig. 3, step A is further Including including:
A1:Data source end data starts to transmit;
A2:Data file is generated in interface unit, data file forms the data file of bundling form according to its size;
A3:Control file is generated in interface unit, md5 encryption is carried out to control file, gives control file mark identification code To ensure the integrality of data file, identification code includes the item number and cipher-text information of data file;
A4:The data file of generation, control file are uploaded using FTP transmission modes, can be adopted in transmission process With serial transmission and parallel transmission;
A5:Interface unit sends mail notification to library module is entered during transmission;
A6:Complete transmission.
Fig. 5 is target database storage flow diagram of the present invention.Corresponding to the step D in Fig. 3:Data file verification is logical Later in-stockroom operation is carried out, ETL tasks are successively put in storage data file, are called after the completion of storage and check script, carry out business number It checks, further comprises according to quality:
D1:Enter library module and start to take over data file storage;
D2:Storage module check control file whether there is, if there is being put into next step, if there is no then continuing It waits;
D3:Control file is compared so that whether checking file meets the requirements in library module is entered, it will if do not met It asks and just abandons being put in storage, be put into next step if met the requirements;
D4:Enter library module to be put in storage data file, and generate storage daily record;
D5:Check that data file script forms quality of data report;
D6:Mail notification is sent to interface module;
D7:Complete data loading.
The present invention is triggered by ETL task automations, and interface data is exported as data file, and is believed according to data file Breath generation control file, target file server, destination host task scan file, by interface group data are transferred to by ftp Fileinfo generates encryption information according to same algorithm, and is compared with information in control file, avoids network transmission mistake Missing documents caused by packet loss in journey ensure to receive the integrality of file, and then ensure the integrality of data.Data file verifies It calls after and is put in storage into library, called automatically after storage and check that script carries out business datum quality and checks, and generated The quality of data is reported.
In conclusion the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although with reference to upper Embodiment is stated the present invention is described in detail, it will be understood by those of ordinary skill in the art that:It still can be to upper The technical solution recorded in each embodiment is stated to modify or carry out equivalent substitution to which part technical characteristic;And these Modification is replaced, and the essence of appropriate technical solution is not made to depart from the spirit and scope of various embodiments of the present invention technical solution.

Claims (2)

1. a kind of data migration method based on data warehouse automation, it is characterised in that comprise the following steps:
A:Data source ETL tasks by interface table data generate data file, and by interface table generate data file information into Row Md5 is encrypted, generation control file;
B:Group of data files and control file are uploaded to ftp server as an interface unit;
C:Destination host ETL tasks tune plays shell scripts, detects interface unit control file in ftp server and whether there is, such as In the presence of then showing that data file has all received in interface unit, the data file information received is subjected to Md5 decryption, with control Information is compared in file processed, avoids missing documents caused by packet loss in network transmission process, ensures to receive the complete of file Property, and then ensure the integrality of data;
D:Data file verification carries out in-stockroom operation after passing through, and ETL tasks are successively put in storage data file, is called after the completion of storage It checks script, carries out business datum quality and check;
Data in interface table are generated data file by the data source ETL tasks, and interface table is generated data file information The step of carrying out Md5 encryptions, generating control file includes:
A1:Data source end data starts to transmit;
A2:Data file is generated in interface unit, data file forms the data file of bundling form according to its size;
A3:Control file is generated in interface unit, md5 encryption is carried out to control file, to control file mark identification code to protect The integrality of data file is demonstrate,proved, identification code includes the item number and cipher-text information of data file;
A4:The data file of generation, control file are uploaded using FTP transmission modes, string may be employed in transmission process Row transmission or parallel transmission;
A5:Interface unit sends mail notification to library module is entered during transmission;
A6:Complete transmission.
2. the data migration method according to claim 1 based on data warehouse automation, it is characterised in that:The data text Part verification carries out in-stockroom operation after passing through, and ETL tasks are successively put in storage data file, is called after the completion of storage and checks script, into The step of industry business quality of data is checked includes:
D1:Enter library module and start to take over data file storage;
D2:Storage module check control file whether there is, if there is being put into next step, if there is no then continuing It treats;
D3:Control file is compared so that whether checking file meets the requirements in library module is entered, if undesirable It abandons being put in storage, be put into next step if met the requirements;
D4:Enter library module to be put in storage data file, and generate storage daily record;
D5:Check that data file script forms quality of data report;
D6:Mail notification is sent to interface module;
D7:Complete data loading.
CN201410832607.6A 2014-12-29 2014-12-29 Data migration system and method based on data warehouse automation Active CN104462562B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410832607.6A CN104462562B (en) 2014-12-29 2014-12-29 Data migration system and method based on data warehouse automation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410832607.6A CN104462562B (en) 2014-12-29 2014-12-29 Data migration system and method based on data warehouse automation

Publications (2)

Publication Number Publication Date
CN104462562A CN104462562A (en) 2015-03-25
CN104462562B true CN104462562B (en) 2018-05-18

Family

ID=52908597

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410832607.6A Active CN104462562B (en) 2014-12-29 2014-12-29 Data migration system and method based on data warehouse automation

Country Status (1)

Country Link
CN (1) CN104462562B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105068805B (en) * 2015-08-07 2018-09-11 北京思特奇信息技术股份有限公司 The method and system of data auditing during a kind of data migration
CN107203564B (en) * 2016-03-18 2020-11-24 北京京东尚科信息技术有限公司 Data transmission method, device and system
CN107818106B (en) * 2016-09-13 2021-11-16 腾讯科技(深圳)有限公司 Big data offline calculation data quality verification method and device
CN107045610B (en) * 2017-05-08 2020-06-12 Oppo广东移动通信有限公司 Data migration method, terminal device and computer readable storage medium
CN108874825B (en) * 2017-05-12 2021-11-02 北京京东尚科信息技术有限公司 Abnormal data verification method and device
CN109871410A (en) * 2019-02-14 2019-06-11 深圳市盟天科技有限公司 A kind of method, apparatus of data assembling storage, server, storage medium
CN111831755B (en) * 2020-07-23 2024-01-16 北京思特奇信息技术股份有限公司 Cross-database data synchronization method, system, medium and device
CN114049214A (en) * 2021-11-15 2022-02-15 深圳前海鸿泰源兴科技发展有限公司 Big data information acquisition and processing system and operation method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101621529A (en) * 2008-06-30 2010-01-06 上海全成通信技术有限公司 High-efficient and low-cost loading method for heterogeneous mass data
CN102375891A (en) * 2011-11-15 2012-03-14 山东浪潮金融信息系统有限公司 Implementation tool for unloading and loading incremental data
CN102999537A (en) * 2011-09-19 2013-03-27 阿里巴巴集团控股有限公司 System and method for data migration
CN103455526A (en) * 2012-06-05 2013-12-18 杭州勒卡斯广告策划有限公司 ETL (extract-transform-load) data processing method, device and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9519695B2 (en) * 2013-04-16 2016-12-13 Cognizant Technology Solutions India Pvt. Ltd. System and method for automating data warehousing processes

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101621529A (en) * 2008-06-30 2010-01-06 上海全成通信技术有限公司 High-efficient and low-cost loading method for heterogeneous mass data
CN102999537A (en) * 2011-09-19 2013-03-27 阿里巴巴集团控股有限公司 System and method for data migration
CN102375891A (en) * 2011-11-15 2012-03-14 山东浪潮金融信息系统有限公司 Implementation tool for unloading and loading incremental data
CN103455526A (en) * 2012-06-05 2013-12-18 杭州勒卡斯广告策划有限公司 ETL (extract-transform-load) data processing method, device and system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
可扩展的ETL技术研究与工具设计;黄觉明;《中国优秀硕士学位论文全文数据库 信息科技辑》;20110315;第2011年卷(第03期);正文第18,36-39,53页,图4-1 *

Also Published As

Publication number Publication date
CN104462562A (en) 2015-03-25

Similar Documents

Publication Publication Date Title
CN104462562B (en) Data migration system and method based on data warehouse automation
US11108863B2 (en) Tag operating system
CN105049425B (en) A kind of physical isolation transmission method based on Quick Response Code
CN110177154B (en) File interaction processing method, device and system
CN104580158B (en) A kind of distributed platform file and content distribution method and system
CN109067733B (en) Method and apparatus for transmitting data, and method and apparatus for receiving data
CN103259797B (en) data file transmission method and platform
CN101009516A (en) A method and system for data synchronization
EP3740924A1 (en) Methods, application server, block chain node and media for logistics tracking and source tracing
WO2016155492A1 (en) Remote data synchronization method and apparatus for database
CN104618432B (en) A kind of processing method and processing system that event sends and receives
US20150373122A1 (en) Data processing method and apparatus
US20220353248A1 (en) Identifying and Securing Unencrypted Data in a Production Environment
RU2013108211A (en) METHOD FOR PREVENTING RE-USE OF DIGITAL DATA PACKAGES IN A NETWORK DATA TRANSFER SYSTEM
CN107147613B (en) Manufacturing physical connection real-time data transmission method
CN106033438A (en) Public sentiment data storage method and server
CN109921919A (en) Data exchange system and method
CN109607341B (en) Elevator running information management system based on block chain
CN107272669A (en) A kind of airborne Fault Management System
CN102404156B (en) Data transmission method and device based on aggregation link
CN104038314B (en) A kind of new safety supervision networking dynamic data RTTS and method
US20150180942A1 (en) Message-oriented middleware
EP3668106A1 (en) Method and system for service provisioning to an optical network terminal
CN105429779B (en) A kind of network service data automatic identification system and method
US20100268784A1 (en) Data synchronization system and method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20200604

Address after: 250100, No. 2877, fairway, Sun Town, Ji'nan hi tech Zone, Shandong

Patentee after: Inspur Software Technology Co.,Ltd.

Address before: 250100 Ji'nan science and Technology Development Zone, Shandong Branch Road No. 2877

Patentee before: INSPUR GROUP Co.,Ltd.

TR01 Transfer of patent right