CN104462562B - Data migration system and method based on data warehouse automation - Google Patents
Data migration system and method based on data warehouse automation Download PDFInfo
- Publication number
- CN104462562B CN104462562B CN201410832607.6A CN201410832607A CN104462562B CN 104462562 B CN104462562 B CN 104462562B CN 201410832607 A CN201410832607 A CN 201410832607A CN 104462562 B CN104462562 B CN 104462562B
- Authority
- CN
- China
- Prior art keywords
- data
- file
- data file
- storage
- interface unit
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 11
- 230000005012 migration Effects 0.000 title claims abstract description 11
- 238000013508 migration Methods 0.000 title claims abstract description 11
- 230000005540 biological transmission Effects 0.000 claims description 33
- 238000013515 script Methods 0.000 claims description 13
- 238000012795 verification Methods 0.000 claims description 6
- 238000010586 diagram Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 3
- 230000001419 dependent effect Effects 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000002035 prolonged effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000001960 triggered effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/18—File system types
- G06F16/185—Hierarchical storage management [HSM] systems, e.g. file migration or policies thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/602—Providing cryptographic facilities or services
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L67/00—Network arrangements or protocols for supporting network services or applications
- H04L67/01—Protocols
- H04L67/06—Protocols specially adapted for file transfer, e.g. file transfer protocol [FTP]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2221/00—Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/21—Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/2107—File encryption
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Bioethics (AREA)
- Databases & Information Systems (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Computer Hardware Design (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Storage Device Security (AREA)
- Computer And Data Communications (AREA)
Abstract
the invention relates to a data migration system and a method based on data warehouse automation, which comprises a trigger module, an interface unit, a warehousing module and a server module, wherein the trigger unit is connected with the interface unit, the interface unit is connected with the warehousing unit, and the trigger module, the interface unit and the warehousing module are all connected with the server module.
Description
Technical field
It is particularly a kind of that mainstream network transport protocol (ftp) is used to transmit number the present invention relates to processing technology field is counted
According to the data mover system and method for file.
Background technology
With the arrival in big data epoch, the rapid development of data warehouse technology, people more and more will recognize to count
According to the importance to enterprise.And the data interaction demand between system is also more and more.Data Migration between database is always
Since be all a problem.Traditional data transference package it is excessive dependent on database, such as:Number is got through by dblink modes
It is connected according to storehouse.Shortcoming is as follows:1st, it is undue dependent on database, if disconnecting cannot recover the friendship of 2, prolonged data automatically
The load of database is mutually added, database performance 3 is influenced, needs manual intervention, heavy workload.4th, the degree of coupling between database
Too high, security cannot ensure.
The content of the invention
It is an object of the invention to provide a kind of whole without manual intervention, really realize safe, reliable and effective automatic
Change data mover system and method.
In order to achieve the above objectives, the present invention adopts the following technical scheme that:
It is a kind of based on data warehouse automation data mover system, including trigger module, interface unit, enter library module,
Server module, wherein trigger element are connected with each other with interface unit, and interface unit is connected with each other with storage unit, trigger mode
Block, interface unit, enter library module with server module be connected with each other.
In one of the embodiments, the trigger module is come trigger data by ETL timed task automated executions
Migration.
In one of the embodiments, the interface unit includes data file and control file.
In one of the embodiments, it is described enter library module include storage daily record and data quality report.
In one of the embodiments, the server module includes interface server, destination server, ftp server.
In one of the embodiments, the data file is transmitted using bundling, and transmission mode is transmitted for FTP, transmits mould
Formula includes parallel transmission or serial transmission.
In one of the embodiments, the control file is transmitted by FTP transmission modes, and the control file uses
Md5 encryption, has identification code, and the identification code includes record strip number and cipher-text information.
Another technical solution of the present invention is:
A kind of data migration method, comprises the following steps:
A:Data in interface table are generated data file by data source ETL tasks, and interface table generation data file is believed
Breath carries out Md5 encryptions, generation control file;
B:Group of data files and control file are uploaded to ftp server as an interface unit;
C:Destination host ETL tasks tune plays shell scripts, detects whether interface unit control file in ftp server is deposited
Showing that data file has all received in interface unit if existing, the data file information received is subjected to Md5 solutions
It is close, it is compared with information in control file, avoids missing documents caused by packet loss in network transmission process, ensure to receive file
Integrality, and then ensure data integrality;
D:Data file verification carries out in-stockroom operation after passing through, and ETL tasks are successively put in storage data file, after the completion of storage
Calling checks script, carries out business datum quality and checks.
In step A:Data in interface table are generated data file by data source ETL tasks, and interface table is generated data
Fileinfo carries out Md5 encryptions, and generation control file includes:
A1:Data source end data starts to transmit;
A2:Data file is generated in interface unit, data file forms the data file of bundling form according to its size;
A3:Control file is generated in interface unit, md5 encryption is carried out to control file, gives control file mark identification code
To ensure the integrality of data file, identification code includes the item number and cipher-text information of data file;
A4:The data file of generation, control file are uploaded using FTP transmission modes, can be adopted in transmission process
With serial transmission and parallel transmission;
A5:Interface unit sends mail notification to library module is entered during transmission;
A6:Complete transmission.
The step D in one of the embodiments:Data file verification carries out in-stockroom operation after passing through, ETL tasks according to
It is secondary to be put in storage data file, called after the completion of storage and check script, carry out business datum quality check including:
D1:Enter library module and start to take over data file storage;
D2:Storage module check control file whether there is, if there is being put into next step, if there is no then continuing
It waits;
D3:Control file is compared so that whether checking file meets the requirements in library module is entered, it will if do not met
It asks and just abandons being put in storage, be put into next step if met the requirements;
D4:Enter library module to be put in storage data file, and generate storage daily record;
D5:Check that data file script forms quality of data report;
D6:Mail notification is sent to interface module;
D7:Complete data loading.
The present invention realizes whole process without manual intervention, is truly realized safe, reliable and effective automation data migration
Scheme.And be transmitted by exporting data file, do not depend on certain database platform, different types of data can be compatible with
Data Migration between storehouse.It can be applied to the scene of any required Data Migration, be a kind of preferable automation data migration
Solution.
Description of the drawings
Fig. 1 is the system composition figure of the present invention.
Fig. 2 is interface unit schematic diagram in the present invention.
Fig. 3 is the workflow schematic diagram of the present invention.
Fig. 4 is data source end data file generated flow chart of the present invention.
Fig. 5 is target database storage flow diagram of the present invention.
Specific embodiment
The specific embodiment of the present invention is illustrated with reference to Figure of description.
Fig. 1 is the system composition figure of the present invention, as shown in Figure 1, a kind of data mover system of the present invention, including trigger mode
Block, interface unit enter library module, server module, and wherein trigger module is connected with each other with interface unit, and interface unit is with storage
Module be connected with each other, trigger module, interface unit, enter library module with server module be connected with each other.
The trigger module is to be migrated by ETL timed tasks automated execution come trigger data.
It is described enter library module include storage daily record and data quality report.
The server module includes interface server, destination server, ftp server.
The control file is transmitted by FTP transmission modes, and the control file has identification code, institute using md5 encryption
Stating identification code includes record strip number and cipher-text information.
Fig. 2 is interface unit schematic diagram in the present invention, and the interface unit includes data file and control as can be seen from Figure 2
File processed.
The data file is transmitted using bundling, and transmission mode is transmitted for FTP, and transmission mode is including parallel transmission and serially
Transmission.
Fig. 3 is the workflow schematic diagram of the present invention.Fig. 3 discloses a kind of number based on data warehouse automation of the present invention
According to moving method, this method comprises the following steps:
A:Data in interface table are generated data file by data source ETL tasks, and interface table generation data file is believed
Breath carries out Md5 encryptions, generation control file;
B:Group of data files and control file are uploaded to ftp server as an interface unit;
C:Destination host ETL tasks tune plays shell scripts, detects whether interface unit control file in ftp server is deposited
Showing that data file has all received in interface unit if existing, the data file information received is subjected to Md5 solutions
It is close, it is compared with information in control file, avoids missing documents caused by packet loss in network transmission process, ensure to receive file
Integrality, and then ensure data integrality;
D:Data file verification carries out in-stockroom operation after passing through, and ETL tasks are successively put in storage data file, after the completion of storage
Calling checks script, carries out business datum quality and checks.
The present invention is excessive to avoid export data file, is unfavorable for transmitting and manage, to single file maximum data item number
It is limited, multiple data files can be generated according to data volume in the limitation of data file maximum item number and actual table.Export data
The mode of file bundling, is more advantageous to network transmission.It can sufficiently utilize network bandwidth with parallel transmission, while also subtract
It is small since file corruption caused by packet loss need to retransmit the network load of file.
The information such as export number, file size are combined into character string by the present invention in order, and are carried out Md5 to it and added
It is close, generate ciphertext.Data summary journal item number, cipher-text information generation control file will be exported, as identification code.Data file and control
File processed forms an interface unit.
Fig. 4 is data source end data file generated flow chart of the present invention.Corresponding to the step A in Fig. 3, step A is further
Including including:
A1:Data source end data starts to transmit;
A2:Data file is generated in interface unit, data file forms the data file of bundling form according to its size;
A3:Control file is generated in interface unit, md5 encryption is carried out to control file, gives control file mark identification code
To ensure the integrality of data file, identification code includes the item number and cipher-text information of data file;
A4:The data file of generation, control file are uploaded using FTP transmission modes, can be adopted in transmission process
With serial transmission and parallel transmission;
A5:Interface unit sends mail notification to library module is entered during transmission;
A6:Complete transmission.
Fig. 5 is target database storage flow diagram of the present invention.Corresponding to the step D in Fig. 3:Data file verification is logical
Later in-stockroom operation is carried out, ETL tasks are successively put in storage data file, are called after the completion of storage and check script, carry out business number
It checks, further comprises according to quality:
D1:Enter library module and start to take over data file storage;
D2:Storage module check control file whether there is, if there is being put into next step, if there is no then continuing
It waits;
D3:Control file is compared so that whether checking file meets the requirements in library module is entered, it will if do not met
It asks and just abandons being put in storage, be put into next step if met the requirements;
D4:Enter library module to be put in storage data file, and generate storage daily record;
D5:Check that data file script forms quality of data report;
D6:Mail notification is sent to interface module;
D7:Complete data loading.
The present invention is triggered by ETL task automations, and interface data is exported as data file, and is believed according to data file
Breath generation control file, target file server, destination host task scan file, by interface group data are transferred to by ftp
Fileinfo generates encryption information according to same algorithm, and is compared with information in control file, avoids network transmission mistake
Missing documents caused by packet loss in journey ensure to receive the integrality of file, and then ensure the integrality of data.Data file verifies
It calls after and is put in storage into library, called automatically after storage and check that script carries out business datum quality and checks, and generated
The quality of data is reported.
In conclusion the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although with reference to upper
Embodiment is stated the present invention is described in detail, it will be understood by those of ordinary skill in the art that:It still can be to upper
The technical solution recorded in each embodiment is stated to modify or carry out equivalent substitution to which part technical characteristic;And these
Modification is replaced, and the essence of appropriate technical solution is not made to depart from the spirit and scope of various embodiments of the present invention technical solution.
Claims (2)
1. a kind of data migration method based on data warehouse automation, it is characterised in that comprise the following steps:
A:Data source ETL tasks by interface table data generate data file, and by interface table generate data file information into
Row Md5 is encrypted, generation control file;
B:Group of data files and control file are uploaded to ftp server as an interface unit;
C:Destination host ETL tasks tune plays shell scripts, detects interface unit control file in ftp server and whether there is, such as
In the presence of then showing that data file has all received in interface unit, the data file information received is subjected to Md5 decryption, with control
Information is compared in file processed, avoids missing documents caused by packet loss in network transmission process, ensures to receive the complete of file
Property, and then ensure the integrality of data;
D:Data file verification carries out in-stockroom operation after passing through, and ETL tasks are successively put in storage data file, is called after the completion of storage
It checks script, carries out business datum quality and check;
Data in interface table are generated data file by the data source ETL tasks, and interface table is generated data file information
The step of carrying out Md5 encryptions, generating control file includes:
A1:Data source end data starts to transmit;
A2:Data file is generated in interface unit, data file forms the data file of bundling form according to its size;
A3:Control file is generated in interface unit, md5 encryption is carried out to control file, to control file mark identification code to protect
The integrality of data file is demonstrate,proved, identification code includes the item number and cipher-text information of data file;
A4:The data file of generation, control file are uploaded using FTP transmission modes, string may be employed in transmission process
Row transmission or parallel transmission;
A5:Interface unit sends mail notification to library module is entered during transmission;
A6:Complete transmission.
2. the data migration method according to claim 1 based on data warehouse automation, it is characterised in that:The data text
Part verification carries out in-stockroom operation after passing through, and ETL tasks are successively put in storage data file, is called after the completion of storage and checks script, into
The step of industry business quality of data is checked includes:
D1:Enter library module and start to take over data file storage;
D2:Storage module check control file whether there is, if there is being put into next step, if there is no then continuing
It treats;
D3:Control file is compared so that whether checking file meets the requirements in library module is entered, if undesirable
It abandons being put in storage, be put into next step if met the requirements;
D4:Enter library module to be put in storage data file, and generate storage daily record;
D5:Check that data file script forms quality of data report;
D6:Mail notification is sent to interface module;
D7:Complete data loading.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410832607.6A CN104462562B (en) | 2014-12-29 | 2014-12-29 | Data migration system and method based on data warehouse automation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410832607.6A CN104462562B (en) | 2014-12-29 | 2014-12-29 | Data migration system and method based on data warehouse automation |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104462562A CN104462562A (en) | 2015-03-25 |
CN104462562B true CN104462562B (en) | 2018-05-18 |
Family
ID=52908597
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410832607.6A Active CN104462562B (en) | 2014-12-29 | 2014-12-29 | Data migration system and method based on data warehouse automation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104462562B (en) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105068805B (en) * | 2015-08-07 | 2018-09-11 | 北京思特奇信息技术股份有限公司 | The method and system of data auditing during a kind of data migration |
CN107203564B (en) * | 2016-03-18 | 2020-11-24 | 北京京东尚科信息技术有限公司 | Data transmission method, device and system |
CN107818106B (en) * | 2016-09-13 | 2021-11-16 | 腾讯科技(深圳)有限公司 | Big data offline calculation data quality verification method and device |
CN107045610B (en) * | 2017-05-08 | 2020-06-12 | Oppo广东移动通信有限公司 | Data migration method, terminal device and computer readable storage medium |
CN108874825B (en) * | 2017-05-12 | 2021-11-02 | 北京京东尚科信息技术有限公司 | Abnormal data verification method and device |
CN109871410A (en) * | 2019-02-14 | 2019-06-11 | 深圳市盟天科技有限公司 | A kind of method, apparatus of data assembling storage, server, storage medium |
CN111831755B (en) * | 2020-07-23 | 2024-01-16 | 北京思特奇信息技术股份有限公司 | Cross-database data synchronization method, system, medium and device |
CN114049214A (en) * | 2021-11-15 | 2022-02-15 | 深圳前海鸿泰源兴科技发展有限公司 | Big data information acquisition and processing system and operation method |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101621529A (en) * | 2008-06-30 | 2010-01-06 | 上海全成通信技术有限公司 | High-efficient and low-cost loading method for heterogeneous mass data |
CN102375891A (en) * | 2011-11-15 | 2012-03-14 | 山东浪潮金融信息系统有限公司 | Implementation tool for unloading and loading incremental data |
CN102999537A (en) * | 2011-09-19 | 2013-03-27 | 阿里巴巴集团控股有限公司 | System and method for data migration |
CN103455526A (en) * | 2012-06-05 | 2013-12-18 | 杭州勒卡斯广告策划有限公司 | ETL (extract-transform-load) data processing method, device and system |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9519695B2 (en) * | 2013-04-16 | 2016-12-13 | Cognizant Technology Solutions India Pvt. Ltd. | System and method for automating data warehousing processes |
-
2014
- 2014-12-29 CN CN201410832607.6A patent/CN104462562B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101621529A (en) * | 2008-06-30 | 2010-01-06 | 上海全成通信技术有限公司 | High-efficient and low-cost loading method for heterogeneous mass data |
CN102999537A (en) * | 2011-09-19 | 2013-03-27 | 阿里巴巴集团控股有限公司 | System and method for data migration |
CN102375891A (en) * | 2011-11-15 | 2012-03-14 | 山东浪潮金融信息系统有限公司 | Implementation tool for unloading and loading incremental data |
CN103455526A (en) * | 2012-06-05 | 2013-12-18 | 杭州勒卡斯广告策划有限公司 | ETL (extract-transform-load) data processing method, device and system |
Non-Patent Citations (1)
Title |
---|
可扩展的ETL技术研究与工具设计;黄觉明;《中国优秀硕士学位论文全文数据库 信息科技辑》;20110315;第2011年卷(第03期);正文第18,36-39,53页,图4-1 * |
Also Published As
Publication number | Publication date |
---|---|
CN104462562A (en) | 2015-03-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104462562B (en) | Data migration system and method based on data warehouse automation | |
US11108863B2 (en) | Tag operating system | |
CN105049425B (en) | A kind of physical isolation transmission method based on Quick Response Code | |
CN110177154B (en) | File interaction processing method, device and system | |
CN104580158B (en) | A kind of distributed platform file and content distribution method and system | |
CN109067733B (en) | Method and apparatus for transmitting data, and method and apparatus for receiving data | |
CN103259797B (en) | data file transmission method and platform | |
CN101009516A (en) | A method and system for data synchronization | |
EP3740924A1 (en) | Methods, application server, block chain node and media for logistics tracking and source tracing | |
WO2016155492A1 (en) | Remote data synchronization method and apparatus for database | |
CN104618432B (en) | A kind of processing method and processing system that event sends and receives | |
US20150373122A1 (en) | Data processing method and apparatus | |
US20220353248A1 (en) | Identifying and Securing Unencrypted Data in a Production Environment | |
RU2013108211A (en) | METHOD FOR PREVENTING RE-USE OF DIGITAL DATA PACKAGES IN A NETWORK DATA TRANSFER SYSTEM | |
CN107147613B (en) | Manufacturing physical connection real-time data transmission method | |
CN106033438A (en) | Public sentiment data storage method and server | |
CN109921919A (en) | Data exchange system and method | |
CN109607341B (en) | Elevator running information management system based on block chain | |
CN107272669A (en) | A kind of airborne Fault Management System | |
CN102404156B (en) | Data transmission method and device based on aggregation link | |
CN104038314B (en) | A kind of new safety supervision networking dynamic data RTTS and method | |
US20150180942A1 (en) | Message-oriented middleware | |
EP3668106A1 (en) | Method and system for service provisioning to an optical network terminal | |
CN105429779B (en) | A kind of network service data automatic identification system and method | |
US20100268784A1 (en) | Data synchronization system and method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20200604 Address after: 250100, No. 2877, fairway, Sun Town, Ji'nan hi tech Zone, Shandong Patentee after: Inspur Software Technology Co.,Ltd. Address before: 250100 Ji'nan science and Technology Development Zone, Shandong Branch Road No. 2877 Patentee before: INSPUR GROUP Co.,Ltd. |
|
TR01 | Transfer of patent right |