CN104407946A - Method for conveniently backing up mail to HDFS - Google Patents

Method for conveniently backing up mail to HDFS Download PDF

Info

Publication number
CN104407946A
CN104407946A CN201410845600.8A CN201410845600A CN104407946A CN 104407946 A CN104407946 A CN 104407946A CN 201410845600 A CN201410845600 A CN 201410845600A CN 104407946 A CN104407946 A CN 104407946A
Authority
CN
China
Prior art keywords
hdfs
content
vfs
timer
thread
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410845600.8A
Other languages
Chinese (zh)
Inventor
李占强
辛国茂
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Electronic Information Industry Co Ltd
Original Assignee
Inspur Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Electronic Information Industry Co Ltd filed Critical Inspur Electronic Information Industry Co Ltd
Priority to CN201410845600.8A priority Critical patent/CN104407946A/en
Publication of CN104407946A publication Critical patent/CN104407946A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Transfer Between Computers (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method for conveniently backing up mail to an HDFS, belonging to the field of big data management. The method comprises the following specific steps that (1) a timer is set in the userspace of a local system and is used for periodically checking a user email, and handing a request path to a VFS (Virtual File System) module by accessing the local mount point of the email, and the VFS module hands the request path to a fuse module to realize the reading of the email and returns reading content to the VFS module; (2) the VFS module returns the content to the reading thread and then returns to the timer, the timer reads the local mount point catalogue of the hdfs, the Hdfs reads the thread to access the VFS, and the VFS accesses the fuse to read the content of the hdfs; and (3) the VFS returns the reading content to the hdfs, the Hdfs returns the content to the timer, the timer completes comparison between the content of the email and the content of the hdfs to determine the synchronization of the email, and the Hdfs reads the thread and stores the email to the hdfs. According to the method, the complicated operation for backing up the email originally can be simplified, and the working efficiency can be improved.

Description

A kind of mail that backs up easily is to the method on HDFS
Technical field
The present invention discloses a kind of mail that backs up easily to the method on HDFS, belongs to large data management field.
Background technology
Hadoop utilizes HDFS, Hadoop Distributed File System, and distributed file system stores data, and utilize namenode to carry out the metadata of memory file system, datanode is for storing data itself.And in order to avoid SPOF, Single Point of Failure, the generation of single failure point problem, namenode metadata node supports HA software feature.Namenode supports that HA characteristic can ensure the security of metadata.Same, in order to ensure the security of data, avoid loss of data.HDFS introduces file backup mechanism, and be one for three under default situations, namely a file has three backups.And the HA of HDFS and back mechanism can be used for the preservation of vital document.
After current Google increases Gmail mailbox space, each large mail service provider also follows the wind in succession.The capacity of mailbox is with tens of GB GB metering even up to a hundred, and mean that the mail that mailbox can store is more, the time that single envelope mail can store is longer.Nowadays the information interchange of the mode of the communication exchange of person to person especially major company, large enterprises inside is based on mail.How to allow the quick preservation of e-mail messages to exchange, and user mail will be caused to lose by factors such as ghosts in the information transfer of black, service provider, company because of the server of service provider, become problem demanding prompt solution.The present invention is directed to a kind of mail that backs up easily of the problems referred to above proposition to the method on HDFS, utilize fuse that mail and HDFS are all hung over a catalogue into local file system respectively, timer is utilized to make regular check on the update status of mailbox, if under mail has variation the mail of the renewal under mail mount point to be copied to the corresponding catalogue of HDFS mount point with regard to startup file copy thread, whole process is just as operation local file, enormously simplify the complex operations in the past backing up mail, improve work efficiency, the more important thing is and mail security can be backuped to HDFS by the inventive method, important email is avoided to lose the loss caused.
Summary of the invention
The present invention is directed to and how to allow the quick preservation of e-mail messages to exchange, and the problem that user mail will be caused to lose by factors such as ghosts in the information transfer of black, service provider, company because of the server of service provider, there is provided a kind of mail that backs up easily to the method on HDFS, simplify the complex operations in the past backing up mail, improve work efficiency, the more important thing is and mail security can be backuped to HDFS by the inventive method, avoid important email to lose the loss caused.
The inventive method mainly realizes based on fuse.Fuse realizes the interface of file system at User space.Linux issues version and is integrated with fuse module at kernel, can enable fuse module by modprobe fuse order.Mailbox space will be mounted on Linux local file system and realize a file system based on fuse, the imap of python can be competent at this demand, and we call mail_fuse_fs this file systematic unity.In addition Hadoop issues version and has utilized fuse to achieve hdfs is hung over function into local file system.User only needs to compile corresponding module and just can use.
By utilizing above-mentioned technology, mail and HDFS are all mounted on local file system.Can carry out synchronously these two mount directory as operation local file.This method uses timer regularly to carry out the synchronous of mail and HDFS.This method is not only applicable to the backup of mail, and the backup of blog, net dish equally also can adopt this method.
The concrete scheme that the present invention proposes is:
Back up mail easily to the method on HDFS, concrete steps are:
1. timer is set in the userspace of local system, subscriber mailbox made regular check on by timer, by the local mount point of access mail, VFS module is given request path, request path is given fuse module by VFS, mail_fuse_fs file system realizes the reading of mail, and return to the content that fuse module reads, Mail Contents is returned to VFS module by fuse module;
2. content is returned to reading thread by VFS module, read thread and the content of reading is returned to timer, timer reads hdfs in local mount point catalogue, and Hdfs reads thread accesses VFS, VFS accesses fuse, fuse and calls hdfs fuse module reading hdfs content;
3. VFS returns to hdfs reading of content and reads thread, Hdfs reads thread and hdfs content is returned to timer, and timer completes the contrast of Mail Contents and hdfs content, determines which mail needs synchronous, timer calls hdfs and writes thread, Hdfs write thread complete mail with hdfs synchronous.
Described step 3. in, Hdfs writes thread, and by sealing mail, to carry out filing heel hdfs synchronous more.
Described mail is Gmail mail.
Described a kind of mail that backs up easily is applied to process blog on HDFS to the method on HDFS, and concrete steps are:
1. timer is set in the userspace of local system, user's blog made regular check on by timer, by the local mount point of accesses blog, VFS module is given request path, request path is given fuse module by VFS, blog_fuse_fs file system realizes the reading of blog, and return to the content that fuse module reads, Blog content is returned to VFS module by fuse module;
2. content is returned to reading thread by VFS module, read thread and the content of reading is returned to timer, timer reads hdfs in local mount point catalogue, and Hdfs reads thread accesses VFS, VFS accesses fuse, fuse and calls hdfs fuse module reading hdfs content;
3. VFS returns to hdfs reading of content and reads thread, Hdfs reads thread and hdfs content is returned to timer, and timer completes the contrast of Blog content and hdfs content, determines which blog needs synchronous, timer calls hdfs and writes thread, Hdfs write thread complete blog with hdfs synchronous.
Described a kind of mail of backup is easily applied to process net dish to the method on HDFS and backups on HDFS, and concrete steps are:
1. timer is set in the userspace of local system, subscriber mailbox made regular check on by timer, by the local mount point of access net dish backup, VFS module is given request path, request path is given fuse module by VFS, network disk_fuse_fs file system realizes the reading of net dish backup, returns to the content that fuse module reads, and net dish backup content is returned to VFS module by fuse module;
2. content is returned to reading thread by VFS module, read thread and the content of reading is returned to timer, timer reads hdfs in local mount point catalogue, and Hdfs reads thread accesses VFS, VFS accesses fuse, fuse and calls hdfs fuse module reading hdfs content;
3. VFS returns to hdfs reading of content and reads thread, Hdfs reads thread and hdfs content is returned to timer, timer completes the contrast of net dish backup content and hdfs content, determine which net dish backup needs synchronous, timer calls hdfs and writes thread, Hdfs write thread complete net dish backup with hdfs synchronous.
Usefulness of the present invention is: the present invention utilizes fuse that mail and HDFS are all hung over a catalogue into local file system respectively, timer is utilized to make regular check on the update status of mailbox, if under mail has variation the mail of the renewal under mail mount point to be copied to the corresponding catalogue of HDFS mount point with regard to startup file copy thread, whole process is just as operation local file, enormously simplify the complex operations in the past backing up mail, improve work efficiency, the more important thing is and mail security can be backuped to HDFS by the inventive method, important email is avoided to lose the loss caused.
Accompanying drawing explanation
Fig. 1 embodiment of the present invention schematic diagram.
Embodiment
The inventive method mainly realizes based on fuse.Fuse realizes the interface of file system at User space.Linux issues version and is integrated with fuse module at kernel, can enable fuse module by modprobe fuse order.Mailbox space will be mounted on Linux local file system and realize a file system based on fuse, the imap of python can be competent at this demand, and we call mail_fuse_fs this file systematic unity.In addition Hadoop issues version and has utilized fuse to achieve hdfs is hung over function into local file system.User only needs to compile corresponding module and just can use.
By utilizing above-mentioned technology, mail and HDFS are all mounted on local file system.Can carry out synchronously these two mount directory as operation local file.This method uses timer regularly to carry out the synchronous of mail and HDFS.This method is not only applicable to the backup of mail, and the backup of blog, net dish equally also can adopt this method.For Gmail, set forth the present invention further, the flow process of the inventive method is followed successively by as can be seen from Figure:
Gmail mailbox made regular check on by 1 timer;
The local mount point of 2 access gmail, finally can give VFS module request path;
Request path is given fuse module by 3Vfs;
4Gmail_fuse_fs file system realizes the reading of gmail mail;
5 return to fuse module read content;
Mail Contents is returned to vfs module by 6Fuse module;
Content is returned to reading thread by 7Vfs module;
8 read thread returns to timer by the content of reading;
9 timers read hdfs in local mount point catalogue;
10Hdfs reads thread accesses vfs;
11Vfs accesses fuse;
12Fuse calls hdfs fuse module and reads hdfs content;
13Vfs returns to hdfs reading of content and reads thread;
14Hdfs reads thread and hdfs content is returned to timer, and timer completes the contrast of gmail content and hdfs content, determines which mail needs synchronous;
15 timers call hdfs and write thread;
16 Hdfs write thread and complete synchronous with hdfs of gmail.
Whole process, because gmail mail size may be fewer, so be stored on hdfs after step 16 can consider gmail many envelopes mail to file.

Claims (5)

1. back up mail easily to the method on HDFS, it is characterized in that concrete steps are:
1. timer is set in the userspace of local system, subscriber mailbox made regular check on by timer, by the local mount point of access mail, VFS module is given request path, request path is given fuse module by VFS, mail_fuse_fs file system realizes the reading of mail, and return to the content that fuse module reads, Mail Contents is returned to VFS module by fuse module;
2. content is returned to reading thread by VFS module, read thread and the content of reading is returned to timer, timer reads hdfs in local mount point catalogue, and Hdfs reads thread accesses VFS, VFS accesses fuse, fuse and calls hdfs fuse module reading hdfs content;
3. VFS returns to hdfs reading of content and reads thread, Hdfs reads thread and hdfs content is returned to timer, and timer completes the contrast of Mail Contents and hdfs content, determines which mail needs synchronous, timer calls hdfs and writes thread, Hdfs write thread complete mail with hdfs synchronous.
2. a kind of mail that backs up easily according to claim 1 is to the method on HDFS, and in it is characterized in that described step 3., Hdfs writes thread, and by sealing mail, to carry out filing heel hdfs synchronous more.
3. a kind of mail that backs up easily according to claim 1 and 2 is to the method on HDFS, it is characterized in that described mail is Gmail mail.
4. a kind of mail that backs up easily according to claim 1 is applied to process blog on HDFS to the method on HDFS, it is characterized in that concrete steps are:
1. timer is set in the userspace of local system, user's blog made regular check on by timer, by the local mount point of accesses blog, VFS module is given request path, request path is given fuse module by VFS, blog_fuse_fs file system realizes the reading of blog, and return to the content that fuse module reads, Blog content is returned to VFS module by fuse module;
2. content is returned to reading thread by VFS module, read thread and the content of reading is returned to timer, timer reads hdfs in local mount point catalogue, and Hdfs reads thread accesses VFS, VFS accesses fuse, fuse and calls hdfs fuse module reading hdfs content;
3. VFS returns to hdfs reading of content and reads thread, Hdfs reads thread and hdfs content is returned to timer, and timer completes the contrast of Blog content and hdfs content, determines which blog needs synchronous, timer calls hdfs and writes thread, Hdfs write thread complete blog with hdfs synchronous.
5. a kind of mail that backs up easily according to claim 1 is applied to process net dish to the method on HDFS and backups on HDFS, it is characterized in that concrete steps are:
1. timer is set in the userspace of local system, subscriber mailbox made regular check on by timer, by the local mount point of access net dish backup, VFS module is given request path, request path is given fuse module by VFS, network disk_fuse_fs file system realizes the reading of net dish backup, returns to the content that fuse module reads, and net dish backup content is returned to VFS module by fuse module;
2. content is returned to reading thread by VFS module, read thread and the content of reading is returned to timer, timer reads hdfs in local mount point catalogue, and Hdfs reads thread accesses VFS, VFS accesses fuse, fuse and calls hdfs fuse module reading hdfs content;
3. VFS returns to hdfs reading of content and reads thread, Hdfs reads thread and hdfs content is returned to timer, timer completes the contrast of net dish backup content and hdfs content, determine which net dish backup needs synchronous, timer calls hdfs and writes thread, Hdfs write thread complete net dish backup with hdfs synchronous.
CN201410845600.8A 2014-12-31 2014-12-31 Method for conveniently backing up mail to HDFS Pending CN104407946A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410845600.8A CN104407946A (en) 2014-12-31 2014-12-31 Method for conveniently backing up mail to HDFS

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410845600.8A CN104407946A (en) 2014-12-31 2014-12-31 Method for conveniently backing up mail to HDFS

Publications (1)

Publication Number Publication Date
CN104407946A true CN104407946A (en) 2015-03-11

Family

ID=52645579

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410845600.8A Pending CN104407946A (en) 2014-12-31 2014-12-31 Method for conveniently backing up mail to HDFS

Country Status (1)

Country Link
CN (1) CN104407946A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105117176A (en) * 2015-09-09 2015-12-02 浪潮(北京)电子信息产业有限公司 Method and system for data reading-writing
CN105468476A (en) * 2015-11-18 2016-04-06 盛趣信息技术(上海)有限公司 Hadoop distributed file system (HDFS) based data disaster backup system
CN106302609A (en) * 2015-06-08 2017-01-04 阿里巴巴集团控股有限公司 A kind of access method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120158803A1 (en) * 2010-12-20 2012-06-21 International Business Machines Corporation Partition file system for virtual machine memory management
CN102571959A (en) * 2012-01-11 2012-07-11 北京奇虎科技有限公司 System and method for downloading data
CN103970794A (en) * 2013-02-01 2014-08-06 联想(北京)有限公司 Method and device for accessing data

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120158803A1 (en) * 2010-12-20 2012-06-21 International Business Machines Corporation Partition file system for virtual machine memory management
CN102571959A (en) * 2012-01-11 2012-07-11 北京奇虎科技有限公司 System and method for downloading data
CN103970794A (en) * 2013-02-01 2014-08-06 联想(北京)有限公司 Method and device for accessing data

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
XIANGYONG OUYANG等: "CRFS: A Lightweight User-Level Filesystem for Generic Checkpoint/Restart", 《2011 INTERNATIONAL CONFERENCE ON PARALLEL PROCESSING》 *
许维龙: "基于HDFS的数据备份系统的分析与设计", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106302609A (en) * 2015-06-08 2017-01-04 阿里巴巴集团控股有限公司 A kind of access method and device
CN106302609B (en) * 2015-06-08 2020-02-28 阿里巴巴集团控股有限公司 Access method and device
US11221997B2 (en) 2015-06-08 2022-01-11 Advanced New Technologies Co., Ltd. On-demand creation and access of a virtual file system
CN105117176A (en) * 2015-09-09 2015-12-02 浪潮(北京)电子信息产业有限公司 Method and system for data reading-writing
CN105468476A (en) * 2015-11-18 2016-04-06 盛趣信息技术(上海)有限公司 Hadoop distributed file system (HDFS) based data disaster backup system
CN105468476B (en) * 2015-11-18 2019-03-08 盛趣信息技术(上海)有限公司 Data disaster recovery and backup systems based on HDFS

Similar Documents

Publication Publication Date Title
CN108536761B (en) Report data query method and server
CN104065685B (en) Data migration method towards in the bedding storage system of cloud computing environment
US8914379B2 (en) Index constructing method, search method, device and system
US10248356B2 (en) Using scratch extents to facilitate copying operations in an append-only storage system
US10659225B2 (en) Encrypting existing live unencrypted data using age-based garbage collection
CN107577420B (en) File processing method and device and server
US11093387B1 (en) Garbage collection based on transmission object models
US10310904B2 (en) Distributed technique for allocating long-lived jobs among worker processes
US9569515B2 (en) Facilitating distributed deletes in a replicated storage system
CN103049317B (en) Based on the high concurrent data no write de-lay system and method for queue under cloud environment
CN103034684A (en) Optimizing method for storing virtual machine mirror images based on CAS (content addressable storage)
CN103905537A (en) System for managing industry real-time data storage in distributed environment
CN104639658A (en) Method for realizing object storage by mounting and accessing file system
CN102662795A (en) Metadata fault-tolerant recovery method in distributed storage system
US20130080389A1 (en) Allocation of absent data within filesystems
US20150256504A1 (en) Distributed synchronization data in a message management service
CN104462185A (en) Digital library cloud storage system based on mixed structure
WO2015106656A1 (en) Cross-data-center data synchronization method
CN108268609A (en) A kind of foundation of file path, access method and device
CN104407946A (en) Method for conveniently backing up mail to HDFS
CN109783018A (en) A kind of method and device of data storage
CN103473258A (en) Cloud storage file system
CN103365740B (en) A kind of data cold standby method and device
CN104216908A (en) Internet data management system and reading and writing method thereof
CN102820998B (en) Realize the dual computer fault-tolerant service system towards office application and date storage method thereof

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20150311

RJ01 Rejection of invention patent application after publication