CN101183323A - Data stand-by system based on finger print - Google Patents

Data stand-by system based on finger print Download PDF

Info

Publication number
CN101183323A
CN101183323A CN 200710168715 CN200710168715A CN101183323A CN 101183323 A CN101183323 A CN 101183323A CN 200710168715 CN200710168715 CN 200710168715 CN 200710168715 A CN200710168715 A CN 200710168715A CN 101183323 A CN101183323 A CN 101183323A
Authority
CN
Grant status
Application
Patent type
Prior art keywords
job
backup
server
block
data
Prior art date
Application number
CN 200710168715
Other languages
Chinese (zh)
Other versions
CN100547555C (en )
Inventor
丹 冯
高 刘
刘景宁
可 周
航 张
杨天明
牛中盈
Original Assignee
华中科技大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date

Links

Abstract

一种基于指纹的数据备份系统,属于计算机存储备份技术领域,目的在于降低数据备份的管理、存储以及网络开销,提高备份性能。 Based on the fingerprint data backup system, the backup storage belonging to a computer technical field, aims to reduce the management, storage, backup and network overhead data, improve backup performance. 本发明包括备份服务器、备份代理、存储服务器和Web服务器,它们通过网络相互通信完成数据备份与恢复;本发明采用基于锚的文件分块技术识别备份文件的冗余数据,具有修改稳定性,计算开销小;数据分块以其指纹为索引存储在存储服务器的磁盘阵列上,消除了冗余数据的备份,节省磁盘存储空间;数据分块一旦存储就不再擦除,可以连续追加在磁盘上,消除了磁盘存储碎片;采用有效的备份缓冲策略,减少了备份的网络开销,提高了数据备份速度,降低了备份对应用服务器的影响。 The present invention comprises a backup server, backup agent, storage servers and Web servers that perform data backup and restore communication with each other through the network; the present invention uses block-based technology to identify backup file anchor redundant data files, having a modified stability, calculated small overhead; data block as an index of its fingerprint stored on a disk array storage server, eliminating redundant data backup, disk storage to save space; data block is no longer stored once erased, can be added continuously on the disk , eliminating disk fragmentation; adopt effective backup strategy buffer, reducing the network overhead backup, improved data backup speed, reducing the backup impact on the application server.

Description

一种基于指纹的数据备份系统 A fingerprint-based data backup system

技术领域 FIELD

本发明属于计算机存储备份领域,具体涉及一种数据备份系统。 The present invention belongs to the field of computer memory backup, particularly relates to a data backup system. 背景技术 Background technique

在当今这个知识爆炸的信息时代,无论对企业还是个人来说,数据都是一项弥足珍贵的资源。 In today's knowledge explosion of the information age, both for business or personal, the data is a precious resource. 数据丢失轻则影响企业业务连续性,使其丧失一时的竞争优势,重则能使一个企业破产倒闭。 Loss of data ranging from the impact of enterprise business continuity, it lost the competitive advantage of the moment, while make a corporate bankruptcy. 引起数据丢失的原因很多,包括系统软硬件故障、人为操作失误或破坏以及不可抗拒力(自然灾害、战争)等。 Data loss caused by many reasons, including system hardware and software failures, human error or sabotage and force majeure (natural disasters, war) and so on. 为了保护数据免遭不测,传统的方法是周期性地把数据拷贝到可移动的媒介比如磁带、光盘上,然后再离线运送到一个相对安全的地方以便在必要时恢复这些数据。 In order to protect data against unexpected, the traditional method is to periodically copied to removable media such as magnetic tape, optical disk, and then shipped off to a relatively safe place in order to restore the data if necessary. 应当指出,这种传统的数据保护方法存在一些明显的缺点:(1)、可移动的存储介质比如磁带、光盘等随着时间的流逝,会出现磨损或损坏使其存储可靠性降低因而不适合作数据的长期存储介质。 It should be noted that there are some obvious disadvantages of this conventional approach to data protection: (1), removable storage media such as magnetic tape, optical disk, etc. over time, it will be worn or damaged thus reducing storage reliability unfit long-term storage of data. (2)、作为备份大容量数据的常用存储媒介的磁带,其读写速度往往很慢,而且由于是顺序存储设备,在恢复数据时通常会出现频繁的机械倒带操作,如果备份数据分布在几条磁带上,还需费时的装卸操作。 (2), a tape backup mass data storage medium commonly used, which is often very slow read and write speed, and a sequential storage device because it is usually frequent mechanical rewinding operation when restoring the data, if the backup data is distributed on several tape, the need of time-consuming loading and unloading operations. 这使得利用磁带进行数据备份和恢复是一件相当耗时的工作。 This allows the use of tape for data backup and recovery is a very time-consuming work. (3)、需要雇用专人把备份数据运送到远程站点,并且保证运输和储存过程中的数据安全。 (3) the need to hire someone to transport the backup data to a remote site, and ensure data security, transport and storage processes. 由此可以看出,传统的数据备份需要人工介入完成许多任务,是一项代价高昂的、繁琐的工作。 It can be seen, the traditional data backup requires manual intervention to complete a number of tasks, it is a costly, tedious work. 为了提高数据备份和恢复的效率,克服传统的数据保护技术的缺点,近二十年来,世界上 In order to improve the efficiency of data backup and recovery, data protection technology to overcome the traditional shortcomings of the past two decades, the world

一些知名的IT企业或研究机构研制出了形形色色的数据备份系统。 Some well-known IT companies or research institutions developed all kinds of data backup system. 包括IBM的TotalStorage, HP的OpenView存储镜像软件、CASA、 XPCA以 Including IBM's TotalStorage, HP's OpenView Storage Mirroring software, CASA, XPCA to

及EVACA, EMC的SRDF和MirrorView, VERITAS的NetBackup等等。 And EVACA, EMC's SRDF and MirrorView, VERITAS NetBackup's and so on.

这些商业系统没有重复数据删除功能,为了存储在备份中产生的大量冗余数据,往往需要使用磁盘到磁带(D2T)技术,即使用高速磁盘作为备份缓冲区以提高在线备份效率,然后在后台把磁盘缓冲区中的备份数据迁移到磁带库或光盘库等低速大容量的存储媒介上,故其后台存储设备还是需要耗费大量的人力物力进行日常维护。 These commercial systems do not deduplication, in order to store a large amount of redundant data generated in the backup, often need to use disk-to-tape (D2T) technology, which use high-speed disk as a backup buffer in order to improve the efficiency of online backup, and then put in the background migrate backup data on disk buffer to the low-speed large-capacity storage media tape libraries or optical libraries, so the back-end storage devices still requires a lot of manpower and resources to carry out routine maintenance. 由于磁盘存储较磁带存储具有管理方便、存取速度快等优点,随着磁盘存储技术的发展,使用磁盘存储数据的备份系统越来越受到重视。 Because of disk storage with easy management, fast access speed and other advantages over tape storage, disk storage with the development of technology, the use of disk storage data backup system more and more attention. 目前的磁盘存储技术能够很容易搭建一个TB甚至PB级的磁盘存储系统。 The current disk storage technology can easily set up a TB or even PB-level disk storage systems. 每比特磁盘存储的价格越来越便宜使得利用磁盘永久归档数据变得现实起来。 Each bit cheaper price of disk storage makes use of disk permanently archived data becomes a reality together. 对于一个基于磁盘的数据备份系统来说,备份数据永久存储于磁盘而不擦除具有许多优点:首先,数据可以连续地写到磁盘上,不会因为空间回收而产生磁盘碎片, 其次,用户的数据历史得到完整的保存,用户可以很方便地浏览文件的任一历史版本,第三,有利于保护用户的备份数据,避免了用户误操作而删除重要的数据。 For a disk-based data backup systems, data is permanently stored in the backup disk without erasing has many advantages: First, the data on the disk can be continuously, recovering the space will not be written to disk fragmentation, secondly, the user's get the complete data history saving, users can easily browse a file of any version history, and third, to protect the user's data is backed up to avoid user errors and delete important data. 然而,对于一个永久存储的基于磁盘的备份系统来说,最大的挑战来源于用户不断增加的备份数据。 However, disk-based backup systems, the biggest challenge comes from the growing user to back up data to a permanent storage. 通常,企业的数据具有高度的冗余,大量重复的数据和文件存储在系统中, 一个文件的多个编辑版本之间也存在大量重复的内容。 Typically, the data of enterprises with a high degree of redundancy in the system, there are also a large number of duplicate content between multiple versions of a file to edit a large number of duplicate data and file storage. 目前广泛使用的基于文件的备份技术不能识别文件之间的冗余数据,导致越来越多的重复数据备份到系统中,不但降低了备份系统的磁盘空间利用率,而且无端通过网络传输了大量冗余数据,增加了数据备份的网络开销,延长了数据备份时间。 Based on the widely used technique of redundant data between the backup file can not recognize the file, resulting in more repeat backup data into the system, not only reduces the disk space utilization backup system, and a large number of endless transmitted through the network redundant data, increasing the network overhead data backup, data backup extended time.

由此可见,开发一个永久存储的基于磁盘的备份系统,并采用新的数据备份技术清除备份的冗余数据,提高系统的存储效率,是具有积极意义的。 Thus, the development of a permanent storage of disk-based backup systems, redundant data and the new data backup technology to clear backup, improve storage efficiency of the system, is of positive significance.

发明内容 SUMMARY

本发明提出一种基于指纹的数据备份系统,系统采用磁盘永久存储备份数据并采用基于指纹的数据备份技术以删除备份中的冗余数据,目的在于降低数据备份的管理、存储以及网络开销,提高备份性能。 The present invention provides a fingerprint-based data backup system, the system uses the backup data and the permanent storage disk based data backup fingerprint technology to remove redundant data backup, data backup object is to reduce the management, storage, and network overhead, improved backup performance.

本发明的一种基于指纹的数据备份系统,包括备份服务器、备份代 The present invention is based on one kind of fingerprint data backup system, including the backup server, the backup generations

理、存储服务器和Web服务器,它们通过网络相互通信完成数据备份与 Management, storage servers and Web servers that perform data backup and communicate with each other through a network

恢复,其特征在于: Recovery, which is characterized in that:

所述备份服务器装有配置文件和目录数据库,备份服务器的配置文件中记录用户定义的作业对象,作业对象包含指定系统操作作业运行的 The backup server database with configuration files and directories, target user-defined configuration files recorded in the backup server, the job object contains the specified operating system job runs

属性,备份服务器通过作业对象控制着整个数据备份和恢复的过程;目录数据库存储作业记录,作业记录保存作业对象运行的管理信息; Property, the backup server controls the entire process of data backup and recovery through job object; catalog database records stored jobs, job record management information stored job objects run;

所述备份代理单元安装于网络中每一个需要备份数据的主机上,备份时由备份代理单元从所在主机的文件系统中读取需要备份的文件,对文件进行基于锚的分块并计算分块的指纹,把指纹和部分需要的分块数据通过网络送往存储服务器;恢复时备份代理单元通过网络从存储服务器接收文件数据并写到所在主机的文件系统中指定的目录下; The backup agent unit is attached to each of the network need to back up the data on the host, to read the file to be backed up from the file system of the host that when backed up by the backup agent unit, based on the file block and calculates anchor block fingerprint, and the fingerprint portion of the block data sent through the network requires storage server; backup agent unit receives from the storage server through a network and writes file data when restoring the host that under specified directory in the file system;

所述存储服务器安装有大容量磁盘阵列,大容量磁盘阵列是数据备份的目的地,备份时通过网络从相应的备份代理单元接收指纹或数据分块,把数据分块存储到磁盘上,并建立文件的索引;恢复时则从大容量磁盘阵列根据文件索引重构文件,并把文件数据通过网络送到相应的备份代理; The server contains a large capacity storage disk array, the disk array is large-capacity data backup destination, the backup time from the corresponding proxy unit receives the fingerprint, or the backup data block, the data block stored on a disk by a network, and establishing index file; from large-capacity disk array according reconstructed file index file, and the file data over the network to the respective backup agent recovery;

所述Web服务器是本系统的BS模式网页用户管理界面,通过登录Web服务器,用户既可以指定系统完成交互式的备份或恢复作业、监视系统自动调度型作业的运行情况,还可以修改备份服务器的配置文件、定制作业对象,进行设备管理。 The Web server is a web BS mode user management interface of the present system, the Web server through the log, the user can either specify the complete interactive system backup or restore job, monitor the operation of automatic scheduling system type job, can also modify the backup server profile, custom job object, device management.

所述的基于指纹的数据备份系统,其特征在于,所述备份服务器包括 The fingerprint-based data backup system, wherein the backup server comprises

备份服务器初始化模块、命令监听模块、命令处理模块、作业处理模块和网络通信模块; Backup server initialization module, a monitoring module command, the command processing module, the job processing module and a network communication module;

所述备份服务器初始化模块执行初始化工作,包括读取配置文件、 建立内存中的资源链表、检查目录数据库状态、保证配置文件和目录数据库的数据一致性和完整性、启动命令监控端口、接受来自Web服务器的用户侖令、初始化作业队列和用户命令队列、向作业队列中加载作业对象、启动作业和网络监控服务; The backup server initialization module to perform initialization work, including reading configuration file, establish a resource list in memory, check the status of the directory database, ensure data consistency and integrity of database configuration files and directories, start a command monitor port to receive from the Web server so that users Lun initialization job queue and user command queue, the job object is loaded into the job queue, start the job and network monitoring services;

所述命令监听模块是由系统生成的一个网络监听线程,对Web服务器的连接请求进行认证,保证只有经过系统授权的Web服务器才能连接系统,监听已通过认证的Web服务器发送来的命令请求;收到命令请求时,将命令请求加入到用户命令队列中等待系统处理; The command is generated by the system monitoring module of a network listener thread for the connection request from the Web server certification to ensure that only authorized Web server system connected to the system, monitor command has been transmitted by the authentication request to the Web server; yield the request command, is added to a user command request in the command queue waiting processing system;

所述命令处理模块包括一个用户命令队列和N个命令工作线程,当用户命令队列溢出时,命令监听模块转入睡眠状态;这些命令工作线程不断从用户命令队列中读取命令并执行,根据所执行命令的不同完成不同的功能;当命令监听模块向用户命令队列中加入一个命令时,如果当前没有空闲的命令工作线程且活跃的命令工作线程的数目没有达到N 时,就生成一个新的命令工作线程;命令工作线程每次从用户命令队列中读取命令时都检查命令监听模块的状态,如果其处于睡眠状态则唤醒它; The command processing module includes a command queue and user commands work N threads, when the user command queue overflows, the command monitoring module into a sleep state; threads these commands continue to work from a user and executes the read command in the command queue, in accordance with the Run different perform different functions; when the command listener module adds a command to the user command queue, the number of commands worker thread if no idle command worker thread and active does not reach N, generates a new command worker thread; command to check the status of worker threads every command monitoring module when reading user commands from the command queue, if it is sleeping then wake it up;

所述作业处理模块包括一个作业队列、L个作业工作线程和一个作业队列加载线程,当作业队列发生溢出时,作业队列加载线程进入睡眠状态;作业工作线程不断从作业队列中取作业对象并执行,根据作业对象属性的不同调用不同的资源、实现不同的功能;作业队列加载线程进行作业调度,检查作业资源链中每个作业对象的调度策略属性,把需要调度运行的作业对象加入作业队列中,如果当前没有空闲的作业工作线程且活跃的作业工作线程的数目没有达到L时,就生成一个新的作业工作 The job includes a job queue processing module, L job and a job queue worker thread loading, job queue when an overflow occurs, the job queue load thread to sleep; job worker job object constantly taken from the job queue and executed , according to different calling job object attributes resources to achieve different functions; job queue load thread job scheduling, resource scheduling policy attribute check job chain each job object, the objects need to schedule jobs to run added to the job queue when the number of worker threads if the job is not currently idle and active job worker thread does not reach L, generates a new job working

线程;作业工作线程每次从作业队列中读取作业对象时都检查作业队列加载线程的状态,如果其处于睡眠状态则唤醒它; Thread; work every worker to check the status of the job queue load thread when reading the job object from the job queue, if it is sleeping then wake it up;

所述网络通信模块把标准的网络通信应用编程接口进行封装,向命令工作线程和作业工作线程提供网络通信接口,网络通信接口实现备份服务器、备份代理和存储服务器之间的数据传输协议。 The network communication module to standard network communication application programming interface package, providing network communication interface, a network communication interface for data transfer protocol between the backup server, the backup server to the proxy and the storage operation command worker thread and the worker threads.

所述的基于指纹的数据备份系统,其特征在于,所述备份代理包括备份代理初始化模块、请求监听模块、作业处理模块、文件分块模块和网络通信模块; The fingerprint-based data backup system, characterized in that said backup agent comprising a backup agent initialization module, the monitoring module a request, the job processing module, module file segment and a network communication module;

所述备份代理初始化模块,执行初始化工作,包括读取备份代理配置文件、建立内存资源链表、初始化作业队列、启动备份服务器请求监听模块; The backup agent initialization module, perform initialization, the backup agent comprising reading the configuration file, the establishment of memory resources list, initializes job queue, the backup server start request to the monitoring module;

所述请求监听模块监听网络上备份服务器的连接请求,认证连接的备份服务器,认证通过后生成一个网络连接套接字和此备份服务器通信并加入作业队列中; The backup server requests connection request monitoring module monitor the network, the backup server connected to the authentication, generate a network connection socket and the communication server and added to the backup job queue after the authentication;

所述作业处理模块包括一个作业队列和M个作业工作线程,当作业队列溢出时,请求监听模块转入睡眠状态;作业工作线程从作业队列中取出一个网络连接套接字后,首先为作业建立一个作业控制记录,把网络连接套接字链入作业控制记录的成员变量中,然后通过此网络连接套接字和备份服务器交互,把备份服务器作业对象的有关属性通过变换后赋值给作业控制记录的相应成员变量;然后用从备份服务器处得到的作业票据ticket连接相应的存储服务器,产生一个和存储服务器通信的网络连接套接字并将之链入作业控制记录的成员变量中;当请求监听模块向作业队列中加入一个网络连接套接字时,如果当前没有空闲的作业工作线程且活跃的作业工作线程的数目没有达到M时,就生成一个新的作业工作线程;作业工作线程每次从作业队列中取一个网络连接 The module comprises a job processing job queue worker thread and M job, when the job queue overflows, the request monitoring module into a sleep state; Task of a network connection socket threads removed from the job queue, the job is first established for the a control member variable job record, the network connection socket links job control record, and then connected to the socket and the backup server interaction through this network, the backup server of the properties of the job object by assigning the converted job control records corresponding member variable; job ticket with the ticket and then obtained from the backup server connected to the respective storage servers, network connection socket generating member variables and operations into a chain of storage and the communication control server recorded; when the request listening when the module is added to the job queue when a network connection socket, the number of worker threads job if no job idle and active worker thread does not reach M, it generates a new job worker thread; each thread operation from work taking a job queue network connection 接字时都检查请求监听模块的状态,如果其处于睡眠状态则唤醒它; Status check request listener modules are in contact when the word if it wakes up it is in a sleep state;

所述文件分块模块接受作业处理模块中作业工作线程的命令执行备份作业的文件分块任务,在客户机文件系统上打开文件集中的每一个文件,对文件进行基于锚的分块并计算分块指纹,和相应的存储服务器协 The partitioning module accepts job file processing module job worker thread to perform the backup job command file segment task to open a file for each file set in the client file system, the file block is calculated based on the anchor points and block fingerprint, storage server, and the corresponding HS

调执行第一备份过程的备份算法; A first modulation algorithm to perform a backup of the backup process;

所述网络通信模块由作业的网络连接套接字组成,备份代理的每个作业都拥有两个网络连接套接字,分别用于和该作业对应的备份服务器作业以及存储服务器作业通信。 The network communication module by a network socket for the job, each job in the backup agent has two network connection sockets, respectively, for the backup job, and the job server communication and storage servers corresponding to the job.

所述的基于指纹的数据备份系统,其特征在于,所述存储服务器包括存储服务器初始化模块、连接监控模块、作业票据表、作业处理模块和网络通信模块,以及索引缓冲区、分块缓冲区、分块哈希表和磁盘日志; The fingerprint-based data backup system, wherein the storage server comprises a storage server initialization module, a connection monitoring module, the job ticket list, the job processing module and a network communication module, and index buffer, the buffer block, block hash table and log disks;

所述存储服务器初始化模块执行初始化工作,包括解析存储服务器 The storage server performs initialization module initialization, the storage server comprising parsing

配置文件,建立内存资源链表,启动相关服务线程; Profiles, the establishment of memory resources list, start a thread related services;

所述连接监控模块监控备份服务器和备份代理的连接请求,对连接的备份服务器进行认证,认证通过后生成一个网络连接套接字和此备份服务器通信并加入作业队列中;对连接的备份代理,则根据其出示的作业票据ticket检查作业票据表以对其进行认证,认证通过后生成一个网络连接套接字和此备份代理通信并链接到相应作业控制记录的成员变量中; The connection monitoring module monitors the backup server and the backup proxy connection requests, authenticating the connection of the backup server generates a network connection socket and the communication server and added to the backup job queue after the authentication; backup agent connection, is to be authenticated in accordance with their job tickets presented ticket ticket inspection work table, and generates a network connection socket and the backup agent communication link to the control member variable corresponding job record after authentication;

所述作业票据表用于存储对备份代理作业进行认证的票据; 所述作业处理模块包括一个作业队列以及W个作业工作线程,当作业队列溢出时,连接监控模块转入"拒绝备份服务器连接请求"状态; 作业工作线程从作业队列中取出一个网络连接套接字后,首先为作业建立一个作业控制记录,把网络连接套接字链入作业控制记录的成员变量中,然后通过此网络连接套接字和备份服务器交互,把备份服务器作业对象的有关属性通过变换后赋值给作业控制记录的相应成员变量,并随 The job ticket ticket table for storing backup job authenticating agent; the job processing module comprises a job queue worker thread and W job when the job queue overflows, a connection monitoring module into the "backup server connection request rejected "state; Task of thread taken from a network socket connection job queue, the job is first to establish a record control job, the job network connection links socket member variable control record, and then sets this network connection Sockets and backup server interaction, the backup server of the properties of the job object by assigning the converted job control record corresponding member variable, and with

机生成一个作业票据ticket登记到作业票据表中且向备份服务器作业对象传送此作业票据ticket;当连接监控模块向作业队列中加入一个网络连接套接字时,如果当前没有空闲的作业工作线程且活跃的作业工作线程 Machine generating a job ticket to the job ticket registered ticket list and transmits this to the backup server job ticket ticket work object; monitoring module when connected to one network connection socket to the job queue, the job if no idle thread and work active work worker

的数目没有达到W时,就生成一个新的作业工作线程;作业工作线程每 When the number is not reached W, it generates a new job worker thread; each thread work jobs

次从作业队列中取一个网络连接套接字时都检査连接监控模块的状态, 如果其处于"拒绝备份服务器连接请求"状态则取消这种状态以使它接 Views taken from the job queue to check for a network connection status monitoring module socket connection, if it is in the "backup server connection request rejected" state is canceled in this state to make it contact

受备份服务器连接请求; A backup server receiving a connection request;

所述网络通信模块由作业的网络连接套接字组成,存储服务器的每个作业都拥有两个网络连接套接字,分别用于和该作业对应的备份服务器作业以及备份代理作业通信; The network communication module is connected by a network socket job, each job storage server has two network connection sockets, respectively, for the backup server and the backup job, and the agent communication jobs corresponding to the job;

所述索引缓冲区是存储服务器作业执行第一备份过程和第二备份过程的基础设施,索引缓冲区以一个内存哈希表实现,用于存储本作业链中本作业实例Job^)的前一个作业实例Job"U包含的所有指纹以及在本作业运行过程中新生成的指纹; The index buffer storage infrastructure job server performing a first process and a second backup of the backup process, the index buffer memory to a hash table implementations, operations for storing the present operation of the present example chain Job ^) previous examples job job "U contains all the fingerprint and a fingerprint during this operation newly generated job;

所述分块缓冲区是存储服务器作业执行第一备份过程和第二备份过程的基础设施,分块缓冲区以一个独立的磁盘阵列实现,用以临时存储第一备份过程中其指纹在索引缓冲区中没有被找到的数据分块; The buffer is a memory block of the first backup server job execution process and a second backup process infrastructure, the block buffer to achieve a separate disk array for temporarily storing the first backup process in which the fingerprint index buffer data area of ​​the block is not found;

所述分块哈希表是存储服务器作业执行第二备份过程的基础设施, 分块哈希表以一个独立的磁盘阵列实现,用以建立分块指纹到此分块在磁盘日志的存储地址的映射; The hash table is a storage block server infrastructure backup job executing a second process, the block hash table to achieve a separate disk array, to establish this fingerprint storage address block in the disk block of the log mapping;

所述磁盘日志是存储服务器作业执行第二备份过程的基础设施,磁盘日志以一个独立的磁盘阵列实现,用以存储数据分块和以分块形式存储的文件索引。 The log is a disk storage server infrastructure job execution, the second disk log backup process to achieve a separate disk arrays for storing the data block and the index file stored in a block form. 本发明的优点为: Advantage of the present invention are:

1、 采用基于锚的文件分块技术把文件分成变长大小的块以识别文件内部或文件之间的冗余数据,具有修改稳定性,对一个文件的修改仅仅影响修改区域内相邻的数据块,其他数据块的边界不会发生移动。 1, the file-based anchor block art file into the variable-length block size to the internal identifier redundancy between file or data, having a modified stability, to modify a file modification affects only the region adjacent to the data block boundary other data blocks does not move. 这样在对一个文件进行增量备份时,仅仅修改过的几个数据块需要备份,其他的数据块可以和以前的备份文件共享;使用窗口滑动计算,计算开销小。 When such a file to the incremental backup, only the modified blocks need to back up a few, and other data blocks can share the previous backup file; using a sliding window calculation, calculation overhead is small.

2、 数据分块以其指纹为索引存储在存储服务器的磁盘阵列上,把数 2, data block as an index of its fingerprint stored on a disk array storage server, the number of

据存储地址和内容关联起来,改变了数据存储地址和内容相分离的传统 According to the address and the associated content, data storage address and the changed content of the conventional phase separation

概念,消除了冗余数据的备份,节省了磁盘存储空间; The concept eliminates the backup of redundant data, saving disk storage space;

3、 数据分块一旦存储就不再擦除,数据分块可以连续追加在磁盘上, 消除了磁盘存储碎片;用户的数据历史得到完整保存,用户可以很方便地浏览文件的任一历史版本;避免了用户误操作而删除重要数据。 3, once the storage data block is no longer erase, data can be continuously added to block on the disk, eliminating disk fragmentation; users get the full history of data stored, users can easily browse any of the historical version of a file; avoid user errors and delete important data.

4、 采用有效的备份缓冲策略,减少了备份的网络开销,提高了数据备份速度,降低了备份对应用服务器的影响。 4, using an effective backup strategy buffer, reducing the network overhead backup, improved data backup speed, reducing the backup impact on the application server.

附图说明 BRIEF DESCRIPTION

图l为本发明结构示意图; 图2为备份服务器结构示意图; FIG structural diagram of the present invention. L; FIG. 2 is a schematic view of the structure of the backup server;

图3为备份代理结构示意图; 3 is a schematic structural backup agent;

图4为存储服务器结构示意图; 4 is a schematic view of a storage server architecture;

图5为文件在磁盘日志上的存储示意图; 5 is a schematic view of a file stored on disk in the log;

图6为磁盘日志上多个文件共享数据分块/索引块示意图; 图7为本发明的索引缓冲区结构图; 图8为基于锚的文件分块技术中,文件分块示意图。 FIG 6 is a plurality of log files on the shared disk data block / schematic block index; FIG. 7 index buffer structure of the present invention; Figure 8 is a block-based anchor art document, a document of a block schematic.

具体实施方式 detailed description

下面结合附图和实施例对本发明进一步详细说明。 The present invention is described in more detail below in conjunction with the accompanying drawings and embodiments. 1、系统总体结构 1, system architecture

图l为本发明系统体系示意图,本发明包括备份服务器、备份代理、 Figure l a schematic system architecture of the present invention, the present invention comprises a backup server, backup agent,

存储服务器和Web服务器,它们通过网络相互通信完成数据备份与恢复。 Storage servers and Web servers that perform data backup and restore communication with each other through a network. 图2为备份服务器结构示意图;备份服务器包括备份服务器初始化模块、命令监听模块、命令处理模块、作业处理模块和网络通信模块;还装有配置文件和目录数据库。 FIG 2 is a schematic structural backup server; backup server includes a server backup initialization module, a monitoring module command, the command processing module, and a network communication module job processing module; further provided with profiles and directory database.

备份服务器是整个网络备份系统的指挥中枢,它通过作业对象控制着整个数据备份和恢复的过程。 The backup server is the command center of the entire network backup system, which controls the entire process of data backup and restore operations by objects. 备份服务器的作业对象给用户提供了一个定制备份/恢复作业的窗口。 Job object backup server provides the user with a customized backup / restore operations window. 作业对象包含了许多属性,这些属性指定了系统如何操作作业运行。 The job object contains a number of attributes that specify how to operate the system job runs. 如备份代理属性指定了作业从哪一台主机上备份/恢复数据;文件集属性指定了作业要备份/恢复的目录;调度策略属性指定了系统调度本作业运行的策略等等。 Such as backup agents attribute specifies the job backup / restore data from which host to; attribute specifies the file collection job to backup / restore directory; scheduling policy attribute specifies the system scheduling jobs to run this strategy and so on. 记一个作业对象为Jobx, 作业对象在时刻t被调度运行时产生一个运行实例Jobx(t)。 Referred to as a job object Jobx, a run job object instance Jobx (t) t be scheduled to run at time. 作业对象Jobx按时间顺序的一序列运行实例Jobx(to), Job"tO,... Jobx(tn) (to〈t^…〈tn) 组成了本作业对象的一条作业链,记为Job"to,t,,...g。 Jobx work object by operating a chronological sequence of Example Jobx (to), Job "tO, ... Jobx (tn) (to <t ^ ... <tn) composed of a job object of the present job chain, referred to as a Job" to, t ,, ... g. 所述备份服务器同时维护着一个目录数据库用于记录Job"t)的管理信息。具体地说,Jobx(t) 的管理信息存储在目录数据库中本作业的作业记录Jobx(t).Rec0rd中。 While the backup server maintains a directory management information database for recording Job "t) Specifically, Jobx (t) of the present management information stored in the job directory database job record Jobx (t) .Rec0rd in.

目录数据库:用来存储作业运行的管理信息,即Job"t).Record。 Jobx(t).ReCOrd主要存储本作业包含的文件的根块,本作业的指纹文件Jobx(t).FF等。每一个运行完成的作业Job"t)都在目录数据库中保存一份指纹文件Jobx(t).FF, Jobx(t).FF存储作业Job"t)所包含的所有指纹。 Jobx(tn).FF用于对作业Jobx(tn+1)的索引缓冲区进行初始化。 Directory Database: stores information for managing jobs run, i.e. Job "t) .Record Jobx (t) .ReCOrd root block of the file operation included in the main memory to the present, the present fingerprint file job Jobx (t) .FF the like. each operation completed jobs job "t) are stored in the directory database a fingerprint file Jobx (t) .FF, Jobx (t) .FF stored jobs job" all fingerprint t) included. Jobx (tn). FF for job Jobx (tn + 1) of the index buffers are initialized.

图3为备份代理结构示意图;备份代理包括备份代理初始化模块、 FIG 3 is a schematic structural diagram of a backup agent; backup agent backup agent comprising an initialization module,

请求监听模块、作业处理模块、文件分块模块和网络通信模块。 Request monitoring module, the job processing module, module file segment and a network communication module.

图4为存储服务器结构示意图;存储服务器包括存储服务器初始化 4 is a schematic view of a storage server architecture; initializing a storage server comprising a storage server

模块、连接监控模块、作业票据表、作业处理模块和网络通信模块,以及索引缓冲区、分块缓冲区、分块哈希表和磁盘日志。 Module, a connection monitoring module, the job ticket list, the job processing module and a network communication module, and index buffer, the buffer block, block hash table and disk logging.

存储服务器管理着一个大容量的磁盘阵列(RAID)用以存储数据分块。 Storage server manages a large capacity disk array (RAID) to store data block. 分块以其指纹为索引存储在磁盘阵列上。 Block fingerprints as its index is stored in the disk array. 数据分块一旦写到磁盘上就不再擦除,这样整个磁盘阵列就像一个日志,数据分块无间隔地追加在磁盘上,消除了磁盘存储的碎片。 Once the data block on the disk is no longer erased, so that the entire disk array as a log, no data block is added at intervals on the disk, the disk storage debris eliminating written. 用于存储数据分块的磁盘被称为磁盘曰志。 Disk block for storing data, said disk is called a blog. 存储服务器使用一块专用的磁盘阵列存储分块哈希表,分块哈希表用以建立分块指纹到此分块在磁盘日志的存储地址的映射。 Storage server using a dedicated disk array storage block hash table, the hash table block for establishing this mapping block fingerprint log block in the disk storage address. 备份文件的所有数据分块通过索引块进行索引, 一个文件的所有索引块组成了一棵索引树。 All the data backup file block indexed by index block, a block index of all the files that make up the index tree. 同时每一个文件都拥有唯一的一个分块叫根块,根块存储文件的索引树的根的索引,同时文件的元数据以及一些管理信息也存储在根块上。 At the same time each file has only one partition called root block, index root of the index tree root block storage file, but the file's metadata management and some of the information is also stored on the root block. 文件的根块以及索引块也作为数据分块存储在磁盘日志上。 Root block and also as an index block of the file on the disk block stored in the log data. 存储服务器采用备份缓冲策略以提高系统的数据备份速度。 Backup policy server uses the buffer memory to increase the speed of data backup system. 具体为:(1) 采用内存索引缓冲区存储本作业链中本作业实例J0b"U的前一个作业实 Specifically: (1) The buffer memory stores the index in the present work the present example the job chain J0b "before a real job of U

例Job"U包含的所有指纹以及在本作业运行过程中新生成的指纹。(2) Example Job "U contains all the fingerprints and during the operation of the present job newly generated fingerprint. (2)

采用一块专用的磁盘阵列作为分块缓冲区用以临时存储备份过程中其 Uses a dedicated disk array as a sub-block buffer for temporarily storing the backup process which

指纹在索引缓冲区中没有被找到的数据分块。 Fingerprint data is not found in the index buffer block. (3) —个作业的备份过程 (3) - jobs backup process

被分成两个阶段完成,这两个阶段分别记为第一备份过程和第二备份过程。 Is divided into two stages, the two stages are referred to as a first and a second backup process backup process. 第一备份过程由备份代理和存储服务器相互交互完成文件分块的备份,使用索引缓冲区査找分块指纹,使用分块缓冲区存储在索引缓冲区查找过程中没有发现其指纹的数据分块。 A first backup process by a storage and backup proxy server interact with each other to complete the backup files of the block, the index lookup block fingerprint buffer, a buffer memory block used in the index to find the buffer is not found during their fingerprint data block. 对备份代理来说,第一备份过程完成后作业的备份过程就算结束了。 Backup agent, the backup process is complete after the first backup job even if the end. 因为本过程使用内存索引缓冲区进行指纹查询,免去了费时的分块哈希表査询,故而速度很快。 Because this procedure uses the index buffer memory fingerprint queries, eliminating the need for time-consuming block Hash table queries, and therefore fast. 第二备份过程由存储服务器在系统相对空闲的时候运行。 During operation by the second backup storage server system is relatively idle. 本过程把分块缓冲区 The process of the present block buffer

中临时存储的数据分块转存到磁盘日志上,使用分块哈希表进行指纹查询。 Temporarily stored in the data block on the disk dump log, using fingerprint block Hash table queries. 本过程同时建立文件在磁盘日志上的索引树。 This procedure while establishing the index file on the disk tree log. 由于第二备份过程是在后台由存储服务器独自完成,故而对运行备份代理的应用服务器没有影响。 Since the second backup process is done alone by the storage server in the background, and therefore does not affect the operation of the backup proxy server application. 恢复文件时,存储服务器根据文件索引重构文件并把文件数据通过网络送到相应的备份代理。 Restoring files, the file data stored in the server and the network backup agent according to the corresponding file index file reconstruction.

Web服务器:本发明采用BS模式提供网页用户界面。 Web Server: BS mode using the present invention to provide a user interface page. 用户可以在任何地方通过Web浏览器登录系统的管理界面以指定系统完成交互式的备份或恢复作业、监视系统自动调度型作业的运行情况,还可以定制作业、配置备份服务器、进行设备管理等。 Users can specify the system to complete an interactive backup or restore job, the operation of the monitoring system automatically schedule type jobs, jobs can also customize and configure the backup server, a device management through a Web browser, log into the system management interface anywhere.

2、存储服务器磁盘白志 2, the storage server disk MUSLIM

本发明备份数据分块以其指纹为索引存储在存储服务器的磁盘日志上。 The present invention backup data block on disk log thereof fingerprint index stored in the storage server. 这样保证没有相同的两个分块同时存储在磁盘上,因而消除了冗余数据的备份。 This ensures that no two sub-blocks are stored on disk at the same time, thereby eliminating redundant data backup. 分块一旦存储就不再擦除,使得分块可以连续的追加在磁盘曰志上,消除了磁盘存储碎片。 Once the memory block is no longer erased, so that additional block may be continuously said blog on the disk, eliminating disk fragmentation. 备份文件所属的数据块以索引块为索引。 Data backup file belongs to the block index for the block index. 文件的索引块也存储在磁盘日志上。 Index block files are also stored on disk log. 2.1、分块块头 2.1, sub-block header

为了方面管理,每个数据分块的前面都附加了一个块头。 For the management, the front of each data block a block header are added. 块头为系统管理,包括完整性检测、文件索引以及分块哈希表的重构提供了必要的信息。 Header management system, including integrity detection, and the reconstructed block file index of the hash table provides the necessary information. 块头一共39字节,由以下部分组成- A total of 39 bytes of header, the following components -

magic: 6个字符的块头标志; magic: bulky six-character mark;

fingerprint:本分块的指纹,共20字节; fingerprint: fingerprint sub-block, a total of 20 bytes;

type:本数据分块的类型,共有三种不同类型的数据分块,即数据块、索引块和文件的根块,分别记为:&,/c,rc; type: The type of data block, a total of three different types of data block, i.e., the root data blocks, index blocks and files are referred to as: &, / c, rc;

size:本数据分块的大小,不包括块头。 size: size of the data block is present, not including the header. 对索引块,系统规定其大小不能超过16KB; Index block, the system can not exceed a predetermined size 16KB;

offset:本数据分块在磁盘日志上的存储地址。 offset: This data block address stored in the log disk. 2.2、文件索引 2.2, the index file

图5所示为文件在磁盘日志上的存储结构。 5 shows the storage structure of the log file on disk. 文件所属的数据块以索引块为索引,索引块也存储在磁盘日志上, 一个文件的所有索引块组成了一棵索引树;每个文件都在磁盘日志上存储有唯一的一个根块,根块里存储文件索引树的根的索引,同时还存储文件的元数据和本文件的一些管理信息。 Data block belongs to the index file block index, index block are also stored on disk log, all index file composed of a block of the index tree; each file is stored only one boot block on the disk logs, root root index storage block in the file index tree, while the number of management information in the metadata file and also stores this document. 文件备份完成后,其根块作为作业的管理信息同时也存储到目录数据库的作业记录里。 After completion of the backup file, the root block as the management information stored in the job while the job directory records in the database. 图5中,Fo表示一个文件,Dj表示数据块, Ii表示索引块,索引块由索引项组成,P(X)表示一个索引项,它是一个三元组〈H(X), offset, type>,其中X是被索引的数据分块,H(X)表示数据分块X的指纹,q诉"表示数据分块X在磁盘日志上的存储地址, 表示数据分块X的类型,X可以是一个索引块Ii,也可以是一个数据块Di,图中的箭头表示被索引块和其索引项的对应关系,M(Fo)表示文件F0的元数据以及一些管理信息,索引块Io, L和l2组成了文件Fo的索引树,索引块Io为此索引树的根,Ro表示文件F。的根块,它由M(Fo)和一个指向文件的索引树的根I()的索引项P(Io)组成。磁盘日志上的所有数据块和索引块都可以被不同的文件所共享。图6所示为不同文件共享数据块和索引块的情况,图中各记号表示的意义和图5相同。 3、存储服务器分块哈希表 5, Fo represents a file, Dj represents the data block, a block of Ii represents an index, the index entries from the index block composed, P (X) represents an index entry, which is a triple <H (X), offset, type >, where X is the index of the data block, H (X) represents fingerprint data sub-block X, q v "represents a data block X stored address on the disk log data indicating the block X type, X may be is an index block of Ii, or may be a block of data Di, an arrow in the figure represents correspondence relationship between index block and its index entries, M (Fo) indicates the metadata file F0, and number of management information, index block Io, L Fo and l2 composition document index tree index tree root index block Io, Ro represents a block of the file F. to this end the root of the index tree consisting of M (Fo) and a pointer to the root of the file the I () of index entries P (Io) composed of all the data and index blocks on the disk log file can be shared by different. FIG. 6 shows the case of shared data and index blocks for different files, each symbol shown in FIG meaning and the same 5.3, block hash table storage server

本发明存储服务器分块哈希表用以建立分块指纹到此分块在磁盘日志的存储地址的映射,分块哈希表由相同大小的桶组成。 The present invention block hash table storage server to establish a fingerprint block this block maps stored in the log disk address, block size of the hash table of the same composition of the tub. 分块哈希表所包含的桶数是根据磁盘日志的大小来确定的,磁盘日志的容量越大, 则分块哈希表所包含的桶数就越多,以降低桶的哈希冲突的几率。 Block hash table contained in barrels is determined according to the size of the disk log, the larger capacity disk log, the more the number of barrels block included in the hash table to reduce the bucket hash collision probability. 系统根据哈希表的桶数取指纹的前n位作为桶号把指纹映射到哈希表的相应的桶里。 The system according to the first n-bit barrel fingerprinting hash table as the fingerprint bucket number mapped to the corresponding bucket of the hash table. 每个指纹以三元组<y?Mg^/7n>zf, /^7&的形式存储在桶里, 其中y?wgw/^",表示此分块的指纹,o^^/表示此指纹对应的分块在磁盘 Each finger to triple <y? Mg ^ / 7n> zf, / ^ 7 & is stored in the bucket, wherein y? Wgw / ^ ", this represents a block of a fingerprint, o ^^ / represents the fingerprint corresponding to the disk block

日志上的存储地址,0^e表示此指纹对应的分块的类型。 Memory address on the log, 0 ^ e indicates the type of block corresponding to the fingerprint. 如果桶发生哈 If the bucket occurs Kazakhstan

希冲突,则把指纹的三元组存储在相邻的一个桶里。 In a bucket adjacent Greek conflict, put the triple store fingerprints.

4、存储服务器索引缓冲区 4, the index buffer storage server

图7所示为索引缓冲区的结构。 Figure 7 shows the structure of the index buffer. 索引缓冲区为一个内存哈希表,它由一个桶组和许多数据链表组成,桶组一共有1024*1024个桶,桶的编号从OOOOOH到FFFFFH,桶可能为空,桶若非空,则里面包含一个指向数据链表的指针,对应一个数据链表,数据链表的表项存储被哈希到本桶中的指纹信息。 Index buffer memory to a hash table, which consists of a bucket and a number of data sets consisting of linked lists, a total of 1024 barrels group * 1024 barrels, buckets numbered from OOOOOH to FFFFFH, the barrel may be empty, if not empty barrel, then there contains a list of pointers to the data, corresponding to a data list, data stored in the linked list of entries to be hashed fingerprint information present in the tub. 哈希时,取指纹的前20比特作为桶号把此指纹哈希到相应的桶所指向的数据链表里。 When hashing, fingerprinting front barrel 20 bits as the number corresponding to the fingerprint hash bucket in the linked list pointed to data.

数据链表的表项结构为: Entry list data structure is:

tag:标识符,占4比特,用以指示在第一备份过程和第二备份过程中本指纹的状态; tag: identifier and is of 4 bits for indicating the state of the backup process of the first fingerprint and the second backup process;

fmgerprintTail:本分块的指纹的后140比特,因为前20比特隐含在桶号中,故这里只需要存储指纹的后140比特; fmgerprintTail: after the sub-bit block 140 of the fingerprint, because the first 20 bits of the barrel in implied No., the stored fingerprint so that the herein requires only 140 bits;

offset存储地址,占64比特,如果此项非空,则表示此指纹对应的数据分块在磁盘日志的存储地址; storing offset address, representing 64 bits, if this is not empty, it indicates that the fingerprint data block corresponding to the address stored in the log disk;

next:占32比特,指向下一个表项的指针。 next: accounting for 32 bits, a pointer points to the next entry.

图7中"一个指纹"所示为一个指纹7E54F36A4EC62…3B被哈希到索引缓冲区的情况,第(l)步用指纹的前20比特"7E54F"作为桶号(bucketNo) 找到编号为7E54FH的桶,第(2)步在此桶所指的数据链表中找fingerprintTail为"36A4EC62…3B "的表项,如果找到则表明指纹7E54F36A4EC62…3B已经存储在索引缓冲区中,如果没有找到,则建立一个新的表项存储此指纹的信息。 As shown in FIG. 7 "fingerprint" of a fingerprint 7E54F36A4EC62 ... 3B where the hashed index buffer, the first 20 bits of the (l) with the fingerprint step "7E54F" as a bucket number (bucketNo) to find the number of 7E54FH barrel, (2) step to find the data in this bucket list referred to in fingerprintTail as "36A4EC62 ... 3B" entry, if it indicates that the fingerprint found 7E54F36A4EC62 ... 3B has been stored in the index buffer, if not found, establish a new entry information stored in this fingerprint.

索引缓冲区的数据链表表项的tag共有三个不同的数值,其表示的意义如下- tag index buffer data entry list there are three different values, meaning it represents the following -

0000:指纹来源于前一个作业的指纹文件,并且在本次备份过程中没有被命中; 0000: fingerprint file before the fingerprint from a job, and this has not been a hit in the backup process;

1000:指纹来源于前一个作业的指纹文件,并且在本次备份过程中被命中; 1000: fingerprints from a fingerprint file before the job, and this is a hit in the backup process;

1100:指纹是在本次备份过程中新产生的。 1100: fingerprint is in this newly created backup process.

一个备份作业Jobx(tw)完成后,本作业所包含的所有指纹以二元组〈fingerprint, offset〉(其中fingerprint表示分块的指纹,offset表示分块在磁盘日志上的存储地址)的形式被保存在文件Jobx (tn—J . FF中, 文件Jobx (tn—》.FF被存储在目录数据库的作业记录Jobx . Record中。 Jobx (tn—j). FF被用来初始化作业Jobx (U的索引缓冲区。由于同一个作业链的相邻作业通常共享大量的文件或数据,故使用Jobx(U. FF初始化作业Jobx(仁)的索引缓冲区会提高缓冲区的指纹命中率。 After a backup job Jobx (tw) is completed, the present operation all fingerprints contained in tuple <fingerprint, offset> (where represents a fingerprint of the fingerprint block, offset indicates the address block stored on the disk log) is in the form of stored in the file Jobx (tn-J FF, the file Jobx (tn -.. "FF is stored in the job directory recorded Jobx record in the database Jobx (tn-j) FF is used to initialize the job Jobx (U a... index buffer. Since the same job adjacent chain typically share a job file or a large amount of data, so the use of Jobx (U. FF initialization job Jobx (ren) index buffer hit rate will improve the fingerprint buffer.

5、备份过程 5, the backup process

为方便起见,定义如下记号: For convenience, the symbols are defined as follows:

BS:备份服务器作业工作线程; BS: Backup server job worker;

BA:备份代理作业工作线程; BA: Backup Agent job worker;

SS:存储服务器作业工作线程; SS: storage server job worker;

F: —个文件; F: - files;

比一个指纹; Than a fingerprint;

M(F):文件F的元数据; M (F): F, metadata file;

R(F):文件F的根块; R (F): the root block of the file F;

H(D):数据分块D的指纹; H (D): partial fingerprint data block D;

D(H):指纹H所对应的数据块/索引块; D (H): H fingerprint data corresponding to the block / block index;

F.Index:构建文件F的索引树的内存缓冲区; F.Index: constructing a memory buffer file F index tree;

index cache:索引缓冲区; index cache: index buffer;

chunk cache:分块缓冲区; chunk cache: block buffer;

hash table:分块哈希表; hash table: block hash table;

Jobx(tn).FileSet:作业对象Jobx(O的文件集; Jobx (tn) .FileSet: job object Jobx (O set of files;

I(F, /eve/):索引树F.Index第level层包含的索引块的集合。 The first layer comprises a set level index block index tree F.Index: I (F, / eve /). 索引树的叶子被定义成O层,叶子结点的父结点为树的第l层,依次类推。 Leaf tree index layer is defined as O, the leaf nodes of the parent node of the tree layer l, and so on.

IW(F, level): I(F, level)中当前被用于存储三元组<H, offset, type〉的工作结点; IW (F, level): I (F, level) currently being used to store triples <H, offset, type> operating node;

<H, offset, type>:三元组,H:指纹,offset:分i央D(H)在磁盘日志上的存储地址,type:分块D(H)的类型; 5.1、第一备份过程 <H, offset, type>: triplet, H: fingerprint, offset: central points i D (H) stored on disk in the log address, type: type of block D (H); a 5.1, a first backup process

第一备份过程主要由备份代理作业工作线程和存储服务器作业工作线程协作完成,其步骤为: The first major collaboration to complete the backup process by the backup agent jobs storage server jobs and worker thread work comprises the following steps:

SS:使用Jobx(tn.O.FF初始化index cache; SS: Use Jobx (tn.O.FF initialization index cache;

(2) BA: if (Jobx(g.FileSet为空)转(20) , else从Jobx(g.FileSet中读取一个文件Fj; (2) BA: if (Jobx (g.FileSet empty) turn (20), else read a file from Jobx (g.FileSet in Fj;

(3) BA:传送M(FO到SS; (3) BA: transmitting M (FO to the SS;

(4) SS:把M(Fj)缓存到chunk cache; (4) SS: the M (Fj) to the cache chunk cache;

(5) BA:对Fj进行基于锚的文件分块; (5) BA: Fj based on the anchor block of the file;

(6) BA:计算每个分块的指纹并把这些指纹组成的指纹集合传送到 (6) BA: calculated for each block of the fingerprint and the fingerprint set of fingerprints composition transferred to

SS; SS;

(7) SS:if (指纹集合为空)转(17), else在指纹集合中取出一个指纹Hj并在index cache中査询此指纹; ' (7) SS: if (fingerprint set is empty) switch (17), else remove a fingerprint in a fingerprint Hj query this set and the fingerprint index cache; '

(8) SS:if (在index cache查到指纹Hj) { (8) SS: if (found in Hj fingerprint index cache) {

(9) SS: if (tag==0000) {tag=1000;把<Hj, offset〉缓存到chunk cache0 (9) SS: if (tag == 0000) {tag = 1000; the <Hj, offset> cached chunk cache0

(10) SS: .else if (tag==1000)把<Hj, offset>缓存至'J chunk cach^ (10) SS: .else if (tag == 1000) the <Hj, offset> buffer to 'J chunk cach ^

(11) SS: else if (tag==l 100)把<Hj, null〉缓存到chunk cache;} (11) SS: else if (tag == l 100) the <Hj, null> cached chunk cache;}

(12) SS: else (把Hj缓存至ljindex cache, tag=1100, offset=null; (12) SS: else (Hj the cache to ljindex cache, tag = 1100, offset = null;

(13) SS:请求BA传送D(Hj); (13) SS: BA transfer request D (Hj);

(14) BA:传送D(Hj)到SS; (14) BA: transfer D (Hj) to the SS;

(15) SS:把〈Hk,D(Hk)〉缓存到chunk cache;} (15) SS: the <Hk, D (Hk)> cached chunk cache;}

(16) SS:返回步骤(7); (16) SS: returns to step (7);

(17) SS:通知BA备份下一个文件; (17) SS: BA notification next file backup;

(18) BA:返回步骤(2); (18) BA: returns to step (2);

(19) BA:向BS及SS报告作业Jobx(g的结束状态然后退出. (19) BA: report to the BS and SS job completion status Jobx (g then exit.

(20) SS:收到BA的作业结束信号后,结束第一备份过程,转入第二备份过程; (20) SS: after receiving the operation end signal BA, a first end of the backup process, the backup process proceeds to a second;

(21) BS:收到BA的作业结束信号后,断开和BA的连接,等待SS执 (21) BS: after receiving the operation end signal BA and BA disconnect connections, execute wait SS

行第二备份过程。 The second line of the backup process.

5.1.1基于锚的文件分块 5.1.1 File-based anchor block

在第一备份过程的步骤(5)中,基于锚的文件分块是由备份代理作业工作线程调用备份代理文件分块模块完成的,其步骤为: In the first step of the backup process (5), a block-based anchor job file by a backup agent worker thread calls the backup proxy module completes the file block, comprising the steps of:

(1)以文件的开头48字节b,,b2,…,b48为一个窗口,以式 (1) beginning with 48 bytes of the file b ,, b2, ..., b48 of a window to the formula

H产(b,p47+Vy6+…+b48)modM计算文件的第一个窗口的哈希值。 The first hash value H of producing a window (b, p47 + Vy6 + ... + b48) modM computing file. 上式中p为某个素数,可取17, M为常数,可取232 。 The above formula is a prime number p, preferably 17, M is a constant, preferably 232. 哈希值存储在变量I^ 中。 The hash value stored in the variable I ^ in.

(2) 向后滑动一个字节,以式H尸(p承H一b49-bJp48) mod M计算文件第二个窗口b2,b3,...,b49的哈希值存储在变量H2中。 (2) a byte slid rearwardly, the corpse to the formula H (p H a bearing b49-bJp48) mod M calculated second window file b2, b3, ..., b49 of the hash value stored in the variable H2.

(3) 以此类推,计算文件的所有窗口的哈希值。 (3) and so on, all the windows of the hash value calculation file. (4) 对每个窗口的哈希值,取其低13位组成一个二进制数,如果此数等于预定的某个数(比如61),则确定其相应的窗口为一个锚。 A number (such as 61) (4) a hash value for each window, whichever is lower 13 composed of a binary number, if this is equal to the predetermined number, the corresponding window is determined as an anchor.

(5) 以锚为边界把文件分成大小不一的数据块。 (5) as a boundary to anchor the file into blocks of different sizes. 上述基于锚的文件分块遵守如下三个约定:a)如果文件小于48字 Based on the above anchor block files comply with the following three conventions: a) if the file is less than 48 characters

节,则退出基于锚的文件分块算法,整个文件为一个数据块;b)如果在某一段字节流中包含过多的锚,则舍弃一些锚使得最小的分块不小于2KB (文件末尾的一个分块是唯一的可能小于2KB的分块);c)如果在连续64KB的字节流中都没有锚,则取此64KB为一个分块; Section, exit anchor-based algorithm block file, the entire file as a data block; b) if too many anchor byte stream in a certain period, the number of discarded anchor block such that a minimum of not less than 2KB (end of file one block is the only block may be less than a 2KB); c) If no anchor is 64KB in a continuous stream of bytes, this is taken as a 64KB block;

本发明中基于锚的文件分块具有如下两个特点:(1)具有修改稳定性,也就是说对一个文件的修改仅仅影响修改区域内相邻的数据块,其他数据块的边界不会发生移动。 The present invention is based on the anchor block of the file has two characteristics: (1) having a modified stability, meaning that changes to a file modification affects only the region adjacent to the block boundary does not occur in the other data blocks mobile. 这样在对一个文件进行增量备份时,仅仅修改过的几个数据块需要备份,其他的数据块可以和以前的备份文件进行共享。 So that when one file for incremental backup, only the modified blocks need to back up a few other data blocks can be shared and the previous backup file. 修改稳定性还保证了文件内部以及文件之间的数据相似性不因比特偏移而被遗漏,从而最大限度地检测出文件的重复数据。 Modified stability also ensures data exchange between the internal documents, and document similarity is not missed due to bit shift, thereby maximizing detected duplicate data files. (2)滑动窗口具有计算方便的优点,其下一个窗口的哈希值可以很容易从前一个窗口的哈希值的基础上计算出来,因而使得基于锚的文件分块具有计算开销小的优点,整个算法的时间复杂度为o(n),其中"为文件包含的字节数。 (2) has the advantage of easy sliding window is calculated, the hash value of the next window which can be easily calculated from a previous hash value based on the window, thus making the file based on the anchor block has the advantages of small computational overhead, the time complexity of the algorithm is o (n), where the number of bytes "contained in the file.

图8所示为一个文件分块后再对文件编辑时此文件分块的变化情况。 Figure block of a document after changes to the file edit file case 8 block. 从图中可以看出,基于锚的文件分块具有修改稳定性,也就是说对一个文件的修改仅仅影响修改区域内相邻的数据块,其他数据块的边界不会发生移动。 As can be seen from the figure, the file segment based on the anchor having a modified stability, meaning that changes to a file modification affects only the region adjacent to the block boundary other data blocks does not move. a行所示为一个文件被锚分成了Bi〜B8大小不一的8块,每一块的边界带纹齿的部分为48字节的锚。 As shown in a row of a document is divided into eight anchor Bi~B8 different sizes, the boundaries of the bands of each one of the toothed portion of the anchor 48 bytes. b、 c、 d行为对文件进行第l、 2、 3次修改后,分块的变化情况,带阴影的部分为被修改过的部分。 b, c, d of the behavior of the file L, 2, 3 revision, changes in sub-block, part of the modified portion shaded. b行-对文件的第1次修改发生在块B4内,修改后并没有产生新的块,仅仅使块B/变成了块B9,其它的块都没有发生改变。 line b - the first file 1 changes occurring in block B4, modified and no new block, so that only the block B / block into a B9, the other blocks are not changed. 这时候的文件备份就只需要 This time only need to backup files

把块B9备份过去替代原来的块B4就可以了。 The block B9 past backup to replace the original block B4 on it. C行:对文件的第2次修改发 C: Line 2 modifications made to the file

生在块Bs内,修改后产生了新的锚,把块Bs分成了两块Bu)和Bn,其它的块都没有发生改变。 Block Bs in the raw, modified produce new anchor, the block is divided into two Bs Bu) and Bn, the other blocks are not changed. 这时候的文件备份就只需要把块Bu)和Bn备份过去代 This time file backup will only need to block Bu) and backup Bn past generations

替原来的块Bs就行了。 The original block Bs for the line. d行:对文件的第3次修改发生在块B2和B3的分界处,结果使B2和B3之间的锚丢失,两块合并成为一个块B,2。 Line d: a third file revision occurs in blocks B2 and B3 are at the boundary, with the result that the anchor is lost between B2 and B3, two merged into a block B, 2. 这时候的文件备份只需把块Bi2备份过去代替原来的块B2和B3。 This time file backup backup simply block Bi2 past instead of the original block B2 and B3.

5.2、第二备份过程 5.2, the second backup process

第二备份过程主要由存储服务器作业工作线程在系统相对空闲的时候完成,其步骤为- The second primary backup process when the system is relatively idle jobs performed by the storage server worker thread, the steps of -

(1) SS.' if (Jobx(tn).FileSet为空)转(19) , else从Jobx(tn).FileSet中 (1) SS. 'If (Jobx (tn) .FileSet empty) switch (19), else from Jobx (tn) .FileSet in

取一个文件名Fi; Take a filename Fi;

(2) SS:为文件Fi创建内存缓冲区Fi.Index,并在Fi.Index中创建R(FO, 然后把chunk cache中的M(Fi)存到R(FO; (2) SS: Fi memory buffer Fi.Index file is created, and creates R (FO in Fi.Index then the chunk cache in the M (Fi) to the memory R (FO;

(3) SS: if (chunk cache中没有和Fj相关的元组)转(14) , else从chunk cache中读取一个和Fi相关的元组; (3) SS: if (no tuple chunk cache and associated Fj) switch (14), else Fi and reads a tuple from the chunk cache the associated;

(4) SS:if (是〈Hj,offset〉),转步骤(12); (4) SS: if (a <Hj, offset>), go to step (12);

(5) SS: else if (是<Hj,D(Hj)>) { (5) SS: else if (a <Hj, D (Hj)>) {

(6) SS:在hash table中查询Hj; (6) SS: Hj query in the hash table;

(7) SS: if (找到)把"offset"值写到index cache中和的Hj对应的表项中,转步骤(12); (7) SS: if (found) the "offset" value is written to the index cache and a corresponding entry Hj, go to step (12);

(8) SS:else(把D(Hj)追加到磁盘日志,同时更新hash table; (8) SS: else (the D (Hj) is added to a log disk, while updating the hash table;

(9) SS:把"offset,,值写到index cache中和的Hj对应的表项中,转步骤(12);» (9) SS: Hj corresponding entry in the transfer step "offset ,, and writes the value in the index cache (12);»

(10) SS: else if (是〈Hj,nu11〉) (10) SS: else if (a <Hj, nu11>)

(11) SS:从index cache中Hj对应的表项中读取"offset"值;(12) SS: insert(〈Hj, offset, 0, F"Index); (11) SS: read "offset" value Hj from the corresponding index cache entry; (12) SS: insert (<Hj, offset, 0, F "Index);

(13) SS:返回步骤(3); (13) SS: returns to the step (3);

(14) SS: storeRemain(Fi.Index, R(Fi)); (14) SS: storeRemain (Fi.Index, R (Fi));

(15) SS:把R(Fi)追加到磁盘日志,同时更新hashtable; (15) SS: the R (Fi) is added to a log disk, while updating Hashtable;

(16) SS:把R(Fi)传送给BS; (16) SS: the R (Fi) is transmitted to the BS;

(17) BS:把R(Fi)传送到目录数据库并存储在Jobx(tn).Record中; (17) BS: The R (Fi) is transferred to and stored in the directory database Jobx (tn) .Record in;

(18) SS:返回步骤(l); (18) SS: returning to step (L);

(19) SS:创建文件Job"UFF; (19) SS: Create a file Job "UFF;

(20) SS:读index cache,对每一个符合条件(tag==1000 or tag二4100)的表项,把〈H, offset〉写到文件Jobx(tn).FF中; (20) SS: Read index cache, each entry matches (tag == 1000 or tag two 4100), and the <H, offset> written document Jobx (tn) .FF in;

(21) SS:把文件Jobx(tn).FF传送给BS; (21) SS: the file Jobx (tn) .FF transmitted to the BS;

(22) BS:把文件Jobx(tn).FF传送到目录数据库并存储在Jobx(tn).Record中; (22) BS: the file Jobx (tn) .FF transferred to and stored in the catalog database Jobx (tn) .Record in;

(23) SS:向BS报告作业Job"tn)的结束状态; (23) SS: Job Job "tn BS reports to) the end state;

(24) BS:中断和SS的连接,把作业Job"tn)的结束状态写到目录数据库的Job"tn).Record中,并结束作业Jobx(t。)运行。 (24) BS: SS interrupts and connection to the job Job "tn) ending the state of Job writes the directory database" tn) .Record and terminates the operation Jobx (t) is running.

在上述算法中,步骤(12)和(14)两个函数的算法如下: 步骤(12)算法 In the above algorithm, (14) algorithm steps (12) and the two functions as follows: Step (12) Algorithm

insert(<//, <^e/,砂e〉, /eve/, i^7wfe:c」 (〃存储三元组<//; Oi侨",0^e〉到F/wfec. insert (<//, <^ e /, sand e>, / eve /, i ^ 7wfe: c '(storage 〃 triplet <//; Oi overseas ", 0 ^ e> to F / wfec.

〃/eve/:存储三元组<//, 0^e〉的索引结点在索引树i^mfec 中的层号. 〃 / eve /: storing triples <//, 0 ^ e> index layer node tree the index number of i ^ mfec.

if(I(F,/eve/)=0) (创建U7,/eve/)3E<//; c#ef, (v/?e>存储到U7,/eve/);返回;} else if (Iw(F, /eve/)未满) if (I (F, / eve /) = 0) (created U7, / eve /) 3E <//; c # ef, (? v / e> stored U7, / eve /); return;} else if (Iw (F, / eve /) less than)

{存储q^試0;/?e〉至廿Iw(F,/eve/)中;返回;} else if(Iw(F,/eve/)已满) (计算H(IwCF, /we/)); 在hash table中査询H(IW(F,/eve/)); if未找到 {Q ^ storage test 0; / e> Zhinian Iw (F, / eve /); and? Return;} else if (Iw (F, / eve /) full) (calculated H (IwCF, / we /) ); query H (IW (F, / eve /)) in the hash table; if not found

把IwCF, /eve/)追加到磁盘日志,同时更新hash table; insert(<H(Iw(F, /eve/)), offset, /c>, /eve/+7, F/wfex:); The IwCF, / eve /) is added to a log disk, while updating the hash table; insert (<H (Iw (F, / eve /)), offset, / c>, / eve / + 7, F / wfex :);

创建一个新的索引结点〜(F, /eve/); Creating a new node index ~ (F, / eve /);

存储<//iq^W, &/7e〉到U^,/ew/)中;返回; Memory <// iq ^ W, & / 7e> to U ^, / ew /); and Returns;

步骤(14)算法storeRemain(i^/"tfex, i?(F」) Step (14) algorithm storeRemain (i ^ / "tfex, i? (F")

(〃把F/mfec中每一层的工作索引结点存储到磁盘日志中. (〃 the F / mfec each layer index nodes operating log to disk.

int /eve/:=0j loop:计算H(Iw(F, /eve/)); 在hash table中査询H(Iw(F, /eve/)); if未找到 int / eve /: = 0j loop: calculate H (Iw (F, / eve /)); query H (Iw (F, / eve /)) in the hash table; if not found

把Iw(F, /ew/)追加到磁盘日志,同时更新hash table; The Iw (F, / ew /) is added to a log disk, while updating the hash table;

if(斷/—1=1) {存储〈H(Iw(F,/eve/)), /c >到返回;} else if (OFF / -1 = 1) {Store <H (Iw (F, / eve /)), / c> to return;} else

{ insert(邻Iw(F, /,/)), <#赋/c〉, /eve/:=/eve/+l; goto loop; {Insert (o Iw (F, /, /)), <# Fu / c>, / eve /: = / eve / + l; goto loop;

} 、 },

Claims (4)

  1. 1.一种基于指纹的数据备份系统,包括备份服务器、备份代理、存储服务器和Web服务器,它们通过网络相互通信完成数据备份与恢复,其特征在于: 所述备份服务器装有配置文件和目录数据库,备份服务器的配置文件中记录用户定义的作业对象,作业对象包含指定系统操作作业运行的属性,备份服务器通过作业对象控制着整个数据备份和恢复的过程;目录数据库存储作业记录,作业记录保存作业对象运行的管理信息; 所述备份代理单元安装于网络中每一个需要备份数据的主机上,备份时由备份代理单元从所在主机的文件系统中读取需要备份的文件,对文件进行基于锚的分块并计算分块的指纹,把指纹和部分需要的分块数据通过网络送往存储服务器;恢复时备份代理单元通过网络从存储服务器接收文件数据并写到所在主机的文件系统中指定的目录下 A fingerprint-based data backup system, including the backup server, backup agent, storage servers and Web servers that perform data backup and restore communication with each other through a network, wherein: said backup server with configuration file and directory database configuration file backup server recording target user-defined job object contains attribute specifies the system operation job runs, the backup server controls the entire data backup and recovery process by the job object; catalog database storing job records, job records saved job the management information objects running; the backup unit is installed in a network proxy each need to backup data on the host reads files copied from the host that the file is backed up by the backup system when the agent unit, the file-based anchor the directory specified in the backup unit via the network proxy server receives data from the storage file is restored and written to the host that the file system; fingerprint calculation block and the block, the block and the fingerprint data portions required by the network storage server sent under ; 所述存储服务器安装有大容量磁盘阵列,大容量磁盘阵列是数据备份的目的地,备份时通过网络从相应的备份代理单元接收指纹或数据分块,把数据分块存储到磁盘上,并建立文件的索引;恢复时则从大容量磁盘阵列根据文件索引重构文件,并把文件数据通过网络送到相应的备份代理; 所述Web服务器是本系统的BS模式网页用户管理界面,通过登录Web服务器,用户既可以指定系统完成交互式的备份或恢复作业、监视系统自动调度型作业的运行情况,还可以修改备份服务器的配置文件、定制作业对象,进行设备管理。 ; The server is attached to a large-capacity storage disk array, the disk array is large-capacity data backup destination, receiving a fingerprint or a data block from the corresponding backup when the backup proxy unit via the network, the data block stored on the disk, and document indexing; from large-capacity disk array according reconstructed file index file, and the file data over the network to the respective recovery backup agent; the Web server is a web BS mode user management interface of the present system, by logging Web server, the user can either specify an interactive system to complete the backup or restore operation, the operation of the monitoring system automatically scheduling the job type, you can also modify the configuration file backup server, customized job object, device management.
  2. 2.如权利要求1所述的基于指纹的数据备份系统,其特征在于,所述备份服务器包括备份服务器初始化模块、命令监听模块、命令处理模块、作业处理模块和网络通信模块;所述备份服务初始化模块执行初始化工作,包括读取配置文件、建立内存中的资源链表、检查目录数据库状态、保证配置文件和目录数据库的数据一致性和完整性、启动命令监控端口、接受来自Web服务器的用户命令、初始化作业队列和用户命令队列、向作业队列中加载作业对象、启动作业和网络监控服务;所述命令监听模块是由系统生成的一个网络监听线程,对Web服务器的连接请求进行认证,保证只有经过系统授权的Web服务器才能连接系统,监听已通过认证的Web服务器发送来的命令请求;收到命令请求时,将命令请求加入到用户命令队列中等待系统处理;所述命令处理模块包括一 2. The fingerprint-based data backup system according to claim 1, wherein said backup server comprises a backup server initialization module, a monitoring module command, the command processing module, and a network communication module job processing module; the backup service initialization module to perform initialization work, including reading configuration file, establish a resource list in memory, check the status of the directory database, ensure data consistency and integrity of database configuration files and directories, start a command monitor port, accepts user commands from the Web server initialization user command queue and the job queue, the job object loaded into the job queue, the job and start network monitoring services; the command generated by the monitoring module is a network monitoring system thread, the Web server connection request for authentication, to ensure that only through the system authorized to connect to the Web server system, the listener has been certified by the Web server sends a command request; upon receipt of the request command, the command requests in the command queue is added to the user wait for processing; said command processing module includes a 用户命令队列和N个命令工作线程,当用户命令队列溢出时,命令监听模块转入睡眠状态;这些命令工作线程不断从用户命令队列中读取命令并执行,根据所执行命令的不同完成不同的功能;当命令监听模块向用户命令队列中加入一个命令时,如果当前没有空闲的命令工作线程且活跃的命令工作线程的数目没有达到N 时,就生成一个新的命令工作线程;命令工作线程每次从用户命令队列中读取命令时都检查命令监听模块的状态,如果其处于睡眠状态则唤醒它;所述作业处理模块包括一个作业队列、L个作业工作线程和一个作业队列加载线程,当作业队列发生溢出时,作业队列加载线程进入睡眠状态;作业工作线程不断从作业队列中取作业对象并执行,根据作业对象属性的不同调用不同的资源、实现不同的功能;作业队列加载线程进行作业调度,检查 N user commands and command queue worker thread, when the user command queue overflows, the command monitoring module into a sleep state; worker threads continuously read these commands in the command queue and executing commands from a user, depending on the completion of execution of the command different function; when the command listener module adds a command to the user command queue, the number of commands worker thread if no idle command worker thread and active does not reach N, generates a new command worker thread; command worker threads per times from the user command queue status command to check for the monitoring module of the read command, if it is in a sleep state the wake it up; the job includes a job queue processing module, L job queue worker thread and a thread loading operation, when when the job queue overflow occurs, the job queue load thread goes to sleep; work worker continues to take the job object from the job queue and executed, according to different calling job object attributes resources to achieve different functions; job queue load thread work scheduling, inspection 业资源链中每个作业对象的调度策略属性,把需要调度运行的作业对象加入作业队列中,如果当前没有空闲的作业工作线程且活跃的作业工作线程的数目没有达到L时,就生成一个新的作业工作线程;作业工作线程每次从作业队列中读取作业对象时都检查作业队列加载线程的状态,如果其处于睡眠状态则唤醒它;所述网络通信模块把标准的网络通信应用编程接口进行封装,向命令工作线程和作业工作线程提供网络通信接口,网络通信接口实现备份服务器、备份代理和存储服务器之间的数据传输协议。 When the industry chain resource scheduling policy attributes of each job object, the objects need to schedule jobs to run added to the job queue, if the number of jobs currently no idle worker threads and active job worker thread does not reach L, to generate a new Task of the thread; job worker checks every job object read from the job queue status of the job queue load thread, if it is in a sleep state the wake it up; the network communication module to the communication network standard application programming interface encapsulated to provide a network communication interface, a network communication interface for data transfer protocol between the backup server, the backup server to the proxy and the storage operation command worker thread and the worker threads.
  3. 3.如权利要求1所述的基于指纹的数据备份系统,其特征在于,所述备份代理包括备份代理初始化模块、请求监听模块、作业处理模块、 文件分块模块和网络通信模块;所述备份代理初始化模块,执行初始化工作,包括读取备份代理配置文件、建立内存资源链表、初始化作业队列、启动备份服务器请求监听模块;所述请求监听模块监听网络上备份服务器的连接请求,认证连接的备份服务器,认证通过后生成一个网络连接套接字和此备份服务器通信并加入作业队列中;所述作业处理模块包括一个作业队列和M个作业工作线程,当作业队列溢出时,请求监听模块转入睡眠状态;作业工作线程从作业队列中取出一个网络连接套接字后,首先为作业建立一个作业控制记录,把网络连接套接字链入作业控制记录的成员变量中,然后通过此网络连接套接字和备份 3. The fingerprint-based data backup system according to claim 1, wherein said backup agent comprising a backup agent initialization module, the monitoring module a request, the job processing module, module file segment and a network communication module; a backup agent initialization module, perform initialization, the backup agent comprising reading the configuration file, the establishment of memory resources list, initializes job queue, the backup server start request to the monitoring module; listens for connection requests on a network monitoring module requests the backup server, backup authenticated connection server, a network connection socket and generates the communication server and added to the backup job queue after the authentication; the job processing module comprises a job queue worker thread and M job, when the job queue has overflowed into the requesting listener module sleep; thread work out a job from the job queue network connection socket, a first for establishing the job record control job, the job network connection links socket member variable control recording, and through this network connection sleeve Sockets and backup 务器交互,把备份服务器作业对象的有关属性通过变换后赋值给作业控制记录的相应成员变量;然后用从备份服务器处得到的作业票据ticket连接相应的存储服务器,产生一个和存储服务器通信的网络连接套接字并将之链入作业控制记录的成员变量中;当请求监听模块向作业队列中加入一个网络连接套接字时,如果当前没有空闲的作业工作线程且活跃的作业工作线程的数目没有达到M时,就生成一个新的作业工作线程;作业工作线程每次从作业队列中取一个网络连接套接字时都检查请求监听模块的状态,如果其处于睡眠状态则唤醒它;所述文件分块模块接受作业处理模块中作业工作线程的命令执行备份作业的文件分块任务,在客户机文件系统上打开文件集中的每一个文件,对文件进行基于锚的分块并计算分块指纹,和相应的存储服务器协调执行第一 Services interact, the relevant properties of the backup server job object assignment after by conversion to the corresponding member variable job control record; then the job ticket ticket obtained from the backup server connected to the respective storage servers, generating a network and a server in communication storage socket and the connecting member variable chain recorded into the job control; and when the monitoring module a request to join a network socket connection to the job queue, if the number of jobs is not currently idle and active worker threads operating the work when M is not reached, generates a new job worker thread; Task of each thread are taken to check the state of a network connection request listening socket module from the job queue, if it wakes up it is in a sleep state; the file segmentation module accepts the job processing module job worker threads command file backup job block task to open each file set in the client file system, the file-based anchor block and calculating block fingerprints , and corresponding coordination server performing a first storage 份过程的备份算法;所述网络通信模块由作业的网络连接套接字组成,备份代理的每个作业都拥有两个网络连接套接字,分别用于和该作业对应的备份服务器作业以及存储服务器作业通信。 Algorithm parts backup process; the network communication module is connected by a network socket composition job, each job in the backup agent has two network connection sockets, respectively, for the job and the job corresponding to the backup server and a storage communications server job.
  4. 4.如权利要求1所述的基于指纹的数据备份系统,其特征在于,所述存储服务器包括存储服务器初始化模块、连接监控模块、作业票据表、 作业处理模块和网络通信模块,以及索引缓冲区、分块缓冲区、分块哈希表和磁盘日志;所述存储服务器初始化模块执行初始化工作,包括解析存储服务器配置文件,建立内存资源链表,启动相关服务线程;所述连接监控模块监控备份服务器和备份代理的连接请求,对连接的备份服务器进行认证,认证通过后生成一个网络连接套接字和此备份服务器通信并加入作业队列中;对连接的备份代理,则根据其出示的作业票据ticket检査作业票据表以对其进行认证,认证通过后生成一个网络连接套接字和此备份代理通信并链接到相应作业控制记录的成员变量中;所述作业票据表用于存储对备份代理作业进行认证的票据; 所 4. The fingerprint-based data backup system according to claim 1, wherein the storage server comprises a storage server initialization module, a connection monitoring module, the job ticket list, the job processing module and a network communication module, and index buffer , block buffer block hash tables and log disk; the storage server initialization module to perform initialization work, including the storage server configuration file parsing, the establishment of memory resources list to start the related service thread; the connection monitoring module monitors the backup server and backup proxy connection request, the connection to the backup server for authentication, generates a network connection socket and the communication server and added to the backup job queue after the authentication; backup agent connection, which is presented in accordance with the job ticket ticket check the job ticket table to authenticate it, and generates a network connection socket and the backup agent communication link to the control member variable corresponding job recorded after the authentication; the job ticket table for storing backup job agent notes authentication; the 作业处理模块包括一个作业队列以及W个作业工作线程,当作业队列溢出时,连接监控模块转入"拒绝备份服务器连接请求"状态; 作业工作线程从作业队列中取出一个网络连接套接字后,首先为作业建立一个作业控制记录,把网络连接套接字链入作业控制记录的成员变量中,然后通过此网络连接套接字和备份服务器交互,把备份服务器作业对象的有关属性通过变换后赋值给作业控制记录的相应成员变量,并随机生成一个作业票据ticket登记到作业票据表中且向备份服务器作业对象传送此作业票据ticket;当连接监控模块向作业队列中加入一个网络连接套接字时,如果当前没有空闲的作业工作线程且活跃的作业工作线程的数目没有达到W时,就生成一个新的作业工作线程;作业工作线程每次从作业队列中取一个网络连接套接字时都检查连接监控模块的状态, The processing module comprises a job queue worker thread and W jobs job when the job queue overflows, a connection monitoring module into the "backup server connection request rejected" state; after a worker thread operating network connection socket is removed from the job queue, first create a job to a job control recorded into the job network connection socket chain variable control of the recording member, and then connected to the socket and the backup server interaction through this network, the backup server of the properties of the job object by assigning the transformed when connected to a network monitoring module is added to the job queue, the socket connector; member variable corresponding to the job control record, and randomly generated job ticket and transmits this to the backup server job ticket objects a job ticket to the job ticket registered ticket list when the job are checked every worker takes a network connection socket from the job queue;, the number of worker threads job if no job idle and active worker thread does not reach W, generates a new job worker thread a state monitoring module is connected, 如果其处于"拒绝备份服务器连接请求"状态则取消这种状态以使它接受备份服务器连接请求;所述网络通信模块由作业的网络连接套接字组成,存储服务器的每个作业都拥有两个网络连接套接字,分别用于和该作业对应的备份服务器作业以及备份代理作业通信;所述索引缓冲区是存储服务器作业执行第一备份过程和第二备份过程的基础设施,索引缓冲区以一个内存哈希表实现,用于存储本作业链中本作业实例J0bx(tn)的前一个作业实例j0b"U包含的所有指纹以及在本作业运行过程中新生成的指纹; '所述分块缓冲区是存储服务器作业执行第一备份过程和第二备份过程的基础设施,分块缓冲区以一个独立的磁盘阵列实现,用以临时存储第一备份过程中其指纹在索引缓冲区中没有被找到的数据分块;所述分块哈希表是存储服务器作业执行第二备份过程的基础 If it is in the "backup server connection request rejected" state is canceled in this state so that it accepts the connection request to the backup server; the network communication module by a network socket connection job, each job storage server has two network connection socket, respectively, for the job and the job corresponding to the backup server and the backup job communication agent; the index buffer storage server infrastructure backup job executing a first process and a second backup process to index buffer a memory hash table implementations, for example before a job is stored in the present work the present example the job chain J0bx (tn) of j0b "All U fingerprint and a fingerprint included in the present process to run a new job generated; 'said block buffer storage server infrastructure operation is performed during a first and a second backup of the backup process, the block buffer to achieve a separate disk array for temporarily storing the first backup during which the fingerprint is not in the index buffer found data block; the block hash table base for storing the second backup server a job execution process 施,分块哈希表以一个独立的磁盘阵列实现,用以建立分块指纹到此分块在磁盘日志的存储地址的映射;所述磁盘日志是存储服务器作业执行第二备份过程的基础设施,磁盘日志以一个独立的磁盘阵列实现,用以存储数据分块和以分块形式存储的文件索引。 Shi, block hash table to achieve a separate disk array, to establish a fingerprint block this block maps stored in the log disk address; the log disk storage server infrastructure second backup job execution process , to a separate disk log disk array implemented to store data block and the index file stored in a block form.
CN 200710168715 2007-12-10 2007-12-10 Data stand-by system based on finger print CN100547555C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 200710168715 CN100547555C (en) 2007-12-10 2007-12-10 Data stand-by system based on finger print

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 200710168715 CN100547555C (en) 2007-12-10 2007-12-10 Data stand-by system based on finger print

Publications (2)

Publication Number Publication Date
CN101183323A true true CN101183323A (en) 2008-05-21
CN100547555C CN100547555C (en) 2009-10-07

Family

ID=39448610

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 200710168715 CN100547555C (en) 2007-12-10 2007-12-10 Data stand-by system based on finger print

Country Status (1)

Country Link
CN (1) CN100547555C (en)

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009152716A1 (en) * 2008-06-19 2009-12-23 华为技术有限公司 Method, device for storing data fingerprint and method for synchronizing data of plurality of devices
CN101814045A (en) * 2010-04-22 2010-08-25 华中科技大学 Data organization method for backup services
CN101887388A (en) * 2010-06-18 2010-11-17 中兴通讯股份有限公司 Data backup system and method based on memory database
CN102169453A (en) * 2011-03-08 2011-08-31 杭州电子科技大学 File online backup method
CN101599079B (en) 2009-07-22 2011-08-31 中国科学院计算技术研究所 Backup data centralized storage management method
CN102436408A (en) * 2011-10-10 2012-05-02 上海交通大学 Data storage cloud and cloud backup method based on Map/Dedup
CN102456059A (en) * 2010-10-21 2012-05-16 英业达股份有限公司 Data deduplication processing system
CN102510340A (en) * 2011-10-11 2012-06-20 浪潮电子信息产业股份有限公司 Method for realizing remote rapid backup by utilizing common Internet network
CN102714789A (en) * 2011-04-19 2012-10-03 华为终端有限公司 Method for backuping and recovering data of mobile terminal and mobile terminal thereof
CN102915325A (en) * 2012-08-11 2013-02-06 深圳市极限网络科技有限公司 Md5 Hash list-based file decomposing and combining technique
CN103119590A (en) * 2010-09-24 2013-05-22 日立数据系统有限公司 System and method for managing integrity in a distributed database
CN103200169A (en) * 2013-01-30 2013-07-10 中国科学院自动化研究所 Method and system of user data protection based on proxy
WO2013114230A1 (en) * 2012-02-02 2013-08-08 International Business Machines Corporation Erasure correcting codes for storage arrays
CN103384270A (en) * 2013-06-28 2013-11-06 环境保护部华南环境科学研究所 Method and system for data backup of internal and external network penetrating remote data transmission
CN103500120A (en) * 2013-09-17 2014-01-08 北京思特奇信息技术股份有限公司 Distributed cache high-availability processing method and system based on multithreading asynchronous double writing
WO2014107845A1 (en) * 2013-01-09 2014-07-17 华为技术有限公司 Data processing method and device
US8918701B2 (en) 2011-02-28 2014-12-23 SK Hynix Inc. Nested multiple erasure correcting codes for storage arrays
CN104331525A (en) * 2014-12-01 2015-02-04 国家计算机网络与信息安全管理中心 Sharing method based on repeating data deletion
CN104408141A (en) * 2014-12-01 2015-03-11 国家计算机网络与信息安全管理中心 Redundancy removal file system and data deployment method thereof
CN104508666A (en) * 2012-10-31 2015-04-08 惠普发展公司,有限责任合伙企业 Cataloging backup data
US9058291B2 (en) 2011-02-28 2015-06-16 International Business Machines Corporation Multiple erasure correcting codes for storage arrays
US9552161B2 (en) 2012-12-12 2017-01-24 Shenzhen Airdrawing Technology Service Co., Ltd Repetitive data block deleting system and method
CN103870362B (en) * 2014-03-21 2017-08-04 华为技术有限公司 A data recovery method, apparatus and backup systems

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3148133B2 (en) 1996-10-30 2001-03-19 三菱電機株式会社 Information retrieval system

Cited By (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2009152716A1 (en) * 2008-06-19 2009-12-23 华为技术有限公司 Method, device for storing data fingerprint and method for synchronizing data of plurality of devices
CN101610281B (en) 2008-06-19 2012-11-21 华为技术有限公司 Method and device for storing data fingerprints
CN101599079B (en) 2009-07-22 2011-08-31 中国科学院计算技术研究所 Backup data centralized storage management method
CN101814045A (en) * 2010-04-22 2010-08-25 华中科技大学 Data organization method for backup services
CN101887388A (en) * 2010-06-18 2010-11-17 中兴通讯股份有限公司 Data backup system and method based on memory database
CN103119590A (en) * 2010-09-24 2013-05-22 日立数据系统有限公司 System and method for managing integrity in a distributed database
CN103119590B (en) * 2010-09-24 2016-08-17 日立数据系统有限公司 The method of management integrity in a distributed database system and
CN102456059A (en) * 2010-10-21 2012-05-16 英业达股份有限公司 Data deduplication processing system
US9058291B2 (en) 2011-02-28 2015-06-16 International Business Machines Corporation Multiple erasure correcting codes for storage arrays
US8918701B2 (en) 2011-02-28 2014-12-23 SK Hynix Inc. Nested multiple erasure correcting codes for storage arrays
CN102169453A (en) * 2011-03-08 2011-08-31 杭州电子科技大学 File online backup method
CN102714789A (en) * 2011-04-19 2012-10-03 华为终端有限公司 Method for backuping and recovering data of mobile terminal and mobile terminal thereof
CN102714789B (en) 2011-04-19 2014-04-02 华为终端有限公司 Method for backuping and recovering data of mobile terminal and mobile terminal thereof
CN102436408A (en) * 2011-10-10 2012-05-02 上海交通大学 Data storage cloud and cloud backup method based on Map/Dedup
CN102436408B (en) 2011-10-10 2014-02-19 上海交通大学 Data storage cloud and cloud backup method based on Map/Dedup
CN102510340A (en) * 2011-10-11 2012-06-20 浪潮电子信息产业股份有限公司 Method for realizing remote rapid backup by utilizing common Internet network
US8869006B2 (en) 2012-02-02 2014-10-21 International Business Machines Corporation Partial-maximum distance separable (PMDS) erasure correcting codes for storage arrays
WO2013114230A1 (en) * 2012-02-02 2013-08-08 International Business Machines Corporation Erasure correcting codes for storage arrays
US8874995B2 (en) 2012-02-02 2014-10-28 International Business Machines Corporation Partial-maximum distance separable (PMDS) erasure correcting codes for storage arrays
CN102915325A (en) * 2012-08-11 2013-02-06 深圳市极限网络科技有限公司 Md5 Hash list-based file decomposing and combining technique
CN104508666A (en) * 2012-10-31 2015-04-08 惠普发展公司,有限责任合伙企业 Cataloging backup data
US9552161B2 (en) 2012-12-12 2017-01-24 Shenzhen Airdrawing Technology Service Co., Ltd Repetitive data block deleting system and method
WO2014107845A1 (en) * 2013-01-09 2014-07-17 华为技术有限公司 Data processing method and device
CN103200169A (en) * 2013-01-30 2013-07-10 中国科学院自动化研究所 Method and system of user data protection based on proxy
CN103384270A (en) * 2013-06-28 2013-11-06 环境保护部华南环境科学研究所 Method and system for data backup of internal and external network penetrating remote data transmission
CN103500120A (en) * 2013-09-17 2014-01-08 北京思特奇信息技术股份有限公司 Distributed cache high-availability processing method and system based on multithreading asynchronous double writing
CN103870362B (en) * 2014-03-21 2017-08-04 华为技术有限公司 A data recovery method, apparatus and backup systems
CN104408141A (en) * 2014-12-01 2015-03-11 国家计算机网络与信息安全管理中心 Redundancy removal file system and data deployment method thereof
CN104331525A (en) * 2014-12-01 2015-02-04 国家计算机网络与信息安全管理中心 Sharing method based on repeating data deletion
CN104331525B (en) * 2014-12-01 2018-01-16 国家计算机网络与信息安全管理中心 Sharing method based deduplication
CN104408141B (en) * 2014-12-01 2018-04-17 国家计算机网络与信息安全管理中心 An eraser redundant file system and its method of deployment data

Also Published As

Publication number Publication date Type
CN100547555C (en) 2009-10-07 grant

Similar Documents

Publication Publication Date Title
Hitz et al. File System Design for an NFS File Server Appliance.
Ghemawat et al. The Google file system
Patterson et al. SnapMirror®: file system based asynchronous mirroring for disaster recovery
US7756833B2 (en) Method and system for synthetic backup and restore
US6857053B2 (en) Method, system, and program for backing up objects by creating groups of objects
US8166263B2 (en) Continuous data protection over intermittent connections, such as continuous data backup for laptops or wireless devices
US7707184B1 (en) System and method for snapshot full backup and hard recovery of a database
US6732125B1 (en) Self archiving log structured volume with intrinsic data protection
Muniswamy-Reddy et al. Provenance for the Cloud.
US8560879B1 (en) Data recovery for failed memory device of memory device array
US7366859B2 (en) Fast incremental backup method and system
US7487228B1 (en) Metadata structures and related locking techniques to improve performance and scalability in a cluster file system
Rhea et al. Fast, Inexpensive Content-Addressed Storage in Foundation.
US8898388B1 (en) NVRAM caching and logging in a storage system
Hutchinson et al. Logical vs. physical file system backup
US7415488B1 (en) System and method for redundant storage consistency recovery
US6985995B2 (en) Data file migration from a mirrored RAID to a non-mirrored XOR-based RAID without rewriting the data
US7596713B2 (en) Fast backup storage and fast recovery of data (FBSRD)
US7457980B2 (en) Data replication method over a limited bandwidth network by mirroring parities
US7149858B1 (en) Synchronous replication for system and data security
US8190850B1 (en) Virtual block mapping for relocating compressed and/or encrypted file data block blocks
US8299944B2 (en) System and method for creating deduplicated copies of data storing non-lossy encodings of data directly in a content addressable store
US20120124046A1 (en) System and method for managing deduplicated copies of data using temporal relationships among copies
US20040267828A1 (en) Transaction consistent copy-on-write database
US7650341B1 (en) Data backup/recovery

Legal Events

Date Code Title Description
C06 Publication
C10 Request of examination as to substance
C14 Granted