CN116648693A

CN116648693A - Method and apparatus for backing up a file system

Info

Publication number: CN116648693A
Application number: CN202080108079.3A
Authority: CN
Inventors: 阿萨夫·耶格尔; 阿维夫·库温特; 阿萨夫·纳塔逊; 亚伦·莫
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2020-12-22
Filing date: 2020-12-22
Publication date: 2023-08-25
Also published as: WO2022135690A1

Abstract

The present disclosure relates to a computer-implemented method and apparatus for backing up a file system. The method comprises the following steps: the processor receives user input providing an early backup time window that precedes a scheduled daily backup time for the file system; the processor identifying one or more files in the file system suitable for backup in advance; the processor initiates a backup-ahead for the one or more determined files within the backup-ahead time window; the processor initiates a scheduled daily backup for a plurality of remaining files in the file system at the scheduled daily backup time.

Description

Method and apparatus for backing up a file system

Technical Field

The present disclosure relates generally to the field of data protection and backup; and more particularly to a method and apparatus for backing up a file system.

Background

In this data-driven world, large amounts of data are being generated and stored regularly worldwide. For example, social networks, internet of things, scientific experiments, business services, industrial services, banking services, business interactions, etc. play a vital role in generating the data. In this case, data backup becomes very important to ensure data protection, because data is easily lost or damaged by destructive events such as system failure, power crisis, network attack, natural disasters, communication failures, and the like. Currently, various data protection techniques are used for data backup.

However, existing data protection techniques have some limitations. First, existing data protection techniques typically do not allow a user to define backups, such as during periods of less activity. Second, existing data protection techniques only provide a limited number of points in time for data backups associated with arbitrary data (e.g., files). For example, in snapshot-based data protection techniques, because snapshots (i.e., backup data images) are generated temporarily (i.e., eventually need to be deleted), considerable space is occupied because they are generated in a periodic manner, i.e., at certain time intervals. This will result in a limited number of points in time being available to create a backup of data associated with the file. Furthermore, generating the snapshots is expensive, and their generation and deletion requires a significant amount of computing resources. When the difference between two consecutive snapshots is large, for example, 15 minutes to several hours, snapshot-based data protection techniques require an excessively long backup time. This also results in a fairly large recovery point target. In snapshot-based data protection techniques, when a snapshot is loaded onto an array for a backup server to read, the snapshot reduces the bandwidth the array provides for production workload. In addition, existing data protection techniques require a considerable bandwidth, are sensitive to bandwidth fluctuations, face difficulties in data reduction (e.g., compression and wide-area network (WAN) deduplication) using advanced algorithms, are prone to significant delays, etc. Third, existing data protection techniques require a significant amount of computing resources and time to parse the file system to learn about the file changes to the file system required for data backup, resulting in a backup window.

In general, existing data utilization techniques are used to optimize the amount of data to be backed up within a backup window, for example, by performing host-side deduplication on the data to reduce network utilization and bandwidth. In addition, for data blocks, some data utilization techniques aim to backup "cold" blocks (e.g., cold storage blocks) prior to planning a backup window. However, when this is performed at the block level, the scheme cannot use the attributes that exist at the file level to determine which data changes should be backed up at any point in time. In addition, existing data utilization techniques include data replication techniques. Data replication is the process of replicating data to auxiliary locations by continuous or snapshot-based techniques. Specifically, continuous replication mirrors each input-output (I/O) to a remote server (e.g., backup server) that maintains the latest state of data, but network bandwidth costs are higher.

Thus, in light of the above discussion, there is a need to overcome the above-described drawbacks associated with existing data protection or backup techniques.

Disclosure of Invention

The present disclosure seeks to provide an improved method, apparatus and system for backing up a file system. The present disclosure seeks to provide a solution to the existing problems of limited number of points in time available for data backup, large amounts of computing resources required for data backup, and limitations on file system level data protection. It is an object of the present disclosure to provide a solution that at least partly overcomes the problems encountered in the prior art and provides an improved method and system for continuous data protection by using a continuous directory, which is capable of providing a required number of data backups of time points with minimal computing resources and block-level data protection.

The object of the present disclosure is achieved by the solution provided in the attached independent claims. Advantageous implementations of the present disclosure are further defined in the dependent claims.

In one aspect, the present disclosure provides a computer-implemented method for backing up a file system. The method comprises the following steps: the processor receives user input providing an early backup time window that precedes a scheduled daily backup time for the file system; the processor identifying one or more files in the file system suitable for backup in advance; the processor initiates a backup-ahead for the one or more determined files within the backup-ahead time window; the processor initiates a scheduled daily backup for a plurality of remaining files in the file system at the scheduled daily backup time.

The method enables a user to perform data backup (i.e., restore or restore) to data (e.g., files) by providing input associated with a user-defined backup window to any desired point in time, thereby minimizing the number of times that a backup is performed when the usage is high. This enables the user to backup data on multiple instances, such as at the end of the day and a preferred time window. The processor may continuously record a log of metadata operations performed on files in the file system, for example in the form of a continuous directory. In this case, the processor maintains up-to-date information about the state of any file at a particular point in time, i.e., when any file performs any operation or change. Typically, a processor may restore a file to a point in time by searching for the state of the file (which has state) in a continuous directory at that point in time. In this regard, the method facilitates achieving near zero recovery point targets. In this way, the method is able to secure data in the event of loss of data due to any destructive event. The method can use minimum computing resources and time when analyzing the file system to learn about changes to the file system files required for data backup. The method is very suitable for file system level data protection and block level data protection. The method may be suitably implemented with a conventional computing system without changing the architecture of the computing system.

In one implementation, identifying includes analyzing file metadata to identify one or more files that have been edited since a previous scheduled daily backup time.

By analyzing file metadata to identify one or more edited files, i.e., files that are subject to or exposed to any operations and operations performed on such files, the necessary information required for file backup may be accurately identified and maintained. This helps to perform backup from file metadata, thereby reducing the computational burden.

In another implementation, the method further includes, in response to a file editing event, the processor receiving the file metadata from an agent installed on the file system.

The method supports backing up files in a file system through file editing events tracked by agents installed on the file system and supports continuous protection at finer granularity, but requires handling inconsistencies in the files and discarding such inconsistent files.

In yet another implementation, the method further comprises: the processor sends a query to the file system at predetermined time intervals and receives the file metadata in response.

The method can facilitate data backup by sending or using queries to ensure efficient and query-free operation is achieved.

In one implementation, a method including identifying includes determining applicability of each file based on file native metadata, wherein the file native metadata includes one or more of a file name, a file size, a file user right, a file group right, a creation time, a last access time, a last modification time, and a file type.

In operation, the method is capable of restoring a file to a state when the applicability of each file having that state is determined. This enables the user to reliably recover (i.e., restore) the desired file having the particular state defined by the file-native metadata. In other words, the methods of the present disclosure provide improved searchability for files subject to changes, and thereafter restore such files.

In another implementation, the file type indicates that editing of the file is append-only (append).

In one implementation, a method including identifying includes determining an expected last access time by analyzing file metadata collected over time.

By analyzing the file metadata collected over time to determine the expected last access time, files that are suitable for backup may be identified and the amount of data or files to backup and the time spent during backup to schedule a daily backup time may be reduced.

In one implementation, the plurality of remaining files includes files that were not backed up within the early backup window.

The method supports backing up all remaining files in the file system (currently not backed up by looking at the latest state of the file at the scheduled daily backup time). And finally, the rest files are backed up, so that the time required for analyzing the files is reduced, and the probability of incomplete backup caused by insufficient time is reduced.

In one implementation, planning daily backups includes a processor checking whether files backed up within an advanced backup window have been further edited.

With this file checking, the method is enabled to exclude backed up files and reduce the associated computational effort and time.

In one implementation, the processor is part of a file system server and initiating the backup includes sending the file to the backup server.

This implementation enables remote replication of data (by a computing device employing a computer-implemented method) to achieve continuous data protection, because the primary storage (i.e., the file system server) is located on-site, while the secondary storage (i.e., the backup server) is located off-site (as a remote storage).

In another implementation, the processor is part of a backup server, and initiating the backup includes requesting an operation of the file from the file system.

This implementation allows the backup to be initiated by the processor of the backup server, i.e. by the processor of the file system server, compared to conventional approaches, thereby improving the applicability of the method.

In another aspect, the present disclosure provides an apparatus for controlling file system backup, the apparatus comprising: an interface for receiving user input providing an early backup time window, the early backup window being prior to a scheduled daily backup time of the file system; a file identification module for identifying one or more files in the file system suitable for backup in advance; and the file backup module is used for starting the advanced backup for the one or more determined files in the advanced backup time window and starting the planned daily backup for a plurality of residual files in the file system at the planned daily backup time.

The apparatus enables a user to perform data backup (i.e., restore or restore) to data (e.g., files) by providing input associated with a user-defined backup window to any desired point in time, thereby minimizing the number of times that a backup is performed when the usage is high. This enables the user to backup data on multiple instances, such as at the end of the day and a preferred time window. The device may continuously record a log of metadata operations performed on files in the file system, for example in the form of a continuous directory. In this case, the device maintains up-to-date information about the status of any file at a particular point in time, i.e., when any file performs any operation or change. In general, a device can restore a file to a point in time by searching for the state of the file (which has a state) in a continuous directory at that point in time. In this regard, the apparatus facilitates achieving a near zero recovery point target. In this way, the device is able to secure data in the event of loss of data due to any destructive event. The device is capable of using minimal computing resources and time in resolving the file system to learn about changes to the file system files required for data backup. The device is very suitable for file system level data protection and block level data protection. The apparatus may be suitably implemented in a conventional computing system without changing the architecture of the computing system.

The apparatus of this aspect achieves all the advantages and effects of the present method.

In one implementation, the file identification module is further configured to analyze the file metadata to identify one or more files that have been edited since a previous scheduled daily backup time.

In another implementation, the file identification module is further configured to receive the file metadata from an agent installed on the file system in response to a file editing event.

In yet another implementation, the file identification module is further configured to send a query to the file system at predetermined time intervals and receive the file metadata in response.

In one implementation, the file identification module is further configured to determine the applicability of each file based on file native metadata, wherein the file native metadata includes one or more of a file name, a file size, a file user permission, a file group permission, a creation time, a last access time, a last modification time, and a file type.

In another implementation, the file type indicates that the editing of the file is append only.

In one implementation, the file identification module is further configured to determine an expected last access time by analyzing file metadata collected over time.

In one implementation, the plurality of remaining files includes files that are not backed up within the early backup window.

In one implementation, planning a daily backup includes checking whether files backed up within an advanced backup window have been further edited.

In one implementation, the apparatus is part of a file system server, and initiating the backup includes sending the file to the backup server.

In another implementation, the apparatus is part of a backup server, and initiating the backup includes requesting an operation of the file from a file system server. Various implementations of the system achieve all the advantages and effects of the corresponding implementations of the method.

It should be noted that all devices, elements, circuits, units and means described in the present application may be implemented in software or hardware elements or any type of combination thereof. All steps performed by the various entities described in this disclosure, as well as functions to be performed by the various entities described are intended to mean that the respective entities are adapted or configured to perform the respective steps and functions. Although in the following description of specific embodiments, specific functions or steps to be performed by external entities are not reflected in the description of specific detailed elements of the entity performing the specific steps or functions, it should be clear to a skilled person that these methods and functions may be implemented by corresponding hardware or software elements or any combination thereof. It should be understood that features of the present disclosure are susceptible to being combined in various combinations without departing from the scope of the present disclosure as defined by the appended claims.

Other aspects, advantages, features, and objects of the present disclosure will become apparent from the accompanying drawings and detailed description of illustrative implementations explained in conjunction with the following appended claims.

Drawings

The foregoing summary, as well as the following detailed description of illustrative embodiments, is better understood when read in conjunction with the appended drawings. For the purpose of illustrating the disclosure, there is shown in the drawings exemplary constructions of the disclosure. However, the present disclosure is not limited to the specific methods and instrumentalities disclosed herein. Moreover, those skilled in the art will appreciate that the drawings are not drawn to scale. Wherever possible, like elements are designated by like numerals.

Embodiments of the present disclosure will now be described, by way of example only, with reference to the following figures, in which:

FIG. 1 illustrates a flowchart of a computer-implemented method for backing up a file system according to an embodiment of the present disclosure;

FIGS. 2 through 6 illustrate exemplary implementation scenarios of a method for backing up a file system according to embodiments of the present disclosure;

FIG. 7 is a block diagram of an apparatus for controlling file system backup according to an embodiment of the present disclosure;

FIG. 8A illustrates a block diagram of a system for backing up a file system in accordance with an embodiment of the present disclosure;

FIG. 8B illustrates a block diagram of a system for backing up a file system according to another embodiment of the present disclosure;

FIG. 9A illustrates a block diagram of a system for controlling file system backup in accordance with an embodiment of the present disclosure;

FIG. 9B illustrates a block diagram of a system for controlling file system backups in accordance with another embodiment of the present disclosure.

In the drawings, the underlined numbers are used to denote items where the underlined numbers are located or items adjacent to the underlined numbers. The non-underlined numbers relate to items identified by lines associating the non-underlined numbers with the items. When a number is not underlined and has an associated arrow, the number without the underline is used to identify the general item to which the arrow points.

Detailed Description

The following detailed description describes embodiments of the present disclosure and the manner in which the embodiments may be implemented. While some modes of carrying out the disclosure have been disclosed, those skilled in the art will recognize that other embodiments for carrying out or practicing the disclosure may also exist.

Referring to FIG. 1, a flowchart of a computer-implemented method 100 for backing up a file system is shown, according to an embodiment of the present disclosure. As shown, method 100 includes steps 102, 104, 106, and 108.

In this disclosure, the term "file system" refers to a data structure (or process) for accessing, organizing, and storing files (or data) on a computing device (specifically, in a memory of the computing device). Examples of File systems include, but are not limited to, file allocation table (File Allocation Table, FAT) File System, new technology File System (New Technology File System, NTFS), hierarchical File System (Hierarchical File System, HFS), unix File System (UFS), virtual machine File System (Virtual Machine File System, VMFS), extended (EXT) File System. Furthermore, the term "file" refers to a resource used to store information (or data) in a computing system. It should be noted that the term "file" may be used interchangeably with the term "data" without limiting the scope of the present disclosure. The above-described files may be used to store images, text, video, executable programs, and the like. Typically, a file system includes a plurality of files, and is part of the main memory of any computing system, such as a Hard Disk Drive (HDD) of a computing device. In addition, the term "backup" or "data backup" refers to the retention or backup of data or files that may be restored in the event of a data failure of the main memory. Typically, a primary data failure is the result of a hardware or software failure, data corruption, or an artificial event, such as a malicious attack (virus or malware) or accidental deletion of data. To avoid this problem, a backup of files on the file system is performed on secondary storage (e.g., a remote server) so that the file system can be accessed from the secondary storage when needed.

The term "main memory" refers to directly accessible memory (volatile memory and/or non-volatile memory) associated with a computing system to process real-time data (e.g., files in a file system upon which one or more operations are performed). Main memory may also be referred to as "main memory," internal memory, "or" main volume. Examples of primary storage include, but are not limited to, a hard disk or storage array of a computing system. In general, the computing systems referred to herein as host computing devices include at least one of a storage array, hard Disk Drive (HDD), solid state drive (solid state drive, SSD), random access memory (random access memory, RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM), flash memory, cache memory, static random access memory (static random access memory, SRAM).

The term "secondary storage" refers to non-volatile memory that is accessed directly or indirectly by a computing system or processor. Secondary storage is a storage array used to store backup files. For computing systems, the secondary storage may be on-site or off-site. In one embodiment, the secondary storage is implemented as remote storage. The term "remote storage" refers to off-site storage that is physically remote from the computing system. In one example, the remote storage may be a hard disk of a remote computing device, or cloud-based storage, or a remote server. In another example, the primary storage and secondary storage may refer to two different memory segments of the same hard disk (e.g., in a field scenario), where the primary storage typically processes real-time data and the secondary storage processes storage or backup data. Examples of secondary storage include, but are not limited to, electronic storage devices, magnetic storage devices, optical storage devices, electromagnetic storage devices, semiconductor storage devices, portable computer floppy disks, hard disks, memory sticks, a set of servers (e.g., cloud servers).

Throughout this disclosure, the term "computing system" or "computing device" refers to hardware, software, firmware, or a combination thereof, for performing at least one computing task based on input from a user. Examples of computing systems include, but are not limited to, computers, virtual Machines (VMs). In general, a computing system or computing device includes computing elements such as memory, processors, data communication interfaces, network adapters, etc., for storing, processing, and/or sharing files or information with other computing devices (e.g., another computing device or server, etc.).

Throughout this disclosure, the term "processor" refers to a computing element operable to respond to and process instructions to perform data backup operations. In one example, the processor may be a microprocessor, a microcontroller, a complex instruction set computing (complex instruction set computing, CISC) microprocessor, a reduced instruction set computing (reduced instruction set computing, RISC) microprocessor, a very long instruction word (very long instruction word, VLIW) microprocessor, or any other type of processing circuit, such as described above. It should be noted that the processor may operate alone or as part of a computing system. Here, the processor is configured to perform a plurality of operations to backup a file system according to the method 100.

The amount of data to be backed up within the backup window is optimized using existing data utilization techniques, for example, by performing host-side deduplication on the data to reduce network utilization and bandwidth. In addition, for data blocks, some data utilization techniques may backup "cold" blocks (e.g., cold storage blocks) before planning a backup window. However, when this is performed at the block level, the scheme cannot use the attributes that exist at the file level to determine which data changes should be backed up at any point in time. In addition, existing data utilization techniques include data replication techniques. Data replication is the process of replicating data to auxiliary locations by continuous or snapshot-based techniques. Typically, continuous replication mirrors each input-output (I/O) to a remote server (e.g., backup server) that maintains the latest state of the data, but suffers from the disadvantage of higher network bandwidth costs. In addition, snapshot replication techniques include periodically transmitting snapshots of the file system, enabling lower network bandwidth requirements, avoiding duplicate and redundant transmissions because only up-to-date states or updates of data are transmitted. In addition, data utilization techniques also include hybrid replication of data, where both continuous replication and snapshot-based replication are used over a particular time window or data, depending on the implementation. In one example, in operation, the hybrid replication technique selects to protect data regions with high probability of alteration by snapshot techniques and to protect data regions with less probability of alteration by continuous replication. In addition, existing data protection techniques provide a limited number of backup points in time (i.e., restore points). In this case, when data loss occurs due to an interrupt event, a data backup is available just before the last available recovery point. It should be appreciated that unlike existing data protection techniques that provide for restoration to a limited point in time, the method 100 for data backup allows a user to restore (or restore) data from a desired or particular point in time (i.e., the point in time at which the last change or corresponding last operation was made to the data). The method 100 also allows the user to perform backups in a more efficient and faster manner.

At step 102, the method 100 includes the processor receiving user input providing an early backup time window that precedes a scheduled daily backup time of the file system. In other words, at step 102, the processor receives user input providing an early backup time window. The term "user input" refers to any information or data provided by a user for further processing. Typically, the user input includes text and numbers to indicate the user's needs. In one example, a user provides user input to provide for advancing backup time, wherein the user uses the computing device as much as possible or not at all. In another example, the user provides user input to provide the frequency and/or duration of the backup time window in advance. The term "time window" refers to the time interval during which an operation or activity (such as a data backup) occurs. Throughout this disclosure, a time window may also be referred to as a "backup time window" or "backup window". Herein, the "backup ahead time window" refers to a user-defined time window in which backup of ahead data is possible. The early data backup corresponding to the early backup time window occurs before the scheduled data backup corresponding to the scheduled daily backup time window. "scheduled daily backup time" refers to a predefined point in time at which a daily data backup is performed. Daily data backup refers to an overall data backup of a file system at a specified time every day. Alternatively, the scheduled daily backup time is based on the user's activity settings, i.e., the scheduled daily backup is performed at least active time. The user input includes information related to a pre-backup time window related to a data backup event. The user input includes at least one of a start time of the time window, an end time of the time window, a duration of the time window, a number and duration of the one or more time windows, and the like. Typically, the pre-backup time window is defined by user input received by the processor, which may define the time window in a linear or non-linear manner. In one example, a start time for the backup window is defined in advance, such as 2 a.m. or 4 a.m.. In another example, the duration of the pre-backup time window is defined and may be 30 seconds, 1 minute, 2 minutes, 3 minutes, 5 minutes, 10 minutes, 30 minutes, 1 hour, 2 hours, etc. In an exemplary scenario, the scheduled daily backup time is set at time T1 at 2 a.m., and the above-described early backup time window is set at a time earlier than the scheduled daily backup time. It should be noted that the method 100 may be one or more backup windows in advance. In one example, one or more time windows are defined or set in a linear and discrete manner, such as equal durations at time intervals of 30 minutes. In another example, one or more time windows are defined or set in a non-linear manner, such as one pre-backup time window of duration 1 hour and another pre-backup time window of duration 2 hours. It should be noted that the early backup time window is defined by the user and the scheduled daily backup time may or may not be defined by the user.

At step 104, the method 100 includes a processor identifying one or more files in a file system suitable for backup in advance. The method 100 includes identifying, using a processor, one or more files suitable for backup in advance. Typically, the processor is configured to determine whether a change has occurred in the state of a file in the file system, and identify a file suitable for backup in advance based thereon. The file system includes a plurality of files that may or may not be changed prior to the scheduled backup time. Typically, the early backup is to backup one or more files from a file system (where changes are less likely to occur). Advancing the backup can reduce the amount of data or files to be backed up at the scheduled daily backup time and ensure smooth progress of the backup operation. The processor determines the probability of each file change prior to planning a daily backup time to determine one or more files suitable for backup in advance. A machine learning algorithm may be used to determine the probability of a change to a file while taking into account a number of parameters such as, but not limited to, past changes or history of the file, the type of file, the scheduled task, etc. Thus, if the probability is below a predetermined threshold, for example, the probability of a change occurring is 0.4 and the threshold is 0.25, in this case, it may be determined that the file is suitable for backup in advance. In another example, if the probability of a change is zero, it is determined that the file is suitable for backup ahead of time.

In one embodiment, the method 100 including identifying includes analyzing file metadata to identify one or more files that have been edited since a previous scheduled daily backup time. The processor is configured to analyze the file metadata to identify the one or more files, wherein the files have been altered or edited since a previous scheduled daily backup time. The term "file metadata" refers to information stored in any type of file in a file system. Typically, file metadata includes, but is not limited to, author name, creation date, modification date, file size, company or organization name, identification of the computing system, identification of the network server or storage drive (primary or secondary storage) storing the file (e.g., internet protocol (Internet protocol, IP) address, media access control (media access control, MAC) address), personalized comments, and name, revision or version of the previous author and time. "previous planned daily backup time" refers to an earlier event that was scheduled for a daily backup event, e.g., if the planned daily backup time is 2 a.m., then any changes made in one or more files of the file system after 2 a.m. are determined to be suitable for backup. In one example, the processor analyzes the metadata information by comparing snapshots taken at two or more different time windows, in particular, one of the two different time windows is a time window at a previous scheduled daily backup time and the other time window is a current time window. Alternatively, the processor identifies changes in one or more files by analyzing metadata between two versions of the file (i.e., a snapshot taken at a previous scheduled daily backup time and a snapshot taken at a current time). Optionally, the processor is configured to identify any changes, such as additions, deletions or any update operations, that may be made to the file after the scheduled daily backup time.

In another embodiment, the method 100 further includes, in response to the file editing event, the processor receiving file metadata from an agent installed on the file system. The processor is configured to receive file metadata from an agent installed on a file system. "agent" refers to an applet installed on a server (e.g., a backup server) for performing a particular job or operation. Typically, each agent supports a particular function, so the method 100 may perform one or more operations using multiple agents installed on the file system. Here, an agent installed on the file system is used to transmit metadata about the file that may be changed as a response to the file editing event. File editing events include, but are not limited to, opening a file, closing a file, creating a file, deleting a file or a portion of a file, renaming a file, writing or adding information to a file, refreshing a file, reading from a file, moving a file, and the like. In operation, a proxy is a filter driver that is used by a processor to capture all input/output (I/O) of a file. This enables continued protection of files in the file system with finer granularity, as the agent can also track a portion of the file, but needs to handle the case of inconsistencies in the file, discarding such inconsistent files.

In yet another embodiment, the method 100 further includes the processor sending a query to the file system at predetermined time intervals and receiving the file metadata as described above in response. The processor is configured to send or transmit a query to the file system at predetermined time intervals. "predetermined time interval" refers to a predefined time window or interval in which the processor communicates with the file system. Typically, the processor transmits a query to the file system to receive file metadata as a response to the query. The query is sent to the file system to facilitate transmission of file metadata from the file system in response. For example, for a file system (e.g., an online analytical processing (Online Analytical Processing, OLAP) database or data source), this may facilitate business intelligence queries and optimize for queries and reports, rather than processing transactions.

In one embodiment, the method 100 comprising identifying comprises determining the applicability of each file based on file native metadata comprising one or more of file name, file size, file user permissions, file group permissions, creation time, last access time, last modification time, and file type. The processor is configured to analyze each file in the file system to further determine the applicability of each file. For example, the appropriate file indicates a change or update in the corresponding file, and thus a requirement to be backed up is posed. In general, the processor determines the applicability of each file from the file native metadata. The term "file native metadata" refers to any object, such as models, packages, and queries, that is based on any data source, such as an OLAP data source. The file native metadata includes one or more of a file name, a file size, a file user right, a file group right, a creation time, a last access time, a last modification time, and a file type. The processor identifies changes based at least on the file native metadata and further determines the applicability of the file based thereon. For example, a file is determined to be suitable for backup when the file changes after a previous backup window (if any). Alternatively, the processor identifies files suitable for backup by analyzing the log files to infer a particular pattern of changes, which may be based on additions, not allowing for overrides or updates, so changes to the file portions may be backed up immediately after the processor identifies. Further, optionally, the processor uses prior knowledge of past behavior of the application-generated file to determine files that are suitable for backup. For example, if a file belongs to application X and the application is to append only to the file, or it is unlikely that the file will be changed after a certain time, the processor may determine that the file is suitable for backup. Further, optionally, the processor uses a machine learning algorithm to determine files suitable for backup, wherein the performed backup and the taken snapshot are stored, accumulated, and analyzed to infer a time period during which the files may or may not change. Furthermore, the analysis of the file may be determined at a finer granularity, i.e., to further identify portions of the file that changed at certain times. For example, a change in a first portion of a file may occur in the morning, e.g., 10 am, while a change in a second portion of the file may occur in the evening, e.g., 7 pm. Optionally, the processor is further supplemented with additional computational methods, such as neural networks, and layer sequence clustering of pseudo-simulated variable state machines implementing machine learning and artificial intelligence models and algorithms.

In another embodiment, the file type indicates that the editing of the file is append only. The processor is configured to identify applicability of the file based at least on the file type, the file type indicating a type of edit or change made to the file. For example, changes or edits made to the file are added only. In other words, only files in the file system are added, and updating or deletion of files in the file system is disabled. In general, each file is analyzed separately and grouped together accordingly to achieve efficient operation. In one example, one or more files determined to be suitable for backup are grouped into one group, while the remaining files are grouped into one group for backup at a later time window or point in time.

In one embodiment, in method 100, identifying includes determining an expected last access time by analyzing file metadata collected over time. The processor is configured to determine an expected last access time for each file in the file system. After determining the expected last access time, the processor determines whether a change has occurred at the expected last access time to further determine whether the file is suitable for backup. For example, if the expected last access time is before the scheduled daily backup time, the files may be backed up in advance in a user-defined advance backup window.

At step 106, the method 100 includes the processor initiating a backup-ahead for one or more determined files within a backup-ahead time window. In other words, at step 106, the processor initiates a backup-ahead for the one or more files determined to be suitable for backup-ahead within the backup-ahead time window. "backup ahead" refers to backup operations performed on one or more files within a backup ahead time window prior to the scheduled backup time window. In particular, copies or snapshots of one or more files are transmitted or stored in a remote location or secondary storage (e.g., backup storage) for storing the copies of one or more files as a backup.

In one embodiment, planning the daily backup includes the processor checking whether files backed up within the early backup window have been further edited. The processor is configured to check whether the files backed up in the early backup time window have undergone any changes or updates prior to planning the daily backup time. Typically, the processor compares a snapshot of a file backed up within an early backup time window with a current snapshot of the file. If changes or edits are inferred from the comparison, the processor is used to backup the edited file and optionally replace an early backup of the file.

At step 108, the method 100 includes the processor initiating a scheduled daily backup for a plurality of remaining files in the file system at a scheduled daily backup time. In other words, in step 108, upon initiating an early backup for one or more files in the file system, the processor is configured to initiate a planned daily backup of a plurality of remaining files in the file system. Typically, daily backups are planned to be performed at fixed points in time, with all remaining files stored at a remote location. This has the advantage that by starting the early backup in the early backup window, the amount of data to be backed up in the planned backup time can be reduced. Thus, the time required for planning the backup to complete the backup operation can be shortened. Further, optionally, at the scheduled backup time, if any particular version of the file is needed, an early version of one or more files corresponding to the early backup window is available for use.

In another embodiment, the plurality of remaining files includes files that were not backed up within the early backup window. Alternatively, the plurality of remaining files refers to files in the file system that are not backed up in the early backup time window. In one example, the plurality of remaining files includes files determined by the processor that are determined to be unsuitable for backup ahead of time due to having the possibility of being changed or updated at a later point in time.

The present disclosure also provides a computer program product comprising a non-transitory computer readable storage medium for storing instructions or computer program code thereon, the instructions being executable by a processor to perform the method 100. In general, the method 100 is a computing device for performing data backup operations. Examples of implementations of the non-transitory computer-readable storage medium include, but are not limited to, electrically Erasable Programmable Read Only Memory (EEPROM), random access Memory (Random Access Memory, RAM), read Only Memory (ROM), hard Disk Drive (HDD), flash Memory, secure Digital (SD) card, solid State Drive (SSD), computer-readable storage medium, and/or CPU cache. Computer readable storage media for providing non-transitory memory include, but are not limited to, electronic storage devices, magnetic storage devices, optical storage devices, electromagnetic storage devices, semiconductor storage devices, or any suitable combination of the foregoing.

Referring to fig. 2-6, exemplary implementation scenarios of a computer-implemented method 100 for backing up a file system 202 (similar to the file system described above) are shown, according to an embodiment of the present disclosure. In general, the method 100 involves backing up one or more files in the file system 202 to the backup server 204 within an early backup time window, which is provided by user input. As shown in fig. 2-5, a snapshot taken before the scheduled daily backup time is shown, beginning at time T0 and ending at time T8. The term "snapshot" refers to a set of reference marks of data or files in the computing system or file system 202 at a particular point in time. Typically, the snapshot includes an inventory that provides users with an accessible copy of files that they can access at a later point in time. This has the advantage that the snapshot is used to detect changes in the files of the file system 202 and to store copies of the files accordingly when certain criteria are met, such as, but not limited to, a threshold criterion related to the probability of changes occurring in the files. As shown in fig. 2-6, snapshots of file system 202 and backup server 204 at different points in time are shown. "backup server" 204 refers to a type of server used to facilitate backup of data, files, applications, and/or databases. In general, backup server 204 includes hardware and software capabilities to manage and restore files from a backup. The backup server 204 may be a local server or a remote backup server installed in a remote location such as a cloud server.

Referring to FIG. 2, a first snapshot 210 of a file system 202 and a backup server 204 during an early backup time window T is shown, according to an embodiment of the present disclosure. In general, the file system 202 includes one or more files that are backed up during the early backup window T. As shown in FIG. 2, a first snapshot 210 is shown at time T0, and the file system 202 includes three files, namely a first file X1, a second file X2, and a third file X3. It should be noted that at the beginning of the period when the file has not been backed up in the backup server 204, the first snapshot 210 depicts only the file in which the change occurred, because the backup server 204 is empty at time T0. The pre-backup time window T defined by the user input may include one or more smaller time windows, such as T0, T1, T2, T3 to T8. In general, the time of a time window refers to the start time of a particular time window, e.g., time T4 refers to the start time of time window T4. However, depending on the implementation, time T4 may also refer to the end time of time window T4.

Referring now to FIG. 3, a second snapshot 310 of the file system 202 and the backup server 204 during the early backup time window T is shown, in accordance with an embodiment of the present disclosure. In general, the file system 202 includes one or more files that are backed up during the early backup window T. As shown in FIG. 3, a second snapshot 210 is shown at time T0, with the backup server 204 including a copy (or backup) of the first file X1. In general, the backup server 204 is operable to determine the applicability of the remaining files in the file system 202 based on attributes of the files or a machine learning algorithm or model that accounts for previous changes in the files of the file system 202. Specifically, the backup server 204 determines whether the second file X2 and the third file X3 in the file system 202 are likely to be changed again until the advanced backup window T8 ends or before the scheduled daily backup time T10. In one example, backup server 204 determines that first file X1 is unlikely to change prior to time T8, and thus determines that the file is suitable for backup ahead of time. In addition, the backup server 204 determines that the second file X2 and the third file X3 may be changed before the end of the early backup time window of T8. Thus, the backup server 204 is operable to perform only the first partial backup 320 of the first file X1 at time T0 during the early backup time window T.

Referring now to FIG. 4, a third snapshot 410 of the file system 202 and the backup server 204 during the early backup time window T is shown, in accordance with an embodiment of the present disclosure. As shown in fig. 4, a third snapshot 410 is shown at time T5, the backup server 204 described above including a copy (or backup) of the first file X1. Here, the backup server 204 continues taking the snapshot after each time window until time window T5, and at time T5, the backup server 204 determines that the second file X2 and the third file X3 are changed to X2 'and X3', respectively. In addition, the backup server 204 determines that the fourth file X4 has also changed during any of the early backup time windows T (e.g., from T0 to T5). In addition, the backup server 204 is operable to determine the applicability of the remaining files in the file system at time T5. In one example, backup server 204 determines that files X2 'and X4 are unlikely to change again prior to time T8 and is ready for backup ahead of time T5, and thus backup server 204 performs a second partial backup 420 of the changes in files X2' and X4 at time T5.

Referring now to FIG. 5, a fourth snapshot 510 of the file system 202 and the backup server 204 during the early backup time window T is shown, in accordance with an embodiment of the present disclosure. As shown in FIG. 5, a fourth snapshot 510 is shown at time T8, with backup server 204 including copies (or backups) of first file X1 backed up by first partial backup 320 and second files X2' and X4 backed up by second partial backup 420. Here, the backup server 204 continues taking snapshots after each time window until time window T8, and at time T8, the backup server 204 detects any changes in the files of the file system 202. In one example, backup server 204 determines that third file X3 'has been changed or updated again to X3", and fourth file X4 has been changed to X4'. It should be noted that the determined likelihood of a change occurring in the fourth file X4 is erroneous or false, because a change is detected and the file X4 is changed to X4'. In addition, the backup server 204 determines that the fifth file X5 is also changed within any of the advance backup time windows T. Since time window T8 is the last time window of the early backup time window T, backup server 204 performs a third partial backup 520 of the changes in the remaining files, i.e., third files X3', X4', and X5, regardless of whether the changes may occur to the remaining files. This has the advantage that the backup server 204 is used to backup a smaller number of files at time T8 than in the conventional backup process, to increase the backup efficiency. In general, in the third portion of backup 510, remote server 204 needs to backup files X3", X4', and X5, otherwise backup server 204 may need to backup files X1, X2', X3", X4', and X5 during time window T8.

Referring now to FIG. 6, a fifth snapshot 610 of the backup server 204 is shown, in accordance with an embodiment of the present disclosure. As shown in fig. 6, a fifth snapshot 610 is shown at time T8, the backup server 204 includes copies (or backups) of the first file X1 backed up by the first partial backup 320, the second file X2 'and the fourth file X4 backed up by the second partial backup 420, the third file X3 backed up by the third partial backup 520, the updated fourth file X4' and the fifth file X5. The backup server 202 is configured to combine the partial backups, i.e., the first partial backup 310, the second partial backup 410, and the third partial backup 510, to form a full backup 610 of the file system 202. Alternatively, backup server 202 is configured to discard an early version of the backed-up file due to an error in the suitability determination for the file in file system 202.

Snapshot techniques allow a user to perform a data backup on a file and access the file at any point in time. Optionally, the data backup includes a continuous backup to automatically save a copy (or backup) of the data or file with respect to each change made to the data or file. Thus, this may capture substantially every version of data that a user (of the computing system) saves or retains in the computing system. Optionally, the file system tracker is configured to continuously record metadata operations performed on files in a continuous directory. In this case, the continuous directory maintains up-to-date information about any file state at any point in time. This facilitates the backup server 204 to restore files from a particular point in time by searching the continuous directory for the state of the file at that point in time. The snapshot technique of method 100 is robust, can be reliably implemented in the real world, and requires only minimal computing resources when parsing file system 202 to obtain changes in the file. However, existing data protection techniques have some limitations. First, existing data protection techniques provide only a limited number of points in time for data backups associated with any data (e.g., files). For example, in snapshot-based data protection techniques, because snapshots (i.e., backup data images) are generated temporarily (i.e., eventually need to be deleted), considerable space is occupied because they are generated in a periodic manner, i.e., at intervals. This will result in a limited number of points in time being available to create a backup of data associated with the file. Furthermore, generating such snapshots is expensive, and their generation and deletion requires the use of significant computing resources. When the difference between two consecutive snapshots is large, for example, 15 minutes to several hours, snapshot-based data protection techniques require an excessively long backup time. This also results in a fairly large recovery point target. In snapshot-based data protection techniques, when a snapshot is loaded onto an array for reading by backup server 204, the snapshot reduces the bandwidth the array provides for production workload.

FIG. 7 illustrates a block diagram of an apparatus 700 for controlling backup of a file system 710, according to an embodiment of the disclosure. As shown, the apparatus 700 includes an interface 702, a file identification module 704, and a file backup module 706 for controlling backup of a file system 710. The apparatus 700 of fig. 7 should be read in conjunction with fig. 1-6. In general, the apparatus 700 is operable to perform the method 100 or 200 for controlling backup of a file system 710. It is apparent that the apparatus 700 may be operatively coupled to other components, such as a processor, memory, and a backup server (as explained in fig. 2-6), for enabling backup of the file system 710. The term "apparatus" refers to hardware, software, firmware, or a combination thereof for performing at least one computing task based on input from a user. Examples of apparatus 700 include, but are not limited to, a computer, a Virtual Machine (VM). In general, apparatus 700 includes computing elements such as memory, a processor, a data communication interface, a network adapter, etc., to store, process, and/or share files or information with other apparatuses such as another computing device or server.

The apparatus 700 includes an interface 702 for receiving user input providing an early backup time window that precedes a scheduled daily backup time of the file system 710. The term "interface" refers to a shared boundary over which two or more independent components of a computing system or device (e.g., device 700) operate or communicate information. The exchange may take place between software, computer hardware, peripheral devices, humans, and combinations of these elements. Interface 702 is configured to receive user input associated with a backup time window in advance. For example, interface 702 may be at least one of a command line interface (command line interface, CLI), a graphical user interface (graphical user interface, GUI), a menu driven interface (menu driven interface, MDI), a form-based interface (form based interface, FBI), or a natural language interface (natural language interface, NLI). It should be appreciated that the apparatus 700 may employ any type of interface based on different implementations, without limiting the scope of the present disclosure.

The apparatus 700 further includes a file identification module 704 for identifying one or more files in the file system 710 that are suitable for backup in advance. "File identification module" 704 refers to any hardware, software, firmware, or combination thereof for performing at least one computing task, depending on the implementation. In operation, the file identification module 704 is used to identify one or more files in the file system that are suitable for backup ahead of time. The file identification module 704 analyzes each file in the file system 710 to identify one or more files suitable for backup in advance. In general, the file system 710 includes a plurality of files that may or may not be changed prior to the scheduled backup time. Typically, an early backup is performed to backup one or more files from determining the appropriate file system 710 (i.e., where changes are unlikely to occur). The file identification module 704 determines the probability of a change in each file before the scheduled daily backup time to determine one or more files suitable for backup in advance. The probability of a change to a file may be determined by the file identification module 704 using a machine learning algorithm or model while taking into account a number of parameters, such as, but not limited to, past changes or history of the file, the type of file, user input, planning tasks, and the like. Optionally, the file identification module 704 is also supplemented with additional computational methods, such as neural networks, and layer sequence clustering of pseudo-simulated variable state machines implementing machine learning and artificial intelligence models and algorithms.

The apparatus 700 further includes a file backup module 706 for initiating a premature backup for one or more determined files within a premature backup time window. The file backup module 706 initiates a backup-ahead for one or more files determined to be suitable for backup-ahead within the backup-ahead time window. "backup ahead" refers to backup operations performed on one or more files within a backup ahead time window prior to the scheduled backup time window. In particular, copies or snapshots of one or more files are transmitted or stored at a remote location, such as a remote server or secondary storage (e.g., backup storage), for storing the copies of one or more files as backups. "File backup module" 706 refers to any hardware, software, firmware, or combination thereof for performing at least one computing task, depending on the implementation. In operation, the file backup module 706 is used to backup one or more files in the file system that are determined to be suitable for backup in advance.

The file backup module 706 is also configured to initiate a scheduled daily backup for a plurality of files remaining in the file system at a scheduled daily backup time. In initiating an early backup for one or more files in the file system 710, the file backup module 706 is configured to initiate a scheduled daily backup for a plurality of files remaining in the file system 710.

In one embodiment, the file identification module 704 is also configured to analyze file metadata to identify one or more files that have been edited since a previous scheduled daily backup time. The file identification module 704 is configured to analyze file metadata to identify one or more files that have been altered or edited since a previous scheduled daily backup time. The file identification module 704 analyzes the metadata information by comparing snapshots taken at two or more different time windows, in particular, one of the two different time windows is a time window at a previous scheduled daily backup time and the other time window is a current time window. Alternatively, the file identification module 704 identifies changes in one or more files by analyzing metadata between two versions of the file (i.e., a snapshot taken at a previous scheduled daily backup time and a snapshot taken at a current time). The file identification module 704 is used to identify any changes, such as additions, deletions, or any update operations, that may be made to the file after the scheduled daily backup time.

In another embodiment, the file identification module 704 is further configured to receive file metadata from an agent installed on the file system 710 in response to a file editing event. The file identification module 704 is configured to receive file metadata from an agent installed on the file system 710. In operation, an agent installed on the file system 710 is used to transmit metadata about files that may be changed in response to a file editing event. File editing events include, but are not limited to, opening a file, closing a file, creating a file, deleting a file or a portion of a file, renaming a file, writing or adding information to a file, refreshing a file, reading from a file, moving a file, and the like.

In yet another embodiment, the file identification module 704 is further configured to send queries to the file system 710 at predetermined time intervals and receive the file metadata as described above in response. The file identification module 704 is configured to send or transmit queries to the file system 710 at predetermined time intervals. In general, the file identification module 704 transmits a query to the file system 710 to receive file metadata as a response to the query. The query is sent to the file system 710 to facilitate transfer of file metadata from the file system 710 in response.

In one embodiment, the file identification module 704 is further configured to determine the applicability of each file based on file native metadata, where the file native metadata includes one or more of a file name, a file size, a file user permission, a file group permission, a creation time, a last access time, a last modification time, and a file type. The file identification module 704 is configured to analyze each file in the file system 710 to further determine the applicability of each file. For example, the appropriate file indicates a change or update in the corresponding file, and thus a requirement to be backed up is posed. In general, the file identification module 704 determines the applicability of each file based on the file native metadata. The file identification module 704 identifies changes based at least on the file-native metadata and further determines the applicability of the file based thereon. For example, when a change in a file (if any) occurs after a previous scheduled daily backup time, the file identification module 704 determines that the file is suitable for backup ahead of time.

In another embodiment, the file type indicates that the editing of the file is append only. The file identification module 704 is configured to identify applicability of a file based at least on a file type, which indicates a type of editing or changing performed on the file. For example, changes or edits made to the file are added only. In other words, only files in the file system 710 are added. In general, each file is analyzed separately and grouped together accordingly to achieve efficient operation. In one example, one or more files determined to be suitable for backup are grouped into one group, while the remaining files are grouped into one group for backup at a later time window or point in time.

In one embodiment, the file identification module 704 is also configured to determine the expected last access time by analyzing file metadata collected over time. The file identification module 704 is used to determine the expected last access time for each file in the file system 710. After determining the expected last access time, the file identification module 704 determines whether a change has occurred at the expected last access time to further determine whether the file is suitable for backup. For example, if the expected last access time is before the scheduled daily backup time, the files may be backed up in advance in a user-defined advance backup window.

In another embodiment, the plurality of remaining files includes files that were not backed up within the early backup window. Alternatively, the plurality of remaining files refers to files in the file system 710 that are not backed up within the early backup time window. In one example, the plurality of remaining files includes files that were determined by the file identification module 704 to be unsuitable for backup ahead of time due to possible changes or updates at a later point in time.

In one embodiment, planning a daily backup includes checking whether files backed up within an advanced backup window have been further edited. The file identification module 704 is configured to check whether files backed up in the early backup time window have undergone any changes or updates prior to planning the daily backup time. In general, the file identification module 704 compares a snapshot of a file backed up within an early backup time window with a current snapshot of the file. If changes or edits are inferred from the comparison, the file identification module 704 is operable to backup the edited file and optionally replace an early backup of the file.

For the method 100 or 200, the various embodiments, operations, and variations of the above publications are applicable to the apparatus 700 by comparison.

Referring to fig. 8A and 8B, block diagrams of systems 800A, 800B, respectively, for backing up a file system 814 according to embodiments of the present disclosure are shown. In general, the system 800A or 800B is directed to a computing system operable to perform a backup of the file system 814. The system 800A or 800B may be referred to as hardware, software, firmware, or a combination thereof for backing up the file system 814. The systems 800A and 800B of fig. 8A and 8B should be read in conjunction with fig. 1 through 7. In general, the system described above includes a processor 812 for performing backup operations.

Referring to FIG. 8A, a block diagram of a system 800A for backing up a file system 814 is shown, according to an embodiment of the present disclosure. Here, processor 812 is part of a file system server 810, and initiating a backup includes sending the file to backup server 820. In operation, processor 812 of file system server 810 is configured to initiate a backup of files on file system 814, wherein initiating the backup includes sending or transmitting the files to backup server 820. Typically, the backup is initiated by the file system server 810 using the processor 812 to backup files on the file system 814.

Referring to FIG. 8A, a block diagram of a system 800A for backing up a file system 814 is shown, according to an embodiment of the present disclosure. Here, processor 812 is part of a file system server 810, and initiating a backup includes sending the file to backup server 820. In operation, processor 812 of file system server 810 is configured to initiate a backup of files on file system 814, wherein initiating the backup includes sending or transmitting the files to backup server 820. Typically, the backup is initiated by the file system server 810 using the processor 812 to backup files on the file system 814. File system server 810 and backup server 820 refer to types of servers that are used to facilitate backup of data, files, applications, and/or databases. It should be noted that each of file system server 810 and backup server 820 includes hardware and software capabilities for managing and restoring files from a backup.

Referring to FIG. 8B, a block diagram of a system 800B for backing up a file system 814 is shown, according to an embodiment of the present disclosure. Here, processor 812 is part of backup server 830, and initiating the backup includes requesting the file from file system server 814. In operation, processor 812 of backup server 830 is configured to initiate a backup of files in file system 814, wherein initiating the backup includes requesting to receive the files from file system server 810. Typically, the backup is initiated by backup server 830 using processor 812 to backup files in file system 824. It should be noted that in contrast to the file system server 810 depicted in FIG. 8A, the backup server 830 includes a processor 812 for initiating a backup.

Referring to fig. 9A and 9B, block diagrams of systems 900A, 900B, respectively, for controlling backup of a file system 914 according to embodiments of the present disclosure are shown. In general, the system 900A or 900B is directed to a computing system operable to perform a backup of the file system 914. The system 900A or 900B may be referred to as hardware, software, firmware, or a combination thereof for controlling the backup of the file system 914. The systems 900A and 900B of fig. 9A and 9B should be read in conjunction with fig. 1 to 7, 8A and 8B.

Referring to FIG. 9A, a block diagram of a system 900A for controlling backup of a file system 914 is shown, according to an embodiment of the disclosure. In general, system 900A includes an apparatus 912 (similar to apparatus 700 of FIG. 7) for controlling backup of a file system 914. Here, the apparatus 912 is part of a file system server 910, and initiating the backup includes sending the file to a backup server 920. In operation, the means 912 of the file system server 910 is configured to initiate a backup of files on the file system 914, wherein initiating the backup includes sending or transmitting the files to the backup server 920. Typically, the backup is initiated by the file system server 910 using the device 912 to backup files on the file system 914. The file system server 910 and backup server 920 refer to types of servers that are used to facilitate backup of data, files, applications, and/or databases. It should be noted that the file system server 910 includes hardware and software capabilities for managing and recovering files from backups.

Referring to FIG. 9B, a block diagram of a system 900B for controlling backup of a file system 914 is shown, according to an embodiment of the disclosure. In general, system 900B includes means 912 for controlling backup of file system 914. Here, the device 912 is part of a backup server 930, and initiating backup includes requesting files from the file system server 910. In operation, means 912 of backup server 930 is used to initiate a backup of a file on file system 914, where initiating the backup includes requesting to receive the file from file system server 910. Typically, the backup is initiated by backup server 930 using device 912 to backup files on file system 914. The file system server 910 and backup server 930 refer to server types that facilitate backup of data, files, applications, and/or databases. It should be noted that in contrast to the file system server 910 shown in FIG. 9A, the backup server 930 includes hardware and software capabilities to manage, control, and recover files from the backup.

Modifications may be made to the embodiments of the disclosure described above without departing from the scope of the disclosure, which is defined by the appended claims. The use of expressions such as "comprising," "including," "incorporating," "having," "being" and the like for describing and claiming the present disclosure is intended to be interpreted in a non-exclusive manner, i.e., to allow for items, components or elements not expressly described to exist as well. Reference to the singular is also to be construed to relate to the plural. The word "exemplary" is used herein to mean "serving as an example, instance, or illustration. Any embodiment described as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments, and/or to exclude combinations of features from other embodiments. The word "optionally" as used herein means "provided in some embodiments and not provided in other embodiments. It is appreciated that certain features of the disclosure, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the invention, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable combination or as in any other described embodiment of the disclosure.

Claims

1. A computer-implemented method (100) for backing up a file system (202, 710, 814, 914), the method (100) comprising:

a processor (812) receives user input providing an early backup time window that precedes a scheduled daily backup time of the file system (202, 710, 814, 914);

the processor (812) identifies one or more files in the file system (202, 710, 814, 914) suitable for backup in advance;

the processor (812) initiates a backup-ahead for the one or more determined files within the backup-ahead time window;

the processor (812) initiates a scheduled daily backup for a plurality of remaining files in the file system (202, 710, 814, 914) at the scheduled daily backup time.

2. The method (100) of claim 1, wherein identifying includes analyzing file metadata to identify one or more files that have been edited since a previous scheduled daily backup time.

3. The method (100) of claim 2, further comprising, in response to a file editing event, the processor (812) receiving the file metadata from an agent installed on the file system (202, 710, 814, 914).

4. The method (100) of claim 2, further comprising the processor (812) sending a query to the file system (202, 710, 814, 914) at predetermined time intervals and receiving the file metadata in response.

5. The method (100) of any of claims 2 to 4, wherein identifying comprises determining suitability of each file from file native metadata, wherein the file native metadata comprises one or more of a file name, a file size, a file user permission, a file group permission, a creation time, a last access time, a last modification time, and a file type.

6. The method (100) of claim 5, wherein the file type indicates that editing of the file is append only.

7. The method (100) of any of claims 2 to 6, wherein identifying comprises determining an expected last access time by analyzing file metadata collected over time.

8. The method (100) of any of the preceding claims, wherein the plurality of remaining files comprises files that are not backed up within the early backup window.

9. The method (100) of any of the preceding claims, wherein the planning a daily backup includes the processor (112) checking whether files backed up within the early backup window have been further edited.

10. The method (100) of any of the preceding claims, wherein the processor (812) is part of a file system server (810), and initiating a backup comprises sending the file to a backup server (204, 820).

11. The method (100) of any of claims 1 to 9, wherein the processor (812) is part of a backup server (204, 830), initiating a backup comprising requesting the file from the file system server (810).

12. A computer readable medium storing instructions that, when executed by a processor (812), cause the processor (812) to perform the method (100) of any of the preceding claims.

13. An apparatus (700, 912) for controlling backup of a file system (202, 710, 814, 914), comprising:

an interface (702) for receiving user input providing an early backup time window, the early backup window preceding a planned daily backup time of the file system (202, 710, 814, 914);

a file identification module (704) for identifying one or more files in the file system (202, 710, 814, 914) suitable for backup in advance;

a file backup module (706) for:

Starting an early backup for the one or more determined files within the early backup time window;

a scheduled daily backup is initiated for a plurality of remaining files in the file system (202, 710, 814, 914) at the scheduled daily backup time.

14. The apparatus (700, 912) of claim 13, wherein the file identification module (704) is further configured to analyze file metadata to identify one or more files that have been edited since a previous scheduled daily backup time.

15. The apparatus (700, 912) of claim 14, wherein the file identification module (704) is further configured to receive the file metadata from an agent installed on the file system (202, 710, 814, 914) in response to a file editing event.

16. The apparatus (700, 912) of claim 14, wherein the file identification module (704) is further configured to send a query to the file system (202, 710, 814, 914) at predetermined time intervals and receive the file metadata in response.

17. The apparatus (700, 912) of any of claims 14-16, wherein the file identification module (704) is further configured to determine the applicability of each file based on file native metadata, wherein the file native metadata includes one or more of a file name, a file size, a file user rights, a file group rights, a creation time, a last access time, a last modification time, and a file type.

18. The apparatus (700, 912) of claim 17, wherein the file type indicates that editing of the file is append only.

19. The apparatus (700, 912) of any of claims 14-18, wherein the file identification module (704) is further configured to determine an expected last access time by analyzing file metadata collected over time.

20. The apparatus (700, 912) of any of claims 13-19, wherein the plurality of remaining files includes files that are not backed up within the early backup window.

21. The apparatus (700, 912) of any of claims 13-20, wherein the planning a daily backup includes checking whether files backed up within the early backup window have been further edited.

22. The apparatus (700, 912) according to any of claims 13-21, wherein the apparatus (912) is part of a file system server (910), and initiating a backup comprises sending the file to a backup server (920).

23. The apparatus (700, 912) according to any of claims 13-21, wherein the apparatus (912) is part of a backup server (930), initiating a backup comprising requesting the file from the file system server (910).