CN116209985A - Storage device and method for generating data backup by adopting backup strategy - Google Patents

Storage device and method for generating data backup by adopting backup strategy Download PDF

Info

Publication number
CN116209985A
CN116209985A CN202080105357.XA CN202080105357A CN116209985A CN 116209985 A CN116209985 A CN 116209985A CN 202080105357 A CN202080105357 A CN 202080105357A CN 116209985 A CN116209985 A CN 116209985A
Authority
CN
China
Prior art keywords
backup
workload
data elements
controller
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202080105357.XA
Other languages
Chinese (zh)
Inventor
阿萨夫·纳塔逊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of CN116209985A publication Critical patent/CN116209985A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • G06F11/1451Management of the data involved in backup or backup restore by selection of backup contents
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • G06F11/1453Management of the data involved in backup or backup restore using de-duplication of the data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A storage apparatus enables automatic backup policy management to create backups of a plurality of data elements using a backup policy with necessary settings. The storage device includes a memory and a controller. The memory is configured to store a plurality of data elements included in the workload. The controller is configured to receive an indication of a workload to backup. The controller is further configured to receive an indication of a backup policy, wherein the backup policy is for a workload and is applied to all of the plurality of data elements to be backed up. The controller is further configured to generate a backup of each of the data elements according to a backup policy of the workload, wherein the backup policy includes settings regarding one or more of: lossy data reduction, anonymization requirements, leakage source detection, and compliance requirements.

Description

Storage device and method for generating data backup by adopting backup strategy
Technical Field
The present disclosure relates generally to the field of data storage; and more particularly to a storage device and method for generating a data backup using a backup policy.
Background
Typically, data backups are used to restore data in the event of a data loss in a storage system. For example, a separate backup system or secondary storage system is typically used to store a backup of data present in the primary storage system. In general, a storage system (i.e., a storage device) is not only used to store backups of data existing in a primary storage system, but also enables backups stored therein to be copied to a plurality of backup storage systems, such as a cloud storage system.
Often, prior to storing the backup copies to multiple backup storage systems, modifications to the backup are required to protect personal data and sensitive data. For example, replicas of backups for testing and development often require data modification, such as data anonymization, so that sensitive data in the backup (e.g., credit card data, social security numbers, and addresses) are not exposed. However, such backup modifications are typically performed manually and require anonymizing software. Conventional storage devices do not provide effective anonymization of the backup.
Furthermore, data compliance is also a consideration in scenarios where the backup replica is exported to a geographic location other than the original geographic location of the conventional storage device storing the backup. In particular, data in the backup replica needs to be modified according to regulations of geographic locations to achieve data protection and privacy. For example, if the backup is stored in the European Union (EU) area, general data protection regulations (General Data Protection Regulation, GDPR) compliance are required to protect the personal data stored in the backup. However, conventional storage devices either fail to address compliance requirements or are inefficient to address compliance requirements.
In addition, data reduction techniques are typically applied when storing backup replicas in another backup storage system. The data reduction technique may be a lossless data reduction technique or a lossy data reduction technique, as desired. Different data reduction techniques are associated with different pricing considerations. For example, existing suppliers typically allow free storage of replicas for backups with lossy data reduction, but charge fees for storage of replicas for backups with lossless data reduction. Conventional storage devices do not provide cost-effective data reduction, compliance management, and data modification.
Thus, in light of the foregoing discussion, there is a need to overcome the previously mentioned drawbacks associated with conventional storage devices and methods for managing data backups in conventional storage devices.
Disclosure of Invention
The present disclosure seeks to provide a storage device and method for generating a data backup using a backup policy. The present disclosure seeks to provide a solution to one or more of the existing problems with respect to inefficient and unreliable compliance, data modification, and data reduction in conventional storage devices. It is an object of the present disclosure to provide a solution that at least partially overcomes the problems encountered in the prior art and to provide a storage device and method for efficient and reliable backup generation using a backup strategy with required settings.
The object of the present disclosure is achieved by the solutions provided in the attached independent claims. Advantageous implementations of the present disclosure are further defined in the dependent claims.
In one aspect, the present disclosure provides a storage device. The storage device includes a memory and a controller. The memory is configured to store a plurality of data elements included in the workload. The controller is configured to: receiving an indication of a workload to be backed up, the workload comprising a plurality of data elements; receiving an indication of a backup policy, wherein the backup policy is for a workload and applies to all of a plurality of data elements to be backed up; and generating a backup of each of the data elements according to a backup policy of the workload, wherein the backup policy includes settings regarding one or more of: lossy data reduction, anonymization requirements, leakage source detection, and compliance requirements.
The storage device enables automatic backup policy management to create an optimized backup of a plurality of data elements. The storage device supports storing backups in a plurality of backup storage devices at a plurality of geographic locations. The storage enables efficient backup of multiple data elements by automatically applying (i.e., without applying manual data reduction, without requiring manual use of compliance tools, etc.) the necessary settings for the backup policy. Thus, the storage device automatically generates an optimized backup having one or more of the following attributes: is space efficient, cost effective, suitable for protecting sensitive content, and complies with a number of regulations for a number of geographic locations. The storage device also enables a backup strategy for leakage source detection to prevent data leakage in a backup or to track data leakage in a backup.
In an implementation, the controller is further configured to generate a backup of each of the data elements according to a backup policy of the workload by one or more of: utilizing a compression algorithm based on the lossy data reduction; adjusting the content of the data element according to anonymization requirements; adding a tag indicating a source to the data element in accordance with the leakage source detection; and adjusting the content of the data element according to compliance requirements.
Compression algorithms for lossy data reduction reduce the size of multiple data elements when generating a backup and thus enable a space-efficient backup to be created. In addition, adjusting the content of the plurality of data elements according to anonymization requirements can protect private or sensitive information stored in the plurality of data elements. Furthermore, adding markers enables the source of data elements to be accurately and reliably tracked in the event of a data leak by using the markers to track the location of such a leak. In addition, the content of the plurality of data elements is modified according to compliance requirements to ensure that the backup is performed in accordance with laws, policies, and regulations regarding data privacy and security.
In another implementation, the controller is further configured to adjust the content of the data element according to anonymization requirements by blurring the face, changing the sound, and/or altering the personal information.
Characteristics such as face, sound and personal information clearly indicate the identity of the entity. To prevent identification of an entity, these unique characteristics are anonymized in such a way that the content of the data element is adjusted. These anonymizations require assistance in preventing identity theft, protecting privacy, etc.
In another implementation, the mark added to the data element based on the leakage source detection is a watermark.
Watermarks are often difficult to damage or detect. Thus, watermarking enables tracking of the source of data leakage in the event of data leakage by tracking the location of the data elements.
In another implementation, the controller is further configured to adjust the content of the data element according to compliance requirements by blurring the child's face and/or deleting addresses, names, and/or identification numbers.
Such a way of adjusting the content of the data element prevents the identification of the entity. Since certain geographic locations have certain compliance requirements in protecting the identity of an entity, such a manner of adjusting the content of a data element by blurring and/or deleting the uniquely identifiable characteristics of the entity enables the storage device to automatically comply with such compliance requirements.
In another implementation, the controller is further configured to generate a backup for each of the data elements according to the backup policy of the workload by generating a copy of the data elements and then applying a filter to the copy of the data elements according to the backup policy.
In this way, the required filters can be effectively applied to multiple copies of the data elements to ensure that the required settings on the data elements are properly achieved according to the backup policy.
In another implementation, the controller is further configured to receive a backup policy from a user.
When a backup strategy is received from a user, the controller adopts the most suitable backup strategy according to the requirement of the user to generate backup. Thus, the generated backup is optimized according to the user's requirements.
In another implementation, the controller is further configured to receive the backup policy from the user by receiving one or more of: a user indication of lossy data reduction, a user indication of anonymization requirements, a user indication of leakage source detection, and a user indication of compliance requirements.
By receiving one or more of the foregoing user indications, a user requirement regarding the backup policy is provided to the controller. In this way, the controller is able to generate an optimized backup of each of the data elements according to the user's requirements.
In another implementation, the controller is further configured to receive the backup strategy from the user by receiving a selection of the backup strategy.
In this way, the backup strategy received by the controller is selected by the user and is therefore more suitable for the user's data backup requirements than the automatically selected backup strategy. The backup of data elements generated by employing a user-selected backup strategy is performed according to the user's requirements.
In another implementation, the controller is further configured to receive a backup policy from the user by receiving an indication of the regulatory requirement, wherein the backup policy matches the regulatory requirement.
By receiving user indications of regulatory requirements, the controller can efficiently select and apply appropriate backup policies that adjust the content of the data elements in generating the backup in accordance with the regulatory requirements. This ensures that the backup meets compliance requirements.
In another aspect, the present disclosure provides a method for a storage device including a memory configured to store a plurality of data elements included in a workload. The method comprises the following steps: receiving an indication of a workload to be backed up, the workload comprising a plurality of data elements; receiving an indication of a backup policy, wherein the backup policy is for a workload and applies to all of a plurality of data elements to be backed up; and generating a backup of each of the data elements according to a backup policy of the workload, wherein the backup policy includes settings regarding one or more of: lossy data reduction, anonymization requirements, leakage source detection, and compliance requirements.
The method of the present aspect achieves all the advantages and effects of the storage device of the present disclosure.
In an implementation, a computer-readable medium carrying computer instructions that, when loaded into and executed by a controller of a storage device, enable the storage device to perform the method is provided.
Computer readable media carrying computer instructions implement all the advantages and effects of the storage device or the method.
In another aspect, the present disclosure provides a storage device comprising a memory configured to store a plurality of data elements included in a workload. The storage device further includes: a workload receiving software module for receiving an indication of a workload to be backed up, the workload comprising a plurality of data elements; a backup policy receiving software module for receiving an indication of a backup policy, wherein the backup policy is for a workload and applies to all of a plurality of data elements to be backed up; and a backup generation software module for generating a backup of each of the data elements according to a backup policy of the workload, wherein the backup policy includes settings for one or more of: lossy data reduction, anonymization requirements, leakage source detection, and compliance requirements.
The software module is executed to enable the storage device to receive and attach an appropriate backup policy to the workload in order to generate a backup of the plurality of data elements. Dedicated software modules are employed to perform dedicated processing tasks related to generating backups. The use of these software modules advantageously automates backup generation at the storage device in a centralized manner, minimizing manual involvement in applying these processes.
It is noted that all devices, elements, circuits, units and modules described in this application may be implemented in software or hardware elements or any type of combination thereof. All steps described herein as being performed by various entities and functions described as being performed by various entities are intended to mean that the respective entities are adapted or configured to perform the respective steps and functions. Although in the following description of the embodiments, specific functions or steps performed by external entities are not reflected in the description of specific detailed elements of the entity performing the specific steps or functions, it should be clear to a skilled person that these methods and functions may be implemented in the form of corresponding hardware or software elements or any type of combination thereof. It should be understood that features of the present disclosure are susceptible to being combined in various combinations without departing from the scope of the present disclosure as defined by the appended claims.
Additional aspects, advantages, features and objects of the present disclosure will become apparent from the accompanying drawings and detailed description of illustrative implementations which are explained in connection with the following appended claims.
Drawings
The foregoing summary, as well as the following detailed description of illustrative embodiments, is better understood when read in conjunction with the appended drawings. For the purpose of illustrating the disclosure, there is shown in the drawings exemplary constructions of the disclosure. However, the present disclosure is not limited to the specific methods and instrumentalities disclosed herein. Moreover, it will be appreciated by those skilled in the art that the drawings are not to scale. Identical elements are denoted by the same reference numerals, where possible.
Embodiments of the present disclosure will now be described, by way of example only, with reference to the following figures, in which:
FIG. 1 is a block diagram illustrating various exemplary components of a storage device according to embodiments of the present disclosure;
FIG. 2 is an exemplary illustration of backing up data elements to a backup storage system according to an embodiment of the present disclosure;
FIG. 3 is an exemplary illustration of a data element including a tag added for leakage source detection in accordance with an embodiment of the present disclosure;
FIG. 4 is an exemplary illustration of data elements whose content has been adjusted according to compliance requirements in accordance with an embodiment of the present disclosure; and
fig. 5 is a flowchart of a method for a storage device according to an embodiment of the present disclosure.
In the drawings, an underlined reference numeral is used to denote an item where the underlined reference numeral is located or an item adjacent to the underlined reference numeral. The non-underlined reference numerals relate to items identified by lines associating the non-underlined reference numerals with the items. When reference numerals are not underlined and associated arrows are attached, the reference numerals without underline are used to identify general items to which the arrows point.
Detailed Description
The following detailed description illustrates embodiments of the disclosure and the manner in which the embodiments may be implemented. While several modes of carrying out the disclosure have been disclosed, those skilled in the art will recognize that other embodiments for carrying out or practicing the disclosure are also possible.
Fig. 1 is a block diagram illustrating various exemplary components of a storage device according to embodiments of the present disclosure. Referring to FIG. 1, a storage device 102 is shown. The storage 102 includes a memory 104 and a controller 106. Storage 102 is also shown to include a communication interface 108 and one or more software modules, such as software module 110.
The storage 102 may comprise suitable logic, circuitry, devices, interfaces, and/or code that may be configured to support automatic backup policy management for a plurality of data elements to generate a backup for each of the data elements. The backup of each of the data elements may be stored in one or more backup storage systems (e.g., cloud storage system, local backup storage system, remote backup storage system, etc.). In other words, the storage 102 enables intelligent data backup management and enables, for example, creating a space efficient and multiple lossy backup of data elements subject to multiple regulations without requiring manual application of specialized data processing tools.
In an embodiment, the storage device 102 is a secondary storage device. Examples of the storage 102 may include, but are not limited to, a server, a production environment system, a computing device in a computer cluster (e.g., a massively parallel computer cluster), a portable or non-portable electronic device, a drone, or a supercomputer.
The memory 104 is configured to store a plurality of data elements included in a workload. The memory 104 may comprise suitable logic, circuitry, and/or interfaces that may be configured to store a plurality of data elements included in a workload. In an embodiment, the memory 104 of the storage device 102 is a secondary memory that stores a plurality of data elements received from a primary storage system (e.g., a host server). The memory 104 may also store instructions that may be executed to control the storage device 102. Memory 104 may additionally store an operating system and/or other program products to operate storage device 102. Examples of implementations of memory 104 may include, but are not limited to, a Hard Disk Drive (HDD), a flash Drive, a Secure Digital (SD) card, a Solid-State Drive (SSD), a network attached storage (Network Attached Storage, NAS), or another computer storage medium.
The controller 106 may comprise suitable logic, circuitry, and/or interfaces that may be configured to implement the processing steps associated with automatic backup policy management to generate backups of a plurality of data elements. The controller 106 is a computing element configured to execute processing instructions that drive the storage device 102. Examples of controller 106 include, but are not limited to, a microprocessor, a microcontroller, a complex instruction set computing (Complex Instruction Set Computing, CISC) processor, a reduced instruction set (Reduced Instruction Set Computing, RISC) processor or very long instruction word (Very Long Instruction Word, VLIW) processor, an Application-specific integrated circuit (ASIC) processor, a central processing unit (Central Processing Unit, CPU), a data processing unit, and other processors or control circuits.
It should be appreciated that the controller 106 may be employed to generate backups of an entire file system, a group of file systems, a single database, a collection of databases, and similar data structures.
In an embodiment, the storage device 102 further includes a communication interface 108. The communication interface 108 is an arrangement of interconnected programmable and/or non-programmable components configured to facilitate data communication between one or more electronic devices. The communication interface 108 supports various wired or wireless communication protocols with respect to one or more of the following: peer-to-peer, hybrid peer-to-peer, local area network (Local Area Network, LAN), wireless access network (Radio Access Network, RAN), metropolitan area network (Metropolitan Area Network, MAN), wide area network (Wide Area Network, WAN), all or a portion of a public network (e.g., a global computer network known as the internet), wireless fidelity (Wireless Fidelity, wi-Fi) network, wireless personal area network (wireless personal area network, WPAN), private network, cellular network, and any other communication network or networks at one or more locations. Additionally, the communication interface 108 supports wired or wireless communications that may be implemented via any number of known protocols, including, but not limited to, transmission control protocol and internet protocol (Transmission Control Protocol and Internet Protocol, TCP/IP), user datagram protocol (User Datagram Protocol, UDP), hypertext transfer protocol (Hypertext Transfer Protocol, HTTP), file transfer protocol (File Transfer Protocol, FTP), IEEE 802.16, light Fidelity (Li-Fi), wireless access protocol (Wireless Access Protocol, WAP), frame relay or asynchronous transfer mode (Asynchronous Transfer Mode, ATM), and/or other cellular communication protocols. In addition, the communication interface 108 may also employ and support any other suitable protocol that uses voice, video, data, or a combination thereof.
In an embodiment, the storage device 102 further includes a software module 110. In an exemplary implementation, the software modules 110 include one or more workload receiver software modules (e.g., workload receiving software module 110 a), one or more backup policy receiver software modules (e.g., backup policy receiving software module 110 b), and one or more backup generating software modules (e.g., backup generating software module 110 c). In an implementation, the software modules 110 (including the software modules 110 a-110 c) may be implemented as separate circuits in the storage 102. Alternatively, in another implementation, software module 110 is implemented as circuitry to perform various operations of software modules 110 a-110 c.
In operation, the controller 106 is configured to receive an indication of a workload to be backed up, the workload including a plurality of data elements. Upon receiving the indication, the controller 106 initiates a backup operation of the workload for creating a backup of the plurality of data elements on the at least one backup storage system. The controller 106 receives an indication of the workload via the communication interface 108. The indication is sent by the user via the user device. Examples of user devices may include, but are not limited to, personal computers (e.g., desktop computers, notebook computers, etc.), personal digital assistants (Personal Digital Assistant, PDAs), or smart phones.
A "data element" is a unit of data in a workload. Examples of a given data element may include, but are not limited to, a document (e.g., a text document, a spreadsheet, a form, a certificate, an email, a presentation, etc.), an image, video, audio, or other form of data element.
The controller 106 is configured to receive an indication of a backup policy that is workload-specific and that applies to all of the plurality of data elements to be backed up. The indicated backup policy enables the controller 106 to automatically manage and create backups of multiple data elements of the workload. The indicated backup policy will be applied to all of the plurality of data elements of the workload to generate a backup. For example, in the case where the workload includes four data elements, the indicated backup policy will apply to all four data elements. In some implementations, the indication of the backup policy is provided by the user using the user device. Such an implementation is described in detail below. In other implementations, the indication of the backup policy is provided by a backup policy selection software module (not shown). The backup policy selection software module automatically indicates the backup policy based on at least one of: characteristics of a plurality of data elements in the workload, information about at least one backup storage system on which the backup is to be stored. By way of example, given the geographic location of a given backup storage system, a given backup policy that complies with regulations for that geographic location will be automatically selected. Alternatively, in such an implementation, multiple backup policies may be pre-stored at the memory 104 of the storage 102.
In an embodiment, the controller 106 is further configured to receive a backup policy from a user. The backup strategy is sent by the user to the controller 106 via the communication interface 108. The user typically sends a backup strategy to the controller 106 that is appropriate for his/her needs. When the backup policy of the workload is indicated by the user, the controller 106 will be caused to generate an optimized backup that is well suited to the user's requirements. Upon receiving a user-provided backup strategy, the backup strategy is appended to (i.e., associated with) a workload (specifically, to a plurality of data elements) by the controller 106. The user may send the backup strategy to the controller 106 using the user device. In an embodiment, the backup policy is stored at a memory module of the user device. When a backup policy is received at the storage 102, the backup policy is stored in the memory 104.
In an embodiment, the controller 106 is further configured to receive the backup policy from the user by receiving one or more of: a user indication of lossy data reduction; anonymizing the required user indication; user indication of leak source detection; and user indications of compliance requirements. One or more of the foregoing user indications relate to one or more settings in the backup policy. The given user indication may indicate whether or not to apply the given setting during the backup and optionally how to apply the given setting. The controller 106 then receives a given backup strategy that matches the given user indication. By means of one or more user indications, the user provides the controller 106 with a backup policy to be employed to generate a backup of the workload. The user may provide a given user indication to the controller 106 using the user device. The controller 106 receives a given user indication via the communication interface 108.
In an embodiment, the given user indication is in the form of: text indication, visual indication (e.g., image, video, etc.), audio indication (e.g., voice input), touch indication.
In an example, the controller 106 may receive a backup policy for the workload through user indication of lossy data reduction. Such user indications enable the controller 106 to create a backup of the plurality of data elements by employing techniques for lossy data reduction over the plurality of data elements. The backups generated in such cases have a size that is smaller than the original size of the plurality of data elements and thus may be stored in a space-efficient format on the at least one backup storage system. Advantageously, the user-indicated backup strategy corresponding to lossy data reduction enables the controller 106 to generate backups of multiple data elements using lossy data reduction without requiring the user to manually apply any data reduction tools. Further, optionally, the amount of lossy data reduction may be configured by the user in a user indication of lossy data reduction. The user may configure the amount of lossy data reduction based on one or more of: user preferences regarding backup, the manner in which the backup is to be used, the storage capacity of at least one backup storage system on which the backup is to be stored.
In another example, the controller 106 may receive a backup policy for the workload through user indication of anonymization requirements. Such user indications enable the controller 106 to protect private or sensitive information stored in the plurality of data elements when generating the backup. The plurality of data elements may contain information that allows an entity (e.g., a person, a group of persons, an organization, etc.) to be uniquely identified using such information. Anonymization requirements may define conditions that need to be met when generating backups using backup policies in order to prevent identification of entities using such information. Different users may provide user indications of different anonymization requirements according to their preferences or needs. As an example, a user indication received by anonymization requirements may require the controller 106 to change (or delete) personal information, such as credit card numbers and social security numbers detected in multiple data elements, to hide the identity of the entity associated with such personal information. As another example, a user indication received by anonymization requirements may require the controller 106 to change the speech in the audio file to avoid identifying the speaker in the backup.
In yet another example, controller 106 may receive a backup policy for the workload through a user indication of leak source detection. Such user indications enable the controller 106 to track the source of the data leak in the event of a data leak. Further, such user instructions may also instruct controller 106 regarding how to include provisions for leakage source detection when generating a backup. Herein, the term "data leakage" refers to the unauthorized transmission of a backup of a given data element from any storage device (e.g., at least one backup storage system) to an unauthorized recipient.
In yet another example, the controller 106 may receive a backup policy for the workload through user indication of compliance requirements. Such user indications enable the controller 106 to change (i.e., adjust) the content of the plurality of data elements according to compliance requirements when generating the backup. The user indication may also specify compliance requirements that serve as a basis for making the change. Compliance requirements are applied by the controller 106 in generating the backup to ensure that the backup complies with (i.e., complies with) laws, policies and regulations of the geographic location of the at least one backup storage system or with (i.e., complies with) regulations commonly followed. This enables the controller 106 to ensure storage of sensitive information in the backup in accordance with data protection regulations.
According to an embodiment, the controller 106 is further configured to receive a backup strategy from a user by receiving a selection of the backup strategy. In this regard, the backup strategy is selected from a plurality of backup strategies. Different backup strategies may include different settings regarding backup generation. The user selects the backup strategy that is most appropriate for the user's backup requirements so that the generated backup is optimized accordingly. The plurality of backup policies may be stored at the memory 104, at a memory module of the user device, etc.
According to an embodiment, the controller 106 is further configured to receive a backup strategy from the user by receiving an indication of the regulatory requirement, wherein the backup strategy matches the regulatory requirement. The user indication of the regulatory requirements provides the regulatory requirements (including both local regulatory requirements and general regulatory requirements) that need to be met when generating the backup. Thus, backup policies that match regulatory requirements are received by the controller 106. This backup strategy allows for the storage and modification of multiple data elements in accordance with regulatory requirements applicable to at least one backup storage system. Thus, controller 106 is permitted to manage the storage of sensitive (or personal) data of the workload in the at least one backup storage system in accordance with legal and government regulations of the geographic location of the at least one backup storage system. For example, when at least one backup storage system is implemented in a european union country, the controller 106 may automatically receive a backup policy that includes settings for adhering to general data protection regulations (General Data Protection Regulation, GDPR) regarding data protection and privacy.
The controller 106 is further configured to generate a backup of each of the data elements according to a backup policy of the workload, wherein the backup policy includes settings regarding one or more of: lossy data reduction; anonymization requirements; detecting a leakage source; compliance requirements. The settings of the backup policy include specifications and/or rules that allow the controller 106 to modify the plurality of data elements to obtain an optimized backup for each of the plurality of data elements of the workload.
In an embodiment, the settings for lossy data reduction include specifications and/or rules that allow the controller 106 to perform lossy compression on a plurality of data elements of the workload of the storage system device 102. Lossy compression removes unnecessary, less important and/or redundant data from multiple data elements when generating a backup. This results in a backup copy having a reduced size that requires less disk space than the size of the plurality of data elements in the storage 102. In the advanced mode, the amount of loss in the plurality of data elements may be configured by the controller 106 according to settings regarding lossy data reduction. Alternatively, the amount of loss may depend on the type of the plurality of data elements. For example, the controller 106 may perform a lossy compression algorithm on the data elements at a compression ratio of 50% such that a compressed backup of data elements having half the size of the original data elements is produced. The setting for lossy data reduction may further comprise an algorithm that performs lossy data reduction on the plurality of data elements. Algorithms for lossy data reduction may include, but are not limited to, discrete cosine transform (Discrete Cosine Transform, DCT), fractal compression, chroma sub-sampling, and color reduction. The controller 106 may perform different compression algorithms depending on the type of data element. For example, the controller 106 may perform different compression algorithms for the image data elements and the audio data elements. Such an algorithm is employed in generating the backup before sending the backup to the at least one storage system. Notably, the provision of lossy data reduction enables the controller 106 to store backups of a plurality of data elements in at least one backup storage system in a space efficient format.
In an embodiment, the setting of anonymization requirements includes specifications and/or rules that allow the controller 106 to modify the content of the plurality of data elements to anonymize identities of entities associated with such content. These settings enable the controller 106 to protect private or sensitive information stored in a plurality of data elements. For example, a rule in the setting of anonymization requirements may anonymize private information such as social security numbers and financial detailed information, while a specification in the setting may change all numbers of the social security numbers and financial detailed information to "1". The setting of anonymization requirements may further comprise an algorithm that performs anonymization on the plurality of data elements. Algorithms for anonymization requirements may include, but are not limited to, machine learning algorithms, incognito algorithms, samarati algorithms, and Datafly algorithms.
In an embodiment, the settings for leakage source detection include specifications and/or rules that allow the controller 106 to avoid data leakage or track the source of data leakage in the event of data leakage. For example, the settings of the leakage source detection include rules to add a flag to a given data element when it is generated to backup. Further, the settings may include specific characteristics of the mark (e.g., type, size, color, location, etc.).
In an embodiment, the setting of compliance requirements includes specifications and/or rules that allow the controller 106 to change the content of the plurality of data elements according to laws, policies, and regulations applicable to the backup of the plurality of data elements. Since legal policies and regulations generally relate to data security and privacy, such settings of compliance requirements provide data protection to multiple data elements.
In an embodiment, the controller 106 is further configured to generate a backup of each of the data elements according to a backup policy of the workload by one or more of: utilizing a compression algorithm based on the lossy data reduction; adjusting the content of the data element according to anonymization requirements; adding a tag indicating a source to the data element in accordance with the leakage source detection; and adjusting the content of the data element according to compliance requirements. One or more of the foregoing processes are performed according to settings in the backup policy to enable the controller 106 to generate a backup. In generating the backup, the controller 106 sends the backup for storage to at least one backup storage system. The controller 106 digitally performs one or more of the foregoing processes in a centralized manner to efficiently and accurately perform backup generation in a systematic manner. In such a case, a separate tool for performing a single process need not be manually employed to generate the backup.
In an embodiment, the controller 106 employs a compression algorithm according to settings in the backup policy regarding lossy data reduction. As an example, different compression algorithms may be employed to perform different degrees of compression as specified in the settings (in the backup policy) for lossy data reduction.
In an embodiment, the controller 106 adjusts the content of the data element according to the setting of anonymization requirements. As an example, different anonymization algorithms may be employed to perform anonymization as specified in the settings regarding anonymization requirements (in the backup policy). For example, where backup copies are used for testing purposes and development purposes, the controller 106 modifies the private information stored in the plurality of data elements when generating the backup.
According to an embodiment, the controller 106 is further configured to adjust the content of the data elements according to anonymization requirements by blurring faces, changing sounds, and/or altering personal information. The controller 106 modifies the data elements according to anonymization requirements in the backup policy. In an example, the faces represented in the images (data elements) are obscured (e.g., blurred, recoloured, etc.) by the controller 106 to prevent identification of the person. The controller 106 may identify and blur the faces represented in the data elements using algorithms such as machine learning, recoloring, square blurring, and the like. In another example, the speech is changed by the controller 106 (by altering the amplitude, altering the pitch and tone of the speech, etc.) to prevent recognition of the individual. The controller 106 may use algorithms such as pitch synchronous overlap and add (Pitch Synchronous Overlap and Add, PSOLA) algorithms, speech morphing algorithms, etc. to alter (i.e., modify) speech in the data elements. In yet another example, the intrinsically sensitive personal information in the plurality of data elements, such as credit card number, date of birth, financial details, address, social security number, etc., is altered by the controller 106 in accordance with anonymization requirements. The controller 106 may alter the sensitive data stored in the data elements using an algorithm for anonymization requirements.
In an embodiment, controller 106 adds a flag to the data element according to the settings of the leak source detection. The indicia is a digital footprint that indicates the source of the data element or its backup and helps track the source of the data leak. The indicia prevents unauthorized use of the plurality of data elements or their backups in the absence of a requisite license (e.g., a requisite license by an authorizer). In an embodiment, the mark is perceptible, thereby preventing data leakage. In another embodiment, the mark is imperceptible, making the mark difficult to break, and thus helping to track the source of the leak. Furthermore, the markers allow for detecting a leaky person by decoding the markers on the data elements. For example, in the scenario where an unauthorized user attempts to access a backup of a data element, the user's identification (e.g., username or IP address) may modify the indicia present in the backup. This allows tracking of the user by decoding the modified mark. Alternatively, if at least one of the backup storage systems is not a trusted system, the indicia is added by the controller 106. The controller 106 may add a tag to the data element using algorithms such as a self-embedding algorithm, a fragile watermarking algorithm, a block-based watermarking algorithm, and a feature-based watermarking algorithm, a watermark embedding algorithm, a neural network training algorithm, a watermark extraction algorithm, and the like.
According to an embodiment, the mark added to the data element according to the leakage source detection is a watermark. Watermarks are digital footprints that are difficult to detect or corrupt and allow detection of data leakage of multiple data elements. The digital footprint may include, but is not limited to, bar codes, text codes, and the like. The watermark indicates the source of the plurality of data elements and/or backups of the data elements, and leakage source detection is achieved by tracking the indicated source on the watermark.
In an embodiment, the controller 106 adjusts (i.e., modifies) the content of the data element according to the settings of the compliance requirements. The content of the plurality of data elements is adjusted to ensure that the backup of the data elements complies with local and government regulations.
According to an embodiment, the controller 106 is further configured to adjust the content of the data elements according to compliance requirements by blurring the child's face and/or deleting addresses, names and/or identification numbers. The controller 106 modifies the data elements according to compliance requirements in the backup policy. In an example, in one scenario, storing images and/or video of a child is prohibited at a geographic location of a backup storage system to which the data elements are to be backed up. To adjust the content of the data elements according to the compliance requirements, the controller 106 may detect faces in the data elements and apply a filter (e.g., a blur filter) to the data elements to blur the child's face. In another example, in one scenario, storing private information such as an address, name, identification number, etc. is prohibited for the backup storage system to which the data element is to be backed up. To adjust the content of the data element according to the compliance requirements, the controller 106 may detect the private information and delete the private information (by using an algorithm such as machine learning).
According to an embodiment, the controller 106 is further configured to generate a backup for each of the data elements according to the backup policy of the workload by generating a copy of the data elements and then applying a filter to the copy of the data elements according to the backup policy. The term "filter" herein refers to a function or software that processes data to perform certain operations on the data (e.g., data reduction, anonymization, leakage source detection, and/or compliance). Here, a copy of a data element refers to a data elementA copy of the element. Further, in an embodiment, depending on the settings in the backup policy, the controller 106 may send the data to the backup deviceMultiple onesA filter is applied to copies of a plurality of data elements in a predefined order. For example, when the settings in the backup policy include lossy data reduction, anonymization requirements, and leakage source detection, the controller 106 may first apply a data reduction filter, then apply an anonymization filter, and then apply a leakage source detection filter (which adds a flag to a copy of the data element). The controller 106 then sends a copy of the data element to the given backup storage system.
FIG. 2 is an exemplary illustration of backing up data elements to a backup storage system according to an embodiment of the present disclosure. Referring to FIG. 2, storage 102 and backup storage system 202 (e.g., depicted as a cloud storage system) are shown. The storage 102 includes a file system 204A, the file system 204A including a copy 206A of a first data element and a copy 208A of a second data element. Controller 106 applies one or more filters to each of copies 206A and 208A according to the backup policy. In an example, a data reduction filter is applied to replica 206A (which may be an image, for example), and then a leakage source detection filter is applied to add a flag to replica 206A. In another example, an anonymization filter is applied to copy 208A (which may be, for example, a document). The backups of the first data element and the second data element are then transferred to the backup storage system 202. Backup storage system 202 stores a backup 204B of file system 204A. The backup 204B includes a backup 206B of the first data element and a backup 208B of the second data element. Backup 206B is optimized with reduced data and leakage source detection capabilities, while backup 208B is optimized for anonymization of sensitive data.
FIG. 3 is an exemplary illustration of a data element including a tag added for leakage source detection, according to an embodiment of the present disclosure. Referring to FIG. 3, a data element 302 is shown that includes a marker 304 added for leak source detection. The data element 302 is, for example, an image. The portion of the data element 302 that includes the marker 304 is shown in enlarged form. The token 304 is difficult to detect and corrupt, thus preventing unauthorized use and distribution of the data element 302 without the need for permission. If such leakage occurs, the tag 304 also helps to detect the source of the leakage of the data element 302.
Fig. 4 is an exemplary illustration of data elements whose content has been adjusted according to compliance requirements in accordance with an embodiment of the present disclosure. Referring to fig. 4, a data element 402 is shown whose content has been adjusted according to compliance requirements. The data element 402 is, for example, an image depicting a child 404 and an adult 406. In one scenario, storing images and/or video of a child is prohibited at the geographic location of the backup storage system to which the data element 402 is to be backed up. To adjust the content of the data element 402 according to compliance requirements that do not store images and/or video of children, the controller 106 may detect faces in the data element 402 and apply a filter (e.g., a blur filter) to the data element 402 to hide the faces of the children 404. Thus, the face of child 404 is obscured by controller 106.
Fig. 5 is a flow chart of a method 500 for a storage device according to an embodiment of the present disclosure. The method 500 is used in a storage device (e.g., storage device 102) that includes a memory (e.g., memory 104) configured to store a plurality of data elements included in a workload. The method 500 is performed by the controller 106 at the storage device 102, such as described in fig. 1. Method 500 includes steps 502, 504, and 506.
At step 502, the method 500 includes receiving an indication of a workload to backup, the workload including a plurality of data elements. The controller 106 receives an indication of the workload via the communication interface 108. The indication is sent by the user via the user device. Upon receiving the indication, the controller 106 initiates a backup operation of the workload for creating a backup of the plurality of data elements on the at least one backup storage system.
At step 504, the method 500 further includes receiving an indication of a backup policy, wherein the backup policy is for a workload and applies to all of the plurality of data elements to be backed up. In some implementations, the indication of the backup policy is received from a user using the user device. In other implementations, the indication of the backup policy is received from a backup policy selection software module. The indicated backup policy enables the controller 106 to automatically manage and create backups of multiple data elements of the workload.
At step 506, the method 500 further includes generating a backup of each of the data elements according to a backup policy of the workload, wherein the backup policy includes settings for one or more of: lossy data reduction; anonymization requirements; detecting a leakage source; compliance requirements. The settings of the backup policy include specifications and/or rules that allow the controller 106 to modify the plurality of data elements to generate an optimized backup for each of the plurality of data elements of the workload.
Steps 502 through 506 are merely illustrative and other alternatives to adding one or more steps, deleting one or more steps, or providing one or more steps in a different order may also be provided without departing from the scope of the claims herein.
According to an embodiment, a computer-readable medium carries computer instructions that, when loaded into and executed by the controller 106 of the storage device 102, enable the storage device 102 to implement the method 500. The computer readable medium carrying computer instructions provides non-transitory memory and may include, but is not limited to, electronic storage, magnetic storage, optical storage, electromagnetic storage, semiconductor storage, or any suitable combination of the foregoing.
In an exemplary aspect, the storage 102 includes a memory 104 configured to store a plurality of data elements included in a workload. The storage 102 also includes a workload receiving software module 110a for receiving an indication of a workload to be backed up, the workload including a plurality of data elements. The storage 102 further comprises a backup policy receiving software module 110b for receiving an indication of a backup policy for all of the plurality of data elements to be backed up for the workload. The storage 102 further comprises a backup generation software module 110c for generating a backup of each of the data elements according to a backup policy of the workload, wherein the backup policy comprises settings regarding one or more of: lossy data reduction; anonymization requirements; detecting a leakage source; compliance requirements.
The software module 110 is executed to enable the storage device 102 to receive and attach an appropriate backup policy to the workload in order to generate a backup of the plurality of data elements. The dedicated software module 110 is employed to perform dedicated processing tasks with respect to backup generation. The use of the software module 110 advantageously automates backup generation at the storage 102 in a centralized manner, minimizing manual involvement in applying these processes. The software module 110 is executed by the controller 106 of the storage device 102.
Modifications may be made to the embodiments of the disclosure described in the foregoing without departing from the scope of the disclosure as defined by the following claims. The terms "comprising," "including," "incorporating," "having," "being/being," and the like, are intended to be interpreted in a non-exclusive manner, i.e., allowing for the existence of items, components, or elements that are not explicitly described. Reference to the singular is also to be construed to relate to the plural. The word "exemplary" is used herein to mean "serving as an example, instance, or illustration. Any embodiment described as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments and/or to exclude the incorporation of features from other embodiments. The word "optionally" as used herein means "provided in some embodiments and not provided in other embodiments". It is appreciated that certain features of the disclosure, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the invention, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable combination or as suitable in any other described embodiment of the disclosure.

Claims (13)

1. A storage device (102) comprising a memory (104) and a controller (106), the memory (104) being configured to store a plurality of data elements included in a workload, and the controller (106) being configured to:
receiving an indication of a workload to be backed up, the workload comprising a plurality of data elements;
receiving an indication of a backup policy, wherein the backup policy is for the workload and applies to all of the plurality of data elements to be backed up; and
generating a backup of each of the data elements according to the backup policy of the workload, wherein the backup policy includes settings regarding one or more of:
lossy data reduction;
anonymization requirements;
detecting a leakage source; and
compliance requirements.
2. The storage device (102) of claim 1, wherein the controller (106) is further configured to generate a backup of each of the data elements according to the backup policy of the workload by one or more of:
reducing and utilizing a compression algorithm according to the lossy data;
adjusting the content of the data element according to the anonymization requirement;
Adding a marker indicating a source to the data element in accordance with the leakage source detection; and
and adjusting the content of the data element according to the compliance requirement.
3. The storage device (102) of claim 2, wherein the controller (106) is further configured to adjust the content of the data element according to the anonymization requirement by blurring faces, changing sounds, and/or altering personal information.
4. A storage device (102) according to claim 2 or 3, wherein the mark added to the data element is detected as a watermark from the leakage source.
5. The storage device (102) of claim 2, 3 or 4, wherein the controller (106) is further configured to adjust the content of the data element according to the compliance requirement by blurring the child's face and/or deleting addresses, names and/or identification numbers.
6. The storage device (102) of any preceding claim, wherein the controller (106) is further configured to generate a backup of each of the data elements according to the backup policy of the workload by generating a copy of the data elements and then applying a filter to the copy of the data elements according to the backup policy.
7. The storage device (102) of any preceding claim, wherein the controller (106) is further configured to receive the backup policy from a user.
8. The storage device (102) of claim 7, wherein the controller (106) is further configured to receive the backup policy from the user by receiving one or more of:
a user indication of lossy data reduction;
anonymizing the required user indication;
user indication of leak source detection; and
user indication of compliance requirements.
9. The storage device (102) of claim 7, wherein the controller (106) is further configured to receive the backup policy from the user by receiving a selection of the backup policy.
10. The storage device (102) of claim 7, wherein the controller (106) is further configured to receive the backup policy from the user by receiving an indication of a regulatory requirement, wherein the backup policy matches the regulatory requirement.
11. A method (500) for a storage device (102) comprising a memory (104), the memory (104) configured to store a plurality of data elements included in a workload, the method (500) comprising:
Receiving an indication of a workload to be backed up, the workload comprising a plurality of data elements;
receiving an indication of a backup policy, wherein the backup policy is for the workload and applies to all of the plurality of data elements to be backed up; and
generating a backup of each of the data elements according to the backup policy of the workload, wherein the backup policy includes settings regarding one or more of:
lossy data reduction;
anonymization requirements;
detecting a leakage source; and
compliance requirements.
12. A computer readable medium carrying computer instructions that, when loaded into a controller (106) of a storage device (102) and executed by the controller (106) of the storage device (102), enable the storage device to perform the method (500) of claim 11.
13. A storage device (102) comprising a memory (104), the memory (104) being configured to store a plurality of data elements included in a workload, and the storage device (102) further comprising:
a workload receiving software module (110 a) for receiving an indication of a workload to be backed up, the workload comprising a plurality of data elements;
A backup policy receiving software module (110 b) for receiving an indication of a backup policy, characterized in that the backup policy is for the workload and applies to all data elements of the plurality of data elements to be backed up; and
a backup generation software module (110 c) for generating a backup of each of the data elements according to the backup policy of the workload, wherein the backup policy comprises settings for one or more of:
lossy data reduction;
anonymization requirements;
detecting a leakage source; and
compliance requirements.
CN202080105357.XA 2020-09-21 2020-09-21 Storage device and method for generating data backup by adopting backup strategy Pending CN116209985A (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2020/076236 WO2022058030A1 (en) 2020-09-21 2020-09-21 Storage arrangements and method employing backup policies for generating data backup

Publications (1)

Publication Number Publication Date
CN116209985A true CN116209985A (en) 2023-06-02

Family

ID=72644203

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202080105357.XA Pending CN116209985A (en) 2020-09-21 2020-09-21 Storage device and method for generating data backup by adopting backup strategy

Country Status (3)

Country Link
EP (1) EP4204965A1 (en)
CN (1) CN116209985A (en)
WO (1) WO2022058030A1 (en)

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8832044B1 (en) * 2009-03-04 2014-09-09 Symantec Corporation Techniques for managing data compression in a data protection system
US8626714B1 (en) * 2011-09-07 2014-01-07 Symantec Corporation Automated separation of corporate and private data for backup and archiving
US10949398B2 (en) * 2017-03-29 2021-03-16 Commvault Systems, Inc. Synchronization operations for network-accessible folders
US10705756B2 (en) * 2018-10-15 2020-07-07 EMC IP Holding Company LLC Agent aware selective backup of a virtual machine using virtual I/O filter snapshots

Also Published As

Publication number Publication date
EP4204965A1 (en) 2023-07-05
WO2022058030A1 (en) 2022-03-24

Similar Documents

Publication Publication Date Title
US11120013B2 (en) Real time visual validation of digital content using a distributed ledger
US10438000B1 (en) Using recognized backup images for recovery after a ransomware attack
US9165002B1 (en) Inexpensive deletion in a data storage system
US10592677B2 (en) Systems and methods for patching vulnerabilities
US8365243B1 (en) Image leak prevention using geotagging
US20170270293A1 (en) Systems and methods for generating tripwire files
EP2924566A2 (en) Constellation based device binding
US10951790B1 (en) Systems and methods for authenticating an image
US10133639B2 (en) Privacy protection of media files for automatic cloud backup systems
CN115396421A (en) Data transmission and filtering method and device, electronic equipment and storage medium
JPWO2006103752A1 (en) How to control document copying
CN112837202B (en) Watermark image generation and attack tracing method and device based on privacy protection
US20170228292A1 (en) Privacy Protection of Media Files For Automatic Cloud Backup Systems
CN108363727B (en) Data storage method and device based on ZFS file system
JP2012182737A (en) Secret data leakage preventing system, determining apparatus, secret data leakage preventing method and program
CN116209985A (en) Storage device and method for generating data backup by adopting backup strategy
KR101557031B1 (en) Method and system for performing image contents registration service
JP2019169143A (en) System, method, device, and program that track copy of printed material owned by rights holder
US10438011B2 (en) Information processing apparatus and non-transitory computer readable medium
US20200250340A1 (en) Security rules compliance for personally identifiable information
JP2017134825A (en) Method for selecting content comprising audiovisual data and corresponding electronic device, system, computer readable program and computer readable storage medium
CN114417397A (en) Behavior portrait construction method and device, storage medium and computer equipment
CN108038028B (en) File backup method and device and file restoration method and device
US10586055B2 (en) Electronically backing up files using steganography
CN112199731A (en) Data processing method, device and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination