WO2022058030A1 - Storage arrangements and method employing backup policies for generating data backup - Google Patents

Storage arrangements and method employing backup policies for generating data backup Download PDF

Info

Publication number
WO2022058030A1
WO2022058030A1 PCT/EP2020/076236 EP2020076236W WO2022058030A1 WO 2022058030 A1 WO2022058030 A1 WO 2022058030A1 EP 2020076236 W EP2020076236 W EP 2020076236W WO 2022058030 A1 WO2022058030 A1 WO 2022058030A1
Authority
WO
WIPO (PCT)
Prior art keywords
backup
workload
data
controller
backup policy
Prior art date
Application number
PCT/EP2020/076236
Other languages
French (fr)
Inventor
Assaf Natanzon
Original Assignee
Huawei Technologies Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co., Ltd. filed Critical Huawei Technologies Co., Ltd.
Priority to CN202080105357.XA priority Critical patent/CN116209985A/en
Priority to PCT/EP2020/076236 priority patent/WO2022058030A1/en
Priority to EP20780122.6A priority patent/EP4204965A1/en
Publication of WO2022058030A1 publication Critical patent/WO2022058030A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • G06F11/1451Management of the data involved in backup or backup restore by selection of backup contents
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1448Management of the data involved in backup or backup restore
    • G06F11/1453Management of the data involved in backup or backup restore using de-duplication of the data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1446Point-in-time backing up or restoration of persistent data
    • G06F11/1458Management of the backup or restore process

Definitions

  • the present disclosure relates generally to the field of data storage; and more specifically, to storage arrangements and a method employing backup policies for generating data backup.
  • data backup is used to recover data in an event of data loss in a storage system.
  • a separate backup system or a secondary storage system is used to store a backup of the data present in a primary storage system.
  • a storage system namely, a storage arrangement
  • a storage arrangement is not only used to store the backup of the data present in the primary storage system, but also allow replication of the backup stored therein, to multiple backup storage systems such as cloud storage systems.
  • modifications are required on the backup prior to storing a replica of the backup to the multiple backup storage systems, in order to protect personal and sensitive data.
  • the replica of the backup that is used for test and development often requires modification of the data, such as data anonymization, so that sensitive data (for example, credit card data, social security numbers, and addresses) in the backup is not exposed.
  • sensitive data for example, credit card data, social security numbers, and addresses
  • modifications of the backup are often performed manually and requires anonymization software.
  • Conventional storage arrangements do not provide effective anonymization on the backup.
  • data compliance is also a consideration.
  • data in the replica of the backup is required to be modified according to legislative regulations of geographical locations for data protection and privacy.
  • GDPR General Data Protection Regulation
  • EU European Union
  • the conventional storage arrangements either do not handle compliance requirements, or they do so inefficiently.
  • the data reduction techniques may be lossless data reduction techniques or lossy data reduction techniques, according to requirements. Different data reduction techniques are associated with different pricing considerations. For example, existing vendors often allow free storage for the replica of the backup in case of lossy data reduction, but charge a fee for storage of the replica of the backup in case of lossless data reduction.
  • the conventional storage arrangements do not provide cost-effective data reduction, compliance management and data modification.
  • the present disclosure seeks to provide storage arrangements and methods employing backup policies for generating data backup.
  • the present disclosure seeks to provide a solution to one or more of the existing problems of inefficient and unreliable compliance adherence, data modification and data reduction in conventional storage arrangements.
  • An aim of the present disclosure is to provide a solution that overcomes at least partially the problems encountered in prior art, and provides storage arrangements and methods for efficient and reliable backup generation using backup policies with required settings.
  • the present disclosure provides a storage arrangement.
  • the storage arrangement comprises a memory and a controller.
  • the memory being configured to store a plurality of data elements comprised in a workload.
  • the controller is configured to: receive an indication of a workload to be backed-up, the workload comprising a plurality of data elements; receive an indication of a backup policy, wherein the backup policy is for the workload and applies to all of the plurality of data elements to be backed-up; and to generate a backup of each of the data elements according to the backup policy for the workload, wherein the backup policy comprises settings for one or more of: lossy data reduction, anonymization requirements, leak source detection, and compliance requirements.
  • the storage arrangement allows implementation of automatic backup policy management for creating an optimized backup of the plurality of data elements.
  • the storage arrangement supports storage of the backup in multiple backup storage arrangements at multiple geographical locations.
  • the storage arrangement allows efficient backup of the plurality of data elements by automatically applying (i.e. without applying manual data reduction, without manually using compliance tools, and the like) requisite settings for backup policies.
  • the storage arrangement automatically generates the optimized backup which is one or more of: space efficient, price-efficient, adapted to protect sensitive content, and compliant with multiple regulations of the multiple geographical locations.
  • the storage arrangement also enables implementation of the backup policy for leak source detection to prevent data leak or to track data leak in the backup.
  • the controller is further configured to generate the backup of each of the data elements according to the backup policy for the workload by one or more of: utilizing a compression algorithm according to the lossy data reduction, adapting the content of the data element according to the anonymization requirements, adding a mark indicting a source to the data element according to the leak source detection, and adapting the content of the data element according to the compliance requirements.
  • the compression algorithms for the lossy data reduction reduce a size of the plurality of data elements when generating the backup and hence, allow for creating a space-efficient backup.
  • adapting the content of the plurality of data elements according to the anonymization requirements enables protection of private or sensitive information stored in the plurality of data elements.
  • adding the mark enables accurate and reliable tracking of source of a data leak in a case such leak occurs by tracking location of the source of the data element using the mark.
  • content of the plurality of data elements is modified according to compliance requirements to ensure that the backup is made in accordance with laws, policies and regulations regarding data privacy and security.
  • the controller is further configured to adapt the content of the data element according to the anonymization requirements by obscuring a face, altering a voice, and/or change personal information.
  • Characteristics such as face, voice and personal information are distinctly indicative of identity of an entity.
  • identity In order to prevent identification of the entity, such a manner of adapting the content of the data element is employed to anonymize these distinctive characteristics.
  • anonymization requirements facilitate prevention of identity theft, protection of privacy, and the like.
  • the mark added to the data element according to the leak source detection is a water mark.
  • Water marks are typically hard to damage or detect. Therefore, the watermark is added to enable tracking source of data leak in a case of data leak by tracking location of the data element.
  • the controller is further configured to adapt the content of the data element according to the compliance requirements by obscuring a face of a child and/or delete an address, a name, and/or an identification number.
  • Such a manner of adapting the content of the data element prevents identification of an entity.
  • certain geographical locations have certain compliance requirements with respect to protection of identities of entities, such a manner of adapting the content of the data element by obscuring and/or deleting uniquely identifiable characteristics of the entities enables the storage arrangement to automatically adhere to such compliance requirements.
  • the controller is further configured to generate the backup of each of the data elements according to the backup policy for the workload by generating a copy of the data element and then apply a filter according to the backup policy to the copy of the data element.
  • a required filter may be effectively applied to the copy of the plurality of data elements, in order to ensure correct implementation of a required setting on the data element according to the backup policy.
  • the controller is further configured to receive the backup policy from a user.
  • a backup policy that is best suited according to the user's requirements is employed by the controller for generating the backup.
  • the backup thus generated is optimized according to the user's requirements.
  • the controller is further configured to receive the backup policy from the user by receiving one or more of: a user indication of lossy data reduction, a user indication of anonymization requirements, a user indication of leak source detection, and a user indication of compliance requirements.
  • the controller By receiving one or more of the aforesaid user indications, the controller is provided with the user's requirements with respect to the backup policy. In this way, the controller is enabled to generate an optimized backup of each of the data elements, according to the user's requirements.
  • the controller is further configured to receive the backup policy from the user by receiving a selection of a backup policy.
  • the backup policy received by the controller is selected by the user, and is therefore better suited to the user's data backup requirements as compared to an automatically selected backup policy.
  • the backup of data elements generated by employing the user-selected backup policy is according to the user's requirements.
  • the controller is further configured to receive the backup policy from the user by receiving an indication of legislative requirements, wherein the backup policy matches the legislative requirements.
  • the controller By receiving the user indication of legislative requirements, the controller is enabled to effectively select and apply a suitable backup policy that adapts the content of the data elements while generating the backup according to the legislative requirements. This ensures that the backup meets compliance requirements.
  • the present disclosure provides a method for a storage arrangement comprising a memory being configured to store a plurality of data elements comprised in a workload.
  • the method comprises: receiving an indication of a workload to be backed-up, the workload comprising a plurality of data elements; receiving an indication of a backup policy, wherein the backup policy is for the workload and applies to all of the plurality of data elements to be backed-up; and generating a backup of each of the data elements according to the backup policy for the workload, wherein the backup policy comprises settings for one or more of: lossy data reduction, anonymization requirements, leak source detection, and compliance requirements.
  • a computer-readable medium carrying computer instructions that when loaded into and executed by a controller of a storage arrangement enables the storage arrangement to implement the method.
  • the computer-readable medium carrying computer instructions achieves all the advantages and effects of the storage arrangement, or the method.
  • the present disclosure provides a storage arrangement comprising a memory being configured to store a plurality of data elements comprised in a workload.
  • the storage arrangement further comprises: a workload receiving software module for receiving an indication of a workload to be backed-up, the workload comprising a plurality of data elements; a backup policy receiving software module for receiving an indication of a backup policy, wherein the backup policy is for the workload and applies to all of the plurality of data elements to be backed-up; and a backup generating software module for generating a backup of each of the data elements according to the backup policy for the workload, wherein the backup policy comprises settings for one or more of: lossy data reduction, anonymization requirements, leak source detection, and compliance requirements.
  • the software modules are executed to enable the storage arrangement to receive and attach an appropriate backup policy to the workload, in order to generate the backup of the plurality of data elements.
  • Dedicated software modules are employed to perform dedicated processing tasks concerning the generation of the backup. Use of these software modules beneficially automates backup generation in a centralized manner at the storage arrangement, so that manual involvement in applying these processes is minimized.
  • FIG. 1 is a block diagram that illustrates various exemplary components of a storage arrangement, in accordance with an embodiment of the present disclosure
  • FIG. 2 is an exemplary illustration of backup of data elements to a backup storage system, in accordance with an embodiment of the present disclosure
  • FIG. 3 is an exemplary illustration of a data element including a mark added for leak source detection, in accordance with an embodiment of the present disclosure
  • FIG. 4 is an exemplary illustration of a data element whose content has been adapted according to a compliance requirement, in accordance with an embodiment of the present disclosure.
  • FIG. 5 is a flowchart of a method for a storage arrangement, in accordance with an embodiment of the present disclosure.
  • an underlined number is employed to represent an item over which the underlined number is positioned or an item to which the underlined number is adjacent.
  • a non-underlined number relates to an item identified by a line linking the nonunderlined number to the item.
  • the non-underlined number is used to identify a general item at which the arrow is pointing.
  • FIG. 1 is a block diagram that illustrates various exemplary components of a storage arrangement, in accordance with an embodiment of the present disclosure.
  • a storage arrangement 102 comprises a memory 104, and a controller 106.
  • the storage arrangement 102 is also shown to comprise a communication interface 108 and one or more software modules, such as software modules 110.
  • the storage arrangement 102 includes suitable logic, circuitry, devices, interfaces and/or code that is configured to allow automatic backup policy management for a plurality of data elements to generate a backup of each of the data elements.
  • the backup of each of the data elements may be stored in one or more backup storage systems (for example, such as cloud storage systems, local backup storage systems, remote backup storage systems, and the like).
  • the storage arrangement 102 enables smart data backup management and enables, for example, creation of a lossy backup of the plurality of data elements which is space efficient and compliant with multiple regulations, without requiring manual application of specialized data processing tools.
  • the storage arrangement 102 is a secondary storage arrangement.
  • Examples of the storage arrangement 102 may include, but are not limited to, a server, a production environment system, a computing device, a computing device in a computer cluster (e.g. massively parallel computer clusters), a portable or non-portable electronic device, a drone, or a supercomputer.
  • the memory 104 is configured to store the plurality of data elements comprised in a workload.
  • the memory 104 includes suitable logic, circuitry, and/or interfaces that is configured to store the plurality of data elements comprised in the workload.
  • the memory 104 of the storage arrangement 102 is a secondary storage memory that stores the plurality of data elements received from a primary storage system (e.g. a host server).
  • the memory 104 may further store instructions executable to control the storage arrangement 102.
  • the memory 104 may additionally store an operating system and/or other program products to operate the storage arrangement 102. Examples of implementation of the memory 104 may include, but are not limited to, Hard Disk Drive (HDD), Flash drive, a Secure Digital (SD) card, Solid-State Drive (SSD), Network Attached Storage (NAS) or another computer storage medium.
  • HDD Hard Disk Drive
  • SD Secure Digital
  • SSD Solid-State Drive
  • NAS Network Attached Storage
  • the controller 106 includes suitable logic, circuitry, and/or interfaces that is configured to implement processing steps pertaining to the automatic backup policy management to generate the backup of the plurality of data elements.
  • the controller 106 is a computational element that is configured to execute process instructions that drive the storage arrangement 102. Examples of the controller 106 include, but are not limited to a microprocessor, a microcontroller, a complex instruction set computing (CISC) processor, a reduced instruction set (RISC) processor or a very long instruction word (VLIW) processor, an application-specific integrated circuit (ASIC) processor, a central processing unit (CPU), a data processing unit, and other processors or control circuitry.
  • CISC complex instruction set computing
  • RISC reduced instruction set
  • VLIW very long instruction word
  • ASIC application-specific integrated circuit
  • CPU central processing unit
  • data processing unit and other processors or control circuitry.
  • controller 106 can be employed to generate backups an entire file system, a group of file systems, a single database, a set of databases, and similar data structures.
  • the storage arrangement 102 further comprises the communication interface 108.
  • the communication interface 108 is an arrangement of interconnected programmable and/or non-programmable components that are configured to facilitate data communication between one or more electronic devices.
  • the communication interface 108 supports various wired or wireless communication protocols for one or more of: a peer-to-peer network, a hybrid peer-to-peer network, local area networks (LANs), radio access networks (RANs), metropolitan area networks (MANS), wide area networks (WANs), all or a portion of a public network such as the global computer network known as the Internet, a Wireless Fidelity (Wi-Fi) network, a wireless personal area network (WPAN), a private network, a cellular network and any other communication network or networks at one or more locations.
  • LANs local area networks
  • RANs radio access networks
  • MANS metropolitan area networks
  • WANs wide area networks
  • Wi-Fi Wireless Fidelity
  • Wi-Fi wireless personal area network
  • WLAN wireless personal area network
  • private network
  • the communication interface 108 supports wired or wireless communication that can be carried out via any number of known protocols, including, but not limited to, Transmission Control Protocol and Internet Protocol (TCP/IP), User Datagram Protocol (UDP), Hypertext Transfer Protocol (HTTP), File Transfer Protocol (FTP), IEEE 802.16, Light Fidelity(Li-Fi), Wireless Access Protocol (WAP), Frame Relay, or Asynchronous Transfer Mode (ATM) and/or other cellular communication protocols.
  • TCP/IP Transmission Control Protocol and Internet Protocol
  • UDP User Datagram Protocol
  • HTTP Hypertext Transfer Protocol
  • FTP File Transfer Protocol
  • WAP Wireless Access Protocol
  • Frame Relay or Asynchronous Transfer Mode (ATM) and/or other cellular communication protocols.
  • ATM Asynchronous Transfer Mode
  • the storage arrangement 102 further comprises the software modules 110.
  • the software modules 110 include one or more workload receiver software modules (such as a workload receiving software module 110a), one or more backup policy receiver software modules (such as a backup policy receiving software module 110b) and one or more backup generating software modules (such as a backup generating software module 110c)
  • the software modules 110 (which includes the software modules 110a to 110c) are potentially implemented as separate circuits in the storage arrangement 102.
  • the software modules 110 are implemented as a circuitry to execute various operations of software modules 110a to 110c.
  • the controller 106 is configured to receive an indication of the workload to be backed-up, the workload comprising the plurality of data elements. Upon receiving the indication, the controller 106 initiates a backup operation of the workload for creating the backup of the plurality of data elements on at least one backup storage system. The controller 106 receives the indication of the workload via the communication interface 108. The indication is sent by a user, via a user device.
  • the examples of the user device may include, but are not limited to a personal computer (such as a desktop computer, a laptop computer, and the like), a personal digital assistant (PDA), or a smart phone.
  • a data element is a unit of data in the workload.
  • Examples of a given data element may include, but are not limited to a document (such as a text document, a spreadsheet, a form, a certificate, electronic mail, a presentation, and the like), an image, a video, an audio, or other forms of data elements.
  • a document such as a text document, a spreadsheet, a form, a certificate, electronic mail, a presentation, and the like
  • an image such as a video, an audio, or other forms of data elements.
  • the controller 106 is configured to receive an indication of a backup policy, wherein the backup policy is for the workload and applies to all of the plurality of data elements to be backed-up.
  • the indicated backup policy enables the controller 106 to automatically manage and create the backup of the plurality of data elements of the workload.
  • the indicated backup policy is to be applied to all the plurality of data elements of the workload to generate the backup. For example, in a case where the workload includes four data elements, the indicated backup policy is to be applied to all four data elements.
  • the indication of the backup policy is provided by the user using the user device. Such implementations are described in detail below. In other implementations, the indication of the backup policy is provided by a backup policy selecting software module (not shown).
  • the backup policy selecting software module automatically indicates the backup policy based on at least one of: characteristics of the plurality of data elements in the workload, information pertaining to the at least one backup storage system on which the backup is to be stored. As an example, given a geographical location of a given backup storage system, a given backup policy compliant with regulations of said geographical location will be automatically selected.
  • a plurality of backup policies may be pre-stored at the memory 104 of the storage arrangement 102.
  • the controller 106 is further configured to receive the backup policy from the user.
  • the backup policy is sent, via the communication interface 108, to the controller 106 by the user.
  • the user typically sends the backup policy that suits his/her requirements, to the controller 106.
  • the controller 106 is facilitated in generating the optimized backup that is well- suited to the user's requirements.
  • said backup policy is attached to (namely, associated with) the workload (specifically, to the plurality of data elements) by the controller 106.
  • the user may send the backup policy to the controller 106 using the user device.
  • the backup policy is stored at a memory module of the user device.
  • the backup policy Upon receiving the backup policy at the storage arrangement 102, the backup policy is stored at the memory 104.
  • the controller 106 is further configured to receive the backup policy from a user by receiving one or more of: a user indication of lossy data reduction; a user indication of anonymization requirements; a user indication of leak source detection; and a user indication of compliance requirements.
  • One or more of the aforesaid user indications pertain to one or more settings in the backup policy.
  • a given user indication may indicate whether or not to apply a given setting during backup, and optionally, how to apply the given setting.
  • the controller 106 then receives a given backup policy that matches the given user indication.
  • the user provides, to the controller 106, the backup policy to be employed for the generating the backup of the workload.
  • the user may provide a given user indication to the controller 106 using the user device.
  • the controller 106 receives the given user indication via the communication interface 108.
  • a given user indication is in form of: a text indication, a visual indication (for example, an image, a video, and the like), an audio indication (for example, a voice input), a touch indication.
  • the controller 106 may receive the backup policy for the workload by the user indication of lossy data reduction.
  • a user indication enables the controller 106 to create the backup of the plurality of data elements by employing techniques for lossy data reduction on the plurality of data elements.
  • the backup generated in such a case has a size that is smaller than an original size of the plurality of data elements and hence, can be stored on the at least one backup storage system in a space efficient format.
  • the backup policy corresponding to the user indication of lossy data reduction enables the controller 106 to generate the backup of the plurality of data elements with lossy data reduction without requiring the user to apply any data reduction tool manually.
  • an amount of the lossy data reduction is configurable by the user in the user indication of the lossy data reduction.
  • the user may configure the amount of the lossy data reduction based on one or more of: the user's preferences with respect to the backup, a manner in which the backup is to be used, a storage capacity of the at least one backup storage system on which the backup is to be stored.
  • the controller 106 may receive the backup policy for the workload by the user indication of anonymization requirements.
  • a user indication enables the controller 106 to protect private or sensitive information stored in the plurality of data elements when generating the backup.
  • the plurality of data elements may contain information that allows for uniquely identifying entities (for example, individuals, groups of individuals, organizations, and the like) using such information.
  • the anonymization requirements may define conditions that are required to be met while generating the backup using the backup policy, in order prevent identification of entities using such information. Different users may provide user indications of different anonymization requirements, as per their preference or need.
  • the backup policy received by the user indication of the anonymization requirements may require the controller 106 to change (or delete) personal information such as credit card numbers and social security numbers detected in the plurality of data elements to conceal identities of entities associated with such personal information.
  • the backup policy received by the user indication of anonymization requirements may require the controller 106 to change a voice in an audio file to avoid identification of the speaker in the backup.
  • the controller 106 may receive the backup policy for the workload by the user indication of the leak source detection. Such a user indication enables the controller 106 to track a source of data leak in a case of occurrence of the data leak.
  • such a user indication may also direct the controller 106 as to how to include provision for the leak source detection when generating the backup.
  • data leak' refers to an unauthorized transmission of a backup of a given data element from any storage arrangement (for example, the at least one backup storage system) to an unauthorized recipient.
  • the controller 106 may receive the backup policy for the workload by the user indication of compliance requirements.
  • a user indication enables the controller 106 to change (namely, adapt) content of the plurality of data elements according to the compliance requirements when generating the backup.
  • the user indication may also specify the compliance requirements that serve as the basis for making the change.
  • the compliance requirements are applied by the controller 106 when generating the backup to ensure that the backup adheres to (namely, complies with) laws, policies and regulations of a geographical location of the at least one backup storage system or to generally followed regulations. This allows the controller 106 to secure storage of sensitive information in the backup according to regulations of data protection.
  • the controller 106 is further configured to receive the backup policy from the user by receiving a selection of a backup policy.
  • the backup policy is selected from amongst a plurality of backup policies. Different backup policies may comprise different settings for backup generation. The user selects the backup policy that is best-suited to the user's backup requirements, so that the backup generated is optimized accordingly.
  • the plurality of backup policies may be stored at the memory 104, at the memory module of the user device, or similar.
  • the controller 106 is further configured to receive the backup policy from the user by receiving an indication of legislative requirements, wherein the backup policy matches the legislative requirements.
  • the user indication of legislative requirements provides the legislative requirements (including both local legislative requirements and general legislative requirements) that are required to be met when generating the backup.
  • the backup policy matching the legislative requirements is received by the controller 106.
  • This backup policy allows storage and modification of the plurality of data elements according to legislative requirements applicable to the at least one backup storage system. Therefore, the controller 106 is allowed to manage storage of sensitive (or personal) data of the workload in the at least one backup storage system according to legal and governmental regulations of the geographical location of the at least one backup storage system.
  • the controller 106 may automatically receive a backup policy including settings for compliance with General Data Protection Regulation (GDPR) for data protection and privacy when the at least one backup storage system is implemented in countries of the European Union.
  • GDPR General Data Protection Regulation
  • the controller 106 is further configured to generate a backup of each of the data elements according to the backup policy for the workload, wherein the backup policy comprises settings for one or more of: lossy data reduction; anonymization requirements; leak source detection; and compliance requirements.
  • the settings of the backup policy comprise specifications and/or rules that allow the controller 106 to modify the plurality of data elements to obtain an optimized backup of each of the plurality of data elements of the workload.
  • the settings for lossy data reduction comprise specifications and/or rules that allow the controller 106 to implement lossy compression on the plurality of data elements of the workload of the storage system arrangement 102.
  • the lossy compression removes unnecessary, less important and/or redundant data from the plurality of data elements when generating the backup. This results in the backup copy to have a reduced size that requires lesser disk space than a size of the plurality of data elements in the storage arrangement 102.
  • an amount of loss in the plurality of data elements may be configurable by the controller 106 according to the settings for lossy data reduction.
  • the amount of loss may depend on a type of the plurality of data elements.
  • the controller 106 may implement a lossy compression algorithm on a data element with 50% compression ratio that results in a compressed backup of the data element having half the size of the original data element.
  • the settings for lossy data reduction may further comprise algorithms that implement lossy data reduction on the plurality of data elements.
  • the algorithms for lossy data reduction may include, but are not limited to Discrete Cosine Transform (DCT), fractal compression, Chroma subsampling and Colour reduction.
  • the controller 106 may implement different compression algorithms according to a type of the data element.
  • the controller 106 may implement different compression algorithms for an image data element and an audio data element. Such algorithms are employed at a time of generating the backup, prior to sending the backup to the at least one storage system.
  • the settings for lossy data reduction enables the controller 106 to store the backup of plurality of data elements in the at least one backup storage system in a space efficient format.
  • the settings of anonymization requirements comprise specifications and/or rules that allow the controller 106 to modify content of the plurality of data elements for anonymizing identities of entities to which such content relates. These settings enable the controller 106 to protect private or sensitive information stored in the plurality of data elements.
  • a rule in the settings of anonymization requirements may be anonymizing private information such as social security number and financial details, and a specification in the settings may be changing all digits of social security number and financial details to T.
  • the settings of anonymization requirements may further comprise algorithms that implement anonymization on the plurality of data elements.
  • the algorithms for anonymization requirements may include, but are not limited to machine learning algorithms, incognito algorithm, Samarati algorithm, and Datafly algorithm.
  • the settings of leak source detection comprise specifications and/or rules that allow the controller 106 to avoid data leaks or track the source of data leak in a case of occurrent of the data leak.
  • the settings of leak source detection comprise a rule to add a mark to a given data element while generating its backup.
  • the settings may comprise specific characteristics (such as a type, a size, a colour, a position, and the like) of the mark.
  • the settings of compliance requirements comprise specifications and/or rules that allow the controller 106 to change the content of the plurality of data elements according to laws, policies, and regulations applicable to the backup of the plurality of data elements. Since laws policies, and regulations typically relate to data security and privacy, such settings of compliance requirement provide data protection to the plurality of data elements.
  • the controller 106 is further configured to generate the backup of each of the data elements according to the backup policy for the workload by one or more of: utilizing a compression algorithm according to the lossy data reduction; adapting the content of the data element according to the anonymization requirements; adding a mark indicting a source to the data element according to the leak source detection; and adapting the content of the data element according to the compliance requirements.
  • One or more of the aforesaid processes are performed in accordance with the settings in the backup policy, to enable the controller 106 in generating the backup.
  • the controller 106 Upon generation of the backup, the controller 106 sends the backup for storage to the at least one backup storage system.
  • the controller 106 digitally performs in a centralized manner, one or more of the aforesaid processes, to efficiently and accurately perform generation of the backup in a systematic manner. In such a case, separate tools for performing individual processes are not required to be manually employed for generating the backup.
  • the controller 106 employs the compression algorithm according to settings for the lossy data reduction in the backup policy.
  • different compression algorithms may be employed for implementing different extents of compression, as specified in the settings for the lossy data reduction (in the backup policy).
  • the controller 106 adapts the content of the data element according to the settings of the anonymization requirements.
  • different anonymization algorithms may be employed for implementing anonymization, as specified in the settings for anonymization requirements (in the backup policy). For example, in a case when the backup copy is to be used for test and development purposes, the controller 106 modifies private information stored in the plurality of data elements, when generating the backup.
  • the controller 106 is further configured to adapt the content of the data element according to the anonymization requirements by obscuring a face, altering a voice, and/or change personal information.
  • the controller 106 modifies the data element according to the anonymization requirements in the backup policy.
  • the face represented in an image (data element) is obscured (for example, blurred, recoloured, and the like) by the controller 106 to prevent identification of an individual.
  • the controller 106 may identify and obscure the face represented in the data element using algorithms, such as machine learning, recolouring, box blurring, and the like.
  • the voice is altered (by changing an amplitude, changing a pitch and tone of the voice, and the like) by the controller 106 to prevent identification of an individual.
  • the controller 106 may alter (namely, modify) the voice in the data element using algorithms, such as pitch synchronous overlap and add (PSOLA) algorithm, voice morphing algorithms, and the like.
  • PSOLA pitch synchronous overlap and add
  • voice morphing algorithms and the like.
  • the personal information which is sensitive in nature, like credit card numbers, date of birth, financial details, addresses, social security numbers, and the like, in the plurality of data elements is changed by the controller 106 according to the anonymization requirements.
  • the controller 106 may change sensitive data stored in the data element using the algorithms for anonymization requirements.
  • the controller 106 adds the mark to the data element according to the settings of the leak source detection.
  • the mark is a digital footprint that indicates a source of the data element or its backup and facilitates in tracking sources of data leaks.
  • the mark discourages the unauthorized use of the plurality of data elements or their backup without requisite permission (for example, of an authorized person).
  • the mark is perceptible, thereby discouraging data leaks.
  • the mark is imperceptible, thereby making the mark hard to destroy and hence, facilitating in tracking the source of the leak. Further, the mark allows detection of the leaker by decoding the mark on the data element.
  • an identification of the user may modify a mark present in backup. This enables tracking of the user by decoding of the modified mark.
  • the mark is added by the controller 106 if at least one backup storage system is not a trusted system.
  • the controller 106 may add mark on the data element using algorithms, such as self-embedding algorithm, fragile watermarks algorithm, block-based watermarking algorithm and feature-based watermark algorithm, watermark-embedding algorithm, neural network training algorithm, watermark extraction algorithm and the alike.
  • the mark added to the data element according to the leak source detection is a water mark.
  • Water mark is a digital footprint that is hard to detect or damage and allows detection of data leaks of the plurality of data elements.
  • the digital footprint may include, but is not to a barcode, a text code and alike.
  • the watermark indicates source of the plurality of data elements and/or the backup of the data elements, and enables leak source detection by tracking the indicated sources on the watermark.
  • the controller 106 to adapt (namely, modify) the content of the data element according to the settings of the compliance requirements.
  • the content of the plurality of data elements is adapted ensure compliance of the backup of said data elements with local and governmental regulations.
  • the controller 106 is further configured to adapt the content of the data element according to the compliance requirements by obscuring a face of a child and/or delete an address, a name, and/or an identification number.
  • the controller 106 modifies the data element according to the compliance requirements in the backup policy.
  • the controller 106 may detect faces in data element and apply a filter (for example, a blur filter) to the data element to obscure a face of the child.
  • a backup storage system to which a data element is to be backed up, storing private information such as an address, a name, an identification number and the like is prohibited.
  • the controller 106 may detect the private information and delete the private information (by using algorithm, such as machine learning).
  • the controller 106 is further configured to generate the backup of each of the data elements according to the backup policy for the workload by generating a copy of the data element and then apply a filter according to the backup policy to the copy of the data element.
  • the term "filter” refers to a function or a software that processes data to perform a certain operation (for example, such as data reduction, anonymization, leak source detection and/or compliance) on the data.
  • the copy of the data element refers to a replica of the data element.
  • more than one filter may be applied in a predefined order by the controller 106 to the copy of the plurality of data elements, according to the settings in the backup policy.
  • FIG. 2 is an exemplary illustration of backup of data elements to a backup storage system, in accordance with an embodiment of the present disclosure. With reference to FIG. 2, there is shown the storage arrangement 102 and a backup storage system 202 (depicted, for example, as a cloud storage system).
  • the storage arrangement 102 comprises a file system 204A that comprises a copy 206A of a first data element and a copy 208A of a second data element.
  • the controller 106 applies one or more filters on each of the copies 206A and 208A according to the backup policy.
  • the data reduction filter is applied to the copy 206A (which may, for example, be an image) and then the leak source detection filter is applied to add a mark to the copy 206A.
  • the anonymization filter is applied to the copy 208A (which may, for example, be a document).
  • backups of the first data element and the second data element are transferred to the backup storage system 202.
  • the backup storage system 202 stores a backup 204B of the file system 204A.
  • the backup 204B comprises a backup 206B of the first data element and a backup 208B of the second data element.
  • the backup 206B is optimized to have reduced data and leak source detection capabilities, whereas the backup 208B is optimized for anonymization of sensitive
  • FIG. 3 is an exemplary illustration of a data element including a mark added for leak source detection, in accordance with an embodiment of the present disclosure.
  • a data element 302 including a mark 304 added for leak source detection is for example, an image.
  • a portion of the data element 302 that includes the mark 304 is shown in an enlarged form.
  • the mark 304 is hard to detect and damage and thus, discourages unauthorized use and distribution of the data element 302 without requisite permission.
  • the mark 304 also facilitates in detecting a source of leak of the data element 302, if such a leak occurs.
  • FIG. 4 is an exemplary illustration of a data element whose content has been adapted according to a compliance requirement, in accordance with an embodiment of the present disclosure.
  • a data element 402 whose content has been adapted according to a compliance requirement.
  • the data element 402 is, for example, an image depicting a child 404 and an adult 406.
  • storing images and/or videos of children is forbidden.
  • FIG. 5 is a flowchart of a method 500 for a storage arrangement, in accordance with an embodiment of the present disclosure.
  • the method 500 is used in a storage arrangement (such as the storage arrangement 102) comprising a memory (such as the memory 104) being configured to store a plurality of data elements comprised in a workload.
  • the method 500 is executed by the controller 106 at the storage arrangement 102 described, for example, in FIG. 1.
  • the method 500 includes steps 502, 504, and 506.
  • the method 500 comprises receiving an indication of a workload to be backed-up, the workload comprising a plurality of data elements.
  • the controller 106 receives the indication of the workload via the communication interface 108.
  • the indication is sent by a user, via a user device.
  • the controller 106 initiates a backup operation of the workload for creating the backup of the plurality of data elements on at least one backup storage system.
  • the method 500 further comprises receiving an indication of a backup policy, wherein the backup policy is for the workload and applies to all of the plurality of data elements to be backed-up.
  • the indication of the backup policy is received from the user using the user device.
  • the indication of the backup policy is received from a backup policy selecting software module. The indicated backup policy enables the controller 106 to automatically manage and create the backup of the plurality of data elements of the workload.
  • the method 500 further comprises generating a backup of each of the data elements according to the backup policy for the workload, wherein the backup policy comprises settings for one or more of: lossy data reduction; anonymization requirements; leak source detection; and compliance requirements.
  • the settings of the backup policy comprise specifications and/or rules that allow the controller 106 to modify the plurality of data elements to generate an optimized backup of each of the plurality of data elements of the workload.
  • steps 502 to 506 are only illustrative and other alternatives can also be provided where one or more steps are added, one or more steps are removed, or one or more steps are provided in a different sequence without departing from the scope of the claims herein.
  • a computer-readable medium carrying computer instructions that when loaded into and executed by a controller 106 of a storage arrangement 102 enables the storage arrangement 102 to implement the method 500.
  • the computer-readable medium carrying computer instructions provides a non-transient memory and may include, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
  • a storage arrangement 102 comprises a memory 104 being configured to store a plurality of data elements comprised in a workload.
  • the storage arrangement 102 further comprises a workload receiving software module 110a for receiving an indication of a workload to be backed-up, the workload comprising a plurality of data elements.
  • the storage arrangement 102 further comprises a backup policy receiving software module 110b for receiving an indication of a backup policy, wherein the backup policy is for the workload and applies to all of the plurality of data elements to be backed-up.
  • the storage arrangement 102 further comprises a backup generating software module 110c for generating a backup of each of the data elements according to the backup policy for the workload, wherein the backup policy comprises settings for one or more of: lossy data reduction; anonymization requirements; leak source detection; and compliance requirements.
  • the software modules 110 are executed to enable the storage arrangement 102 to receive and attach an appropriate backup policy to the workload, in order to generate the backup of the plurality of data elements.
  • Dedicated software modules 110 are employed to perform dedicated processing tasks concerning the generation of the backup. Use of the software modules 110 beneficially automates backup generation in a centralized manner at the storage arrangement 102, so that manual involvement in applying these processes is minimized.
  • the software modules 110 are executed by the controller 106 of the storage arrangement 102.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A storage arrangement that allows implementation of automatic backup policy management for creating a backup of the plurality of data elements using a backup policy with requisite settings. The storage arrangement includes a memory and a controller. The memory is configured to store a plurality of data elements comprised in a workload. The controller is configured to receive an indication of a workload to be backed-up. The controller is further configured to receive an indication of a backup policy, where the backup policy is for the workload and applies to all of the plurality of data elements to be backed-up. The controller is further configured to generate a backup of each of the data elements according to the backup policy for the workload, where the backup policy includes settings for one or more of: lossy data reduction, anonymization requirements, leak source detection, and compliance requirements.

Description

STORAGE ARRANGEMENTS AND METHOD EMPLOYING BACKUP POLICIES FOR GENERATING DATA BACKUP
TECHNICAL FIELD
The present disclosure relates generally to the field of data storage; and more specifically, to storage arrangements and a method employing backup policies for generating data backup.
BACKGROUND
Generally, data backup is used to recover data in an event of data loss in a storage system. For example, often a separate backup system or a secondary storage system is used to store a backup of the data present in a primary storage system. Typically, a storage system (namely, a storage arrangement) is not only used to store the backup of the data present in the primary storage system, but also allow replication of the backup stored therein, to multiple backup storage systems such as cloud storage systems.
Typically, modifications are required on the backup prior to storing a replica of the backup to the multiple backup storage systems, in order to protect personal and sensitive data. For example, the replica of the backup that is used for test and development often requires modification of the data, such as data anonymization, so that sensitive data (for example, credit card data, social security numbers, and addresses) in the backup is not exposed. However, such modifications of the backup are often performed manually and requires anonymization software. Conventional storage arrangements do not provide effective anonymization on the backup.
Furthermore, in a scenario wherein the replica of the backup is exported to a geographical location other than an original geographical location of the conventional storage arrangement at which the backup is stored, data compliance is also a consideration. In particular, data in the replica of the backup is required to be modified according to legislative regulations of geographical locations for data protection and privacy. For example, General Data Protection Regulation (GDPR) compliance is required if the backup is stored in a European Union (EU) area to protect personal data stored in the backup. However, the conventional storage arrangements either do not handle compliance requirements, or they do so inefficiently.
Furthermore, when storing the replica of the backup in another backup storage system, data reduction techniques are generally applied. The data reduction techniques may be lossless data reduction techniques or lossy data reduction techniques, according to requirements. Different data reduction techniques are associated with different pricing considerations. For example, existing vendors often allow free storage for the replica of the backup in case of lossy data reduction, but charge a fee for storage of the replica of the backup in case of lossless data reduction. The conventional storage arrangements do not provide cost-effective data reduction, compliance management and data modification.
Therefore, in light of the foregoing discussion, there exists a need to overcome the aforementioned drawbacks associated with the conventional storage arrangements and methods for managing data backup in the conventional storage arrangements.
SUMMARY
The present disclosure seeks to provide storage arrangements and methods employing backup policies for generating data backup. The present disclosure seeks to provide a solution to one or more of the existing problems of inefficient and unreliable compliance adherence, data modification and data reduction in conventional storage arrangements. An aim of the present disclosure is to provide a solution that overcomes at least partially the problems encountered in prior art, and provides storage arrangements and methods for efficient and reliable backup generation using backup policies with required settings.
The object of the present disclosure is achieved by the solutions provided in the enclosed independent claims. Advantageous implementations of the present disclosure are further defined in the dependent claims.
In an aspect, the present disclosure provides a storage arrangement. The storage arrangement comprises a memory and a controller. The memory being configured to store a plurality of data elements comprised in a workload. The controller is configured to: receive an indication of a workload to be backed-up, the workload comprising a plurality of data elements; receive an indication of a backup policy, wherein the backup policy is for the workload and applies to all of the plurality of data elements to be backed-up; and to generate a backup of each of the data elements according to the backup policy for the workload, wherein the backup policy comprises settings for one or more of: lossy data reduction, anonymization requirements, leak source detection, and compliance requirements.
The storage arrangement allows implementation of automatic backup policy management for creating an optimized backup of the plurality of data elements. The storage arrangement supports storage of the backup in multiple backup storage arrangements at multiple geographical locations. The storage arrangement allows efficient backup of the plurality of data elements by automatically applying (i.e. without applying manual data reduction, without manually using compliance tools, and the like) requisite settings for backup policies. Hence, the storage arrangement automatically generates the optimized backup which is one or more of: space efficient, price-efficient, adapted to protect sensitive content, and compliant with multiple regulations of the multiple geographical locations. The storage arrangement also enables implementation of the backup policy for leak source detection to prevent data leak or to track data leak in the backup.
In an implementation form, the controller is further configured to generate the backup of each of the data elements according to the backup policy for the workload by one or more of: utilizing a compression algorithm according to the lossy data reduction, adapting the content of the data element according to the anonymization requirements, adding a mark indicting a source to the data element according to the leak source detection, and adapting the content of the data element according to the compliance requirements.
The compression algorithms for the lossy data reduction reduce a size of the plurality of data elements when generating the backup and hence, allow for creating a space-efficient backup. Moreover, adapting the content of the plurality of data elements according to the anonymization requirements enables protection of private or sensitive information stored in the plurality of data elements. Furthermore, adding the mark enables accurate and reliable tracking of source of a data leak in a case such leak occurs by tracking location of the source of the data element using the mark. Moreover, content of the plurality of data elements is modified according to compliance requirements to ensure that the backup is made in accordance with laws, policies and regulations regarding data privacy and security. In a further implementation form, the controller is further configured to adapt the content of the data element according to the anonymization requirements by obscuring a face, altering a voice, and/or change personal information.
Characteristics such as face, voice and personal information are distinctly indicative of identity of an entity. In order to prevent identification of the entity, such a manner of adapting the content of the data element is employed to anonymize these distinctive characteristics. These anonymization requirements facilitate prevention of identity theft, protection of privacy, and the like.
In a further implementation form, the mark added to the data element according to the leak source detection is a water mark.
Water marks are typically hard to damage or detect. Therefore, the watermark is added to enable tracking source of data leak in a case of data leak by tracking location of the data element.
In a further implementation form, the controller is further configured to adapt the content of the data element according to the compliance requirements by obscuring a face of a child and/or delete an address, a name, and/or an identification number.
Such a manner of adapting the content of the data element prevents identification of an entity. As certain geographical locations have certain compliance requirements with respect to protection of identities of entities, such a manner of adapting the content of the data element by obscuring and/or deleting uniquely identifiable characteristics of the entities enables the storage arrangement to automatically adhere to such compliance requirements.
In a further implementation form, the controller is further configured to generate the backup of each of the data elements according to the backup policy for the workload by generating a copy of the data element and then apply a filter according to the backup policy to the copy of the data element.
In this way, a required filter may be effectively applied to the copy of the plurality of data elements, in order to ensure correct implementation of a required setting on the data element according to the backup policy.
In a further implementation form, the controller is further configured to receive the backup policy from a user. When the backup policy is received from the user, a backup policy that is best suited according to the user's requirements is employed by the controller for generating the backup. The backup thus generated is optimized according to the user's requirements.
In a further implementation form, the controller is further configured to receive the backup policy from the user by receiving one or more of: a user indication of lossy data reduction, a user indication of anonymization requirements, a user indication of leak source detection, and a user indication of compliance requirements.
By receiving one or more of the aforesaid user indications, the controller is provided with the user's requirements with respect to the backup policy. In this way, the controller is enabled to generate an optimized backup of each of the data elements, according to the user's requirements.
In a further implementation form, the controller is further configured to receive the backup policy from the user by receiving a selection of a backup policy.
In this way, the backup policy received by the controller is selected by the user, and is therefore better suited to the user's data backup requirements as compared to an automatically selected backup policy. The backup of data elements generated by employing the user-selected backup policy, is according to the user's requirements.
In a further implementation form, the controller is further configured to receive the backup policy from the user by receiving an indication of legislative requirements, wherein the backup policy matches the legislative requirements.
By receiving the user indication of legislative requirements, the controller is enabled to effectively select and apply a suitable backup policy that adapts the content of the data elements while generating the backup according to the legislative requirements. This ensures that the backup meets compliance requirements.
In another aspect, the present disclosure provides a method for a storage arrangement comprising a memory being configured to store a plurality of data elements comprised in a workload. The method comprises: receiving an indication of a workload to be backed-up, the workload comprising a plurality of data elements; receiving an indication of a backup policy, wherein the backup policy is for the workload and applies to all of the plurality of data elements to be backed-up; and generating a backup of each of the data elements according to the backup policy for the workload, wherein the backup policy comprises settings for one or more of: lossy data reduction, anonymization requirements, leak source detection, and compliance requirements.
The method of this aspect achieves all the advantages and effects of the storage arrangement of the present disclosure.
In an implementation form, there is provided a computer-readable medium carrying computer instructions that when loaded into and executed by a controller of a storage arrangement enables the storage arrangement to implement the method.
The computer-readable medium carrying computer instructions achieves all the advantages and effects of the storage arrangement, or the method.
In another aspect, the present disclosure provides a storage arrangement comprising a memory being configured to store a plurality of data elements comprised in a workload. The storage arrangement further comprises: a workload receiving software module for receiving an indication of a workload to be backed-up, the workload comprising a plurality of data elements; a backup policy receiving software module for receiving an indication of a backup policy, wherein the backup policy is for the workload and applies to all of the plurality of data elements to be backed-up; and a backup generating software module for generating a backup of each of the data elements according to the backup policy for the workload, wherein the backup policy comprises settings for one or more of: lossy data reduction, anonymization requirements, leak source detection, and compliance requirements.
The software modules are executed to enable the storage arrangement to receive and attach an appropriate backup policy to the workload, in order to generate the backup of the plurality of data elements. Dedicated software modules are employed to perform dedicated processing tasks concerning the generation of the backup. Use of these software modules beneficially automates backup generation in a centralized manner at the storage arrangement, so that manual involvement in applying these processes is minimized.
It has to be noted that all devices, elements, circuitry, units and means described in the present application could be implemented in the software or hardware elements or any kind of combination thereof. All steps which are performed by the various entities described in the present application as well as the functionalities described to be performed by the various entities are intended to mean that the respective entity is adapted to or configured to perform the respective steps and functionalities. Even if, in the following description of specific embodiments, a specific functionality or step to be performed by external entities is not reflected in the description of a specific detailed element of that entity which performs that specific step or functionality, it should be clear for a skilled person that these methods and functionalities can be implemented in respective software or hardware elements, or any kind of combination thereof. It will be appreciated that features of the present disclosure are susceptible to being combined in various combinations without departing from the scope of the present disclosure as defined by the appended claims.
Additional aspects, advantages, features and objects of the present disclosure would be made apparent from the drawings and the detailed description of the illustrative implementations construed in conjunction with the appended claims that follow.
BRIEF DESCRIPTION OF THE DRAWINGS
The summary above, as well as the following detailed description of illustrative embodiments, is better understood when read in conjunction with the appended drawings. For the purpose of illustrating the present disclosure, exemplary constructions of the disclosure are shown in the drawings. However, the present disclosure is not limited to specific methods and instrumentalities disclosed herein. Moreover, those in the art will understand that the drawings are not to scale. Wherever possible, like elements have been indicated by identical numbers.
Embodiments of the present disclosure will now be described, by way of example only, with reference to the following diagrams wherein:
FIG. 1 is a block diagram that illustrates various exemplary components of a storage arrangement, in accordance with an embodiment of the present disclosure;
FIG. 2 is an exemplary illustration of backup of data elements to a backup storage system, in accordance with an embodiment of the present disclosure;
FIG. 3 is an exemplary illustration of a data element including a mark added for leak source detection, in accordance with an embodiment of the present disclosure; FIG. 4 is an exemplary illustration of a data element whose content has been adapted according to a compliance requirement, in accordance with an embodiment of the present disclosure; and
FIG. 5 is a flowchart of a method for a storage arrangement, in accordance with an embodiment of the present disclosure.
In the accompanying drawings, an underlined number is employed to represent an item over which the underlined number is positioned or an item to which the underlined number is adjacent. A non-underlined number relates to an item identified by a line linking the nonunderlined number to the item. When a number is non-underlined and accompanied by an associated arrow, the non-underlined number is used to identify a general item at which the arrow is pointing.
DETAILED DESCRIPTION OF EMBODIMENTS
The following detailed description illustrates embodiments of the present disclosure and ways in which they can be implemented. Although some modes of carrying out the present disclosure have been disclosed, those skilled in the art would recognize that other embodiments for carrying out or practicing the present disclosure are also possible.
FIG. 1 is a block diagram that illustrates various exemplary components of a storage arrangement, in accordance with an embodiment of the present disclosure. With reference to FIG. 1 , there is shown a storage arrangement 102. The storage arrangement 102 comprises a memory 104, and a controller 106. The storage arrangement 102 is also shown to comprise a communication interface 108 and one or more software modules, such as software modules 110.
The storage arrangement 102 includes suitable logic, circuitry, devices, interfaces and/or code that is configured to allow automatic backup policy management for a plurality of data elements to generate a backup of each of the data elements. The backup of each of the data elements may be stored in one or more backup storage systems (for example, such as cloud storage systems, local backup storage systems, remote backup storage systems, and the like). In other words, the storage arrangement 102 enables smart data backup management and enables, for example, creation of a lossy backup of the plurality of data elements which is space efficient and compliant with multiple regulations, without requiring manual application of specialized data processing tools.
In an embodiment, the storage arrangement 102 is a secondary storage arrangement. Examples of the storage arrangement 102 may include, but are not limited to, a server, a production environment system, a computing device, a computing device in a computer cluster (e.g. massively parallel computer clusters), a portable or non-portable electronic device, a drone, or a supercomputer.
The memory 104 is configured to store the plurality of data elements comprised in a workload. The memory 104 includes suitable logic, circuitry, and/or interfaces that is configured to store the plurality of data elements comprised in the workload. In an embodiment, the memory 104 of the storage arrangement 102 is a secondary storage memory that stores the plurality of data elements received from a primary storage system (e.g. a host server). The memory 104 may further store instructions executable to control the storage arrangement 102. The memory 104 may additionally store an operating system and/or other program products to operate the storage arrangement 102. Examples of implementation of the memory 104 may include, but are not limited to, Hard Disk Drive (HDD), Flash drive, a Secure Digital (SD) card, Solid-State Drive (SSD), Network Attached Storage (NAS) or another computer storage medium.
The controller 106 includes suitable logic, circuitry, and/or interfaces that is configured to implement processing steps pertaining to the automatic backup policy management to generate the backup of the plurality of data elements. The controller 106 is a computational element that is configured to execute process instructions that drive the storage arrangement 102. Examples of the controller 106 include, but are not limited to a microprocessor, a microcontroller, a complex instruction set computing (CISC) processor, a reduced instruction set (RISC) processor or a very long instruction word (VLIW) processor, an application-specific integrated circuit (ASIC) processor, a central processing unit (CPU), a data processing unit, and other processors or control circuitry.
It will be appreciated that the controller 106 can be employed to generate backups an entire file system, a group of file systems, a single database, a set of databases, and similar data structures.
In an embodiment, the storage arrangement 102 further comprises the communication interface 108. The communication interface 108 is an arrangement of interconnected programmable and/or non-programmable components that are configured to facilitate data communication between one or more electronic devices. The communication interface 108 supports various wired or wireless communication protocols for one or more of: a peer-to-peer network, a hybrid peer-to-peer network, local area networks (LANs), radio access networks (RANs), metropolitan area networks (MANS), wide area networks (WANs), all or a portion of a public network such as the global computer network known as the Internet, a Wireless Fidelity (Wi-Fi) network, a wireless personal area network (WPAN), a private network, a cellular network and any other communication network or networks at one or more locations. Additionally, the communication interface 108 supports wired or wireless communication that can be carried out via any number of known protocols, including, but not limited to, Transmission Control Protocol and Internet Protocol (TCP/IP), User Datagram Protocol (UDP), Hypertext Transfer Protocol (HTTP), File Transfer Protocol (FTP), IEEE 802.16, Light Fidelity(Li-Fi), Wireless Access Protocol (WAP), Frame Relay, or Asynchronous Transfer Mode (ATM) and/or other cellular communication protocols. Moreover, any other suitable protocols using voice, video, data, or combinations thereof, can also be employed and supported by the communication interface 108.
In an embodiment, the storage arrangement 102 further comprises the software modules 110. In an exemplary implementation, the software modules 110 include one or more workload receiver software modules (such as a workload receiving software module 110a), one or more backup policy receiver software modules (such as a backup policy receiving software module 110b) and one or more backup generating software modules (such as a backup generating software module 110c) In an implementation, the software modules 110 (which includes the software modules 110a to 110c) are potentially implemented as separate circuits in the storage arrangement 102. Alternatively, in another implementation, the software modules 110 are implemented as a circuitry to execute various operations of software modules 110a to 110c.
In operation, the controller 106 is configured to receive an indication of the workload to be backed-up, the workload comprising the plurality of data elements. Upon receiving the indication, the controller 106 initiates a backup operation of the workload for creating the backup of the plurality of data elements on at least one backup storage system. The controller 106 receives the indication of the workload via the communication interface 108. The indication is sent by a user, via a user device. The examples of the user device may include, but are not limited to a personal computer (such as a desktop computer, a laptop computer, and the like), a personal digital assistant (PDA), or a smart phone. A data element is a unit of data in the workload. Examples of a given data element may include, but are not limited to a document (such as a text document, a spreadsheet, a form, a certificate, electronic mail, a presentation, and the like), an image, a video, an audio, or other forms of data elements.
The controller 106 is configured to receive an indication of a backup policy, wherein the backup policy is for the workload and applies to all of the plurality of data elements to be backed-up. The indicated backup policy enables the controller 106 to automatically manage and create the backup of the plurality of data elements of the workload. The indicated backup policy is to be applied to all the plurality of data elements of the workload to generate the backup. For example, in a case where the workload includes four data elements, the indicated backup policy is to be applied to all four data elements. In some implementations, the indication of the backup policy is provided by the user using the user device. Such implementations are described in detail below. In other implementations, the indication of the backup policy is provided by a backup policy selecting software module (not shown). The backup policy selecting software module automatically indicates the backup policy based on at least one of: characteristics of the plurality of data elements in the workload, information pertaining to the at least one backup storage system on which the backup is to be stored. As an example, given a geographical location of a given backup storage system, a given backup policy compliant with regulations of said geographical location will be automatically selected. Optionally, in such implementations, a plurality of backup policies may be pre-stored at the memory 104 of the storage arrangement 102.
In an embodiment, the controller 106 is further configured to receive the backup policy from the user. The backup policy is sent, via the communication interface 108, to the controller 106 by the user. The user typically sends the backup policy that suits his/her requirements, to the controller 106. When the backup policy for the workload is indicated by the user, the controller 106 is facilitated in generating the optimized backup that is well- suited to the user's requirements. Upon receiving the user-provided backup policy, said backup policy is attached to (namely, associated with) the workload (specifically, to the plurality of data elements) by the controller 106. The user may send the backup policy to the controller 106 using the user device. In an embodiment, the backup policy is stored at a memory module of the user device. Upon receiving the backup policy at the storage arrangement 102, the backup policy is stored at the memory 104. In an embodiment, the controller 106 is further configured to receive the backup policy from a user by receiving one or more of: a user indication of lossy data reduction; a user indication of anonymization requirements; a user indication of leak source detection; and a user indication of compliance requirements. One or more of the aforesaid user indications pertain to one or more settings in the backup policy. A given user indication may indicate whether or not to apply a given setting during backup, and optionally, how to apply the given setting. The controller 106 then receives a given backup policy that matches the given user indication. By virtue of the one or more user indications, the user provides, to the controller 106, the backup policy to be employed for the generating the backup of the workload. The user may provide a given user indication to the controller 106 using the user device. The controller 106 receives the given user indication via the communication interface 108.
In an embodiment, a given user indication is in form of: a text indication, a visual indication (for example, an image, a video, and the like), an audio indication (for example, a voice input), a touch indication.
In an example, the controller 106 may receive the backup policy for the workload by the user indication of lossy data reduction. Such a user indication enables the controller 106 to create the backup of the plurality of data elements by employing techniques for lossy data reduction on the plurality of data elements. The backup generated in such a case has a size that is smaller than an original size of the plurality of data elements and hence, can be stored on the at least one backup storage system in a space efficient format. Beneficially, the backup policy corresponding to the user indication of lossy data reduction enables the controller 106 to generate the backup of the plurality of data elements with lossy data reduction without requiring the user to apply any data reduction tool manually. Furthermore, optionally, an amount of the lossy data reduction is configurable by the user in the user indication of the lossy data reduction. The user may configure the amount of the lossy data reduction based on one or more of: the user's preferences with respect to the backup, a manner in which the backup is to be used, a storage capacity of the at least one backup storage system on which the backup is to be stored.
In another example, the controller 106 may receive the backup policy for the workload by the user indication of anonymization requirements. Such a user indication enables the controller 106 to protect private or sensitive information stored in the plurality of data elements when generating the backup. The plurality of data elements may contain information that allows for uniquely identifying entities (for example, individuals, groups of individuals, organizations, and the like) using such information. The anonymization requirements may define conditions that are required to be met while generating the backup using the backup policy, in order prevent identification of entities using such information. Different users may provide user indications of different anonymization requirements, as per their preference or need. As an example, the backup policy received by the user indication of the anonymization requirements may require the controller 106 to change (or delete) personal information such as credit card numbers and social security numbers detected in the plurality of data elements to conceal identities of entities associated with such personal information. As another example, the backup policy received by the user indication of anonymization requirements may require the controller 106 to change a voice in an audio file to avoid identification of the speaker in the backup.
In yet another example, the controller 106 may receive the backup policy for the workload by the user indication of the leak source detection. Such a user indication enables the controller 106 to track a source of data leak in a case of occurrence of the data leak.
Moreover, such a user indication may also direct the controller 106 as to how to include provision for the leak source detection when generating the backup. Here, the term "data leak' refers to an unauthorized transmission of a backup of a given data element from any storage arrangement (for example, the at least one backup storage system) to an unauthorized recipient.
In still another example, the controller 106 may receive the backup policy for the workload by the user indication of compliance requirements. Such a user indication enables the controller 106 to change (namely, adapt) content of the plurality of data elements according to the compliance requirements when generating the backup. The user indication may also specify the compliance requirements that serve as the basis for making the change. The compliance requirements are applied by the controller 106 when generating the backup to ensure that the backup adheres to (namely, complies with) laws, policies and regulations of a geographical location of the at least one backup storage system or to generally followed regulations. This allows the controller 106 to secure storage of sensitive information in the backup according to regulations of data protection.
In accordance with an embodiment, the controller 106 is further configured to receive the backup policy from the user by receiving a selection of a backup policy. In this regard, the backup policy is selected from amongst a plurality of backup policies. Different backup policies may comprise different settings for backup generation. The user selects the backup policy that is best-suited to the user's backup requirements, so that the backup generated is optimized accordingly. The plurality of backup policies may be stored at the memory 104, at the memory module of the user device, or similar.
In accordance with an embodiment, the controller 106 is further configured to receive the backup policy from the user by receiving an indication of legislative requirements, wherein the backup policy matches the legislative requirements. The user indication of legislative requirements provides the legislative requirements (including both local legislative requirements and general legislative requirements) that are required to be met when generating the backup. Accordingly, the backup policy matching the legislative requirements is received by the controller 106. This backup policy allows storage and modification of the plurality of data elements according to legislative requirements applicable to the at least one backup storage system. Therefore, the controller 106 is allowed to manage storage of sensitive (or personal) data of the workload in the at least one backup storage system according to legal and governmental regulations of the geographical location of the at least one backup storage system. For example, the controller 106 may automatically receive a backup policy including settings for compliance with General Data Protection Regulation (GDPR) for data protection and privacy when the at least one backup storage system is implemented in countries of the European Union.
The controller 106 is further configured to generate a backup of each of the data elements according to the backup policy for the workload, wherein the backup policy comprises settings for one or more of: lossy data reduction; anonymization requirements; leak source detection; and compliance requirements. The settings of the backup policy comprise specifications and/or rules that allow the controller 106 to modify the plurality of data elements to obtain an optimized backup of each of the plurality of data elements of the workload.
In an embodiment, the settings for lossy data reduction comprise specifications and/or rules that allow the controller 106 to implement lossy compression on the plurality of data elements of the workload of the storage system arrangement 102. The lossy compression removes unnecessary, less important and/or redundant data from the plurality of data elements when generating the backup. This results in the backup copy to have a reduced size that requires lesser disk space than a size of the plurality of data elements in the storage arrangement 102. In advance mode, an amount of loss in the plurality of data elements may be configurable by the controller 106 according to the settings for lossy data reduction. Optionally, the amount of loss may depend on a type of the plurality of data elements. For example, the controller 106 may implement a lossy compression algorithm on a data element with 50% compression ratio that results in a compressed backup of the data element having half the size of the original data element. The settings for lossy data reduction may further comprise algorithms that implement lossy data reduction on the plurality of data elements. The algorithms for lossy data reduction may include, but are not limited to Discrete Cosine Transform (DCT), fractal compression, Chroma subsampling and Colour reduction. The controller 106 may implement different compression algorithms according to a type of the data element. For example, the controller 106 may implement different compression algorithms for an image data element and an audio data element. Such algorithms are employed at a time of generating the backup, prior to sending the backup to the at least one storage system. Notably, the settings for lossy data reduction enables the controller 106 to store the backup of plurality of data elements in the at least one backup storage system in a space efficient format.
In an embodiment, the settings of anonymization requirements comprise specifications and/or rules that allow the controller 106 to modify content of the plurality of data elements for anonymizing identities of entities to which such content relates. These settings enable the controller 106 to protect private or sensitive information stored in the plurality of data elements. For example, a rule in the settings of anonymization requirements may be anonymizing private information such as social security number and financial details, and a specification in the settings may be changing all digits of social security number and financial details to T. The settings of anonymization requirements may further comprise algorithms that implement anonymization on the plurality of data elements. The algorithms for anonymization requirements may include, but are not limited to machine learning algorithms, incognito algorithm, Samarati algorithm, and Datafly algorithm.
In an embodiment, the settings of leak source detection comprise specifications and/or rules that allow the controller 106 to avoid data leaks or track the source of data leak in a case of occurrent of the data leak. For example, the settings of leak source detection comprise a rule to add a mark to a given data element while generating its backup. Moreover, the settings may comprise specific characteristics (such as a type, a size, a colour, a position, and the like) of the mark.
In an embodiment, the settings of compliance requirements comprise specifications and/or rules that allow the controller 106 to change the content of the plurality of data elements according to laws, policies, and regulations applicable to the backup of the plurality of data elements. Since laws policies, and regulations typically relate to data security and privacy, such settings of compliance requirement provide data protection to the plurality of data elements.
In an embodiment, the controller 106 is further configured to generate the backup of each of the data elements according to the backup policy for the workload by one or more of: utilizing a compression algorithm according to the lossy data reduction; adapting the content of the data element according to the anonymization requirements; adding a mark indicting a source to the data element according to the leak source detection; and adapting the content of the data element according to the compliance requirements. One or more of the aforesaid processes are performed in accordance with the settings in the backup policy, to enable the controller 106 in generating the backup. Upon generation of the backup, the controller 106 sends the backup for storage to the at least one backup storage system. The controller 106 digitally performs in a centralized manner, one or more of the aforesaid processes, to efficiently and accurately perform generation of the backup in a systematic manner. In such a case, separate tools for performing individual processes are not required to be manually employed for generating the backup.
In an embodiment, the controller 106 employs the compression algorithm according to settings for the lossy data reduction in the backup policy. As an example, different compression algorithms may be employed for implementing different extents of compression, as specified in the settings for the lossy data reduction (in the backup policy).
In an embodiment, the controller 106 adapts the content of the data element according to the settings of the anonymization requirements. As an example, different anonymization algorithms may be employed for implementing anonymization, as specified in the settings for anonymization requirements (in the backup policy). For example, in a case when the backup copy is to be used for test and development purposes, the controller 106 modifies private information stored in the plurality of data elements, when generating the backup.
In accordance with an embodiment, the controller 106 is further configured to adapt the content of the data element according to the anonymization requirements by obscuring a face, altering a voice, and/or change personal information. The controller 106 modifies the data element according to the anonymization requirements in the backup policy. In an example, the face represented in an image (data element) is obscured (for example, blurred, recoloured, and the like) by the controller 106 to prevent identification of an individual. The controller 106 may identify and obscure the face represented in the data element using algorithms, such as machine learning, recolouring, box blurring, and the like. In another example, the voice is altered (by changing an amplitude, changing a pitch and tone of the voice, and the like) by the controller 106 to prevent identification of an individual. The controller 106 may alter (namely, modify) the voice in the data element using algorithms, such as pitch synchronous overlap and add (PSOLA) algorithm, voice morphing algorithms, and the like. In yet another example, the personal information, which is sensitive in nature, like credit card numbers, date of birth, financial details, addresses, social security numbers, and the like, in the plurality of data elements is changed by the controller 106 according to the anonymization requirements. The controller 106 may change sensitive data stored in the data element using the algorithms for anonymization requirements.
In an embodiment, the controller 106 adds the mark to the data element according to the settings of the leak source detection. The mark is a digital footprint that indicates a source of the data element or its backup and facilitates in tracking sources of data leaks. The mark discourages the unauthorized use of the plurality of data elements or their backup without requisite permission (for example, of an authorized person). In an embodiment, the mark is perceptible, thereby discouraging data leaks. In another embodiment, the mark is imperceptible, thereby making the mark hard to destroy and hence, facilitating in tracking the source of the leak. Further, the mark allows detection of the leaker by decoding the mark on the data element. For example, in a scenario when an unauthorised user attempts to access a backup of a data element, an identification of the user (such as username or IP address) may modify a mark present in backup. This enables tracking of the user by decoding of the modified mark. Optionally, the mark is added by the controller 106 if at least one backup storage system is not a trusted system. The controller 106 may add mark on the data element using algorithms, such as self-embedding algorithm, fragile watermarks algorithm, block-based watermarking algorithm and feature-based watermark algorithm, watermark-embedding algorithm, neural network training algorithm, watermark extraction algorithm and the alike.
In accordance with an embodiment, the mark added to the data element according to the leak source detection is a water mark. Water mark is a digital footprint that is hard to detect or damage and allows detection of data leaks of the plurality of data elements. The digital footprint may include, but is not to a barcode, a text code and alike. The watermark indicates source of the plurality of data elements and/or the backup of the data elements, and enables leak source detection by tracking the indicated sources on the watermark. In an embodiment, the controller 106 to adapt (namely, modify) the content of the data element according to the settings of the compliance requirements. The content of the plurality of data elements is adapted ensure compliance of the backup of said data elements with local and governmental regulations.
In accordance with an embodiment, the controller 106 is further configured to adapt the content of the data element according to the compliance requirements by obscuring a face of a child and/or delete an address, a name, and/or an identification number. The controller 106 modifies the data element according to the compliance requirements in the backup policy. In an example, in a scenario, at a geographical location of a backup storage system to which a data element is to be backed up, storing images and/or videos of children is forbidden. In order to adapt the content of the data element according to this compliance requirement, the controller 106 may detect faces in data element and apply a filter (for example, a blur filter) to the data element to obscure a face of the child. In another example, in a scenario, a backup storage system to which a data element is to be backed up, storing private information such as an address, a name, an identification number and the like is prohibited. In order to adapt the content of the data element according to this compliance requirement, the controller 106 may detect the private information and delete the private information (by using algorithm, such as machine learning).
In accordance with an embodiment, the controller 106 is further configured to generate the backup of each of the data elements according to the backup policy for the workload by generating a copy of the data element and then apply a filter according to the backup policy to the copy of the data element. Here, the term "filter" refers to a function or a software that processes data to perform a certain operation (for example, such as data reduction, anonymization, leak source detection and/or compliance) on the data. Here, the copy of the data element refers to a replica of the data element. Further, in an embodiment, more than one filter may be applied in a predefined order by the controller 106 to the copy of the plurality of data elements, according to the settings in the backup policy. For example, when the settings in the backup policy comprise lossy data reduction, anonymization requirements, and leak source detection, the controller 106 may first apply a data reduction filter, then an anonymization filter and then a leak source detection filter (that adds the mark to the copy of the data element). Subsequently, the controller 106 sends the copy of the data element to a given backup storage system. FIG. 2 is an exemplary illustration of backup of data elements to a backup storage system, in accordance with an embodiment of the present disclosure. With reference to FIG. 2, there is shown the storage arrangement 102 and a backup storage system 202 (depicted, for example, as a cloud storage system). The storage arrangement 102 comprises a file system 204A that comprises a copy 206A of a first data element and a copy 208A of a second data element. The controller 106 applies one or more filters on each of the copies 206A and 208A according to the backup policy. In an example, the data reduction filter is applied to the copy 206A (which may, for example, be an image) and then the leak source detection filter is applied to add a mark to the copy 206A. In another example, the anonymization filter is applied to the copy 208A (which may, for example, be a document). Then, backups of the first data element and the second data element are transferred to the backup storage system 202. The backup storage system 202 stores a backup 204B of the file system 204A. The backup 204B comprises a backup 206B of the first data element and a backup 208B of the second data element. The backup 206B is optimized to have reduced data and leak source detection capabilities, whereas the backup 208B is optimized for anonymization of sensitive data.
FIG. 3 is an exemplary illustration of a data element including a mark added for leak source detection, in accordance with an embodiment of the present disclosure. With reference to FIG. 3, there is shown a data element 302 including a mark 304 added for leak source detection. The data element 302, is for example, an image. A portion of the data element 302 that includes the mark 304 is shown in an enlarged form. The mark 304 is hard to detect and damage and thus, discourages unauthorized use and distribution of the data element 302 without requisite permission. The mark 304 also facilitates in detecting a source of leak of the data element 302, if such a leak occurs.
FIG. 4 is an exemplary illustration of a data element whose content has been adapted according to a compliance requirement, in accordance with an embodiment of the present disclosure. With reference to FIG. 4, there is shown a data element 402 whose content has been adapted according to a compliance requirement. The data element 402 is, for example, an image depicting a child 404 and an adult 406. In a scenario, at a geographical location of a backup storage system to which the data element 402 is to be backed up, storing images and/or videos of children is forbidden. In order to adapt the content of the data element 402 according to this compliance requirement of not storing images and/or videos of children, the controller 106 may detect faces on in data element 402 and apply a filter (for example, a blur filter) to the data element 402 to hide a face of the child 404. As a result, the face of the child 404 is obscured by the controller 106. FIG. 5 is a flowchart of a method 500 for a storage arrangement, in accordance with an embodiment of the present disclosure. The method 500 is used in a storage arrangement (such as the storage arrangement 102) comprising a memory (such as the memory 104) being configured to store a plurality of data elements comprised in a workload. The method 500 is executed by the controller 106 at the storage arrangement 102 described, for example, in FIG. 1. The method 500 includes steps 502, 504, and 506.
At step 502, the method 500 comprises receiving an indication of a workload to be backed-up, the workload comprising a plurality of data elements. The controller 106 receives the indication of the workload via the communication interface 108. The indication is sent by a user, via a user device. Upon receiving the indication, the controller 106 initiates a backup operation of the workload for creating the backup of the plurality of data elements on at least one backup storage system.
At step 504, the method 500 further comprises receiving an indication of a backup policy, wherein the backup policy is for the workload and applies to all of the plurality of data elements to be backed-up. In some implementations, the indication of the backup policy is received from the user using the user device. In other implementations, the indication of the backup policy is received from a backup policy selecting software module. The indicated backup policy enables the controller 106 to automatically manage and create the backup of the plurality of data elements of the workload.
At step 506, the method 500 further comprises generating a backup of each of the data elements according to the backup policy for the workload, wherein the backup policy comprises settings for one or more of: lossy data reduction; anonymization requirements; leak source detection; and compliance requirements. The settings of the backup policy comprise specifications and/or rules that allow the controller 106 to modify the plurality of data elements to generate an optimized backup of each of the plurality of data elements of the workload.
The steps 502 to 506 are only illustrative and other alternatives can also be provided where one or more steps are added, one or more steps are removed, or one or more steps are provided in a different sequence without departing from the scope of the claims herein.
In accordance with an embodiment, a computer-readable medium carrying computer instructions that when loaded into and executed by a controller 106 of a storage arrangement 102 enables the storage arrangement 102 to implement the method 500. The computer-readable medium carrying computer instructions provides a non-transient memory and may include, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
In an exemplary aspect, a storage arrangement 102 comprises a memory 104 being configured to store a plurality of data elements comprised in a workload. The storage arrangement 102 further comprises a workload receiving software module 110a for receiving an indication of a workload to be backed-up, the workload comprising a plurality of data elements. The storage arrangement 102 further comprises a backup policy receiving software module 110b for receiving an indication of a backup policy, wherein the backup policy is for the workload and applies to all of the plurality of data elements to be backed-up. The storage arrangement 102 further comprises a backup generating software module 110c for generating a backup of each of the data elements according to the backup policy for the workload, wherein the backup policy comprises settings for one or more of: lossy data reduction; anonymization requirements; leak source detection; and compliance requirements.
The software modules 110 are executed to enable the storage arrangement 102 to receive and attach an appropriate backup policy to the workload, in order to generate the backup of the plurality of data elements. Dedicated software modules 110 are employed to perform dedicated processing tasks concerning the generation of the backup. Use of the software modules 110 beneficially automates backup generation in a centralized manner at the storage arrangement 102, so that manual involvement in applying these processes is minimized. The software modules 110 are executed by the controller 106 of the storage arrangement 102.
Modifications to embodiments of the present disclosure described in the foregoing are possible without departing from the scope of the present disclosure as defined by the accompanying claims. Expressions such as "including", "comprising", "incorporating", "have", "is" used to describe and claim the present disclosure are intended to be construed in a non-exclusive manner, namely allowing for items, components or elements not explicitly described also to be present. Reference to the singular is also to be construed to relate to the plural. The word "exemplary" is used herein to mean "serving as an example, instance or illustration". Any embodiment described as “exemplary” is not necessarily to be construed as preferred or advantageous over other embodiments and/or to exclude the incorporation of features from other embodiments. The word "optionally" is used herein to mean "is provided in some embodiments and not provided in other embodiments". It is appreciated that certain features of the present disclosure, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the invention, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable combination or as suitable in any other described embodiment of the disclosure.

Claims

1. A storage arrangement (102) comprising a memory (104) and a controller (106), the memory (104) being configured to store a plurality of data elements comprised in a workload, and the controller (106) being configured to: receive an indication of a workload to be backed-up, the workload comprising a plurality of data elements; receive an indication of a backup policy, characterized in that the backup policy is for the workload and applies to all of the plurality of data elements to be backed-up; and generate a backup of each of the data elements according to the backup policy for the workload, wherein the backup policy comprises settings for one or more of: lossy data reduction; anonymization requirements; leak source detection; and compliance requirements.
2. The storage arrangement (102) according to claim 1 , wherein the controller (106) is further configured to generate the backup of each of the data elements according to the backup policy for the workload by one or more of: utilizing a compression algorithm according to the lossy data reduction; adapting the content of the data element according to the anonymization requirements; adding a mark indicting a source to the data element according to the leak source detection; and adapting the content of the data element according to the compliance requirements.
3. The storage arrangement (102) according to claim 2, wherein the controller (106) is further configured to adapt the content of the data element according to the anonymization requirements by obscuring a face, altering a voice, and/or change personal information.
4. The storage arrangement (102) according to claim 2 or 3, wherein the mark added to the data element according to the leak source detection is a water mark.
5. The storage arrangement (102) according to claim 2, 3 or 4, wherein the controller (106) is further configured to adapt the content of the data element according to the compliance requirements by obscuring a face of a child and/or delete an address, a name, and/or an identification number.
23
6. The storage arrangement (102) according to any previous claim, wherein the controller (106) is further configured to generate the backup of each of the data elements according to the backup policy for the workload by generating a copy of the data element and then apply a filter according to the backup policy to the copy of the data element.
7. The storage arrangement (102) according to any previous claim, wherein the controller (106) is further configured to receive the backup policy from a user.
8. The storage arrangement (102) according to claim 7, wherein the controller (106) is further configured to receive the backup policy from the user by receiving one or more of: a user indication of lossy data reduction; a user indication of anonymization requirements; a user indication of leak source detection; and a user indication of compliance requirements.
9. The storage arrangement (102) according to claim 7, wherein the controller (106) is further configured to receive the backup policy from the user by receiving a selection of a backup policy.
10. The storage arrangement (102) according to claim 7, wherein the controller (106) is further configured to receive the backup policy from the user by receiving an indication of legislative requirements, wherein the backup policy matches the legislative requirements.
11. A method (500) for a storage arrangement (102) comprising a memory (104) being configured to store a plurality of data elements comprised in a workload, the method (500) comprising: receiving an indication of a workload to be backed-up, the workload comprising a plurality of data elements; receiving an indication of a backup policy, characterized in that the backup policy is for the workload and applies to all of the plurality of data elements to be backed-up; and generating a backup of each of the data elements according to the backup policy for the workload, wherein the backup policy comprises settings for one or more of: lossy data reduction; anonymization requirements; leak source detection; and compliance requirements.
12. A computer-readable medium carrying computer instructions that when loaded into and executed by a controller (106) of a storage arrangement (102) enables the storage arrangement to implement the method (500) according to claim 11.
13. A storage arrangement (102) comprising a memory (104) being configured to store a plurality of data elements comprised in a workload, and the storage arrangement (102) further comprising: a workload receiving software module (110a) for receiving an indication of a workload to be backed-up, the workload comprising a plurality of data elements; a backup policy receiving software module (110b) for receiving an indication of a backup policy, characterized in that the backup policy is for the workload and applies to all of the plurality of data elements to be backed-up; and a backup generating software module (110c) for generating a backup of each of the data elements according to the backup policy for the workload, wherein the backup policy comprises settings for one or more of: lossy data reduction; anonymization requirements; leak source detection; and compliance requirements.
PCT/EP2020/076236 2020-09-21 2020-09-21 Storage arrangements and method employing backup policies for generating data backup WO2022058030A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN202080105357.XA CN116209985A (en) 2020-09-21 2020-09-21 Storage device and method for generating data backup by adopting backup strategy
PCT/EP2020/076236 WO2022058030A1 (en) 2020-09-21 2020-09-21 Storage arrangements and method employing backup policies for generating data backup
EP20780122.6A EP4204965A1 (en) 2020-09-21 2020-09-21 Storage arrangements and method employing backup policies for generating data backup

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2020/076236 WO2022058030A1 (en) 2020-09-21 2020-09-21 Storage arrangements and method employing backup policies for generating data backup

Publications (1)

Publication Number Publication Date
WO2022058030A1 true WO2022058030A1 (en) 2022-03-24

Family

ID=72644203

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2020/076236 WO2022058030A1 (en) 2020-09-21 2020-09-21 Storage arrangements and method employing backup policies for generating data backup

Country Status (3)

Country Link
EP (1) EP4204965A1 (en)
CN (1) CN116209985A (en)
WO (1) WO2022058030A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013036537A1 (en) * 2011-09-07 2013-03-14 Symantec Corporation Automated separation of corporate and private data for backup and archiving
US8832044B1 (en) * 2009-03-04 2014-09-09 Symantec Corporation Techniques for managing data compression in a data protection system
US20180285382A1 (en) * 2017-03-29 2018-10-04 Commvault Systems, Inc. Synchronization operations for network-accessible folders
US20200117365A1 (en) * 2018-10-15 2020-04-16 EMC IP Holding Company LLC Agent aware selective backup of a virtual machine using virtual i/o filter snapshots

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8832044B1 (en) * 2009-03-04 2014-09-09 Symantec Corporation Techniques for managing data compression in a data protection system
WO2013036537A1 (en) * 2011-09-07 2013-03-14 Symantec Corporation Automated separation of corporate and private data for backup and archiving
US20180285382A1 (en) * 2017-03-29 2018-10-04 Commvault Systems, Inc. Synchronization operations for network-accessible folders
US20200117365A1 (en) * 2018-10-15 2020-04-16 EMC IP Holding Company LLC Agent aware selective backup of a virtual machine using virtual i/o filter snapshots

Also Published As

Publication number Publication date
EP4204965A1 (en) 2023-07-05
CN116209985A (en) 2023-06-02

Similar Documents

Publication Publication Date Title
US10438000B1 (en) Using recognized backup images for recovery after a ransomware attack
US11120013B2 (en) Real time visual validation of digital content using a distributed ledger
CN109074452B (en) System and method for generating tripwire files
Böhme et al. Counter-forensics: Attacking image forensics
US10592677B2 (en) Systems and methods for patching vulnerabilities
US9165002B1 (en) Inexpensive deletion in a data storage system
EA034354B1 (en) System and method for document information authenticity verification
WO2018187408A1 (en) System for recording ownership of digital works and providing backup copies
US20020154144A1 (en) Image management system and methods using digital watermarks
CN108701188A (en) In response to detecting the potential system and method for extorting software for modification file backup
CN107077570A (en) System and method for detecting the trial that sensitive information is sent by data distribution passage
US11182873B2 (en) Multiple source watermarking for surveillance
Emam et al. Two‐stage keypoint detection scheme for region duplication forgery detection in digital images
EP2579258A1 (en) Method of automatic management of a collection of images and corresponding device
US20170228289A1 (en) Privacy Protection of Media Files For Automatic Cloud Backup Systems
US20170228292A1 (en) Privacy Protection of Media Files For Automatic Cloud Backup Systems
EP4204965A1 (en) Storage arrangements and method employing backup policies for generating data backup
CN111368128A (en) Target picture identification method and device and computer readable storage medium
JP2017134825A (en) Method for selecting content comprising audiovisual data and corresponding electronic device, system, computer readable program and computer readable storage medium
US10586055B2 (en) Electronically backing up files using steganography
US20200411047A1 (en) Detecting electronic system modification
Jana et al. Voronoi Diagrams Based Digital Tattoo for Multimedia Data Protection
US12026173B1 (en) System and method for extraction management
Panchal et al. Relevance feedback utilizing secure evaluation with content-based image retrieval in cloud computing
US20240114034A1 (en) Generation of the Digital Fingerprints Library with Hierarchical Structure

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20780122

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2020780122

Country of ref document: EP

Effective date: 20230327

NENP Non-entry into the national phase

Ref country code: DE