WO2023169656A1 - Data management device and method for data management for a distributed data storage - Google Patents

Data management device and method for data management for a distributed data storage Download PDF

Info

Publication number
WO2023169656A1
WO2023169656A1 PCT/EP2022/055872 EP2022055872W WO2023169656A1 WO 2023169656 A1 WO2023169656 A1 WO 2023169656A1 EP 2022055872 W EP2022055872 W EP 2022055872W WO 2023169656 A1 WO2023169656 A1 WO 2023169656A1
Authority
WO
WIPO (PCT)
Prior art keywords
pii
file
data management
management device
metadata
Prior art date
Application number
PCT/EP2022/055872
Other languages
French (fr)
Inventor
Asaf Yeger
Michael Gutman
Shahar SALZMAN
Shmoolik Yosub
David Segal
Assaf Natanzon
Dror PERETZ
Original Assignee
Huawei Technologies Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co., Ltd. filed Critical Huawei Technologies Co., Ltd.
Priority to PCT/EP2022/055872 priority Critical patent/WO2023169656A1/en
Publication of WO2023169656A1 publication Critical patent/WO2023169656A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes

Definitions

  • the disclosure relates generally to a distributed data storage including a data management device for a distributed data storage, and more particularly, the disclosure relates to a computer-implemented method of data management for the distributed data storage.
  • FIG. 1 illustrates an exemplary view of a distributed data storage 100 of an organization for data management in accordance with the prior art.
  • the distributed data storage 100 includes a personal server 102, a laptop 104, a database 106, a datacenter half-storage rack 108, and a data center storage rack 110.
  • the distributed data storage 100 stores Personally Identifiable Information, PII in at least one of the personal server 102, the laptop 104, the database 106, the datacenter half-storage rack 108, or the data center storage rack 110.
  • PII Personally Identifiable Information
  • Tracking down the Personally Identifiable Information, PII, in the distributed data storage 100 is easy, yet a portion of the Personally Identifiable Information, PII, information might be unstructured information that exists in at least one of the personal server 102, the laptop 104, the database 106, the datacenter half-storage rack 108 or the data center storage rack 110. Tracking down the Personally Identifiable Information, PII, in unstructured information requires knowing where to look and looking through devices that look past metadata and into actual information. The process of tracking down the Personally Identifiable Information, PII, is difficult when there are numerous laptops, servers on-premise and in clouds is difficult.
  • the organizations must comply with a user's request and reveal any information acquired on the user under the General Data Protection Regulation, or GDPR. As these files may contain additional Personally Identifiable Information, PII, of other users, a security officer must have access to all of them in order to evaluate them before delivering the information to the user.
  • the data containing the Personally Identifiable Information, PII, about the user are dispersed across the corporate storage, such as NAS and object storage.
  • the security officer must have credentials for all main storage where user PII files are stored, as well as to object storage in the cloud or on-premise, in order to have access to all DSAR information. When information is stored in many databases, a number of credential management and access are difficulties.
  • information may be stored in many databases of different types, adding further complexity to the process.
  • the security officer To supply the user with the user information, the security officer must copy the user's PII files from a target or production storage. There is a burden on the production storage when the files are large in size. The security officer must perform Data Subject Access Request, DSAR, again to get all updated file paths and sources since the user information may have been transferred to various places or may no longer exist in the production storage.
  • DSAR Data Subject Access Request
  • the disclosure provides a distributed data storage including a data management device for a distributed data storage, and a computer-implemented method of data management for the distributed data storage.
  • a data management device for a distributed data storage.
  • the data management device includes one or more collector modules and a reporting module.
  • the one or more collector modules are configured to receive metadata relating to each file in the distributed data storage.
  • the metadata includes information on any personally identifiable information, PII, elements detected in each file.
  • the reporting module configured, in response to receiving a request relating to a user, to: (i) search the metadata for each PII element relating to the user and (ii) generate a list including each searched PII element.
  • the data management device provides immediate access to Data Subject Access Request, DS AR, relevant files to package, and send all of the DS AR relevant files to a Data security officer.
  • the data management device reduces a load on a primary storage and network traffics by avoiding a process of copying a large amount of data from production servers to a target node.
  • the device management device reduces security risks by restricting credentials of all organization servers to the data security officer.
  • the device management device reduces security risks by restricting access to a production storage and reducing human errors.
  • the data management device fulfills service legal agreements and eliminates penalties by providing faster results.
  • the generated list includes one or more of a data source, a file path, a file offset, a PII type and a value.
  • the reporting module is further configured to generate a soft link to each file containing at least one of the PII elements included in the generated list.
  • the reporting module is configured to generate the soft link by mounting the data source containing each file to be linked.
  • the metadata further includes a PII relation between at least one pair of PII elements.
  • the collector modules are configured to receive the metadata continuously or periodically at a predetermined time interval.
  • the predetermined time interval is determined for each file based on an access frequency of the file.
  • a distributed data storage including the data management device of the first aspect.
  • the distributed data storage provides immediate access to Data Subject Access Request, DSAR, relevant files to package and send all of the DSAR relevant files to a Data security officer.
  • the distributed data storage reduces a load on a primary storage and network traffics by avoiding a process of copying a large amount of data from production servers to a target node.
  • the distributed data storage reduces security risks by restricting credentials of all organization servers to the data security officer.
  • the distributed data storage reduces security risks by restricting access to production storage and reducing human errors.
  • the distributed data storage fulfills service legal agreements and eliminates penalties by providing faster results.
  • a computer-implemented method of data management for a distributed data storage includes receiving metadata relating to each file in the distributed data storage.
  • the metadata includes information on any personally identifiable information, PII, elements detected in each file.
  • the method includes, in response to receiving a request relating to a user, searching the metadata for each PII element relating to the user and generating a list including each searched PII element.
  • the method provides immediate access to Data Subject Access Request, DSAR, relevant files to package and send all of the DSAR relevant files to a Data security officer.
  • the method reduces a load on a primary storage and network traffics by avoiding a process of copying a large amount of data from production servers to a target node.
  • the method reduces security risks by restricting credentials of all organization servers to the data security officer.
  • the method reduces security risks by restricting access to a production storage and reducing human errors.
  • the method fulfills service legal agreements and eliminates penalties by providing faster results.
  • the generated list includes one or more of a data source, a file path, a file offset, a PII type, and a value.
  • the reporting module is further configured to generate a soft link to each file containing at least one of the PII elements included in the generated list.
  • the reporting module is configured to generate the soft link by mounting the data source containing each file to be linked.
  • the metadata further includes a PII relation between at least one pair of PII elements.
  • the collector modules are configured to receive the metadata continuously or periodically at a predetermined time interval.
  • the predetermined time interval is determined for each file based on an access frequency of the file.
  • the device management device provides immediate access to the Data Subject Access Request, DS AR, relevant files to package and send all of the DSAR relevant files to a Data security officer.
  • the data management device reduces a load on primary storage and network traffics by avoiding a process of copying a large amount of data from production servers to a target node.
  • the device management device reduces security risks by restricting credentials of all organization servers to the data security officer.
  • the device management device reduces security risks by restricting access to a production storage and reducing human errors.
  • the data management device fulfills service legal agreements and eliminates penalties by providing faster results.
  • FIG. 1 illustrates an exemplary view of a distributed data storage of an organization for data management in accordance with the prior art
  • FIG. 2 is a block diagram of a data management device for a distributed data storage in accordance with an implementation of the disclosure
  • FIG. 3 is a block diagram of a distributed data storage in accordance with an implementation of the disclosure.
  • FIG. 4 is a flow diagram that illustrates a computer-implemented method of data management for a distributed data storage in accordance with an implementation of the disclosure.
  • FIG. 5 is an illustration of an exemplary data management device, a distributed data storage, or a computer system in which the various architectures and functionalities of the various previous implementations may be implemented.
  • Implementations of the disclosure provide a distributed data storage including a data management device for a distributed data storage and a computer-implemented method of data management for the distributed data storage.
  • a process, a method, a system, a product, or a device that includes a series of steps or units is not necessarily limited to expressly listed steps or units but may include other steps or units that are not expressly listed or that are inherent to such process, method, product, or device.
  • FIG. 2 is a block diagram of a data management device 200 for a distributed data storage in accordance with an implementation of the disclosure.
  • the data management device 200 includes one or more collector modules 202A-N and a reporting module 204.
  • the one or more collector modules 202A-N are configured to receive metadata relating to each file in the distributed data storage.
  • the metadata includes information on any personally identifiable information, PII, elements detected in each file.
  • the reporting module 204 is configured, in response to receiving a request relating to a user, to: (i) search the metadata for each PII element relating to the user and (ii) generate a list including each searched PII element.
  • the data management device 200 provides immediate access to Data Subject Access Request, DSAR, relevant files to package, and send all of the DSAR relevant files to a Data security officer.
  • the data management device 200 reduces a load on a primary storage and network traffics by avoiding a process of copying a large amount of data from production servers to a target node.
  • the data management device 200 reduces security risks by restricting credentials of all organization servers to the data security officer.
  • the data management device 200 reduces security risks by restricting access to a production storage and reducing human errors.
  • the data management device 200 fulfills service legal agreements and eliminates penalties by providing faster results.
  • the data management device 200 uses backup storage for the distributed data storage.
  • the backups are centralized and include the latest personally identifiable information.
  • the data management device 200 provides a service of unstructured data management.
  • the data management device 200 is a centralized system that has a single pane of glass of an entire enterprise storage.
  • the data management device 200 may run for different types of queries and performs analytic and supply results on a customer storage enterprise. For example: find hot and cold data according to customer policy, DSAR, etc.
  • the data management device 200 may scan data from at least one primary storage or backup storage.
  • the combination of the data management device 200 is part of a backup system and provides a complete solution for a data security officer.
  • the data management device 200 provides a list of sources and files with all user personally identifiable information, PII.
  • user “John” personally identifiable information, PII may be found on NASI file path /home/users/john/banking.txt backupID: “123” & NAS7 file path /tmp/newusers/list.txt backupID: “456”) and expose from the backup storage file systems of the backup storage that contains DSAR file results.
  • the generated list includes one or more data sources, a file path, a file offset, a PII type, and a value.
  • the reporting module 204 is further configured to generate a soft link to each file containing at least one of the PII elements included in the generated list.
  • the reporting module 204 is configured to generate the soft link by mounting the data source containing each file to be linked.
  • the metadata further includes a PII relation between at least one pair of PII elements.
  • the data management device 200 may support a user based on a PII collection and data modeling.
  • the one or more collector modules 202A-N are configured to receive the metadata continuously or periodically at a predetermined time interval.
  • the predetermined time interval may determine for each file based on an access frequency of the file.
  • the one or more collector modules 202A-N collect information on unstructured data from all types of sources in enterprise storage.
  • the enterprise storage may include NAS, S3, Environment (VMs) storage.
  • the collection of information may be made periodically or “live update”, using the one or more collector modules 202A-N that may run in-host or outside of the host.
  • the one or more collector modules 202A-N collect native metadata and additional synthetic data. For example, supply I/O temperature of a file by calculating the number of reads per file in the last 24 hours or any other synthetic information.
  • the one or more collector modules 202A-N update the data management device 200 with file system metadata and detect personally identifiable information, PII, and the relation between the personally identifiable information, PII.
  • the information may store in the data management device backend. For example, ElasticSearch.
  • the one or more collector modules 202A-N are detectors that read file data and send sensitive data to the data management device 200 which is responsible for modeling private information (PI) and private identifiable information (PII) found in files and objects.
  • the data management device 200 stores the private information, PI, and the private identifiable information, PII in a backend of the data management device 200.
  • FIG. 3 is a block diagram of a distributed data storage 300 in accordance with an implementation of the disclosure.
  • the distributed data storage 300 includes a data management device 302.
  • the data management device 302 includes one or more collector modules 304A-N and a reporting module 306.
  • the one or more collector modules 304A- N are configured to receive metadata relating to each file in the distributed data storage 300.
  • the metadata includes information on any personally identifiable information, PII, elements detected in each file.
  • the reporting module 306 is configured, in response to receiving a request relating to a user, to: (i) search the metadata for each PII element relating to the user and (ii) generate a list including each searched PII element.
  • FIG. 4 is a flow diagram that illustrates a computer-implemented method of data management for a distributed data storage in accordance with an implementation of the disclosure.
  • a step 402 metadata relating to each file in the distributed data storage is received.
  • the metadata is searched for each PII element relating to the user and a list including each searched PII element is generated.
  • the generated list includes one or more of a data source, a file path, a file offset, a PII type and a value.
  • the reporting module is further configured to generate a soft link to each file containing at least one of the PII elements included in the generated list.
  • the reporting module is configured to generate the soft link by mounting the data source containing each file to be linked.
  • the metadata further includes a PII relation between at least one pair of PII elements.
  • the collector modules are configured to receive the metadata continuously or periodically at a predetermined time interval.
  • the predetermined time interval is determined for each file based on an access frequency of the file.
  • the method may provide soft links to a data security officer for relevant files and provide only the links by performing (i) scanning the enterprise storage from the backup storage, (ii) sending a Data Subject Access Request, DS AR, of some user to the data management device, (iii) searching in the backend for all the personally identifiable information, PII related to the user and responding with results that contain the list of sources, files, PII types, values & location in the file (offset), (iv) exposing from the backup the file systems that contain (a) home/john/tmp/bank.txt (backup of VM231-ID, backupID: “111”), (b) /home/john/mydocs/tmp.txt (backup of VM231-ID, backupID: “111”) and (c) /data/ToDo.txt (backup of VM562-ID, backupID: “777”), (v) exposing file systems associated with backupID “111” & “777” from the backup repository and mounts them on
  • the data security officer may obtain access only to the “dsar-123” directory and views all the user “john” files without any copy from production storage or special credentials for each server.
  • exposing the files from the backup storage and creating the soft link is only one embodiment of the solution to expose the Data Subject Access Request, DSAR, files.
  • the Data Subject Access Request, DSAR, files may be exposed as “flat” all files per backupID in the same storage.
  • for files with the same name from the same backupID need to place them in different storages and create the soft link. For example:
  • soft links are created for at least one of bank.txt, tmp.txt, or ToDo.txt.
  • a computer-readable medium is configured to store instructions which, when executed by a processor, causes the processor to execute the above method.
  • FIG. 5 is an illustration of an exemplary data management device, a distributed data storage, or a computer system 500 in which the various architectures and functionalities of the various previous implementations may be implemented.
  • the computer system 500 includes at least one processor 504 that is connected to a bus 502, wherein the computer system 500 may be implemented using any suitable protocol, such as PCI (Peripheral Component Interconnect), PCI-Express, AGP (Accelerated Graphics Port), Hyper Transport, or any other bus or point-to-point communication protocol (s).
  • the computer system 500 also includes a memory 506.
  • Control logic (software) and data are stored in the memory 506 which may take a form of random-access memory (RAM).
  • RAM random-access memory
  • a single semiconductor platform may refer to a sole unitary semiconductor-based integrated circuit or chip. It should be noted that the term single semiconductor platform may also refer to multi-chip modules with increased connectivity which simulate on-chip modules with increased connectivity which simulate on-chip operation, and make substantial improvements over utilizing a conventional central processing unit (CPU) and bus implementation. Of course, the various modules may also be situated separately or in various combinations of semiconductor platforms per the desires of the user.
  • the computer system 500 may also include a secondary storage 510.
  • the secondary storage 510 includes, for example, a hard disk drive and a removable storage drive, representing a floppy disk drive, a magnetic tape drive, a compact disk drive, digital versatile disk (DVD) drive, recording device, universal serial bus (USB) flash memory.
  • the removable storage drive at least one of reads from and writes to a removable storage unit in a well-known manner.
  • Computer programs, or computer control logic algorithms may be stored in at least one of the memory 506 and the secondary storage 510. Such computer programs, when executed, enable the computer system 500 to perform various functions as described in the foregoing.
  • the memory 506, the secondary storage 510, and any other storage are possible examples of computer-readable media.
  • the architectures and functionalities depicted in the various previous figures may be implemented in the context of the processor 504, a graphics processor coupled to a communication interface 512, an integrated circuit (not shown) that is capable of at least a portion of the capabilities of both the processor 504 and a graphics processor, a chipset (namely, a group of integrated circuits designed to work and sold as a unit for performing related functions, and so forth).
  • the architectures and functionalities depicted in the various previous- described figures may be implemented in a context of a general computer system, a circuit board system, a game console system dedicated for entertainment purposes, an application-specific system.
  • the computer system 500 may take the form of a desktop computer, a laptop computer, a server, a workstation, a game console, an embedded system.
  • the computer system 500 may take the form of various other devices including, but not limited to a personal digital assistant (PDA) device, a mobile phone device, a smart phone, a television, and so forth. Additionally, although not shown, the computer system 500 may be coupled to a network (for example, a telecommunications network, a local area network (LAN), a wireless network, a wide area network (WAN) such as the Internet, a peer-to-peer network, a cable network, or the like) for communication purposes through an I/O interface 508.
  • a network for example, a telecommunications network, a local area network (LAN), a wireless network, a wide area network (WAN) such as the Internet, a peer-to-peer network, a cable network, or the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Databases & Information Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

Provided is a data management device (200, 302) for a distributed data storage (300). The data management device (200, 302) including one or more collector modules (202 A-N, 304A-N) and a reporting module (204, 306). The one or more collector modules (202A- N) configured to receive metadata relating to each file in the distributed data storage (300). The metadata includes information on any personally identifiable information, PII, elements detected in each file. The reporting module (204, 306) configured, in response to receiving a request relating to a user, to: (i) search the metadata for each PII element relating to the user and (ii) generate a list including each searched PII element.

Description

DATA MANAGEMENT DEVICE AND METHOD FOR DATA MANAGEMENT
FOR A DISTRIBUTED DATA STORAGE
TECHNICAL FIELD
The disclosure relates generally to a distributed data storage including a data management device for a distributed data storage, and more particularly, the disclosure relates to a computer-implemented method of data management for the distributed data storage.
BACKGROUND
Individuals are permitted access to information that organizations collect and process on them. The individuals hold intellectual property rights, or ownership rights, over their information even after they provide that to an organization. Information Technology organizations must provide complete bookkeeping of actually recognizable data their associations have on some random user upon demand. This requires the organization to know what data it has, where that is found, and have the option to recover a duplicate of that within 30 days or less.
FIG. 1 illustrates an exemplary view of a distributed data storage 100 of an organization for data management in accordance with the prior art. The distributed data storage 100 includes a personal server 102, a laptop 104, a database 106, a datacenter half-storage rack 108, and a data center storage rack 110. The distributed data storage 100 stores Personally Identifiable Information, PII in at least one of the personal server 102, the laptop 104, the database 106, the datacenter half-storage rack 108, or the data center storage rack 110. Tracking down the Personally Identifiable Information, PII, in the distributed data storage 100 is easy, yet a portion of the Personally Identifiable Information, PII, information might be unstructured information that exists in at least one of the personal server 102, the laptop 104, the database 106, the datacenter half-storage rack 108 or the data center storage rack 110. Tracking down the Personally Identifiable Information, PII, in unstructured information requires knowing where to look and looking through devices that look past metadata and into actual information. The process of tracking down the Personally Identifiable Information, PII, is difficult when there are numerous laptops, servers on-premise and in clouds is difficult. The organizations must comply with a user's request and reveal any information acquired on the user under the General Data Protection Regulation, or GDPR. As these files may contain additional Personally Identifiable Information, PII, of other users, a security officer must have access to all of them in order to evaluate them before delivering the information to the user. The data containing the Personally Identifiable Information, PII, about the user, are dispersed across the corporate storage, such as NAS and object storage. The security officer must have credentials for all main storage where user PII files are stored, as well as to object storage in the cloud or on-premise, in order to have access to all DSAR information. When information is stored in many databases, a number of credential management and access are difficulties. In particular, information may be stored in many databases of different types, adding further complexity to the process. To supply the user with the user information, the security officer must copy the user's PII files from a target or production storage. There is a burden on the production storage when the files are large in size. The security officer must perform Data Subject Access Request, DSAR, again to get all updated file paths and sources since the user information may have been transferred to various places or may no longer exist in the production storage.
In the existing solutions, the organizations were able to fall back on an idea of manually finding a person’s data when asked. The requests were rare, there was no set procedure for completing the task, and there were no penalties for errors. As a result, a request is forwarded to someone who will ask numerous application owners and data storage owners to report on what each system contains. This procedure is time-consuming, inaccurate, and dependent on search, which fails to locate contextual personal information. With the implementation of GDPR, businesses are turning to technology to assist automate the DSAR request and fulfillment process, as well as offering a report that includes a list of files and their sources (e.g. NASI). The security officer must get credentials from all sources, verify that the material is still available, and copy it to his local storage.
Therefore, there arises a need to address the aforementioned technical problem/ drawbacks in data management across various storage units of the organizations. SUMMARY
It is an object of the disclosure to provide a data management device for a distributed data storage and a computer-implemented method of data management for the distributed data storage while avoiding one or more disadvantages of prior art approaches.
This object is achieved by the features of the independent claims. Further, implementation forms are apparent from the dependent claims, the description, and the figures.
The disclosure provides a distributed data storage including a data management device for a distributed data storage, and a computer-implemented method of data management for the distributed data storage.
According to a first aspect, there is provided a data management device for a distributed data storage. The data management device includes one or more collector modules and a reporting module. The one or more collector modules are configured to receive metadata relating to each file in the distributed data storage. The metadata includes information on any personally identifiable information, PII, elements detected in each file. The reporting module configured, in response to receiving a request relating to a user, to: (i) search the metadata for each PII element relating to the user and (ii) generate a list including each searched PII element.
The data management device provides immediate access to Data Subject Access Request, DS AR, relevant files to package, and send all of the DS AR relevant files to a Data security officer. The data management device reduces a load on a primary storage and network traffics by avoiding a process of copying a large amount of data from production servers to a target node. The device management device reduces security risks by restricting credentials of all organization servers to the data security officer. The device management device reduces security risks by restricting access to a production storage and reducing human errors. The data management device fulfills service legal agreements and eliminates penalties by providing faster results.
Optionally, for each searched PII element, the generated list includes one or more of a data source, a file path, a file offset, a PII type and a value.
Optionally, the reporting module is further configured to generate a soft link to each file containing at least one of the PII elements included in the generated list. Optionally, the reporting module is configured to generate the soft link by mounting the data source containing each file to be linked.
Optionally, the metadata further includes a PII relation between at least one pair of PII elements.
Optionally, the collector modules are configured to receive the metadata continuously or periodically at a predetermined time interval.
Optionally, the predetermined time interval is determined for each file based on an access frequency of the file.
According to a second aspect, there is provided a distributed data storage including the data management device of the first aspect.
The distributed data storage provides immediate access to Data Subject Access Request, DSAR, relevant files to package and send all of the DSAR relevant files to a Data security officer. The distributed data storage reduces a load on a primary storage and network traffics by avoiding a process of copying a large amount of data from production servers to a target node. The distributed data storage reduces security risks by restricting credentials of all organization servers to the data security officer. The distributed data storage reduces security risks by restricting access to production storage and reducing human errors. The distributed data storage fulfills service legal agreements and eliminates penalties by providing faster results.
According to a third aspect, there is provided a computer-implemented method of data management for a distributed data storage. The method includes receiving metadata relating to each file in the distributed data storage. The metadata includes information on any personally identifiable information, PII, elements detected in each file. The method includes, in response to receiving a request relating to a user, searching the metadata for each PII element relating to the user and generating a list including each searched PII element.
The method provides immediate access to Data Subject Access Request, DSAR, relevant files to package and send all of the DSAR relevant files to a Data security officer. The method reduces a load on a primary storage and network traffics by avoiding a process of copying a large amount of data from production servers to a target node. The method reduces security risks by restricting credentials of all organization servers to the data security officer. The method reduces security risks by restricting access to a production storage and reducing human errors. The method fulfills service legal agreements and eliminates penalties by providing faster results.
Optionally, for each searched PII element, the generated list includes one or more of a data source, a file path, a file offset, a PII type, and a value.
Optionally, the reporting module is further configured to generate a soft link to each file containing at least one of the PII elements included in the generated list.
Optionally, the reporting module is configured to generate the soft link by mounting the data source containing each file to be linked.
Optionally, the metadata further includes a PII relation between at least one pair of PII elements.
Optionally, the collector modules are configured to receive the metadata continuously or periodically at a predetermined time interval.
Optionally, the predetermined time interval is determined for each file based on an access frequency of the file.
According to a fourth aspect, there is provided computer-readable medium that includes instructions which, when executed by a processor, cause the processor to perform the above method.
Therefore, in contradistinction to the existing solutions, the device management device provides immediate access to the Data Subject Access Request, DS AR, relevant files to package and send all of the DSAR relevant files to a Data security officer. The data management device reduces a load on primary storage and network traffics by avoiding a process of copying a large amount of data from production servers to a target node. The device management device reduces security risks by restricting credentials of all organization servers to the data security officer. The device management device reduces security risks by restricting access to a production storage and reducing human errors. The data management device fulfills service legal agreements and eliminates penalties by providing faster results. These and other aspects of the disclosure will be apparent from and the implementation(s) described below.
BRIEF DESCRIPTION OF DRAWINGS
Implementations of the disclosure will now be described, by way of example only, with reference to the accompanying drawings, in which:
FIG. 1 illustrates an exemplary view of a distributed data storage of an organization for data management in accordance with the prior art;
FIG. 2 is a block diagram of a data management device for a distributed data storage in accordance with an implementation of the disclosure;
FIG. 3 is a block diagram of a distributed data storage in accordance with an implementation of the disclosure;
FIG. 4 is a flow diagram that illustrates a computer-implemented method of data management for a distributed data storage in accordance with an implementation of the disclosure; and
FIG. 5 is an illustration of an exemplary data management device, a distributed data storage, or a computer system in which the various architectures and functionalities of the various previous implementations may be implemented.
DETAILED DESCRIPTION OF THE DRAWINGS
Implementations of the disclosure provide a distributed data storage including a data management device for a distributed data storage and a computer-implemented method of data management for the distributed data storage.
To make solutions of the disclosure more comprehensible for a person skilled in the art, the following implementations of the disclosure are described with reference to the accompanying drawings.
Terms such as "a first", "a second", "a third", and "a fourth" (if any) in the summary, claims, and foregoing accompanying drawings of the disclosure are used to distinguish between similar objects and are not necessarily used to describe a specific sequence or order. It should be understood that the terms so used are interchangeable under appropriate circumstances, so that the implementations of the disclosure described herein are, for example, capable of being implemented in sequences other than the sequences illustrated or described herein. Furthermore, the terms "include" and "have" and any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, a method, a system, a product, or a device that includes a series of steps or units, is not necessarily limited to expressly listed steps or units but may include other steps or units that are not expressly listed or that are inherent to such process, method, product, or device.
FIG. 2 is a block diagram of a data management device 200 for a distributed data storage in accordance with an implementation of the disclosure. The data management device 200 includes one or more collector modules 202A-N and a reporting module 204. The one or more collector modules 202A-N are configured to receive metadata relating to each file in the distributed data storage. The metadata includes information on any personally identifiable information, PII, elements detected in each file. The reporting module 204 is configured, in response to receiving a request relating to a user, to: (i) search the metadata for each PII element relating to the user and (ii) generate a list including each searched PII element.
The data management device 200 provides immediate access to Data Subject Access Request, DSAR, relevant files to package, and send all of the DSAR relevant files to a Data security officer. The data management device 200 reduces a load on a primary storage and network traffics by avoiding a process of copying a large amount of data from production servers to a target node. The data management device 200 reduces security risks by restricting credentials of all organization servers to the data security officer. The data management device 200 reduces security risks by restricting access to a production storage and reducing human errors. The data management device 200 fulfills service legal agreements and eliminates penalties by providing faster results.
Optionally, the data management device 200 uses backup storage for the distributed data storage. The backups are centralized and include the latest personally identifiable information. The data management device 200 provides a service of unstructured data management. Optionally, the data management device 200 is a centralized system that has a single pane of glass of an entire enterprise storage. The data management device 200 may run for different types of queries and performs analytic and supply results on a customer storage enterprise. For example: find hot and cold data according to customer policy, DSAR, etc.
The data management device 200 may scan data from at least one primary storage or backup storage. The combination of the data management device 200 is part of a backup system and provides a complete solution for a data security officer. The data management device 200 provides a list of sources and files with all user personally identifiable information, PII. For example, user “John” personally identifiable information, PII, may be found on NASI file path /home/users/john/banking.txt backupID: “123” & NAS7 file path /tmp/newusers/list.txt backupID: “456”) and expose from the backup storage file systems of the backup storage that contains DSAR file results.
Optionally, for each searched PII element, the generated list includes one or more data sources, a file path, a file offset, a PII type, and a value.
Optionally, the reporting module 204 is further configured to generate a soft link to each file containing at least one of the PII elements included in the generated list. Optionally, the reporting module 204 is configured to generate the soft link by mounting the data source containing each file to be linked.
Optionally, the metadata further includes a PII relation between at least one pair of PII elements.
The data management device 200 may support a user based on a PII collection and data modeling.
Optionally, the one or more collector modules 202A-N are configured to receive the metadata continuously or periodically at a predetermined time interval. The predetermined time interval may determine for each file based on an access frequency of the file.
The one or more collector modules 202A-N collect information on unstructured data from all types of sources in enterprise storage. The enterprise storage may include NAS, S3, Environment (VMs) storage. The collection of information may be made periodically or “live update”, using the one or more collector modules 202A-N that may run in-host or outside of the host. The one or more collector modules 202A-N collect native metadata and additional synthetic data. For example, supply I/O temperature of a file by calculating the number of reads per file in the last 24 hours or any other synthetic information.
The one or more collector modules 202A-N update the data management device 200 with file system metadata and detect personally identifiable information, PII, and the relation between the personally identifiable information, PII. The information may store in the data management device backend. For example, ElasticSearch.
Optionally, the one or more collector modules 202A-N are detectors that read file data and send sensitive data to the data management device 200 which is responsible for modeling private information (PI) and private identifiable information (PII) found in files and objects. The data management device 200 stores the private information, PI, and the private identifiable information, PII in a backend of the data management device 200.
FIG. 3 is a block diagram of a distributed data storage 300 in accordance with an implementation of the disclosure. The distributed data storage 300 includes a data management device 302. The data management device 302 includes one or more collector modules 304A-N and a reporting module 306. The one or more collector modules 304A- N are configured to receive metadata relating to each file in the distributed data storage 300. The metadata includes information on any personally identifiable information, PII, elements detected in each file. The reporting module 306 is configured, in response to receiving a request relating to a user, to: (i) search the metadata for each PII element relating to the user and (ii) generate a list including each searched PII element.
FIG. 4 is a flow diagram that illustrates a computer-implemented method of data management for a distributed data storage in accordance with an implementation of the disclosure. At a step 402, metadata relating to each file in the distributed data storage is received. At a step 404, in response to receiving a request relating to a user, the metadata is searched for each PII element relating to the user and a list including each searched PII element is generated.
Optionally, for each searched PII element, the generated list includes one or more of a data source, a file path, a file offset, a PII type and a value.
Optionally, the reporting module is further configured to generate a soft link to each file containing at least one of the PII elements included in the generated list. Optionally, the reporting module is configured to generate the soft link by mounting the data source containing each file to be linked.
Optionally, the metadata further includes a PII relation between at least one pair of PII elements.
Optionally, the collector modules are configured to receive the metadata continuously or periodically at a predetermined time interval.
Optionally, the predetermined time interval is determined for each file based on an access frequency of the file.
The method may provide soft links to a data security officer for relevant files and provide only the links by performing (i) scanning the enterprise storage from the backup storage, (ii) sending a Data Subject Access Request, DS AR, of some user to the data management device, (iii) searching in the backend for all the personally identifiable information, PII related to the user and responding with results that contain the list of sources, files, PII types, values & location in the file (offset), (iv) exposing from the backup the file systems that contain (a) home/john/tmp/bank.txt (backup of VM231-ID, backupID: “111”), (b) /home/john/mydocs/tmp.txt (backup of VM231-ID, backupID: “111”) and (c) /data/ToDo.txt (backup of VM562-ID, backupID: “777”), (v) exposing file systems associated with backupID “111” & “777” from the backup repository and mounts them on a target node, (vi) creating a directory with the DSAR requestID, (vii) creating directories with the backup IDs under DSAR requestID directory (For example: dsar- 123/backupID-l l l and dsar-123/backupID-777, (viii) creating soft links to all files that were found in the Data Subject Access Request, DSAR, and accessible on the target node.
The data security officer may obtain access only to the “dsar-123” directory and views all the user “john” files without any copy from production storage or special credentials for each server. Optionally, exposing the files from the backup storage and creating the soft link is only one embodiment of the solution to expose the Data Subject Access Request, DSAR, files. In the above example: the Data Subject Access Request, DSAR, files may be exposed as “flat” all files per backupID in the same storage. Optionally, for files with the same name from the same backupID need to place them in different storages and create the soft link. For example:
Figure imgf000013_0001
"sources": [{
"source": {
"id": <UUID4 value>
Figure imgf000013_0002
"objects": [{
"fullpath": <string value>,
"piis": [{
"id": <UUID4 value>,
"offset": <unsigned integer value>,
"value": <string value>
}]
}]
Figure imgf000013_0003
Figure imgf000013_0004
"sources": [{
"source": {
"id": "VM231-ID"
Figure imgf000013_0005
"objects": [{
"fullpath" : "/home/john/tmp/bank.txt".
"piis": [{
"id": "credit card type",
"offset": 37,
"value": "4580... 5782"
Figure imgf000013_0006
Figure imgf000013_0007
"id": "credit card type",
"offset": 231, "value": "4580... 6887"
Figure imgf000013_0008
"fullpath" : "/home/john/mydocs/tmp.txt", "piis":
[ {
"id": "SSN", "offset": 15, "value": "12345"
}]
Figure imgf000014_0001
"source": {
"id": "VM562-ID"
Figure imgf000014_0002
"objects": [{
"fullpath": "/data/ToDo.txt", "piis": [ {
"id": "IsraelHealthlD", "offset": 0, "value": "0998..221" }]
}]
Figure imgf000014_0003
Figure imgf000014_0004
Optionally, soft links are created for at least one of bank.txt, tmp.txt, or ToDo.txt. For example: asaf@developer:/tmp/dsar-123$ tree backupid-111 B bank.txt -> /home/asaf/tmp/backupid-111/home/john/tmp/bank.txt tmp.txt -> /home/asaf/tmp/backupid-111/home/john/mydocs/tmp.txt backupid-777
ToDo.txt -> /home/asaf/tmp/backupid-777/data/ToDo.txt
2 directories, 3 files asaf@developer:/tmp/dsar - 123$ cat backupid - 111/bank.txt
I'm bank.txt asaf@developer:/tmp/dsar - 123$ 1
In an implementation, a computer-readable medium is configured to store instructions which, when executed by a processor, causes the processor to execute the above method.
FIG. 5 is an illustration of an exemplary data management device, a distributed data storage, or a computer system 500 in which the various architectures and functionalities of the various previous implementations may be implemented. As shown, the computer system 500 includes at least one processor 504 that is connected to a bus 502, wherein the computer system 500 may be implemented using any suitable protocol, such as PCI (Peripheral Component Interconnect), PCI-Express, AGP (Accelerated Graphics Port), Hyper Transport, or any other bus or point-to-point communication protocol (s). The computer system 500 also includes a memory 506.
Control logic (software) and data are stored in the memory 506 which may take a form of random-access memory (RAM). In the disclosure, a single semiconductor platform may refer to a sole unitary semiconductor-based integrated circuit or chip. It should be noted that the term single semiconductor platform may also refer to multi-chip modules with increased connectivity which simulate on-chip modules with increased connectivity which simulate on-chip operation, and make substantial improvements over utilizing a conventional central processing unit (CPU) and bus implementation. Of course, the various modules may also be situated separately or in various combinations of semiconductor platforms per the desires of the user.
The computer system 500 may also include a secondary storage 510. The secondary storage 510 includes, for example, a hard disk drive and a removable storage drive, representing a floppy disk drive, a magnetic tape drive, a compact disk drive, digital versatile disk (DVD) drive, recording device, universal serial bus (USB) flash memory. The removable storage drive at least one of reads from and writes to a removable storage unit in a well-known manner.
Computer programs, or computer control logic algorithms, may be stored in at least one of the memory 506 and the secondary storage 510. Such computer programs, when executed, enable the computer system 500 to perform various functions as described in the foregoing. The memory 506, the secondary storage 510, and any other storage are possible examples of computer-readable media.
In an implementation, the architectures and functionalities depicted in the various previous figures may be implemented in the context of the processor 504, a graphics processor coupled to a communication interface 512, an integrated circuit (not shown) that is capable of at least a portion of the capabilities of both the processor 504 and a graphics processor, a chipset (namely, a group of integrated circuits designed to work and sold as a unit for performing related functions, and so forth).
Furthermore, the architectures and functionalities depicted in the various previous- described figures may be implemented in a context of a general computer system, a circuit board system, a game console system dedicated for entertainment purposes, an application-specific system. For example, the computer system 500 may take the form of a desktop computer, a laptop computer, a server, a workstation, a game console, an embedded system.
Furthermore, the computer system 500 may take the form of various other devices including, but not limited to a personal digital assistant (PDA) device, a mobile phone device, a smart phone, a television, and so forth. Additionally, although not shown, the computer system 500 may be coupled to a network (for example, a telecommunications network, a local area network (LAN), a wireless network, a wide area network (WAN) such as the Internet, a peer-to-peer network, a cable network, or the like) for communication purposes through an I/O interface 508.
It should be understood that the arrangement of components illustrated in the figures described are exemplary and that other arrangement may be possible. It should also be understood that the various system components (and means) defined by the claims, described below, and illustrated in the various block diagrams represent components in some systems configured according to the subject matter disclosed herein. For example, one or more of these system components (and means) may be realized, in whole or in part, by at least some of the components illustrated in the arrangements illustrated in the described figures.
In addition, while at least one of these components are implemented at least partially as an electronic hardware component, and therefore constitutes a machine, the other components may be implemented in software that when included in an execution environment constitutes a machine, hardware, or a combination of software and hardware.
Although the disclosure and its advantages have been described in detail, it should be understood that various changes, substitutions, and alterations can be made herein without departing from the spirit and scope of the disclosure as defined by the appended claims.

Claims

1. A data management device (200, 302) for a distributed data storage (300), comprising: one or more collector modules (202A-N, 304A-N) configured to receive metadata relating to each file in the distributed data storage (300), wherein the metadata includes information on any personally identifiable information, PII, elements detected in each file; and a reporting module (204, 306) configured, in response to receiving a request relating to a user, to: search the metadata for each PII element relating to the user; and generate a list including each searched PII element.
2. The data management device (200, 302) of claim 1, wherein for each searched PII element, the generated list includes one or more of a data source, a file path, a file offset, a PII type and a value.
3. The data management device (200, 302) of claim 1 or claim 2, wherein the reporting module (204, 306) is further configured to generate a soft link to each file containing at least one of the PII elements included in the generated list.
4. The data management device (200, 302) of claim 3, wherein the reporting module (204, 306) is configured to generate the soft link by mounting the data source containing each file to be linked.
5. The data management device (200, 302) of any preceding claim, wherein the metadata further includes a PII relation between at least one pair of PII elements.
6. The data management device (200, 302) of any preceding claim, wherein the collector modules (204 A-N, 306A-N) are configured to receive the metadata continuously or periodically at a predetermined time interval.
7. The data management device (200, 302) of claim 6, wherein the predetermined time interval is determined for each file based on an access frequency of the file.
8. A distributed data storage (300) comprising the data management device (200, 302) of any preceding claim.
9. A computer-implemented method of data management for a distributed data storage (300), comprising: receiving metadata relating to each file in the distributed data storage (300), wherein the metadata includes information on any personally identifiable information, PII, elements detected in each file; and in response to receiving a request relating to a user: searching the metadata for each PII element relating to the user; and generating a list including each searched PII element.
10. The computer-implemented method of claim 9, wherein for each searched PII element, the generated list includes one or more of a data source, a file path, a file offset, a PII type and a value.
11. The computer-implemented method of claim 9 or claim 10, wherein the reporting module (204, 306) is further configured to generate a soft link to each file containing at least one of the PII elements included in the generated list.
12. The computer-implemented method of claim 11, wherein the reporting module (204, 306) is configured to generate the soft link by mounting the data source containing each file to be linked.
13. The computer-implemented method of any one of claims 9 to 12, wherein the metadata further includes a PII relation between at least one pair of PII elements.
14. The computer-implemented method of any one of claims 9 to 13, wherein the collector modules (202 A-N, 304A-N) are configured to receive the metadata continuously or periodically at a predetermined time interval.
15. The computer-implemented method of claim 14, wherein the predetermined time interval is determined for each file based on an access frequency of the file.
16. A computer-readable medium comprising instructions which, when executed by a processor, cause the processor to perform the method of any one of claims 9 to 15.
PCT/EP2022/055872 2022-03-08 2022-03-08 Data management device and method for data management for a distributed data storage WO2023169656A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/EP2022/055872 WO2023169656A1 (en) 2022-03-08 2022-03-08 Data management device and method for data management for a distributed data storage

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2022/055872 WO2023169656A1 (en) 2022-03-08 2022-03-08 Data management device and method for data management for a distributed data storage

Publications (1)

Publication Number Publication Date
WO2023169656A1 true WO2023169656A1 (en) 2023-09-14

Family

ID=80978793

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2022/055872 WO2023169656A1 (en) 2022-03-08 2022-03-08 Data management device and method for data management for a distributed data storage

Country Status (1)

Country Link
WO (1) WO2023169656A1 (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019028405A1 (en) * 2017-08-04 2019-02-07 OneTrust, LLC Data processing systems for the identification and deletion of personal data in computer systems
US20190286839A1 (en) * 2018-03-13 2019-09-19 Commvault Systems, Inc. Graphical representation of an information management system

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019028405A1 (en) * 2017-08-04 2019-02-07 OneTrust, LLC Data processing systems for the identification and deletion of personal data in computer systems
US20190286839A1 (en) * 2018-03-13 2019-09-19 Commvault Systems, Inc. Graphical representation of an information management system

Similar Documents

Publication Publication Date Title
US11500729B2 (en) System and method for preserving data using replication and blockchain notarization
AU2020264946B2 (en) Deduplication in a cloud-based data protection service
US10146639B1 (en) Recovery of virtual machines from a protection tier
US11909890B2 (en) Software release verification
WO2008070415A2 (en) Networked information collection apparatus and method
US10671709B2 (en) Data isolation in distributed hash chains
US10503900B2 (en) Identifying malware based on content item identifiers
US9177034B2 (en) Searchable data in an object storage system
US9563845B1 (en) Rule evaluation based on precomputed results
US20200272655A1 (en) Multi-Image Information Retrieval System
WO2023169656A1 (en) Data management device and method for data management for a distributed data storage
US8914899B2 (en) Directing users to preferred software services
US9898485B2 (en) Dynamic context-based data protection and distribution
Moreaux et al. Blockchain assisted near-duplicated content detection
US11023900B2 (en) Routing customer feedback and service request
US20200104046A1 (en) Opportunistic data content discovery scans of a data repository
US20200257736A1 (en) Hybrid Datacenter for Dynamic Delta Documentation
US11522914B1 (en) Peer-based policy definitions
Veloso Automated support tool for forensics investigation on hard disk images
US20240012721A1 (en) Device and method for multi-source recovery of items
US11675751B2 (en) Systems and methods for capturing data schema for databases during data insertion
US11455391B2 (en) Data leakage and misuse detection
EP3872668A1 (en) System and method of anonymous sending of data from a user device to a recipient device
US20200349195A1 (en) File Attribute for Source Inspection
Zhuang et al. StoreSim: Optimizing information leakage in multicloud storage services

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22713603

Country of ref document: EP

Kind code of ref document: A1