WO2023193908A1 - Data processing device and method of data processing - Google Patents

Data processing device and method of data processing Download PDF

Info

Publication number
WO2023193908A1
WO2023193908A1 PCT/EP2022/059214 EP2022059214W WO2023193908A1 WO 2023193908 A1 WO2023193908 A1 WO 2023193908A1 EP 2022059214 W EP2022059214 W EP 2022059214W WO 2023193908 A1 WO2023193908 A1 WO 2023193908A1
Authority
WO
WIPO (PCT)
Prior art keywords
pii
subject
identified
document
data processing
Prior art date
Application number
PCT/EP2022/059214
Other languages
French (fr)
Inventor
Assaf Natanzon
Original Assignee
Huawei Technologies Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co., Ltd. filed Critical Huawei Technologies Co., Ltd.
Priority to PCT/EP2022/059214 priority Critical patent/WO2023193908A1/en
Publication of WO2023193908A1 publication Critical patent/WO2023193908A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • G06F21/6254Protecting personal data, e.g. for financial or medical purposes by anonymising data, e.g. decorrelating personal data from the owner's identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/93Document management systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6227Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database where protection concerns the structure of data, e.g. records, types, queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2209/00Additional information or applications relating to cryptographic mechanisms or cryptographic arrangements for secret or secure communication H04L9/00
    • H04L2209/42Anonymization, e.g. involving pseudonyms

Definitions

  • the present disclosure relates generally to the field of compliance and data management systems, and more specifically, to a data processing device and a computer-implemented method of data processing.
  • Metadata is stored by an organization to save the data related to multiple subjects.
  • the metadata allows retrieval of information about subjects, which can be used to answer regulatory queries. For example, a request from an individual subject may provide access to all personal information of the individual subject that is stored by the organization and may also provide access for forcing erasure of all personal information about the individual subject by the organization.
  • the data is constantly flowing into the different storage system, so information needs to be constantly indexed. This means that establishing a correlation between different information in different storage systems related to the same subject is a challenging task.
  • the present disclosure provides a data processing device and a computer-implemented method of data processing.
  • the present disclosure provides a solution to the existing problem.
  • the present disclosure provides a solution to the existing problem of how to improve efficiently and accuracy in identifying the personally identifiable information related to a specific person in one or more documents while obfuscating data that is not related to the specific person.
  • An aim of the present disclosure is to provide a solution that overcomes at least partially the problems encountered in the prior art and provide an improved data processing device and an improved computer-implemented method of data processing such as by providing data subject access request (DSAR), personally identifiable information (PII) aware data obfuscation.
  • DSAR data subject access request
  • PII personally identifiable information
  • the present disclosure provides a data processing device that includes an input unit configured to receive a request for a document.
  • the request specifies at least one personally identifiable information (PII) element associated with a first subject.
  • the data processing device further includes an identification unit configured to identify one or more PII elements in the requested document.
  • the data processing device further includes a lookup unit configured to search each identified PII element in a database to link identified PII elements associated with the same subject and a redaction unit configured to edit the document to obfuscate one or more identified PII elements not linked with the specified PII element.
  • the data processing device further includes an output unit configured to output the edited document.
  • the data processing device efficiently and accurately identifies the relevant information related to the first subject due to the identification of one or more PII elements related to the first subject. Moreover, the data processing device identifies the one or more PII elements related to the first subject not only from a single document but from the multiple documents. Additionally, the data processing device obfuscates the PII elements that are not linked with the first subject due to which the data processing device ensures data privacy and further enables the data processing device to comply with the data privacy regulations and compliance.
  • the specified PII element is associated with a first subject.
  • the redaction unit is further configured to determine each identified PII element not linked with the specified PII element whether the first subject is authorized to view the identified PII element and obfuscate any identified PII element linked with the specified PII element which the subject associated with the specified PII element is not authorized to view.
  • the request for a document includes a request for a data subject access report (DSAR) for a subject associated with the specified PII element.
  • the data processing device further includes a search unit configured to search for one or more documents based on the subject and provide each document to the identification unit.
  • DSAR data subject access report
  • the search unit enables the data processing device to search one or more documents to include all the PII elements related to the first subject.
  • the lookup unit is configured to search a relation graph stored in the database. Each node of the graph represents an identified PII element, and each edge of the graph represents a link between pairs of identified PII elements.
  • the lookup unit enables the data processing device to discover the link between one or more PII elements.
  • the relations graph database allows the data processing device to understand which PII element belongs to the first subject.
  • each node of the relation graph further includes an accuracy score of each identified PII element based on an accuracy of the identification and a uniqueness score of each identified PII element based on a uniqueness of a type of the identified PII element.
  • Each edge further includes a relation accuracy score of each link based on an accuracy of the link.
  • the relation graph enables the data processing device to identify the PII element based on the accuracy score for each node in order to efficiently and accurately link the one or more PII elements related to the first subject with more accuracy and reliability.
  • the lookup unit is configured to traverse the graph starting from the specified PII element and generate a list including each traversed PII element, wherein the traversal is limited by a weighting factor based on the assigned scores.
  • the present disclosure provides a computer-implemented method of data processing that includes receiving, by an input unit, a request for a document, the request specifying at least one Personally Identifiable Information (PII) element associated with a first subject.
  • the computer-implemented method further includes identifying, by an identification unit, one or more PII elements in the requested document and searching, by a lookup unit, each identified PII element in a database to link identified PII elements associated with the same subject.
  • the method further includes editing, by a redaction unit, the document to obfuscate one or more PII elements not linked with the specified PII element, and outputting, by an output unit, the edited document.
  • the present disclosure provides a computer-implemented method of data processing that includes receiving, by an input unit, a request for a document, the request specifying at least one Personally Identifiable Information (PII) element associated with a first subject.
  • the computer-implemented method further includes identifying, by an identification unit, one or more PII elements in the requested document and searching, by a lookup unit, each identified PII element in a database to link identified PII elements associated with the same subject.
  • the method further includes editing, by a redaction unit, the document to obfuscate one or more PII elements not linked with the specified PII element, and outputting, by an output unit, the edited document.
  • the computer-implemented method achieves all the advantages and technical effects of the data processing device.
  • the present disclosure provides a computer-readable medium comprising instructions which, when executed by a processor, cause the processor to perform the method.
  • the processor e.g., processor of a device or a system
  • FIG. 1 is a block diagram that illustrates various exemplary components of a data processing device, in accordance with an embodiment of the present disclosure.
  • FIG. 2 is a flowchart of a computer-implemented method of data processing, in accordance with an embodiment of the present disclosure.
  • an underlined number is employed to represent an item over which the underlined number is positioned or an item to which the underlined number is adjacent.
  • a non-underlined number relates to an item identified by a line linking the nonunderlined number to the item.
  • the non-underlined number is used to identify a general item at which the arrow is pointing.
  • FIG. 1 is a block diagram that illustrates various exemplary components of a data processing device, in accordance with an embodiment of the present disclosure.
  • a block diagram 100 of a data processing device 102 that includes an input unit 104, an identification unit 106, a lookup unit 108, a redaction unit 110, an output unit 112, a search unit 114, a memory 116, and a processor 118.
  • the data processing device 102 may include suitable logic, circuitry, interfaces, or code that is configured to identify the personally identifiable information (PII) elements related to the first subject.
  • the PII elements are present in one or more documents received from the first subject and the data is obfuscated to view the information that belongs to the first subject.
  • the data processing device 102 is further configured to obfuscate one or more identified PII elements that are not linked with the specified PII element.
  • the first subject may be a potential customer of an organization.
  • the first subject may be either a user of a product, or a visitor of a website, or a customer of a company, or an employee of an organization without limiting the scope of the disclosure.
  • the input unit 104 may include suitable logic, circuitry, interfaces, or code that is configured to receive a request for a document.
  • Examples of the input unit 104 may include, but are not limited to, a data terminal, a receiver, a receiving unit, a transceiver, a facsimile machine, a virtual server, and the like.
  • the identification unit 106 may include suitable logic, circuitry, interfaces, or code that is configured to identify one or more personally identifiable information (PII) elements in the requested document.
  • PII personally identifiable information
  • the lookup unit 108 may include suitable logic, circuitry, interfaces, or code that is configured to search each identified PII element in a database to link identified PII elements associated with the same subject.
  • the redaction unit 110 may include suitable logic, circuitry, interfaces, or code that is configured to edit the document to obfuscate one or more identified PII elements that are not linked with the specified PII element.
  • the output unit 112 may include suitable logic, circuitry, interfaces, or code that is configured to output the edited document.
  • the search unit 114 may include suitable logic, circuitry, interfaces, or code that is configured to search for one or more documents based on the subject.
  • the search unit 114 is configured to provide each document to the identification unit 106.
  • the memory 116 may include suitable logic, circuitry, interfaces, or code that is configured to store data and instructions executable by the processor 118. Examples of implementation of the memory 116 may include, but are not limited to, an Electrically Erasable Programmable Read- Only Memory (EEPROM), Random Access Memory (RAM), Read-Only Memory (ROM), Hard Disk Drive (HDD), Flash memory, Solid-State Drive (SSD), or CPU cache memory.
  • EEPROM Electrically Erasable Programmable Read- Only Memory
  • RAM Random Access Memory
  • ROM Read-Only Memory
  • HDD Hard Disk Drive
  • Flash memory Solid-State Drive
  • SSD Solid-State Drive
  • CPU cache memory any suitable logic, circuitry, interfaces, or code that is configured to store data and instructions executable by the processor 118. Examples of implementation of the memory 116 may include, but are not limited to, an Electrically Erasable Programmable Read- Only Memory (EEPROM), Random Access Memory (RAM), Read-Only Memory (ROM), Hard Disk Drive (HDD), Flash memory
  • the processor 118 may include suitable logic, circuitry, interfaces, or code that is configured to execute the instructions stored in the memory 116.
  • the processor 118 may be a general-purpose processor.
  • Other examples of the processor 118 may include, but are not limited to a control unit, a central processing unit (CPU), a digital signal processor (DSP), a microprocessor, a microcontroller, a complex instruction set computing (CISC) processor, an application-specific integrated circuit (ASIC) processor, a reduced instruction set (RISC) processor, a very long instruction word (VLIW) processor, a central processing unit (CPU), a state machine, a data processing unit, a graphics processing unit (GPU), and other processors or control circuitry.
  • the processor 118 may refer to one or more individual processors, processing devices, a processing unit that is part of a machine, such as the data processing device 102.
  • the data processing device 102 includes the input unit 104 configured to receive a request for a document.
  • the request specifies at least one personally identifiable information (PII) element associated with the first subject.
  • the input unit 104 is configured to receive a request for the document related to the first subject.
  • the received request includes at least one personally identifiable information about the first subject.
  • the one or more PII elements are provided by the first subject to an organization.
  • the request for the document includes a request for a data subject access report (DSAR) for a subject associated with the specified PII element.
  • the data processing device 102 further includes the search unit 114 configured to search for one or more documents based on the first subject.
  • the search unit 114 is configured to provide each document to the identification unit 106.
  • the request for the document that includes the request for the data subject access report (DSAR) for the first subject related to the specified PII element or more than one PII element is sufficient to identify the first subject.
  • the request corresponds to identifying all the relevant information about the first subject, no matter which regulation this request is used to fulfil, for example, a data subject access request (DSAR).
  • the search may include searching from a relation graph that includes the PIIs of every document added to the database and related to the first subject. And, the relation graph is built offline when every document is added to the database.
  • the search unit 114 is configured to search the one or more documents about the first subject. Furthermore, the searched one or more documents are provided to the identification unit 106 by the search unit 114 to identify one or more PII elements associated with the first subject.
  • the data processing device 102 includes the input unit 104 that receives the request for the document, and the search unit 114 searches the one or more documents related to the first subject and provides the documents to the identification unit 106. Thus, enables the data processing device 102 to identify all the relevant PII elements related to the first subject in multiple documents.
  • the data processing device 102 further includes the identification unit 106 configured to identify one or more PII elements in the requested document.
  • the PII elements may include the name of the first subject, the social security number (SSN), the address of the first subject, the phone number of the first subject, credit card number of the first subject, and the like.
  • SSN social security number
  • the identification of one or more PII elements in the requested document enables the data processing device 102 to identify all the personal information related to the first subject.
  • the data processing device 102 further includes the lookup unit 108 configured to search each identified PII element in a database to link identified PII elements associated with the same subject.
  • the lookup unit 108 is configured to search each identified PII element in the database to link identified PII elements associated with the first subject, such as the name, SSN, phone number, and credit card number of the first subject.
  • the lookup unit 108 may be configured to group together the subject information (i.e., the PII elements related to the first subject) such as the name, SSN, phone number, and credit card found in the database to get all the PII elements associated with the first subject.
  • the lookup unit 108 is configured to search a relation graph stored in the database.
  • Each node of the graph represents an identified PII element, and each edge of the graph represents a link between pairs of identified PII elements.
  • the lookup unit 108 is configured to search the relation graph (or a weighted graph) by searching each identified PII element through the nodes of the graph and searching each identified link between the pairs of the PII elements through the edges of the graph.
  • the lookup unit 108 may be configured to search the various PII elements, such as name of the first subject, SSN of the first subject, phone number of the first subject, credit card number of the first subject as the node of the graph.
  • the lookup unit 108 may be further configured to search the link between each pair of PII element as the edge of the graph.
  • the lookup unit 108 enables the data processing device 102 to discover the link between one or more PII elements.
  • the relations graph database allows the data processing device to understand which PII element belongs to the first subject.
  • each node of the relation graph further includes an accuracy score of each identified PII element based on an accuracy of the identification.
  • a uniqueness score of each identified PII element based on a uniqueness of a type of the identified PII element.
  • Each edge further includes a relation accuracy score of each link based on the accuracy of the link.
  • the accuracy score describes the accuracy of the identification performed by the identification unit 106.
  • the accuracy score lies in a range of 0 to 1.
  • the uniqueness score describes how unique is the PII element.
  • the value of the uniqueness score lies between 0 to 1.
  • the unique PII element may be defined as a PII element that is unique by law with uniqueness equal to 1, such as social security number (SSN) or passport number (PPN).
  • the other PII elements such as the home address, the phone number, the credit card number, and the like, are assigned a value lesser than 1, whereas the higher the value, the more unique is the PII element.
  • the relation accuracy score (may also be named as PII relation accuracy score) describes the accuracy of the link identified by the lookup unit 108. The value of the relation accuracy score lies between 0 to 1.
  • the relations graph database allows the data processing device 102 to understand which PII entities belong to a specific individual, for example, the passport number, the credit card number, the name, and the like.
  • the relation graph enables the data processing device 102 to identify the one or more PII elements based on the accuracy score for each node in order to efficiently and accurately link the one or more PII elements related to the first subject with more accuracy and reliability.
  • the lookup unit 108 is configured to traverse the graph starting from the specified PII element and generate a list including each traversed PII element. The traversal is limited by a weighting factor based on the assigned scores.
  • the input unit 104 is configured to receive the request which includes either one PII element or more than one PII element that is sufficient to identify the first subject. After receiving the request by the input unit 104, the lookup unit 108 is configured to search the relation graph stored in the database to gather the relevant information about the first subject by use of the graph traversal. The graph traversal starts from the PII element or the group of PII elements specified by the request.
  • the lookup unit 108 is further configured to generate the list including each traversed PII element.
  • the graph traversal is limited by the weighting factor which is computed based on the accuracy and uniqueness score assigned to each node and the relation accuracy score assigned to each edge. Alternatively stated, by use of the weighting factor, the graph traversal is limited only to those PII elements that are closely related to one PII element or more than one PII element specified in the received request.
  • the weighting factor is calculated for each node by multiplying the accuracy score of the node with a path weight, where the path weight is the product of the path weight of the preceding node, the uniqueness score of the preceding node, and the accuracy score of the relation between the two nodes.
  • the weighting factor is a diminishing product because all the scores, that is the accuracy score, the uniqueness score, and the relation accuracy score lie between 0 to 1. Thus, this is advantageous in terms of knowing which PII element is traversed in the document.
  • the data processing device 102 further includes the redaction unit 110 configured to edit the document to obfuscate one or more identified PII elements not linked with the specified PII element.
  • the data processing device 102 finds the one or more PII elements that are associated with the first subject. However, some identified PII elements are not linked with the specified PII element and therefore, it is important to obfuscate that identified PII element.
  • the application programming interface allows the first subject to hide all the personally identifiable information, except personally identifiable information that the first subject is allowed to see.
  • the specified PII element is associated with a first subject.
  • the redaction unit 110 is configured to determine, for each identified PII element not linked with the specified PII element, whether the first subject is authorized to view the identified PII element.
  • the redaction unit 110 is configured to obfuscate any identified PII element linked with the specified PII element which the subj ect associated with the specified PII element is not authorized to view.
  • the one or more PII elements include name, social security number (SSN), address, and another name.
  • SSN social security number
  • Assaf, Benny, Gil, Mirit, and Zelda are considered as different data subjects (i.e., the subjects other than the first subject).
  • the exemplary scenario in the table mentioned above may be an example of PII elements saved by an organization with all Assaf, Benny, Gil, Mirit, and Zelda, as different data subjects (or users). Further, if the Assaf requests his data specified access report, the PII elements of all other data subjects are obfuscated.
  • Gil is less than eighteen years due to which Assaf his father can view his personally identifiable information. Therefore, the data obfuscation allows Assaf to view the following information: -
  • the data processing device 102 further includes the output unit 112 configured to output the edited document. After the edition of the document by the redaction unit 110, the output unit 112 provides the edited document to the first subject to view.
  • the final edited document includes the specified one or more than one PII element associated with the first subject. Additionally, the edited document further includes data obfuscation of the sensitive data that is not linked with the first subject.
  • the data processing device 102 efficiently and accurately identifies the relevant information related to the first subject due to the identification of one or more PII elements related to the first subject. Moreover, the data processing device 102 identifies the one or more PII elements related to the first subject not only from a single document but from the multiple requested documents. Additionally, the data processing device 102 obfuscates the PII elements that are not linked with the first subject due to which the data processing device 102 ensures data privacy and further enables the data processing device 102 to comply with the data privacy regulations and compliance.
  • FIG. 2 is a flowchart of a computer-implemented method of data processing, in accordance with an embodiment of the present disclosure.
  • FIG. 2 is described in conjunction with elements from FIG. 1.
  • a computer-implemented method 200 of data processing there is shown a computer-implemented method 200 of data processing.
  • the computer-implemented method 200 includes steps 202 to 210.
  • the computer- implemented method 200 is executed by the data processing device 102 (of FIG. 1).
  • the computer-implemented method 200 of data processing is used to provide all the relevant sensitive information related to the first subject and obfuscate the sensitive information that is not related to the first subject.
  • the computer-implemented method 200 comprises receiving, by an input unit (e.g., the input unit 104 of FIG. 1), a request for a document.
  • the received request includes at least one personally identifiable information about the first subject.
  • the one or more PII elements are provided by the first subject to an organization.
  • the computer-implemented method 200 further comprises identifying, by an identification unit (e.g., the identification unit 106), one or more personally identifiable information, PII, elements in the requested document.
  • an identification unit e.g., the identification unit 106
  • the one or more PII elements are identified by the identification unit 106.
  • the identified one or more PII elements may include name, identity, address, phone number, and the like.
  • the identification unit 106 identifies all the PIIs that includes the PIIs not related to the first subject.
  • the data processing device 102 obfuscates the PIIs that are not related to the first subject.
  • the identification of one or more PII elements in the requested document enables the data processing device 102 to identify all the PIIs related to the first subject as well as the PIIs not related to the first subject.
  • the computer-implemented method 200 further comprises searching, by a lookup unit (e.g., the lookup unit 108), each identified PII element in a database to link identified PII elements associated with the same subject.
  • the lookup unit 108 may be configured to group together the subject information (i.e., the PII elements related to the first subject) such as the name, SSN, phone number, and credit card found in the database to get all the PII elements associated with the first subject.
  • the computer-implemented method 200 further comprises editing, by a redaction unit (e.g., the redaction unit 110).
  • the computer-implemented method 200 finds the one or more PII elements that are associated with the first subject. However, some identified PII elements are not linked with the specified PII element, and therefore, it is important to obfuscate that identified PII element.
  • the application programming interface allows the first subject to hide all the personally identifiable information, except personally identifiable information that the first subject is allowed to see.
  • the computer-implemented method 200 further comprises outputting, by an output unit (e.g., the output unit 112).
  • the output unit 112 After the edition of the document by the redaction unit 110, the output unit 112 provides the edited document to the first subject to view.
  • the final edited document includes the specified one or more than one PII element associated with the first subject. Additionally, the edited document further includes data obfuscation of the sensitive data that is not linked with the first subject.
  • the request for a document includes a request for a data subject access report (DSAR) for a subject associated with the specified PII element.
  • the computer-implemented method 200 further includes searching, by a search unit, for one or more documents based on the subject and providing each document to the identification unit 106.
  • the request for the document that includes the request for the data subject access report (DSAR) for the subject related to the specified PII element or more than one PII element is sufficient to identify the first subject.
  • the request corresponds to identifying all the relevant information about the first subject, no matter which regulation this request is used to fulfil, for example, a data subject access request (DSAR).
  • the search may include searching from a relation graph that includes the PIIs of every document added to the database and the relation graph is built offline when every document is added to the database.
  • the search unit 114 is configured to search for the one or more documents about the first subject. Furthermore, the searched one or more documents are provided to the identification unit 106 by the search unit 114 to search and identify one or more PII elements associated with the first subject.
  • the data processing device 102 includes the input unit 104 that receives the request for the document, and the search unit 114 searches the one or more documents related to the first subject and provides the documents to the identification unit 106. Thus, enables the data processing device 102 to identify all the relevant PII elements related to the first subject in multiple documents.
  • searching each PII element includes searching a relation graph stored in the database.
  • Each node of the graph represents an identified PII element, and each edge of the graph represents a link between pairs of identified PII elements.
  • the computer- implemented method 200 is configured to search the relation graph (or a weighted graph) by searching each identified PII element through the nodes of the graph and searching each identified link between the pairs of the PII elements through the edges of the graph.
  • the computer-implemented method 200 enables the data processing device 102 to discover the link between one or more PII elements.
  • the relations graph database allows the data processing device to understand which PII element belongs to the first subject.
  • each node of the relation graph further includes an accuracy score of each identified PII element based on an accuracy of the identification and a uniqueness score of each identified PII element based on a uniqueness of a type of the identified PII element.
  • Each edge further includes a relation accuracy score of each link based on the accuracy of the link.
  • the accuracy score describes the accuracy of the identification performed by the computer-implemented method 200.
  • the accuracy score lies in a range of 0 to 1.
  • the uniqueness score describes how unique is the PII element. The value of the uniqueness score lies between 0 to 1.
  • the unique PII element may be defined as a PII element that is unique by law with uniqueness equal to 1, such as social security number (SSN) or passport number (PPN).
  • SSN social security number
  • PPN passport number
  • the other PII elements such as the home address, the phone number, the credit card number, and the like, are assigned a value lesser than 1, whereas the higher the value, the more unique is the PII element.
  • the relation accuracy score (may also be named as PII relation accuracy score) describes the accuracy of the link identified by the lookup unit 108. The value of the relation accuracy score lies between 0 to 1. Thus, enables the data processing device 102 to discover relations between PII elements.
  • the relations graph database allows the data processing device 102 to understand which PII entities belong to a specific individual, for example, the passport number, the credit card number, the name, and the like. Beneficially, the relation graph enables the data processing device 102 to identify the one or more PII elements based on the accuracy score for each node in order to efficiently and accurately link the one or more PII elements related to the first subject with more accuracy and reliability.
  • editing the document includes traversing the graph starting from the specified PII element and generating a list including each traversed PII element.
  • the traversal is limited by a weighting factor based on the assigned scores.
  • the input unit 104 is configured to receive the request, which includes either one PII element or more than one PII element that is sufficient to identify the first subject.
  • the lookup unit 108 is configured to search the relation graph stored in the database to gather the relevant information about the first subject by use of the graph traversal.
  • the graph traversal starts from the PII element or the group of PII elements specified by the request.
  • the lookup unit 108 is further configured to generate the list, including each traversed PII element.
  • the graph traversal is limited by the weighting factor, which is computed based on the accuracy and uniqueness score assigned to each node and the relation accuracy score assigned to each edge. Alternatively stated, by use of the weighting factor, the graph traversal is limited only to those PII elements that are closely related to one PII element or more than one PII element specified in the received request.
  • the weighting factor is calculated for each node by multiplying the accuracy score of the node with a path weight, where the path weight is the product of the path weight of the preceding node, the uniqueness score of the preceding node, and the accuracy score of the relation between the two nodes.
  • the weighting factor is a diminishing product because all the scores, that is the accuracy score, the uniqueness score, and the relation accuracy score lie between 0 to 1. Thus, this is advantageous in terms of knowing which PII element is traversed in the document.
  • editing the document further includes determining, for each PII element not linked with the specified PII element, whether a subject associated with the specified PII element is authorized to view the PII element.
  • the editing of the document includes obfuscating any identified PII element not linked with the specified PII element, which the subject associated with the specified PII element is not authorized to view.
  • the one or more PII elements include name, social security number (SSN), address, and another name.
  • the computer-implemented method 200 efficiently and accurately identifies the relevant information related to the first subject due to the identification of one or more PII elements related to the first subject. Moreover, the computer-implemented method 200 identifies the one or more PII elements related to the first subject not only from a single document but from the multiple requested documents. Additionally, the computer-implemented method 200 obfuscates the PII elements that are not linked with the first subject, due to which the computer- implemented method 200 ensures data privacy and further enables the computer-implemented method 200 to comply with the data privacy regulations and compliance.
  • steps 202 to 210 are only illustrative, and other alternatives can also be provided where one or more steps are added, one or more steps are removed, or one or more steps are provided in a different sequence without departing from the scope of the claims herein.
  • a computer-readable medium comprising instructions which, when executed by a processor (e.g., the processor 118 of the data processing device 102), cause the processor to perform the computer-implemented method 200.
  • the instructions may be implemented on the computer-readable media, which include, but is not limited to, Electrically Erasable Programmable Read-Only Memory (EEPROM), Random Access Memory (RAM), Read-Only Memory (ROM), Hard Disk Drive (HDD), Flash memory, a Secure Digital (SD) card, Solid-State Drive (SSD), a computer-readable storage medium, and/or CPU cache memory.
  • EEPROM Electrically Erasable Programmable Read-Only Memory
  • RAM Random Access Memory
  • ROM Read-Only Memory
  • HDD Hard Disk Drive
  • Flash memory Flash memory
  • SD Secure Digital
  • SSD Solid-State Drive
  • the instructions are generated by a computer program, which is implemented in view of the computer-implemented method 200, and for use in implementing the computer-implemented method 200 on one or more processors, such as the processor 118 of the data processing device 102.

Abstract

A data processing device includes an input unit configured to receive a request for a document. The request specifies at least one PII element associated with a first subject. Further, the data processing device includes an identification unit configured to identify one or more PII elements in the requested document and a lookup unit configured to search each identified PII element in a database to link identified PII elements associated with the same subject. Further, the data processing device includes a redaction unit configured to edit the document to obfuscate one or more identified PII elements not linked with the specified PII element and an output unit configured to output the edited document. The data processing device ensures data privacy by obfuscating the data that is not linked with the first subject and further enables the data processing device to comply with the data privacy regulations and compliance.

Description

DATA PROCESSING DEVICE AND METHOD OF DATA PROCESSING
TECHNICAL FIELD
The present disclosure relates generally to the field of compliance and data management systems, and more specifically, to a data processing device and a computer-implemented method of data processing.
BACKGROUND
Generally, different organizations need to maintain data related to, for example, multiple subjects, customers, and potential customers. Moreover, data may be distributed among different storage systems and storage tiers, due to which extraction of information about a specific subject is complicated, time-consuming, and sometimes requires manual effort. Conventionally, metadata is stored by an organization to save the data related to multiple subjects. The metadata allows retrieval of information about subjects, which can be used to answer regulatory queries. For example, a request from an individual subject may provide access to all personal information of the individual subject that is stored by the organization and may also provide access for forcing erasure of all personal information about the individual subject by the organization. However, in such case, the data is constantly flowing into the different storage system, so information needs to be constantly indexed. This means that establishing a correlation between different information in different storage systems related to the same subject is a challenging task.
In certain scenarios, during the registration of a service by the subject, several documents are sent from the subj ect to the organization, which need to be linked to the same subj ect. Moreover, such documents do not reside in a same file or even in a same storage system and are updated with subject information as time passes. For example, contact information of the subject is stored in a customer database, payment details are stored in a finance database, and the like. However, in such scenarios, the data is not always structured, and even in structured data, correlation between data items is not a trivial task. For example, car insurance claim which has the details of both drivers, driver license details given in the form of a copy of the driving license for both drivers. Thus, the insurance claim needs to be processed in such a way that gathers driver license information from images of the driving licenses and aggregates personally identifiable information (PII) of each driver without correlating between one driver and another.
Currently, certain methods have been proposed in order to identify personally identifiable information (PII) in multiple documents. For example, various searching tool may be used for searching and identifying personal identifiable information (PII) in a document. However, the already existing methods are based on automatic, data-focused solutions, or graphs (e.g., identity graphs) to be used in a data model but the existing methods are bound to a single document and generally fail to handle dispersed PII in multiple documents. Thus, there exists a technical problem of how to improve efficiently and accuracy in identifying the personally identifiable information related to a specific person in one or more documents while obfuscating data that is not related to the specific person.
Therefore, in light of the foregoing discussion, there exists a need to overcome the aforementioned drawbacks associated with the conventional methods of data processing related to the identification of the PII elements of a specified person in a typical data processing device.
SUMMARY
The present disclosure provides a data processing device and a computer-implemented method of data processing. The present disclosure provides a solution to the existing problem. The present disclosure provides a solution to the existing problem of how to improve efficiently and accuracy in identifying the personally identifiable information related to a specific person in one or more documents while obfuscating data that is not related to the specific person. An aim of the present disclosure is to provide a solution that overcomes at least partially the problems encountered in the prior art and provide an improved data processing device and an improved computer-implemented method of data processing such as by providing data subject access request (DSAR), personally identifiable information (PII) aware data obfuscation.
The object of the present disclosure is achieved by the solutions provided in the enclosed independent claims. Advantageous implementations of the present disclosure are further defined in the dependent claims. In one aspect, the present disclosure provides a data processing device that includes an input unit configured to receive a request for a document. The request specifies at least one personally identifiable information (PII) element associated with a first subject. The data processing device further includes an identification unit configured to identify one or more PII elements in the requested document. The data processing device further includes a lookup unit configured to search each identified PII element in a database to link identified PII elements associated with the same subject and a redaction unit configured to edit the document to obfuscate one or more identified PII elements not linked with the specified PII element. The data processing device further includes an output unit configured to output the edited document.
The data processing device efficiently and accurately identifies the relevant information related to the first subject due to the identification of one or more PII elements related to the first subject. Moreover, the data processing device identifies the one or more PII elements related to the first subject not only from a single document but from the multiple documents. Additionally, the data processing device obfuscates the PII elements that are not linked with the first subject due to which the data processing device ensures data privacy and further enables the data processing device to comply with the data privacy regulations and compliance.
In an implementation form, the specified PII element is associated with a first subject. The redaction unit is further configured to determine each identified PII element not linked with the specified PII element whether the first subject is authorized to view the identified PII element and obfuscate any identified PII element linked with the specified PII element which the subject associated with the specified PII element is not authorized to view.
By virtue of determining each identified PII element which is not linked with the specified PII element and obfuscating the identified PII element which the first subject is not authorized to view enables the data processing device to hide the sensitive data that does not belong to the first subject.
In a further implementation form, the request for a document includes a request for a data subject access report (DSAR) for a subject associated with the specified PII element. The data processing device further includes a search unit configured to search for one or more documents based on the subject and provide each document to the identification unit.
In this implementation, the search unit enables the data processing device to search one or more documents to include all the PII elements related to the first subject. In a further implementation form, the lookup unit is configured to search a relation graph stored in the database. Each node of the graph represents an identified PII element, and each edge of the graph represents a link between pairs of identified PII elements.
In this implementation, the lookup unit enables the data processing device to discover the link between one or more PII elements. The relations graph database allows the data processing device to understand which PII element belongs to the first subject.
In a further implementation form, each node of the relation graph further includes an accuracy score of each identified PII element based on an accuracy of the identification and a uniqueness score of each identified PII element based on a uniqueness of a type of the identified PII element. Each edge further includes a relation accuracy score of each link based on an accuracy of the link.
Beneficially, the relation graph enables the data processing device to identify the PII element based on the accuracy score for each node in order to efficiently and accurately link the one or more PII elements related to the first subject with more accuracy and reliability.
In a further implementation form, the lookup unit is configured to traverse the graph starting from the specified PII element and generate a list including each traversed PII element, wherein the traversal is limited by a weighting factor based on the assigned scores.
This is advantageous in terms of knowing which PII element is traversed in the document.
In another aspect, the present disclosure provides a computer-implemented method of data processing that includes receiving, by an input unit, a request for a document, the request specifying at least one Personally Identifiable Information (PII) element associated with a first subject. The computer-implemented method further includes identifying, by an identification unit, one or more PII elements in the requested document and searching, by a lookup unit, each identified PII element in a database to link identified PII elements associated with the same subject. The method further includes editing, by a redaction unit, the document to obfuscate one or more PII elements not linked with the specified PII element, and outputting, by an output unit, the edited document.
The method achieves all the advantages and technical effects of the data processing device of the present disclosure. In another aspect, the present disclosure provides a computer-implemented method of data processing that includes receiving, by an input unit, a request for a document, the request specifying at least one Personally Identifiable Information (PII) element associated with a first subject. The computer-implemented method further includes identifying, by an identification unit, one or more PII elements in the requested document and searching, by a lookup unit, each identified PII element in a database to link identified PII elements associated with the same subject. The method further includes editing, by a redaction unit, the document to obfuscate one or more PII elements not linked with the specified PII element, and outputting, by an output unit, the edited document.
The computer-implemented method achieves all the advantages and technical effects of the data processing device.
In a yet another aspect, the present disclosure provides a computer-readable medium comprising instructions which, when executed by a processor, cause the processor to perform the method.
The processor (e.g., processor of a device or a system) achieves all the advantages and effects of the method after execution of the method.
It is to be appreciated that all the aforementioned implementation forms can be combined.
It has to be noted that all devices, elements, circuitry, units and means described in the present application could be implemented in the software or hardware elements or any kind of combination thereof. All steps which are performed by the various entities described in the present application as well as the functionalities described to be performed by the various entities are intended to mean that the respective entity is adapted to or configured to perform the respective steps and functionalities. Even if, in the following description of specific embodiments, a specific functionality or step to be performed by external entities is not reflected in the description of a specific detailed element of that entity which performs that specific step or functionality, it should be clear for a skilled person that these methods and functionalities can be implemented in respective software or hardware elements, or any kind of combination thereof. It will be appreciated that features of the present disclosure are susceptible to being combined in various combinations without departing from the scope of the present disclosure as defined by the appended claims. Additional aspects, advantages, features and objects of the present disclosure would be made apparent from the drawings and the detailed description of the illustrative implementations construed in conjunction with the appended claims that follow.
BRIEF DESCRIPTION OF THE DRAWINGS
The summary above, as well as the following detailed description of illustrative embodiments, is better understood when read in conjunction with the appended drawings. For the purpose of illustrating the present disclosure, exemplary constructions of the disclosure are shown in the drawings. However, the present disclosure is not limited to specific methods and instrumentalities disclosed herein. Moreover, those in the art will understand that the drawings are not to scale. Wherever possible, like elements have been indicated by identical numbers.
Embodiments of the present disclosure will now be described, by way of example only, with reference to the following diagrams wherein:
FIG. 1 is a block diagram that illustrates various exemplary components of a data processing device, in accordance with an embodiment of the present disclosure; and
FIG. 2 is a flowchart of a computer-implemented method of data processing, in accordance with an embodiment of the present disclosure.
In the accompanying drawings, an underlined number is employed to represent an item over which the underlined number is positioned or an item to which the underlined number is adjacent. A non-underlined number relates to an item identified by a line linking the nonunderlined number to the item. When a number is non-underlined and accompanied by an associated arrow, the non-underlined number is used to identify a general item at which the arrow is pointing.
DETAILED DESCRIPTION OF EMBODIMENTS
The following detailed description illustrates embodiments of the present disclosure and ways in which they can be implemented. Although some modes of carrying out the present disclosure have been disclosed, those skilled in the art would recognize that other embodiments for carrying out or practicing the present disclosure are also possible. FIG. 1 is a block diagram that illustrates various exemplary components of a data processing device, in accordance with an embodiment of the present disclosure. With reference to FIG. 1, there is shown a block diagram 100 of a data processing device 102 that includes an input unit 104, an identification unit 106, a lookup unit 108, a redaction unit 110, an output unit 112, a search unit 114, a memory 116, and a processor 118.
The data processing device 102 may include suitable logic, circuitry, interfaces, or code that is configured to identify the personally identifiable information (PII) elements related to the first subject. The PII elements are present in one or more documents received from the first subject and the data is obfuscated to view the information that belongs to the first subject. The data processing device 102 is further configured to obfuscate one or more identified PII elements that are not linked with the specified PII element. In an implementation, the first subject may be a potential customer of an organization. In another implementation, the first subject may be either a user of a product, or a visitor of a website, or a customer of a company, or an employee of an organization without limiting the scope of the disclosure.
The input unit 104 may include suitable logic, circuitry, interfaces, or code that is configured to receive a request for a document. Examples of the input unit 104 may include, but are not limited to, a data terminal, a receiver, a receiving unit, a transceiver, a facsimile machine, a virtual server, and the like.
The identification unit 106 may include suitable logic, circuitry, interfaces, or code that is configured to identify one or more personally identifiable information (PII) elements in the requested document.
The lookup unit 108 may include suitable logic, circuitry, interfaces, or code that is configured to search each identified PII element in a database to link identified PII elements associated with the same subject.
The redaction unit 110 may include suitable logic, circuitry, interfaces, or code that is configured to edit the document to obfuscate one or more identified PII elements that are not linked with the specified PII element.
The output unit 112 may include suitable logic, circuitry, interfaces, or code that is configured to output the edited document. The search unit 114 may include suitable logic, circuitry, interfaces, or code that is configured to search for one or more documents based on the subject. The search unit 114 is configured to provide each document to the identification unit 106.
The memory 116 may include suitable logic, circuitry, interfaces, or code that is configured to store data and instructions executable by the processor 118. Examples of implementation of the memory 116 may include, but are not limited to, an Electrically Erasable Programmable Read- Only Memory (EEPROM), Random Access Memory (RAM), Read-Only Memory (ROM), Hard Disk Drive (HDD), Flash memory, Solid-State Drive (SSD), or CPU cache memory. The memory 116 may store an operating system or other program products (including one or more operation algorithms) to operate the data processing device 102.
The processor 118 may include suitable logic, circuitry, interfaces, or code that is configured to execute the instructions stored in the memory 116. In an example, the processor 118 may be a general-purpose processor. Other examples of the processor 118 may include, but are not limited to a control unit, a central processing unit (CPU), a digital signal processor (DSP), a microprocessor, a microcontroller, a complex instruction set computing (CISC) processor, an application-specific integrated circuit (ASIC) processor, a reduced instruction set (RISC) processor, a very long instruction word (VLIW) processor, a central processing unit (CPU), a state machine, a data processing unit, a graphics processing unit (GPU), and other processors or control circuitry. Moreover, the processor 118 may refer to one or more individual processors, processing devices, a processing unit that is part of a machine, such as the data processing device 102.
In operation, the data processing device 102 includes the input unit 104 configured to receive a request for a document. The request specifies at least one personally identifiable information (PII) element associated with the first subject. The input unit 104 is configured to receive a request for the document related to the first subject. The received request includes at least one personally identifiable information about the first subject. In an example, in a process of registration for a service being availed by the first subject, during the registration, the one or more PII elements are provided by the first subject to an organization.
In accordance with an embodiment, the request for the document includes a request for a data subject access report (DSAR) for a subject associated with the specified PII element. The data processing device 102 further includes the search unit 114 configured to search for one or more documents based on the first subject. The search unit 114 is configured to provide each document to the identification unit 106. The request for the document that includes the request for the data subject access report (DSAR) for the first subject related to the specified PII element or more than one PII element is sufficient to identify the first subject. The request corresponds to identifying all the relevant information about the first subject, no matter which regulation this request is used to fulfil, for example, a data subject access request (DSAR). In an implementation, the search may include searching from a relation graph that includes the PIIs of every document added to the database and related to the first subject. And, the relation graph is built offline when every document is added to the database. After receiving the request for the document by the input unit 104, the search unit 114 is configured to search the one or more documents about the first subject. Furthermore, the searched one or more documents are provided to the identification unit 106 by the search unit 114 to identify one or more PII elements associated with the first subject. Alternatively stated, the data processing device 102 includes the input unit 104 that receives the request for the document, and the search unit 114 searches the one or more documents related to the first subject and provides the documents to the identification unit 106. Thus, enables the data processing device 102 to identify all the relevant PII elements related to the first subject in multiple documents.
The data processing device 102 further includes the identification unit 106 configured to identify one or more PII elements in the requested document. In aforementioned example of the registration process, the PII elements may include the name of the first subject, the social security number (SSN), the address of the first subject, the phone number of the first subject, credit card number of the first subject, and the like. Beneficially, the identification of one or more PII elements in the requested document enables the data processing device 102 to identify all the personal information related to the first subject.
The data processing device 102 further includes the lookup unit 108 configured to search each identified PII element in a database to link identified PII elements associated with the same subject. In the example of the registration process, the lookup unit 108 is configured to search each identified PII element in the database to link identified PII elements associated with the first subject, such as the name, SSN, phone number, and credit card number of the first subject. The lookup unit 108 may be configured to group together the subject information (i.e., the PII elements related to the first subject) such as the name, SSN, phone number, and credit card found in the database to get all the PII elements associated with the first subject. In accordance with an embodiment, the lookup unit 108 is configured to search a relation graph stored in the database. Each node of the graph represents an identified PII element, and each edge of the graph represents a link between pairs of identified PII elements. The lookup unit 108 is configured to search the relation graph (or a weighted graph) by searching each identified PII element through the nodes of the graph and searching each identified link between the pairs of the PII elements through the edges of the graph. In the example of the registration process, the lookup unit 108 may be configured to search the various PII elements, such as name of the first subject, SSN of the first subject, phone number of the first subject, credit card number of the first subject as the node of the graph. Moreover, the lookup unit 108 may be further configured to search the link between each pair of PII element as the edge of the graph. Beneficially, the lookup unit 108 enables the data processing device 102 to discover the link between one or more PII elements. The relations graph database allows the data processing device to understand which PII element belongs to the first subject.
In accordance with an embodiment, each node of the relation graph further includes an accuracy score of each identified PII element based on an accuracy of the identification. A uniqueness score of each identified PII element based on a uniqueness of a type of the identified PII element. Each edge further includes a relation accuracy score of each link based on the accuracy of the link. The accuracy score describes the accuracy of the identification performed by the identification unit 106. The accuracy score lies in a range of 0 to 1. In addition, the uniqueness score describes how unique is the PII element. The value of the uniqueness score lies between 0 to 1. The unique PII element may be defined as a PII element that is unique by law with uniqueness equal to 1, such as social security number (SSN) or passport number (PPN). However, the other PII elements, such as the home address, the phone number, the credit card number, and the like, are assigned a value lesser than 1, whereas the higher the value, the more unique is the PII element. Furthermore, the relation accuracy score (may also be named as PII relation accuracy score) describes the accuracy of the link identified by the lookup unit 108. The value of the relation accuracy score lies between 0 to 1. Thus, enables the data processing device 102 to discover relations between PII elements. The relations graph database allows the data processing device 102 to understand which PII entities belong to a specific individual, for example, the passport number, the credit card number, the name, and the like. Beneficially, the relation graph enables the data processing device 102 to identify the one or more PII elements based on the accuracy score for each node in order to efficiently and accurately link the one or more PII elements related to the first subject with more accuracy and reliability.
In accordance with an embodiment, the lookup unit 108 is configured to traverse the graph starting from the specified PII element and generate a list including each traversed PII element. The traversal is limited by a weighting factor based on the assigned scores. The input unit 104 is configured to receive the request which includes either one PII element or more than one PII element that is sufficient to identify the first subject. After receiving the request by the input unit 104, the lookup unit 108 is configured to search the relation graph stored in the database to gather the relevant information about the first subject by use of the graph traversal. The graph traversal starts from the PII element or the group of PII elements specified by the request. In order to gather the relevant information about the first subject, the lookup unit 108 is further configured to generate the list including each traversed PII element. The graph traversal is limited by the weighting factor which is computed based on the accuracy and uniqueness score assigned to each node and the relation accuracy score assigned to each edge. Alternatively stated, by use of the weighting factor, the graph traversal is limited only to those PII elements that are closely related to one PII element or more than one PII element specified in the received request. The weighting factor is calculated for each node by multiplying the accuracy score of the node with a path weight, where the path weight is the product of the path weight of the preceding node, the uniqueness score of the preceding node, and the accuracy score of the relation between the two nodes. The weighting factor is a diminishing product because all the scores, that is the accuracy score, the uniqueness score, and the relation accuracy score lie between 0 to 1. Thus, this is advantageous in terms of knowing which PII element is traversed in the document.
The data processing device 102 further includes the redaction unit 110 configured to edit the document to obfuscate one or more identified PII elements not linked with the specified PII element. The data processing device 102 finds the one or more PII elements that are associated with the first subject. However, some identified PII elements are not linked with the specified PII element and therefore, it is important to obfuscate that identified PII element. The application programming interface (API) allows the first subject to hide all the personally identifiable information, except personally identifiable information that the first subject is allowed to see. In accordance with an embodiment, the specified PII element is associated with a first subject. The redaction unit 110 is configured to determine, for each identified PII element not linked with the specified PII element, whether the first subject is authorized to view the identified PII element. The redaction unit 110 is configured to obfuscate any identified PII element linked with the specified PII element which the subj ect associated with the specified PII element is not authorized to view. The one or more PII elements include name, social security number (SSN), address, and another name. For example, the table given below represents the one or more PII elements such as the address, the SSN, the passport ID, the Last name, and the name.
Figure imgf000013_0001
In the exemplary scenario, as shown in the above table, Assaf, Benny, Gil, Mirit, and Zelda are considered as different data subjects (i.e., the subjects other than the first subject). The exemplary scenario in the table mentioned above may be an example of PII elements saved by an organization with all Assaf, Benny, Gil, Mirit, and Zelda, as different data subjects (or users). Further, if the Assaf requests his data specified access report, the PII elements of all other data subjects are obfuscated. Moreover, in the exemplary scenario, Gil is less than eighteen years due to which Assaf his father can view his personally identifiable information. Therefore, the data obfuscation allows Assaf to view the following information: -
Figure imgf000013_0002
Figure imgf000014_0001
By virtue of determining each identified PII element which is not linked with the specified PII element and obfuscating the identified PII element that the first subject is not authorized to view enables the data processing device 102 to hide the sensitive data that does not belong to the first subject.
The data processing device 102 further includes the output unit 112 configured to output the edited document. After the edition of the document by the redaction unit 110, the output unit 112 provides the edited document to the first subject to view. The final edited document includes the specified one or more than one PII element associated with the first subject. Additionally, the edited document further includes data obfuscation of the sensitive data that is not linked with the first subject.
Thus, the data processing device 102 efficiently and accurately identifies the relevant information related to the first subject due to the identification of one or more PII elements related to the first subject. Moreover, the data processing device 102 identifies the one or more PII elements related to the first subject not only from a single document but from the multiple requested documents. Additionally, the data processing device 102 obfuscates the PII elements that are not linked with the first subject due to which the data processing device 102 ensures data privacy and further enables the data processing device 102 to comply with the data privacy regulations and compliance.
FIG. 2 is a flowchart of a computer-implemented method of data processing, in accordance with an embodiment of the present disclosure. FIG. 2 is described in conjunction with elements from FIG. 1. With reference to FIG. 2, there is shown a computer-implemented method 200 of data processing. The computer-implemented method 200 includes steps 202 to 210. The computer- implemented method 200 is executed by the data processing device 102 (of FIG. 1).
There is provided the computer-implemented method 200 of data processing. The computer- implemented method 200 is used to provide all the relevant sensitive information related to the first subject and obfuscate the sensitive information that is not related to the first subject. At step 202, the computer-implemented method 200 comprises receiving, by an input unit (e.g., the input unit 104 of FIG. 1), a request for a document. The received request includes at least one personally identifiable information about the first subject. In an example, in the process of registration for a service being availed by the first subject, during the registration, the one or more PII elements are provided by the first subject to an organization.
At step 204, the computer-implemented method 200 further comprises identifying, by an identification unit (e.g., the identification unit 106), one or more personally identifiable information, PII, elements in the requested document. After receiving the request, the one or more PII elements are identified by the identification unit 106. The identified one or more PII elements may include name, identity, address, phone number, and the like. Furthermore, the identification unit 106 identifies all the PIIs that includes the PIIs not related to the first subject. Thus, the data processing device 102 obfuscates the PIIs that are not related to the first subject. Beneficially, the identification of one or more PII elements in the requested document enables the data processing device 102 to identify all the PIIs related to the first subject as well as the PIIs not related to the first subject.
At step 206, the computer-implemented method 200 further comprises searching, by a lookup unit (e.g., the lookup unit 108), each identified PII element in a database to link identified PII elements associated with the same subject. The lookup unit 108 may be configured to group together the subject information (i.e., the PII elements related to the first subject) such as the name, SSN, phone number, and credit card found in the database to get all the PII elements associated with the first subject.
At step 208, the computer-implemented method 200 further comprises editing, by a redaction unit (e.g., the redaction unit 110). The computer-implemented method 200 finds the one or more PII elements that are associated with the first subject. However, some identified PII elements are not linked with the specified PII element, and therefore, it is important to obfuscate that identified PII element. The application programming interface (API) allows the first subject to hide all the personally identifiable information, except personally identifiable information that the first subject is allowed to see.
At step 210, the computer-implemented method 200 further comprises outputting, by an output unit (e.g., the output unit 112). After the edition of the document by the redaction unit 110, the output unit 112 provides the edited document to the first subject to view. The final edited document includes the specified one or more than one PII element associated with the first subject. Additionally, the edited document further includes data obfuscation of the sensitive data that is not linked with the first subject.
In accordance with an embodiment, the request for a document includes a request for a data subject access report (DSAR) for a subject associated with the specified PII element. The computer-implemented method 200 further includes searching, by a search unit, for one or more documents based on the subject and providing each document to the identification unit 106. The request for the document that includes the request for the data subject access report (DSAR) for the subject related to the specified PII element or more than one PII element is sufficient to identify the first subject. The request corresponds to identifying all the relevant information about the first subject, no matter which regulation this request is used to fulfil, for example, a data subject access request (DSAR). In an implementation, the search may include searching from a relation graph that includes the PIIs of every document added to the database and the relation graph is built offline when every document is added to the database. After receiving the request for the document by the input unit 104, the search unit 114 is configured to search for the one or more documents about the first subject. Furthermore, the searched one or more documents are provided to the identification unit 106 by the search unit 114 to search and identify one or more PII elements associated with the first subject. Alternatively stated, the data processing device 102 includes the input unit 104 that receives the request for the document, and the search unit 114 searches the one or more documents related to the first subject and provides the documents to the identification unit 106. Thus, enables the data processing device 102 to identify all the relevant PII elements related to the first subject in multiple documents.
In accordance with an embodiment, searching each PII element includes searching a relation graph stored in the database. Each node of the graph represents an identified PII element, and each edge of the graph represents a link between pairs of identified PII elements. The computer- implemented method 200 is configured to search the relation graph (or a weighted graph) by searching each identified PII element through the nodes of the graph and searching each identified link between the pairs of the PII elements through the edges of the graph. Beneficially, the computer-implemented method 200 enables the data processing device 102 to discover the link between one or more PII elements. The relations graph database allows the data processing device to understand which PII element belongs to the first subject. In accordance with an embodiment, each node of the relation graph further includes an accuracy score of each identified PII element based on an accuracy of the identification and a uniqueness score of each identified PII element based on a uniqueness of a type of the identified PII element. Each edge further includes a relation accuracy score of each link based on the accuracy of the link. The accuracy score describes the accuracy of the identification performed by the computer-implemented method 200. The accuracy score lies in a range of 0 to 1. In addition, the uniqueness score describes how unique is the PII element. The value of the uniqueness score lies between 0 to 1. The unique PII element may be defined as a PII element that is unique by law with uniqueness equal to 1, such as social security number (SSN) or passport number (PPN). However, the other PII elements, such as the home address, the phone number, the credit card number, and the like, are assigned a value lesser than 1, whereas the higher the value, the more unique is the PII element. Furthermore, the relation accuracy score (may also be named as PII relation accuracy score) describes the accuracy of the link identified by the lookup unit 108. The value of the relation accuracy score lies between 0 to 1. Thus, enables the data processing device 102 to discover relations between PII elements. The relations graph database allows the data processing device 102 to understand which PII entities belong to a specific individual, for example, the passport number, the credit card number, the name, and the like. Beneficially, the relation graph enables the data processing device 102 to identify the one or more PII elements based on the accuracy score for each node in order to efficiently and accurately link the one or more PII elements related to the first subject with more accuracy and reliability.
In accordance with an embodiment, editing the document includes traversing the graph starting from the specified PII element and generating a list including each traversed PII element. The traversal is limited by a weighting factor based on the assigned scores. The input unit 104 is configured to receive the request, which includes either one PII element or more than one PII element that is sufficient to identify the first subject. After receiving the request by the input unit 104, the lookup unit 108 is configured to search the relation graph stored in the database to gather the relevant information about the first subject by use of the graph traversal. The graph traversal starts from the PII element or the group of PII elements specified by the request. In order to gather the relevant information about the first subject, the lookup unit 108 is further configured to generate the list, including each traversed PII element. The graph traversal is limited by the weighting factor, which is computed based on the accuracy and uniqueness score assigned to each node and the relation accuracy score assigned to each edge. Alternatively stated, by use of the weighting factor, the graph traversal is limited only to those PII elements that are closely related to one PII element or more than one PII element specified in the received request. The weighting factor is calculated for each node by multiplying the accuracy score of the node with a path weight, where the path weight is the product of the path weight of the preceding node, the uniqueness score of the preceding node, and the accuracy score of the relation between the two nodes. The weighting factor is a diminishing product because all the scores, that is the accuracy score, the uniqueness score, and the relation accuracy score lie between 0 to 1. Thus, this is advantageous in terms of knowing which PII element is traversed in the document.
In accordance with an embodiment, editing the document further includes determining, for each PII element not linked with the specified PII element, whether a subject associated with the specified PII element is authorized to view the PII element. The editing of the document includes obfuscating any identified PII element not linked with the specified PII element, which the subject associated with the specified PII element is not authorized to view. The one or more PII elements include name, social security number (SSN), address, and another name.
Thus, the computer-implemented method 200 efficiently and accurately identifies the relevant information related to the first subject due to the identification of one or more PII elements related to the first subject. Moreover, the computer-implemented method 200 identifies the one or more PII elements related to the first subject not only from a single document but from the multiple requested documents. Additionally, the computer-implemented method 200 obfuscates the PII elements that are not linked with the first subject, due to which the computer- implemented method 200 ensures data privacy and further enables the computer-implemented method 200 to comply with the data privacy regulations and compliance.
The steps 202 to 210 are only illustrative, and other alternatives can also be provided where one or more steps are added, one or more steps are removed, or one or more steps are provided in a different sequence without departing from the scope of the claims herein.
In one aspect, there is provided a computer-readable medium comprising instructions which, when executed by a processor (e.g., the processor 118 of the data processing device 102), cause the processor to perform the computer-implemented method 200. In an example, the instructions may be implemented on the computer-readable media, which include, but is not limited to, Electrically Erasable Programmable Read-Only Memory (EEPROM), Random Access Memory (RAM), Read-Only Memory (ROM), Hard Disk Drive (HDD), Flash memory, a Secure Digital (SD) card, Solid-State Drive (SSD), a computer-readable storage medium, and/or CPU cache memory. In an example, the instructions are generated by a computer program, which is implemented in view of the computer-implemented method 200, and for use in implementing the computer-implemented method 200 on one or more processors, such as the processor 118 of the data processing device 102.
Modifications to embodiments of the present disclosure described in the foregoing are possible without departing from the scope of the present disclosure as defined by the accompanying claims. Expressions such as "including", "comprising", "incorporating", "have", "is" used to describe and claim the present disclosure are intended to be construed in a non-exclusive manner, namely allowing for items, components or elements not explicitly described also to be present. Reference to the singular is also to be construed to relate to the plural. The word "exemplary" is used herein to mean "serving as an example, instance or illustration". Any embodiment described as “exemplary” is not necessarily to be construed as preferred or advantageous over other embodiments and/or to exclude the incorporation of features from other embodiments. The word "optionally" is used herein to mean "is provided in some embodiments and not provided in other embodiments". It is appreciated that certain features of the present disclosure, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the present disclosure, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable combination or as suitable in any other described embodiment of the disclosure.

Claims

1. A data processing device (102), comprising: an input unit (104) configured to receive a request for a document, the request specifying at least one Personally Identifiable Information, PII, element associated with a first subject; an identification unit (106) configured to identify one or more PII elements in the requested document; a lookup unit (108) configured to search each identified PII element in a database to link identified PII elements associated with the same subject; a redaction unit (110) configured to edit the document to obfuscate one or more identified PII elements not linked with the specified PII element; and an output unit (112) configured to output the edited document.
2. The data processing device (102) of claim 1, wherein the specified PII element is associated with a first subject and the redaction unit is further configured to: determine, for each identified PII element not linked with the specified PII element, whether the first subject is authorized to view the identified PII element, and obfuscate any identified PII element linked with the specified PII element which the subject associated with the specified PII element is not authorized to view.
3. The data processing device (102) of claim 1 or claim 2, wherein the request for a document includes a request for a data subject access report, DSAR, for a subject associated with the specified PII element, and the data processing device (102) further comprises: a search unit (114) configured to search for one or more documents based on the subject and provide each document to the identification unit (106).
4. The data processing device (102) of any preceding claim, wherein the lookup unit (108) is configured to search a relation graph stored in the database, wherein each node of the graph represents an identified PII element, and each edge of the graph represents a link between pairs of identified PII elements.
5. The data processing device (102) of claim 4, wherein each node of the relation graph further includes an accuracy score of each identified PII element based on an accuracy of the identification and a uniqueness score of each identified PII element based on a uniqueness of a type of the identified PII element, and each edge further includes a relation accuracy score of each link based on an accuracy of the link.
6. The data processing device (102) of claim 5, wherein the lookup unit (108) is configured to traverse the graph starting from the specified PII element and generate a list including each traversed PII element, wherein the traversal is limited by a weighting factor based on the assigned scores.
7. A computer-implemented method (200) of data processing, comprising: receiving, by an input unit (104), a request for a document, the request specifying at least one Personally Identifiable Information, PII, element associated with a first subject; identifying, by an identification unit (106), one or more PII elements in the requested document; searching, by a lookup unit (108), each identified PII element in a database to link identified PII elements associated with the same subject; editing, by a redaction unit (110), the document to obfuscate one or more PII elements not linked with the specified PII element; and outputting, by an output unit (112), the edited document.
8. The computer-implemented method (200) of claim 7, wherein editing the document further comprises: determining, for each PII element not linked with the specified PII element, whether a subject associated with the specified PII element is authorised to view the PII element, and obfuscating any identified PII element not linked with the specified PII element which the subject associated with the specified PII element is not authorised to view.
9. The computer-implemented method (200) of claim 7 or claim 8, wherein the request for a document includes a request for a data subject access report, DSAR, for a subject associated with the specified PII element, and the method further comprises: searching, by a search unit (114), for one or more documents based on the subject, and providing each document to the identification unit (106).
10. The computer-implemented method (200) of any one of claims 7 to 9, wherein searching each PII element comprising searching a relation graph stored in the database, wherein each node of the graph represents an identified PII element and each edge of the graph represents a link between pairs of identified PII elements.
11. The computer-implemented method (200) of claim 10, wherein each node of the relation graph further includes an accuracy score of each identified PII element based on an accuracy of the identification and a uniqueness score of each identified PII element based on a uniqueness of a type of the identified PII element, and each edge further includes a relation accuracy score of each link based on an accuracy of the link.
12. The computer-implemented method (200) of claim 11, wherein editing the document comprises traversing the graph starting from the specified PII element and generating a list including each traversed PII element, wherein the traversal is limited by a weighting factor based on the assigned scores.
13. A computer-readable medium comprising instructions which, when executed by a processor, cause the processor to perform the method of any one of claims 7 to 12.
PCT/EP2022/059214 2022-04-07 2022-04-07 Data processing device and method of data processing WO2023193908A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/EP2022/059214 WO2023193908A1 (en) 2022-04-07 2022-04-07 Data processing device and method of data processing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/EP2022/059214 WO2023193908A1 (en) 2022-04-07 2022-04-07 Data processing device and method of data processing

Publications (1)

Publication Number Publication Date
WO2023193908A1 true WO2023193908A1 (en) 2023-10-12

Family

ID=81580695

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2022/059214 WO2023193908A1 (en) 2022-04-07 2022-04-07 Data processing device and method of data processing

Country Status (1)

Country Link
WO (1) WO2023193908A1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160321462A1 (en) * 2015-05-01 2016-11-03 International Business Machines Corporation Audience-based sensitive information handling for shared collaborative documents
US20190213354A1 (en) * 2018-01-09 2019-07-11 Accenture Global Solutions Limited Automated secure identification of personal information
US20220043935A1 (en) * 2020-08-06 2022-02-10 OneTrust, LLC Data processing systems and methods for automatically redacting unstructured data from a data subject access request

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160321462A1 (en) * 2015-05-01 2016-11-03 International Business Machines Corporation Audience-based sensitive information handling for shared collaborative documents
US20190213354A1 (en) * 2018-01-09 2019-07-11 Accenture Global Solutions Limited Automated secure identification of personal information
US20220043935A1 (en) * 2020-08-06 2022-02-10 OneTrust, LLC Data processing systems and methods for automatically redacting unstructured data from a data subject access request

Similar Documents

Publication Publication Date Title
US10614248B2 (en) Privacy preserving cross-organizational data sharing with anonymization filters
JP5232855B2 (en) How to identify email messages and associate them with each other
GB2513472A (en) Resolving similar entities from a database
JP7159923B2 (en) Detection and prevention of privacy violations due to database disclosure
US20090112805A1 (en) Method, system, and computer program product for implementing search query privacy
CN103631904A (en) System and method for selecting synchronous or asynchronous file access method during antivirus analysis
WO2009058474A1 (en) Method and apparatus for automatically classifying data
US20230205755A1 (en) Methods and systems for improved search for data loss prevention
US10915533B2 (en) Extreme value computation
WO2022064348A1 (en) Protecting sensitive data in documents
US20090259622A1 (en) Classification of Data Based on Previously Classified Data
CN113711221A (en) Efficient chained record access
US11275850B1 (en) Multi-faceted security framework for unstructured storage objects
US20190294594A1 (en) Identity Data Enhancement
US20220138343A1 (en) Method of determining data set membership and delivery
EP3227794A1 (en) Unstructured search query generation from a set of structured data terms
WO2023193908A1 (en) Data processing device and method of data processing
US20160292282A1 (en) Detecting and responding to single entity intent queries
Franke et al. ScaDS research on scalable privacy-preserving record linkage
US10509809B1 (en) Constructing ground truth when classifying data
KR20200073824A (en) Method for profiling malware and apparatus thereof
US11797528B2 (en) Systems and methods for targeted data discovery
US11444976B2 (en) Systems and methods for automatically blocking the use of tracking tools
CN114490692A (en) Data checking method, device, equipment and storage medium
CN113127810A (en) Method and device for protecting data assets, electronic equipment and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22721323

Country of ref document: EP

Kind code of ref document: A1