US20230261856A1 - Deterministic cryptography deidentification with granular data destruction - Google Patents

Deterministic cryptography deidentification with granular data destruction Download PDF

Info

Publication number
US20230261856A1
US20230261856A1 US17/674,118 US202217674118A US2023261856A1 US 20230261856 A1 US20230261856 A1 US 20230261856A1 US 202217674118 A US202217674118 A US 202217674118A US 2023261856 A1 US2023261856 A1 US 2023261856A1
Authority
US
United States
Prior art keywords
token
name
data
deidentification
token pair
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
US17/674,118
Other versions
US11757626B1 (en
Inventor
Ofer Rivlin
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cyberark Software Ltd
Original Assignee
Cyberark Software Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cyberark Software Ltd filed Critical Cyberark Software Ltd
Priority to US17/674,118 priority Critical patent/US11757626B1/en
Assigned to CYBERARK SOFTWARE LTD. reassignment CYBERARK SOFTWARE LTD. ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: RIVLIN, OFER
Publication of US20230261856A1 publication Critical patent/US20230261856A1/en
Application granted granted Critical
Publication of US11757626B1 publication Critical patent/US11757626B1/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/08Key distribution or management, e.g. generation, sharing or updating, of cryptographic keys or passwords
    • H04L9/0816Key establishment, i.e. cryptographic processes or cryptographic protocols whereby a shared secret becomes available to two or more parties, for subsequent use
    • H04L9/0819Key transport or distribution, i.e. key establishment techniques where one party creates or otherwise obtains a secret value, and securely transfers it to the other(s)
    • H04L9/0825Key transport or distribution, i.e. key establishment techniques where one party creates or otherwise obtains a secret value, and securely transfers it to the other(s) using asymmetric-key encryption or public key infrastructure [PKI], e.g. key signature or public key certificates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • G06F21/6254Protecting personal data, e.g. for financial or medical purposes by anonymising data, e.g. decorrelating personal data from the owner's identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • G06F21/6263Protecting personal data, e.g. for financial or medical purposes during internet communication, e.g. revealing personal data from cookies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/08Key distribution or management, e.g. generation, sharing or updating, of cryptographic keys or passwords
    • H04L9/0861Generation of secret information including derivation or calculation of cryptographic keys or passwords
    • H04L9/0869Generation of secret information including derivation or calculation of cryptographic keys or passwords involving random numbers or seeds
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/08Key distribution or management, e.g. generation, sharing or updating, of cryptographic keys or passwords
    • H04L9/0894Escrow, recovery or storing of secret information, e.g. secret key escrow or cryptographic key storage
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/32Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials
    • H04L9/321Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials involving a third party or a trusted authority
    • H04L9/3213Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials involving a third party or a trusted authority using tickets or tokens, e.g. Kerberos
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/32Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials
    • H04L9/3234Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials involving additional secure or trusted devices, e.g. TPM, smartcard, USB or software token
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2209/00Additional information or applications relating to cryptographic mechanisms or cryptographic arrangements for secret or secure communication H04L9/00
    • H04L2209/42Anonymization, e.g. involving pseudonyms

Definitions

  • Lookup-table-based tokenization schemes for de-identification can achieve destruction of data.
  • the granularity that is possible with lookup tables is limited to item-level granularity, rather than group-level granularity.
  • each item gets its own table row, which involves a memory consumption of Big O of (n).
  • Lookup tables are thus not a preferred cloud solution due to scaling issues at large volumes.
  • Personal identifiable information (Pll) data should be protected. This may include, for example, biological, social, economic, or other data that is sensitive to individuals. Likewise, other sensitive types of data (e.g., business data, server logs, testing data, communication data, etc.) may also need to be protected. Sensitive data of these types can be masked in many ways to ensure protection. If an analysis is performed on the data in the cloud and then returned to a customer through a data analysis, any specific identifying information should be deidentified before the data is processed in the cloud. However, the information should be re-identified when returned to the customer or other owner. Re-identification (a reversal of the de-identification) is possible when the de-identification occurs using cryptographic encryption of the data. Re-identification occurs by using decryption of the encrypted data.
  • Crypto-shredding is a process that destroys data by destroying the cryptographic keys that protect the data. Data sets that are protected entirely by one cryptographic key will in turn be destroyed in their entirety when the cryptographic key is destroyed.
  • having separate cryptographic keys for each data item results in large and burdensome overhead, which can be complex and expensive.
  • granularity may be desired, such as through having separate groups of data, e.g., data separated according to month and year. Each data item must be unique and different from each other to differentiate each item. This differentiation can be achieved through deterministic encryption.
  • technological solutions should be able to precisely perform de-identification and re-identification of data without destroying an entire data set and without destroying one cryptography key. Further, solutions should use a combination of deterministic encryption together with separating data via groups to enable granular destruction. Additional technological problems and corresponding solutions are addressed in the following detailed description.
  • a de-identification process may involve encrypting sensitive data such as a person’s name and all sensitive data associated with that person’s name.
  • the person’s information may all be associated with, for example, a specific username.
  • the encryption process may use the same encryption key for the person’s specific username, and upon completion of the analysis the results may be returned to the person, where only the person can reidentify the data based on the specific username.
  • non-transitory computer readable media, systems, and methods for deterministic cryptography deidentification enabling granular destruction there may be a non-transitory computer readable medium including instructions that, when executed by at least one processor, cause the at least one processor to perform operations for deterministic cryptography deidentification enabling granular destruction.
  • the operations may comprise preparing a table of name-token pair groupings with unique tokens, wherein the name-token pair groupings are configured to be used in a deidentification process; storing data deidentified in association with the deidentification process in a centralized repository; identifying a token from the table of name-token pair groupings; and enabling reidentifying of a specific data item of the deidentified data based on the token provided from the table.
  • the operations further comprise disposal of the token from the table of name-token pair groupings.
  • the deterministic cryptography is performed via authenticated encryption with associated data cryptography.
  • the data includes personal identifiable information.
  • the table of name-token pair groupings is prepared by a cryptography random generator.
  • the table of name-token pair groupings includes ⁇ group-name: group:token ⁇ pairs.
  • the deidentification process further comprises use of an encryption key and a tag.
  • the token is smaller in size than the encryption key used in the deidentification process.
  • the deidentification of data uses the table of name-token pair groupings.
  • a non-transitory computer readable medium including instructions that, when executed by at least one processor, cause the at least one processor to perform operations for granular destruction of data deidentified by deterministic cryptography.
  • the operations may comprise preparing a table of name-token pair groupings with unique tokens, wherein the name-token pair groupings are configured to be used in a deidentification process; deidentifying data using the table of name-token pair groupings; storing data deidentified in association with the deidentification process in a centralized repository; disposing of a token of the table of name-token pair groupings.
  • the operations further comprise identifying the token from the table of name-token pair groupings; and enabling reidentifying of a specific data item of deidentified data associated with the deidentification process based on the token provided from the table.
  • the data includes personal identifiable information.
  • the deterministic cryptography is performed via authenticated encryption with associated data cryptography.
  • the table of name-token pair groupings is prepared by a cryptography random generator.
  • the table of name-token pair groupings includes ⁇ group name: group-token ⁇ pairs.
  • the deidentification process further comprises use of an encryption key and a tag.
  • the token is smaller in size than the encryption key used in the deidentification process.
  • a method may be implemented for deterministic cryptography deidentification enabling granular destruction.
  • the method may comprise preparing a table of name-token pair groupings with unique tokens, wherein the name-token pair groupings are configured to be used in a deidentification process; storing data deidentified in associated with the deidentification process in a centralized repository; identifying a token from the table of name-token pair groupings; and enabling reidentifying of a specific data item of the deidentified data based on the token provided from the table.
  • the method further comprises disposing of the token from the table of name-token pair groupings.
  • the data includes personal identifiable information.
  • FIG. 1 is a block diagram of an exemplary system for performing operations for deterministic cryptographic deidentification enabling granular destruction of data, consistent with disclosed embodiments.
  • FIG. 2 is a block diagram of an example grouping of name-token pairs, consistent with disclosed embodiments.
  • FIG. 3 illustrates an exemplary flowchart of a method for deterministic cryptographic deidentification enabling granular destruction of data, consistent with disclosed embodiments.
  • FIG. 4 illustrates an exemplary flowchart of another method for deterministic cryptographic deidentification enabling granular destruction of data, consistent with disclosed embodiments.
  • FIG. 1 is a block diagram of an exemplary system 100 for deterministic, cryptographic deidentification enabling granular destruction of data.
  • a memory or storage device e.g., including one or more non-transitory computer readable medium
  • the processor(s) 102 may take the form of, but is not limited to, a microprocessor, embedded processor, or the like. According to some embodiments, processor(s) 102 may be from the family of processors manufactured by Intel®, AMD®, Qualcomm®, Apple®, NVIDIA®, or the like.
  • the processor(s) 102 may also be based on the ARM architecture, a mobile processor, or a graphics processing unit, etc. The disclosed embodiments are not limited to any particular type of processor(s) 102 in system 100 . As discussed herein, the processor(s) 102 may perform operations for deterministic cryptographic deidentification enabling granular destruction. These operations are discussed in more detail below.
  • Memory 101 may include one or more storage devices configured to store instructions used by the processor(s) 102 to perform functions related to deterministic cryptographic deidentification described herein.
  • the disclosed embodiments are not limited to particular software programs or devices configured to perform dedicated tasks.
  • the memory 101 may store a single program, such as a user-level application, that performs the functions associated with the disclosed embodiments, or may comprise multiple software programs.
  • the processor(s) 102 may, in some embodiments, execute one or more programs (or portions thereof).
  • memory 101 may include one or more storage devices configured to store data for use by the programs.
  • Memory 101 may include, but is not limited to, a hard drive, a solid state drive, a CD-ROM drive, a peripheral storage device (e.g., an external hard drive, a USB drive, etc.), a network drive, a cloud storage device, or any other storage device.
  • a hard drive e.g., a hard drive, a solid state drive, a CD-ROM drive, a peripheral storage device (e.g., an external hard drive, a USB drive, etc.), a network drive, a cloud storage device, or any other storage device.
  • these operations performed by processor(s) 102 may include preparing a table of name-token pair groupings 103 .
  • FIG. 2 further depicts an example of a table of name-token pair groupings 201 .
  • the table of name-token pair groupings 201 may configure the groupings for use in a deidentification process.
  • the data deidentified in association with the deidentification process may be stored in centralized repository 104 .
  • the processor(s) 102 may also perform an operation for token identification 105 from the table of name-token pair groupings.
  • the processor(s) 102 may also perform an operation for reidentification of specific data 106 based on the token identification 105 .
  • aspects of this disclosure may include preparing a table of name-token pair groupings 103 with unique tokens.
  • the table may include a set of data arranged in rows and columns. Other formats of data organization are possible as well.
  • the table of name-token pair groupings 103 and centralized repository 104 may be included on one or more volatile or non-volatile, magnetic, semiconductor, tape, optical, removable, non-removable, or other type of storage device or tangible or non-transitory computer-readable medium.
  • Table of name-token pair groupings 103 and centralized repository 104 may also be part the same server or cluster of servers, or disparate servers.
  • Table of name-token pair groupings 103 and centralized repository 104 may include one or more memory devices that store data and instructions used to perform one or more features of the disclosed embodiments.
  • Table of name-token pair groupings 103 and centralized repository 104 may include any suitable databases, ranging from small databases hosted on a work station to large databases distributed among data centers.
  • Table of name-token pair groupings 103 and centralized repository 104 may also include any combination of one or more databases controlled by memory controller devices (e.g., server(s), etc.) or software.
  • table of name-token pair groupings 103 and centralized repository 104 may include document management systems, Microsoft SQL® databases, SharePoint® databases, Oracle® databases, Sybase®databases, other relational databases, or non-relational databases, such as mongo and others.
  • a token may include an object that represents the right to perform an operation, including but not limited to security, access, and control.
  • the right to perform an operation may also identify an identity that is able to perform the operation. For example, an identity may be referenced according to a security policy or access-control policy to determine whether the identity can perform an operation.
  • a unique token may include an exclusive or particular object that represents the right to perform an operation, including but not limited to security, access, and control.
  • a name-token pair grouping may include a character string (name) with a randomly generated string (token).
  • name a character string
  • token a randomly generated string
  • Each name-token pair grouping may be unique.
  • a name-token pair grouping may use the name of a project, person, company, date-stamp, time-stamp, etc.
  • system 100 may prepare a table of names and tokens that are grouped together, and each token may be unique. As shown in FIG. 2 , table 200 shows an exemplary format of name-token groupings. Other formats of name-token groupings are possible as well.
  • the name-token pair groupings 103 may be configured to be used in a deidentification process, as described further below (e.g., in the processes of FIG. 3 and FIG. 4 ).
  • a deidentification process may include detecting identifiers that directly or indirectly point to a person, entity, or object, and deleting those identifiers from the data. By deleting the identifiers, the underlying data may be effectively and granularly deidentified.
  • the configuration may include several steps. For example, this may include creating the name-token pair grouping 103 , as shown in FIG. 2 . Further, this may include retrieving a token from the name-token pair grouping 103 (e.g., based on the name value). This may also include marking the data with the selected token. The marking may be done cryptographically or otherwise.
  • the table of name-token pair groupings 103 may be prepared by a cryptography random generator.
  • a cryptography random generator may include a process for creating cryptographically strong random values. This may be performed using, for example, a cryptographically secure pseudorandom number generator (CSPRNG) or cryptographic pseudorandom number generator (CPRNG).
  • CSPRNG cryptographically secure pseudorandom number generator
  • CPRNG cryptographic pseudorandom number generator
  • the values produced by the cryptography random generator should exhibit properties including, but not limited to, appearing random, being unpredictable in advance, and not being reliably reproduced after generation.
  • the table of name-token pair groupings 103 may be created by randomly assigning each name-token pair grouping based on the output of a cryptographic random generator (e.g., CSPRNG, CPRNG, or the like).
  • a cryptographic random generator e.g., CSPRNG, CPRNG, or the like.
  • the table of name-token pair groupings may include ⁇ group-name: group-token ⁇ pairs.
  • the group name may include the name of a month
  • the group token may be a randomly generated string, as discussed above.
  • centralized repository 104 may include a collection of stored data from existing databases that is deployed by consolidating data from multiple sources.
  • a centralized repository 104 may include a data lake, a data warehouse, or other types of data storage.
  • the centralized repository 104 may thus be based on architectures such as AWS Data Lake®, Google Data Lake®, Azure Data Lake®, Cloudera Data Platform®, Databricks Unified Analytics Platform®, or others.
  • the data may include personal identifiable information.
  • the data may also include other types of sensitive business, biological, social, technical, or economic data. Further, the data may include sets of values of qualitative or quantitative variables about one or more persons, entities, or objects.
  • personal identifiable information may include any representation of information that permits the identity of an individual, entity, or object to whom the information applies to be reasonably inferred by either direct or indirect means. For example, personal identifiable information may include, but is not limited to, a passport number, financial account number, or a driver’s license number, among many other types.
  • the deidentification process may involve deidentification of sensitive data.
  • Sensitive data may include data that contains personal identifiable information or other confidential data, as discussed above.
  • This process may include encrypting the personal identifiable information data before fetching the data.
  • the encryption may be done symmetrically (e.g., using techniques such as AES, Blowfish, CAST5, RC4, DES, 3DES, etc.) or asymmetrically (e.g., using techniques such as Diffie-Hellman, DSS, RSA, YAK, etc.).
  • the deidentification process further comprises use of an encryption key (symmetric or asymmetric) and a tag.
  • a deidentification process may include detecting identifiers that directly or indirectly point to a person, entity, or object, and deleting those identifiers from the data.
  • An encryption key may include a piece of information, usually a string of numbers or letters that are stored in a file, which, when processed through a cryptographic algorithm can encode or decode cryptographic data.
  • a tag may include a keyword or term assigned to a piece of information.
  • other examples of encryption keys and tags are possible as well.
  • the encryption key is randomly generated by a cryptographically secure pseudorandom number generator (CSPRNG) or cryptographic pseudorandom number generator (CPRNG), as discussed above.
  • the tag may also be, for example, randomly generated by a cryptographically secure pseudorandom number generator (CSPRNG) or cryptographic pseudorandom number generator (CPRNG).
  • the tag may also be secured using hashing. Hashing may include changing a plain text or a key value to a hashed value by applying a hash function.
  • the hash function may be, for example, based on CRC (16/32/64), Alder-32, BSD, sum (8/16/24/32), fletcher (4/8/16/32), or various other techniques. Hashing prevents tampering with the tag.
  • the encryption key may be attached to the tag. This attachment can be useful in the decryption process.
  • the token may be smaller in size than the encryption key used in the deidentification process.
  • aspects of this disclosure may include identifying a token from the table of name-token pair groupings 103 .
  • the token may be identified based on the corresponding name or another identifier.
  • aspects may also include enabling reidentifying of a specific data item of the deidentified data based on the token provided from the table of name-token pair groupings 103 .
  • a data item may include a single unit of data in a storage record and can include the smallest possible unit of information or a single entry or field of data.
  • a data item may include personal identifiable information or other sensitive information, as discussed above. For example, “JSmith” and “JDoe” are data items that might be associated with a person’s name. As described herein, deidentified data may include data from which all personally identifiable information has been removed.
  • a token may include an object that represents the right to perform an operation, including but not limited to security, access, and control.
  • the token can be random or pseudo-random characters, as discussed above.
  • a table e.g., table 103
  • a table may include a set of data arranged in rows and columns. When data is deidentified, as discussed below, this may mean that the token from the table of name-token pair groupings 103 is disposed of (e.g., deleted or moved).
  • disposing of the token may include transferring the token to the control of another (e.g., application, server, third-party, etc.), to get rid of, to place, to distribute, or to arrange in an orderly way.
  • a token may include an object that represents the right to perform an operation, including but not limited to security, access, and control. This may be based on, for example, a security policy, security group memberships, a least-privilege security framework, etc. In some situations, the rights to perform operations may be based on an Active Directory® framework, CyberArk Privileged Access Management® framework, AWS Identity and Access Management® framework, or others.
  • a name-token pair grouping may include a character string (name) with a randomly generated string (token). Each name-token pair grouping may be unique.
  • a name-token pair grouping may be the name of a project.
  • a table of name-token pair groupings 201 may include a group token 202 , cipher text 203 , and a tag 204 , as discussed above.
  • a deterministic cryptography technique may be performed via authenticated encryption with associated data cryptography.
  • Deterministic cryptography may include a type of encryption that repeatedly produces the resulting converted information given the same source text and key. Examples include RSA and block ciphers, among others as noted above.
  • deterministic cryptography may include the practice and study of techniques for secure communication.
  • authenticated encryption with associated data may include a type of encoding that allows a recipient to check the integrity of both the encrypted and unencrypted information in a message.
  • stored data may be deidentified according to various techniques. Data may be deidentified using the table of name-token pair groupings 103 discussed above. In some embodiments, deidentifying data may include detecting identifiers that directly or indirectly point to a person, entity, or object, and deleting those identifiers from the data. As discussed herein, data may include sets of values of qualitative or quantitative variables about one or more persons, entities, or objects.
  • FIG. 3 is a block diagram of an exemplary method 300 performed by a processor of a computer or computer-based system, consistent with disclosed embodiments.
  • process 300 may be caried out at centralized repository 103 , memory 101 , processor(s) 102 , or a separate computing system.
  • process 300 may be performed by a data security application running at processor(s) 102 .
  • Operation 301 may include preparing a table of name-token pair groupings 103 with unique tokens, for use in a deidentification process.
  • the table of name-token pair groupings 201 may include a group token 202 , cipher text 203 , and a tag 204 .
  • the table of name-token pair groupings 201 may configure the groupings for use in a deidentification process, as discussed above.
  • Process 300 may also include an operation 302 of storing data deidentified in association with the deidentification process in a centralized repository 104 .
  • the data includes personal identifiable information or other sensitive information.
  • centralized repository 104 may take the form of a data lake, data warehouse, or other storage, and may be based on architectures such as AWS Data Lake®, Google Data Lake®, Azure Data Lake®, Cloudera Data Platform, Databricks Unified Analytics Platform®, or others.
  • Process 300 may further include an operation 303 of identifying a token from the table of name-token pair groupings 103 . This identification may be based on a name value of the name-token pair groupings 103 or another identifiable attribute of the name-token pair groupings 103 . In some embodiments, this operation 303 also may further comprise disposing of the token from the table of name-token pair groupings. Disposing of the token may include deleting the token, scrambling the token, moving the token, etc.
  • process 300 may further include enabling reidentification of a specific data item of the deidentified data based on the token provided from the table 103 . Consistent with the above discussion, the reidentification may be based on a name attribute or other unique identifier.
  • FIG. 4 is a block diagram of an exemplary method 400 performed by a processor of a computer or computer-based system, consistent with disclosed embodiments.
  • method 400 may be caried out at centralized repository 103 , memory 101 , processor(s) 102 , or a separate computing system.
  • process 400 may be performed by a data security application running at processor(s) 102 .
  • Process 400 may include an operation 401 of preparing a table of name-token pair groupings 103 with unique tokens.
  • a table of name-token pair groupings 201 may include a group token 202 , cipher text 203 , and a tag 204 .
  • the table of name-token pair groupings 201 may configure the groupings for use in a deidentification process, as discussed above.
  • Process 400 may also include an operation 402 of deidentifying data using table of name-token pair groupings 103 . This may be performed according to the techniques discussed above. For example, deidentifying data may be based on deleting, scrambling, or moving a token associated with certain data in the table of name-token pair groupings 103 .
  • process 400 may store data deidentified in association with the deidentification process in a centralized repository 104 .
  • the centralized repository 104 may be a data lake, data warehouse, or other storage, and may be based on architectures such as AWS Data Lake®, Google Data Lake®, Azure Data Lake®, Cloudera Data Platform®, Databricks Unified Analytics Platform®, or others.
  • Process 400 may also include an operation 404 of disposing of a token of table of name-token pair groupings 103 .
  • the token can be deleted, scrambled, moved, etc.
  • the deidentification of data can be highly granular.
  • Data may be deidentified based on the token, and thus other data need not be deidentified as well. This approach thus offers significant more precision and granularity that other available techniques.
  • the disclosed embodiments may be implemented in a system, a method, and/or a computer program product.
  • the computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention.
  • the computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device.
  • the computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing.
  • a non-exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a static random access memory (SRAM), a portable compact disc read-only memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read-only memory
  • EPROM or Flash memory erasable programmable read-only memory
  • SRAM static random access memory
  • CD-ROM compact disc read-only memory
  • DVD digital versatile disk
  • memory stick a floppy disk
  • a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon
  • a computer readable storage medium is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.
  • Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network.
  • the network may comprise copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers.
  • a network adapter card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium within the respective computing/processing device.
  • Computer readable program instructions for carrying out operations of the present invention may be assembler instructions, instruction-set-architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state-setting data, or either source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C++ or the like, and conventional procedural programming languages, such as the “C” programming language or similar programming languages.
  • the computer readable program instructions may execute entirely on the user’s computer, partly on the user’s computer, as a stand-alone software package, partly on the user’s computer and partly on a remote computer or entirely on the remote computer or server.
  • the remote computer may be connected to the user’s computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).
  • electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present invention.
  • These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
  • These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, a programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable storage medium having instructions stored therein comprises an article of manufacture including instructions which implement aspects of the function/act specified in the flowchart and/or block diagram block or blocks.
  • the computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable apparatus or other device to produce a computer implemented process, such that the instructions which execute on the computer, other programmable apparatus, or other device implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
  • each block in the flowcharts or block diagrams may represent a software program, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s).
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Storage Device Security (AREA)

Abstract

Disclosed embodiments relate to systems and methods for deterministic cryptography deidentification enabling granular destruction. Techniques include preparing a table of name-token pair groupings with unique tokens, storing data deidentified in association with the deidentification process in a centralized repository, identifying a token from the table of name-token pair groupings, and enabling reidentifying of a specific data item of the deidentified data based on the token provided from the table.

Description

    BACKGROUND
  • Lookup-table-based tokenization schemes for de-identification (instead of, e.g., cryptographic encryption and decryption) can achieve destruction of data. However, to also keep data items unique, the granularity that is possible with lookup tables is limited to item-level granularity, rather than group-level granularity. In a lookup table, each item gets its own table row, which involves a memory consumption of Big O of (n). Lookup tables are thus not a preferred cloud solution due to scaling issues at large volumes.
  • Personal identifiable information (Pll) data, as well as any other type of sensitive data, should be protected. This may include, for example, biological, social, economic, or other data that is sensitive to individuals. Likewise, other sensitive types of data (e.g., business data, server logs, testing data, communication data, etc.) may also need to be protected. Sensitive data of these types can be masked in many ways to ensure protection. If an analysis is performed on the data in the cloud and then returned to a customer through a data analysis, any specific identifying information should be deidentified before the data is processed in the cloud. However, the information should be re-identified when returned to the customer or other owner. Re-identification (a reversal of the de-identification) is possible when the de-identification occurs using cryptographic encryption of the data. Re-identification occurs by using decryption of the encrypted data.
  • Crypto-shredding is a process that destroys data by destroying the cryptographic keys that protect the data. Data sets that are protected entirely by one cryptographic key will in turn be destroyed in their entirety when the cryptographic key is destroyed. However, having separate cryptographic keys for each data item results in large and burdensome overhead, which can be complex and expensive. In view of this overhead, granularity may be desired, such as through having separate groups of data, e.g., data separated according to month and year. Each data item must be unique and different from each other to differentiate each item. This differentiation can be achieved through deterministic encryption.
  • In view of these issues, there are technological needs for systems and methods to perform operations for deterministic cryptographic identification for granular data destruction. Advantageously, technological solutions should be able to precisely perform de-identification and re-identification of data without destroying an entire data set and without destroying one cryptography key. Further, solutions should use a combination of deterministic encryption together with separating data via groups to enable granular destruction. Additional technological problems and corresponding solutions are addressed in the following detailed description.
  • For example, a de-identification process may involve encrypting sensitive data such as a person’s name and all sensitive data associated with that person’s name. The person’s information may all be associated with, for example, a specific username. The encryption process may use the same encryption key for the person’s specific username, and upon completion of the analysis the results may be returned to the person, where only the person can reidentify the data based on the specific username.
  • SUMMARY
  • The disclosed embodiments describe non-transitory computer readable media, systems, and methods for deterministic cryptography deidentification enabling granular destruction. For example, in an exemplary embodiment, there may be a non-transitory computer readable medium including instructions that, when executed by at least one processor, cause the at least one processor to perform operations for deterministic cryptography deidentification enabling granular destruction. The operations may comprise preparing a table of name-token pair groupings with unique tokens, wherein the name-token pair groupings are configured to be used in a deidentification process; storing data deidentified in association with the deidentification process in a centralized repository; identifying a token from the table of name-token pair groupings; and enabling reidentifying of a specific data item of the deidentified data based on the token provided from the table.
  • According to a disclosed embodiment, the operations further comprise disposal of the token from the table of name-token pair groupings.
  • According to a disclosed embodiment, the deterministic cryptography is performed via authenticated encryption with associated data cryptography.
  • According to a disclosed embodiment, the data includes personal identifiable information.
  • According to a disclosed embodiment, the table of name-token pair groupings is prepared by a cryptography random generator.
  • According to a disclosed embodiment, the table of name-token pair groupings includes {group-name: group:token} pairs.
  • According to a disclosed embodiment, the deidentification process further comprises use of an encryption key and a tag.
  • According to a disclosed embodiment, the token is smaller in size than the encryption key used in the deidentification process.
  • According to a disclosed embodiment, the deidentification of data uses the table of name-token pair groupings.
  • According to another disclosed embodiment, there may be a non-transitory computer readable medium including instructions that, when executed by at least one processor, cause the at least one processor to perform operations for granular destruction of data deidentified by deterministic cryptography. The operations may comprise preparing a table of name-token pair groupings with unique tokens, wherein the name-token pair groupings are configured to be used in a deidentification process; deidentifying data using the table of name-token pair groupings; storing data deidentified in association with the deidentification process in a centralized repository; disposing of a token of the table of name-token pair groupings.
  • According to a disclosed embodiment, the operations further comprise identifying the token from the table of name-token pair groupings; and enabling reidentifying of a specific data item of deidentified data associated with the deidentification process based on the token provided from the table.
  • According to a disclosed embodiment, the data includes personal identifiable information.
  • According to a disclosed embodiment, the deterministic cryptography is performed via authenticated encryption with associated data cryptography.
  • According to a disclosed embodiment, the table of name-token pair groupings is prepared by a cryptography random generator.
  • According to a disclosed embodiment, the table of name-token pair groupings includes {group name: group-token} pairs.
  • According to a disclosed embodiment, the deidentification process further comprises use of an encryption key and a tag.
  • According to a disclosed embodiment, the token is smaller in size than the encryption key used in the deidentification process.
  • According to another disclosed embodiment, a method may be implemented for deterministic cryptography deidentification enabling granular destruction. The method may comprise preparing a table of name-token pair groupings with unique tokens, wherein the name-token pair groupings are configured to be used in a deidentification process; storing data deidentified in associated with the deidentification process in a centralized repository; identifying a token from the table of name-token pair groupings; and enabling reidentifying of a specific data item of the deidentified data based on the token provided from the table.
  • According to a disclosed embodiment, the method further comprises disposing of the token from the table of name-token pair groupings.
  • According to a disclosed embodiment, wherein the data includes personal identifiable information.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate disclosed embodiments and, together with the description, serve to explain the disclosed embodiments. In the drawings:
  • FIG. 1 is a block diagram of an exemplary system for performing operations for deterministic cryptographic deidentification enabling granular destruction of data, consistent with disclosed embodiments.
  • FIG. 2 is a block diagram of an example grouping of name-token pairs, consistent with disclosed embodiments.
  • FIG. 3 illustrates an exemplary flowchart of a method for deterministic cryptographic deidentification enabling granular destruction of data, consistent with disclosed embodiments.
  • FIG. 4 illustrates an exemplary flowchart of another method for deterministic cryptographic deidentification enabling granular destruction of data, consistent with disclosed embodiments.
  • DETAILED DESCRIPTION
  • In the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of the disclosed example embodiments. However, it will be understood by those skilled in the art that the principles of the example embodiments may be practiced without every specific detail. Well-known methods, procedures, and components have not been described in detail so as not to obscure the principles of the example embodiments. Unless explicitly stated, the example methods and processes described herein are not constrained to a particular order or sequence, or constrained to a particular system configuration. Additionally, some of the described embodiments or elements thereof can occur or be performed simultaneously, at the same point in time, or concurrently.
  • The techniques discussed herein overcome several technological needs for systems to perform operations for deterministic, cryptographic identification enabling granular destruction of data. In prior techniques, destruction of a cryptography key may result in destruction of an entire data set being protected via encryption. Such destruction of data may be desired for many reasons, including compliance with privacy regulations, such as the General Data Protection Regulation (GDPR). Data destruction may also be desired to comply with corporate privacy policies, data retention policies, server maintenance, or various other reasons. According to prior techniques, however, using one cryptographic key for each data item often results in a large overhead, making those approaches inefficient and inflexible. Prior systems also often require granularity at a grouping level, and thus are not truly or adequately granular at all. In contrast to such inadequate approaches, there are needs for deterministic encryption techniques to identify each data item. These and related security and efficiency problems are addressed by the disclosed embodiments herein.
  • Reference will now be made in detail to the disclosed embodiments, examples of which are illustrated in the accompanying drawings.
  • FIG. 1 is a block diagram of an exemplary system 100 for deterministic, cryptographic deidentification enabling granular destruction of data. In accordance with system 100, a memory or storage device (e.g., including one or more non-transitory computer readable medium) 101 includes instructions that are to be executed by at least one processor (e.g., processor(s) 102). The processor(s) 102 may take the form of, but is not limited to, a microprocessor, embedded processor, or the like. According to some embodiments, processor(s) 102 may be from the family of processors manufactured by Intel®, AMD®, Qualcomm®, Apple®, NVIDIA®, or the like. The processor(s) 102 may also be based on the ARM architecture, a mobile processor, or a graphics processing unit, etc. The disclosed embodiments are not limited to any particular type of processor(s) 102 in system 100. As discussed herein, the processor(s) 102 may perform operations for deterministic cryptographic deidentification enabling granular destruction. These operations are discussed in more detail below.
  • Memory 101 may include one or more storage devices configured to store instructions used by the processor(s) 102 to perform functions related to deterministic cryptographic deidentification described herein. The disclosed embodiments are not limited to particular software programs or devices configured to perform dedicated tasks. For example, the memory 101 may store a single program, such as a user-level application, that performs the functions associated with the disclosed embodiments, or may comprise multiple software programs. Additionally, the processor(s) 102 may, in some embodiments, execute one or more programs (or portions thereof). Furthermore, memory 101 may include one or more storage devices configured to store data for use by the programs. Memory 101 may include, but is not limited to, a hard drive, a solid state drive, a CD-ROM drive, a peripheral storage device (e.g., an external hard drive, a USB drive, etc.), a network drive, a cloud storage device, or any other storage device.
  • In accordance with disclosed embodiments, these operations performed by processor(s) 102 may include preparing a table of name-token pair groupings 103. FIG. 2 further depicts an example of a table of name-token pair groupings 201. Within the table of name-token pair groupings 201 are examples of potential groupings, including a group token 202, cipher text 203, and a tag 204. The table of name-token pair groupings 103 may configure the groupings for use in a deidentification process. The data deidentified in association with the deidentification process may be stored in centralized repository 104. The processor(s) 102 may also perform an operation for token identification 105 from the table of name-token pair groupings. The processor(s) 102 may also perform an operation for reidentification of specific data 106 based on the token identification 105.
  • Aspects of this disclosure may include preparing a table of name-token pair groupings 103 with unique tokens. For example, the table may include a set of data arranged in rows and columns. Other formats of data organization are possible as well. The table of name-token pair groupings 103 and centralized repository 104 may be included on one or more volatile or non-volatile, magnetic, semiconductor, tape, optical, removable, non-removable, or other type of storage device or tangible or non-transitory computer-readable medium. Table of name-token pair groupings 103 and centralized repository 104 may also be part the same server or cluster of servers, or disparate servers. Table of name-token pair groupings 103 and centralized repository 104 may include one or more memory devices that store data and instructions used to perform one or more features of the disclosed embodiments. Table of name-token pair groupings 103 and centralized repository 104 may include any suitable databases, ranging from small databases hosted on a work station to large databases distributed among data centers. Table of name-token pair groupings 103 and centralized repository 104 may also include any combination of one or more databases controlled by memory controller devices (e.g., server(s), etc.) or software. For example, table of name-token pair groupings 103 and centralized repository 104 may include document management systems, Microsoft SQL® databases, SharePoint® databases, Oracle® databases, Sybase®databases, other relational databases, or non-relational databases, such as mongo and others.
  • In some embodiments, a token may include an object that represents the right to perform an operation, including but not limited to security, access, and control. In some embodiments, the right to perform an operation may also identify an identity that is able to perform the operation. For example, an identity may be referenced according to a security policy or access-control policy to determine whether the identity can perform an operation. According to some embodiments, a unique token may include an exclusive or particular object that represents the right to perform an operation, including but not limited to security, access, and control.
  • In some embodiments, a name-token pair grouping may include a character string (name) with a randomly generated string (token). Each name-token pair grouping may be unique. For example, a name-token pair grouping may use the name of a project, person, company, date-stamp, time-stamp, etc.
  • In some embodiments, system 100 may prepare a table of names and tokens that are grouped together, and each token may be unique. As shown in FIG. 2 , table 200 shows an exemplary format of name-token groupings. Other formats of name-token groupings are possible as well.
  • The name-token pair groupings 103 may be configured to be used in a deidentification process, as described further below (e.g., in the processes of FIG. 3 and FIG. 4 ). For example, a deidentification process may include detecting identifiers that directly or indirectly point to a person, entity, or object, and deleting those identifiers from the data. By deleting the identifiers, the underlying data may be effectively and granularly deidentified.
  • In some embodiments, the configuration may include several steps. For example, this may include creating the name-token pair grouping 103, as shown in FIG. 2 . Further, this may include retrieving a token from the name-token pair grouping 103 (e.g., based on the name value). This may also include marking the data with the selected token. The marking may be done cryptographically or otherwise.
  • In some embodiments, the table of name-token pair groupings 103 may be prepared by a cryptography random generator. For example, a cryptography random generator may include a process for creating cryptographically strong random values. This may be performed using, for example, a cryptographically secure pseudorandom number generator (CSPRNG) or cryptographic pseudorandom number generator (CPRNG). The values produced by the cryptography random generator should exhibit properties including, but not limited to, appearing random, being unpredictable in advance, and not being reliably reproduced after generation.
  • In some embodiments, the table of name-token pair groupings 103 may be created by randomly assigning each name-token pair grouping based on the output of a cryptographic random generator (e.g., CSPRNG, CPRNG, or the like). As an illustration, the table of name-token pair groupings may include {group-name: group-token} pairs. For example, the group name may include the name of a month, and the group token may be a randomly generated string, as discussed above.
  • Aspects of this disclosure may include storing data deidentified in association with the deidentification process in a centralized repository 104. In some embodiments, centralized repository 104 may include a collection of stored data from existing databases that is deployed by consolidating data from multiple sources. For example, a centralized repository 104 may include a data lake, a data warehouse, or other types of data storage. The centralized repository 104 may thus be based on architectures such as AWS Data Lake®, Google Data Lake®, Azure Data Lake®, Cloudera Data Platform®, Databricks Unified Analytics Platform®, or others.
  • In some embodiments, the data may include personal identifiable information. The data may also include other types of sensitive business, biological, social, technical, or economic data. Further, the data may include sets of values of qualitative or quantitative variables about one or more persons, entities, or objects. In some embodiments, personal identifiable information may include any representation of information that permits the identity of an individual, entity, or object to whom the information applies to be reasonably inferred by either direct or indirect means. For example, personal identifiable information may include, but is not limited to, a passport number, financial account number, or a driver’s license number, among many other types.
  • In some embodiments, the deidentification process may involve deidentification of sensitive data. Sensitive data may include data that contains personal identifiable information or other confidential data, as discussed above. This process may include encrypting the personal identifiable information data before fetching the data. For example, the encryption may be done symmetrically (e.g., using techniques such as AES, Blowfish, CAST5, RC4, DES, 3DES, etc.) or asymmetrically (e.g., using techniques such as Diffie-Hellman, DSS, RSA, YAK, etc.).
  • In some embodiments, the deidentification process further comprises use of an encryption key (symmetric or asymmetric) and a tag. For example, a deidentification process may include detecting identifiers that directly or indirectly point to a person, entity, or object, and deleting those identifiers from the data. An encryption key may include a piece of information, usually a string of numbers or letters that are stored in a file, which, when processed through a cryptographic algorithm can encode or decode cryptographic data. A tag may include a keyword or term assigned to a piece of information. Of course, other examples of encryption keys and tags are possible as well.
  • In some embodiments, the encryption key is randomly generated by a cryptographically secure pseudorandom number generator (CSPRNG) or cryptographic pseudorandom number generator (CPRNG), as discussed above. The tag may also be, for example, randomly generated by a cryptographically secure pseudorandom number generator (CSPRNG) or cryptographic pseudorandom number generator (CPRNG). The tag may also be secured using hashing. Hashing may include changing a plain text or a key value to a hashed value by applying a hash function. The hash function may be, for example, based on CRC (16/32/64), Alder-32, BSD, sum (8/16/24/32), fletcher (4/8/16/32), or various other techniques. Hashing prevents tampering with the tag. In some embodiments, the encryption key may be attached to the tag. This attachment can be useful in the decryption process. The token may be smaller in size than the encryption key used in the deidentification process.
  • Aspects of this disclosure may include identifying a token from the table of name-token pair groupings 103. For example, the token may be identified based on the corresponding name or another identifier. In accordance with the techniques discussed below, aspects may also include enabling reidentifying of a specific data item of the deidentified data based on the token provided from the table of name-token pair groupings 103.
  • In some embodiments, a data item may include a single unit of data in a storage record and can include the smallest possible unit of information or a single entry or field of data. A data item may include personal identifiable information or other sensitive information, as discussed above. For example, “JSmith” and “JDoe” are data items that might be associated with a person’s name. As described herein, deidentified data may include data from which all personally identifiable information has been removed.
  • In some embodiments, a token may include an object that represents the right to perform an operation, including but not limited to security, access, and control. The token can be random or pseudo-random characters, as discussed above. In some embodiments, a table (e.g., table 103) may include a set of data arranged in rows and columns. When data is deidentified, as discussed below, this may mean that the token from the table of name-token pair groupings 103 is disposed of (e.g., deleted or moved). In some embodiments, disposing of the token may include transferring the token to the control of another (e.g., application, server, third-party, etc.), to get rid of, to place, to distribute, or to arrange in an orderly way.
  • In some embodiments, a token may include an object that represents the right to perform an operation, including but not limited to security, access, and control. This may be based on, for example, a security policy, security group memberships, a least-privilege security framework, etc. In some situations, the rights to perform operations may be based on an Active Directory® framework, CyberArk Privileged Access Management® framework, AWS Identity and Access Management® framework, or others.
  • In some embodiments, a name-token pair grouping may include a character string (name) with a randomly generated string (token). Each name-token pair grouping may be unique. For example, a name-token pair grouping may be the name of a project. For example, consistent with FIG. 2 , a table of name-token pair groupings 201 may include a group token 202, cipher text 203, and a tag 204, as discussed above.
  • Consistent with the embodiments herein, a deterministic cryptography technique may be performed via authenticated encryption with associated data cryptography. Deterministic cryptography may include a type of encryption that repeatedly produces the resulting converted information given the same source text and key. Examples include RSA and block ciphers, among others as noted above. In some embodiments, deterministic cryptography may include the practice and study of techniques for secure communication. Further, as discussed herein authenticated encryption with associated data may include a type of encoding that allows a recipient to check the integrity of both the encrypted and unencrypted information in a message.
  • As discussed herein, stored data may be deidentified according to various techniques. Data may be deidentified using the table of name-token pair groupings 103 discussed above. In some embodiments, deidentifying data may include detecting identifiers that directly or indirectly point to a person, entity, or object, and deleting those identifiers from the data. As discussed herein, data may include sets of values of qualitative or quantitative variables about one or more persons, entities, or objects.
  • FIG. 3 is a block diagram of an exemplary method 300 performed by a processor of a computer or computer-based system, consistent with disclosed embodiments. In accordance with the discussion above, process 300 may be caried out at centralized repository 103, memory 101, processor(s) 102, or a separate computing system. For example, in some embodiments process 300 may be performed by a data security application running at processor(s) 102.
  • Operation 301 may include preparing a table of name-token pair groupings 103 with unique tokens, for use in a deidentification process. For example, as illustrated in FIG. 2 , the table of name-token pair groupings 201 may include a group token 202, cipher text 203, and a tag 204. The table of name-token pair groupings 201 may configure the groupings for use in a deidentification process, as discussed above.
  • Process 300 may also include an operation 302 of storing data deidentified in association with the deidentification process in a centralized repository 104. In some embodiments, the data includes personal identifiable information or other sensitive information. Consistent with above embodiments, centralized repository 104 may take the form of a data lake, data warehouse, or other storage, and may be based on architectures such as AWS Data Lake®, Google Data Lake®, Azure Data Lake®, Cloudera Data Platform, Databricks Unified Analytics Platform®, or others.
  • Process 300 may further include an operation 303 of identifying a token from the table of name-token pair groupings 103. This identification may be based on a name value of the name-token pair groupings 103 or another identifiable attribute of the name-token pair groupings 103. In some embodiments, this operation 303 also may further comprise disposing of the token from the table of name-token pair groupings. Disposing of the token may include deleting the token, scrambling the token, moving the token, etc.
  • In an operation 304, process 300 may further include enabling reidentification of a specific data item of the deidentified data based on the token provided from the table 103. Consistent with the above discussion, the reidentification may be based on a name attribute or other unique identifier.
  • FIG. 4 is a block diagram of an exemplary method 400 performed by a processor of a computer or computer-based system, consistent with disclosed embodiments. In some embodiments, method 400 may be caried out at centralized repository 103, memory 101, processor(s) 102, or a separate computing system. For example, in some embodiments process 400 may be performed by a data security application running at processor(s) 102.
  • Process 400 may include an operation 401 of preparing a table of name-token pair groupings 103 with unique tokens. As discussed above and as illustrated in FIG. 2 , a table of name-token pair groupings 201 may include a group token 202, cipher text 203, and a tag 204. The table of name-token pair groupings 201 may configure the groupings for use in a deidentification process, as discussed above.
  • Process 400 may also include an operation 402 of deidentifying data using table of name-token pair groupings 103. This may be performed according to the techniques discussed above. For example, deidentifying data may be based on deleting, scrambling, or moving a token associated with certain data in the table of name-token pair groupings 103.
  • In accordance with operation 403, process 400 may store data deidentified in association with the deidentification process in a centralized repository 104. Consistent with above embodiments, the centralized repository 104 may be a data lake, data warehouse, or other storage, and may be based on architectures such as AWS Data Lake®, Google Data Lake®, Azure Data Lake®, Cloudera Data Platform®, Databricks Unified Analytics Platform®, or others.
  • Process 400 may also include an operation 404 of disposing of a token of table of name-token pair groupings 103. For example, as discussed above, the token can be deleted, scrambled, moved, etc. In this manner, the deidentification of data can be highly granular. Data may be deidentified based on the token, and thus other data need not be deidentified as well. This approach thus offers significant more precision and granularity that other available techniques.
  • It is to be understood that the disclosed embodiments are not necessarily limited in their application to the details of construction and the arrangement of the components and/or methods set forth in the following description and/or illustrated in the drawings and/or the examples. The disclosed embodiments are capable of variations, or of being practiced or carried out in various ways.
  • The disclosed embodiments may be implemented in a system, a method, and/or a computer program product. The computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention.
  • The computer readable storage medium can be a tangible device that can retain and store instructions for use by an instruction execution device. The computer readable storage medium may be, for example, but is not limited to, an electronic storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing. A non-exhaustive list of more specific examples of the computer readable storage medium includes the following: a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), a static random access memory (SRAM), a portable compact disc read-only memory (CD-ROM), a digital versatile disk (DVD), a memory stick, a floppy disk, a mechanically encoded device such as punch-cards or raised structures in a groove having instructions recorded thereon, and any suitable combination of the foregoing. A computer readable storage medium, as used herein, is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber-optic cable), or electrical signals transmitted through a wire.
  • Computer readable program instructions described herein can be downloaded to respective computing/processing devices from a computer readable storage medium or to an external computer or external storage device via a network, for example, the Internet, a local area network, a wide area network and/or a wireless network. The network may comprise copper transmission cables, optical transmission fibers, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. A network adapter card or network interface in each computing/processing device receives computer readable program instructions from the network and forwards the computer readable program instructions for storage in a computer readable storage medium within the respective computing/processing device.
  • Computer readable program instructions for carrying out operations of the present invention may be assembler instructions, instruction-set-architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state-setting data, or either source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C++ or the like, and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The computer readable program instructions may execute entirely on the user’s computer, partly on the user’s computer, as a stand-alone software package, partly on the user’s computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user’s computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider). In some embodiments, electronic circuitry including, for example, programmable logic circuitry, field-programmable gate arrays (FPGA), or programmable logic arrays (PLA) may execute the computer readable program instructions by utilizing state information of the computer readable program instructions to personalize the electronic circuitry, in order to perform aspects of the present invention.
  • Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer readable program instructions.
  • These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer readable program instructions may also be stored in a computer readable storage medium that can direct a computer, a programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer readable storage medium having instructions stored therein comprises an article of manufacture including instructions which implement aspects of the function/act specified in the flowchart and/or block diagram block or blocks.
  • The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other device to cause a series of operational steps to be performed on the computer, other programmable apparatus or other device to produce a computer implemented process, such that the instructions which execute on the computer, other programmable apparatus, or other device implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
  • The flowcharts and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowcharts or block diagrams may represent a software program, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
  • The descriptions of the various embodiments of the present invention have been presented for purposes of illustration but are not intended to be exhaustive or limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein was chosen to best explain the principles of the embodiments, the practical application or technical improvement over technologies found in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.
  • It is expected that during the life of a patent maturing from this application many relevant virtualization platforms, virtualization platform environments, trusted cloud platform resources, cloud-based assets, protocols, communication networks, security tokens and authentication credentials will be developed and the scope of these terms is intended to include all such new technologies a priori.
  • It is appreciated that certain features of the invention, which are, for clarity, described in the context of separate embodiments, may also be provided in combination in a single embodiment. Conversely, various features of the invention, which are, for brevity, described in the context of a single embodiment, may also be provided separately or in any suitable subcombination or as suitable in any other described embodiment of the invention. Certain features described in the context of various embodiments are not to be considered essential features of those embodiments, unless the embodiment is inoperative without those elements. Although the invention has been described in conjunction with specific embodiments thereof, it is evident that many alternatives, modifications and variations will be apparent to those skilled in the art. Accordingly, it is intended to embrace all such alternatives, modifications and variations that fall within the spirit and broad scope of the appended claims.

Claims (20)

What is claimed is:
1. A non-transitory computer readable medium including instructions that, when executed by at least one processor, cause the at least one processor to perform operations for deterministic cryptography deidentification enabling granular destruction, comprising:
prepare a table of name-token pair groupings with unique tokens, wherein the table of name-token pair groupings is created by randomly assigning each name-token pair grouping based on an output of a cryptography random generator, and wherein the name-token pair groupings are configured to be used in a deidentification process and the unique tokens are assigned to a group, wherein the group is associated with one or more tags and the group creates an entry in the table of name-token pair groupings;
store data deidentified in association with the deidentification process in a centralized repository;
identify a token from the table of name-token pair groupings; and
enable reidentifying of a specific data item of the deidentified data based on the token provided from the table.
2. The non-transitory computer-readable medium of claim 1, wherein the operations further comprise dispose of the token from the table of name-token pair groupings.
3. The non-transitory computer-readable medium of claim 1, wherein the deterministic cryptography is performed via authenticated encryption with associated data cryptography.
4. The non-transitory computer-readable medium of claim 1, wherein the data includes personal identifiable information.
5. (canceled)
6. The non-transitory computer-readable medium of claim 1, wherein the table of name-token pair groupings includes {group-name: group-token} pairs.
7. The non-transitory computer-readable medium of claim 1, wherein the deidentification process further comprises use of an encryption key and a tag.
8. The non-transitory computer-readable medium of claim 7, wherein the token is smaller in size than the encryption key used in the deidentification process.
9. The non-transitory computer-readable medium of claim 1, wherein the operations further comprise deidentify data using the table of name-token pair groupings.
10. A system for granular destruction of data deidentified by deterministic cryptography, comprising:
one or more processors; and
a memory storing instructions to cause the one or more processors to execute operations of:
prepare a table of name-token pair groupings with unique tokens, wherein the table of name-token pair groupings is created by randomly assigning each name-token pair grouping based on an output of a cryptography random generator, and wherein the name-token pair groupings are configured to be used in a deidentification process and the unique tokens are assigned to a group, wherein the group is associated with one or more tags and the group creates an entry in the table of name-token pair groupings;
store data deidentified in association with the deidentification process in a centralized repository;
identify a token from the table of name-token pair groupings; and
enable reidentifying of a specific data item of the deidentified data based on the token provided from the table.
11. The system of claim 10, wherein the operations further comprise:
identify the token from the table of name-token pair groupings; and
enable reidentifying of a specific data item of deidentified data associated with the deidentification process based on the token provided from the table.
12. The system of claim 10, wherein the data includes personal identifiable information.
13. The system of claim 10, wherein the deterministic cryptography is performed via authenticated encryption with associated data cryptography.
14. (canceled)
15. The system of claim 10, wherein the table of name-token pair groupings includes {group-name: group-token} pairs.
16. The system of claim 10, wherein the deidentification process further comprises use of an encryption key and a tag.
17. The system of claim 16, wherein the token is smaller in size than the encryption key used in the deidentification process.
18. A computer-implemented method for deterministic cryptography deidentification enabling granular destruction, comprising:
prepare a table of name-token pair groupings with unique tokens, wherein the table of name-token pair groupings is created by randomly assigning each name-token pair grouping based on an output of a cryptography random generator, and wherein the name-token pair groupings are configured to be used in a deidentification process and the unique tokens are assigned to a group, wherein the group is associated with one or more tags and the group creates an entry in the table of name-token pair groupings;
store data deidentified in association with the deidentification process in a centralized repository;
identify a token from the table of name-token pair groupings; and
enable reidentifying of a specific data item of the deidentified data based on the token provided from the table.
19. The computer-implemented method of claim 18, further comprising disposing of the token from the table of name-token pair groupings.
20. The computer-implemented method of claim 18, wherein the data includes personal identifiable information.
US17/674,118 2022-02-17 2022-02-17 Deterministic cryptography deidentification with granular data destruction Active US11757626B1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/674,118 US11757626B1 (en) 2022-02-17 2022-02-17 Deterministic cryptography deidentification with granular data destruction

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
US17/674,118 US11757626B1 (en) 2022-02-17 2022-02-17 Deterministic cryptography deidentification with granular data destruction

Publications (2)

Publication Number Publication Date
US20230261856A1 true US20230261856A1 (en) 2023-08-17
US11757626B1 US11757626B1 (en) 2023-09-12

Family

ID=87558184

Family Applications (1)

Application Number Title Priority Date Filing Date
US17/674,118 Active US11757626B1 (en) 2022-02-17 2022-02-17 Deterministic cryptography deidentification with granular data destruction

Country Status (1)

Country Link
US (1) US11757626B1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6546385B1 (en) * 1999-08-13 2003-04-08 International Business Machines Corporation Method and apparatus for indexing and searching content in hardcopy documents
US20070276845A1 (en) * 2006-05-12 2007-11-29 Tele Atlas North America, Inc. Locality indexes and method for indexing localities
US20200334639A1 (en) * 2019-04-18 2020-10-22 Microsoft Technology Licensing, Llc Email content modification system

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109716345B (en) * 2016-04-29 2023-09-15 普威达有限公司 Computer-implemented privacy engineering system and method
US10742646B2 (en) * 2018-05-10 2020-08-11 Visa International Service Association Provisioning transferable access tokens
US11405365B2 (en) * 2019-03-13 2022-08-02 Springcoin, Inc. Method and apparatus for effecting a data-based activity
US11101987B2 (en) * 2019-06-10 2021-08-24 International Business Machines Corporation Adaptive encryption for entity resolution
US11227068B2 (en) * 2019-10-17 2022-01-18 Mentis Inc System and method for sensitive data retirement
US20210390546A1 (en) * 2020-06-15 2021-12-16 Magtek, Inc. Systems and Methods for Secure Transaction Processing
US11947706B2 (en) * 2020-08-28 2024-04-02 Open Text Holdings, Inc. Token-based data security systems and methods with embeddable markers in unstructured data

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6546385B1 (en) * 1999-08-13 2003-04-08 International Business Machines Corporation Method and apparatus for indexing and searching content in hardcopy documents
US20070276845A1 (en) * 2006-05-12 2007-11-29 Tele Atlas North America, Inc. Locality indexes and method for indexing localities
US20200334639A1 (en) * 2019-04-18 2020-10-22 Microsoft Technology Licensing, Llc Email content modification system

Also Published As

Publication number Publication date
US11757626B1 (en) 2023-09-12

Similar Documents

Publication Publication Date Title
US10491576B1 (en) System and method for security breach response using hierarchical cryptographic key management
US11379606B2 (en) Provision of risk information associated with compromised accounts
Arockiam et al. Efficient cloud storage confidentiality to ensure data security
WO2021012548A1 (en) Blockchain-based data processing method and system, and electronic apparatus and storage medium
US20140281520A1 (en) Secure cloud data sharing
US10320757B1 (en) Bounded access to critical data
WO2012166633A1 (en) Data perturbation and anonymization using one-way hash
US10169600B2 (en) Encryption policies for various nodes of a file
US20140068279A1 (en) Systems, methods, and mediums for secure information access
US11494508B2 (en) Secrets as a service
US11489660B2 (en) Re-encrypting data on a hash chain
US10657275B2 (en) Encryption directed database management system and method
CN104618096A (en) Method and device for protecting secret key authorized data, and TPM (trusted platform module) secrete key management center
US11075753B2 (en) System and method for cryptographic key fragments management
US9749299B1 (en) Systems and methods for image-based encryption of cloud data
CN112307515A (en) Database-based data processing method and device, electronic equipment and medium
US10536276B2 (en) Associating identical fields encrypted with different keys
CN113127915A (en) Data encryption desensitization method and device, electronic equipment and storage medium
CN110889121A (en) Method, server and storage medium for preventing data leakage
CN102752112A (en) Authority control method and device based on signed message 1 (SM1)/SM2 algorithm
US10380353B2 (en) Document security in enterprise content management systems
CN104182691B (en) data encryption method and device
Suthar et al. EncryScation: A novel framework for cloud iaas, daas security using encryption and obfuscation techniques
US11757626B1 (en) Deterministic cryptography deidentification with granular data destruction
CN114500103A (en) Internet of things privacy data segmentation and encryption method and block chain system

Legal Events

Date Code Title Description
FEPP Fee payment procedure

Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY

AS Assignment

Owner name: CYBERARK SOFTWARE LTD., ISRAEL

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:RIVLIN, OFER;REEL/FRAME:059061/0412

Effective date: 20220213

STCF Information on status: patent grant

Free format text: PATENTED CASE