NL1043913A - Data anonymization - Google Patents

Data anonymization Download PDF

Info

Publication number
NL1043913A
NL1043913A NL1043913A NL1043913A NL1043913A NL 1043913 A NL1043913 A NL 1043913A NL 1043913 A NL1043913 A NL 1043913A NL 1043913 A NL1043913 A NL 1043913A NL 1043913 A NL1043913 A NL 1043913A
Authority
NL
Netherlands
Prior art keywords
anonymization
data
key
mapping
applying
Prior art date
Application number
NL1043913A
Other languages
Dutch (nl)
Other versions
NL1043913B1 (en
Inventor
Ir Niek Johannes Bouman Dr
Original Assignee
Ir Niek Johannes Bouman Dr
Roseman Group Bv
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ir Niek Johannes Bouman Dr, Roseman Group Bv filed Critical Ir Niek Johannes Bouman Dr
Publication of NL1043913A publication Critical patent/NL1043913A/en
Application granted granted Critical
Publication of NL1043913B1 publication Critical patent/NL1043913B1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/32Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials
    • H04L9/3236Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials using cryptographic hash functions
    • H04L9/3242Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials using cryptographic hash functions involving keyed hash functions, e.g. message authentication codes [MACs], CBC-MAC or HMAC
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • G06F21/6254Protecting personal data, e.g. for financial or medical purposes by anonymising data, e.g. decorrelating personal data from the owner's identification
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/08Key distribution or management, e.g. generation, sharing or updating, of cryptographic keys or passwords
    • H04L9/0861Generation of secret information including derivation or calculation of cryptographic keys or passwords
    • H04L9/0877Generation of secret information including derivation or calculation of cryptographic keys or passwords using additional device, e.g. trusted platform module [TPM], smartcard, USB or hardware security module [HSM]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/08Key distribution or management, e.g. generation, sharing or updating, of cryptographic keys or passwords
    • H04L9/0894Escrow, recovery or storing of secret information, e.g. secret key escrow or cryptographic key storage
    • H04L9/0897Escrow, recovery or storing of secret information, e.g. secret key escrow or cryptographic key storage involving additional devices, e.g. trusted platform module [TPM], smartcard or USB
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2143Clearing memory, e.g. to prevent the data from being stolen

Abstract

The present invention discloses a method, device and system for anonymizing data, where said system comprises a cascade of at least two data anonymization devices, where each device applies a keyed anonymization mapping to the data, and each device is operated 5 by an independent party who uses an independent anonymization key. Such system offers unprecedented privacy guarantees: as long as at least one party behaves honestly, in that she will keep her anonymization key secret and, in case of an ordered key-erasure will indeed delete it, the overall anonymization mapping will remain private, regardless of the behavior of the other parties. 10 Fig. 1. 1043913

Description

DATA ANONYMIZATION
FIELD OF THE INVENTION The invention relates to systems, devices, and methods for data anonymization.
TECHNICAL BACKGROUND S in many situations there is a desire or even a legal necessity io anonymize data, for example personal data. The goal of data ancoymization is to transform dala that is related io some entity £ into a form in which the relation ia £ is removed or adequately suppressed, while preserving certain other properties of the data andor the ability lo apply certain functions lo that dala. For example, if one wants to test whether two dala tems are equal, one might anonymize those two data items first by applying an injective pseudorandom mapping to both items, and test for equality in the space that coincides with the co-domain of the mapping.
The foremost requirement of any data anonymization method is that anyone who dass not know the anonymizalion mapping cannot re-identify the anonymized data, Le. re- esiablish the relation between the data and entity £. An unsound method for anonymizing dala Hems is by applying some fixed (Le, non-random) hash function to each of these items, because when an attacker observes an anonymized data item, for exampie an anonymized name of a person, the atlacker could fry to re-identily the anonymized data item by guessing a name and learn whether this guess is correct by comparing the hash of the guess ta the anonymized dala item.
Keyed anonymization is a form of gnonymization thal involves a cryplographic key that, when given the public information about the mapping family, fully determines the mapping between inputs and outputs; and rules cut the above guessing attack, under the assumption that the attacker does not know the cryplographic key. For example, tet the HMAC-SHA258 algorithm be used as the mapping family, which is public information. The cryptographic key is then used to select one specific mapping from this mapping family. The cryptographic key is viewed as a random variable, hence the choice of the specific element from the mapping family is alse viewed as a random variable, which sels it apart : from a fixed choice of a mapping; it has already been argued above that anonymizing dala using a fixed mapping would be unsound.
Optionally, if is required that the anonymized data has the same format as the source inon-anonymized) data, for example, such that an anonymized credit card number is again a valid credit card number, This kind of anonymization is referred {o as lormat-preserving anonymizgtion.
US patent no, US 8,202,078 discloses a method for anonymizing data from multiple data sources using a keyed hash function.
US patent no. US 10,103,888 discloses a method for computing HMAC using multiparty computation on al least two servers, The keys required for the servers are generated from a common key by a third party.
US patent no, US 10,140,474 discloses a method for improved coniaxd information management, where as part of said application an anonymization component runs on a trusted execution environment (TEE).
SUMMARY OE THE INVENTION In a practical scenario, it might be difficult to ensure that the cryptographic key of a keyed ancnymization system remains privale. Note that from hereon, the term anonymizalion key represents a cryplographic key that is used as the key In a keyed anonymizalion method or device. Leakage of key material could occur, for example, dus to accidental loss of storage media {ike USB sticks}, negligence or intentional misconduct of an employee involved in the data anonymization process, unintended automated backup of the anonymization key, or a malware infection or other security incident on the computer on which the anonymization is performed.
In some cases, it is required that the key of a keyed anonymizalion method can be ordered lo be erased, with the aim to render ra-idantification absolutely impoasible eternally. The main issue with such an erasure feature, however, is that it is impossible to verify whether secret coples have been made prior to the “erasure” of the key. Further, the anonymization method should be applicable to large data sets; this implies that the throughput (the number of data iem ananymizations per second) of the anonymization method should be sufficient to handle such workloads.
An object of the invention is 0 provide a high-speed, optionally format-preserving keyed anonymization method that has better protection against intentional or unintentional key-leakage and, in case of a key-erasure order, provides more confidence in the belief that the key has truly been erased, compared to the state of the art, The present invention discloses a method, device and system for applying a keyed anonymization mapping to data, where said system comprises at least two anonymization devices. An anonymization device comprises an input interface for receiving dala, an anonymization key, a processor configured 10 anonymize the received dats using & keyed anonymization mapping under said anonymization key, and an output interface for transferring the anonymized data. Said anonymization device either receives its inp data from yet another anonymization device, of sends fis aulput data to yet another anonymization device, or both. Each anonymization device is typically operated by an independent party. The benefit of using a system comprising multiple anonymization devices operated by independent parties is that such system offers unprecedented privacy guarantees: as long as al least one party behaves honestly, in that she will keep her anonymizalion key secret and, in case of an ordered key-erasure {typically issued after the anonymization of a data set has completed) will indeed delete it, the overall anonymizalion mapping will remain private, regardless of the behavior of the other parties. The multi-party aspect may thus provide mitigation against the risks of intentional and unintentional data leakage.
From the perspective of secure multiparty computation (MPC), the disclosed method combines the strong privacy properties offered by MPC with the performance characteris- tics of cleartext-computation. Further embodiments of the invention and the advantages thereof will be explained in more detail further in Ine description.
BRIEF DESCRIPTION OF THE DRAWINGS Fig. 1 shows a first example of the anonymization system in which multiple anonymization devices are placed in cascade, where N denotes the number of such devices in the cascade. The immediate benefit is that key leakage occurring at at most á anonymization devices does not expose the overall anonymization mapping, for all non-negative integers kN.
Fig. 2 shows an anonymization device with its input and oulput interface, with which said device is connected to a data source and data sink, where sald source and sink can represent any type of data source and sink respectively, like a file, network socket, or another anonymization device, Fig. 3 shows how an anonymization device can be implemented; in the example ilustrated in Fig. 3, the data source is a character stream {which can be, for example, a TOPAP socket stream or a file stream), and data from this source stream is split into separate data items by a tokenizer, after which the data items are anonymized by a keyed mapping, then optionally re-formatted, and finally encoded to make said data items suitable for transmission to the data sink.
Fig. 4 shows the principle of a second embodiment of the invention in which storage of the anonymization key and application of the keyed anonymization mapping are performed inside a trusled execution environment (TEE), with the benefit of a strong protection of the anonymization key against intentional or unintentional leakage.
Fig. 5 shows more details of this second embodiment with respect to key management inside the trusted execution environment, like performing the generation of the anonymiza- tion key also inside the TEE, and encryption of the anonymization key using a suitable ancryption scheme under an enclave key, which has the benefit that the anonymization key can be securely stored outside the TEE in encrypted form.
Fig. 8 shows an embodiment in which the trusted execution environment is attested by a third party attestation service, and where the secure connections (e.g., making use of TLS) for incoming and outgoing data are attested channels. The benefil of this embodiment is thal the party that runs this anonymization device cannot observe any data in the clear, hance said party does not have to be trusted, Fig. 7 shows a computer program product according io an embodiment of the invention.
DETAILED DESCRIPTION OF THE INVENTION The tnvention disclosed in this application provides a method, system and device for dala anonymization, optionally equipped with data-format preservation, Embodiments may provide protection against unintentional or intentional leakage of anonymizalion key material, In various embodiments, a throughput may be achieved which is on par with state-of-the-art anonymization solutions.
18 Fig, 1 shows a system 100 In which dala items originating from a daig source 110, ars anonymized by N anonymization devices 121, 122, 123 that are placed in cascade, after which the anonymized data items are transierred lo a dala sink 150, For example, the system could employ two, three, or at least four anonymizalion devices. As an Hlustrative example, three anonymization devices are shown. Each anonymization device 121, 122, 123 is typically operated by an independent party.
Accordingly, an anonymization device such gs anonymization device 121 may oblain original, non-anonymized data directly and provide it to a further anonymization device. An anonymization device such as anonymization device 122 can also both obtain data to be anonymized from a further anonymization device, and provide the anonymized dala io another further anonymization device. An anonymization device such as anonymization device 123 may also obtain data from a further anonymization device and determine final anpnymized dala that is ready for use. Typloally, an anonymization system comprises at least a pair of anonymization devices, wherein one party anonymizes data and provides it to anather party for further anonymization.
A party may receive the data © De anonymized as part of a message, wherein ine message may further identify the anonymization device 10 send the anonymized data to, e.g. in a list of subsequent anonymization devices by which the data should be anonymized. The identification of the anonymization device(s} may be cryplographicatlly signed to protect thelr integrity, for example. Information about the topology of the anonymization system 100, Le, how the anonymization devices should be interconnected may also ba provided via, e.g., command-line parameters or configuration files, Ar anonymization device such as anonymization device 121 may be integraled with the data source 110, For example, if the dala source is a relational database, the anonymization device 121 may be embodied as a database plugin, An anonymization device such as anonymization device 122 or 123 may perform checks on the data, eg. a statistical test, to verity that the data has already been anonymized by at least one other party.
The data source 110 can be any source of digital data: a file, a relational database, a network stream of JSON objects, sicetera, Likewise, samples of the data sink 150 include: a network socket, g file, and a database.
The anonymization devices 120, 121, 122 could be interconnected by TCP sockets with § Transport Layer Security (TLS), of by means of a different interface. 1 is noted that sach such anonymization device is typically operated by an independent entity and uses an independently chosen anonymization key 330 (see Fig. 3}. The benefit is thal no coalition of size strictly less than N parties has knowledge of the overall keyed anonymization mapping, where the overall keyed anonymization mapping is defined as the composition of all cascaded keyed mappings. Note that said anonymization devices do nol have to be identical; said devices could employ a variety of keyed mappings, as well as 3 variety of interconnects. In the system as shown in Fig. 1, only the last anonymization device 123 typically performs data re-formatting (for the sake of format-preserving anonymization), while the other anonymization devices 121, 122 typically amit data reformatting.
Figs. 2 and 3 illustrate an embodiment of an anonymization device 120, 9,9, for use in system 100, An anonymization device 120 comprises an input interface 130 for receiving data, an anonymization key 330, a processor 370 configured to anonymize the received data using a keyed anonymizalion mapping under the anonymization key 330, and an output interface 140 for transiarring the processed data.
Because the processor 370 might be used far other tasks than anonymizing data, such as storing the anonymizalion hey 330 or executing a pseudorandom generator 340, the data anonymization task that runs on the processor 370 is indicated in Fig, Jas the anonymization mapper 320.
Anonymizalion device 120 either receives ifs input data from yet another anonymization device, or sends its processed dala lo yet another anonymization device, or both, Hence, the data source 210 and data sink 250 are like sourne 110 and sink 150 respectively, except that 210 and 250 could also represent other anonymization devices, which might also be part of system 100, Examples of the input interface 130 and output interface 140 include infernal/external storage interface for accessing an internal/external storage: network interface for accessing a iocal/wide area network.
Optionally, the anonymization device 120 also re-formats each data fem, such that the data items output to the data sink 250 have the same format as the data items before those underwent any anonymization operations.
The anonymization key 330 can be viewed in two ways:
1. it can be regarded as a volatiie storage slement, like a GPU register or computer memory, for storing a binary string of a given length; and
2. It can be regarded as a value, i.e. the contents of said storage element.
When viewed as a value, the anonymization key 330 may be a cryptographic key.
For example, the anonymization key may be a random variable with a close-to-uniform proba- bility distribution, and may be kept secret.
The key 330 may have a key length, eg, of al least 80 or at least 120 bits, and/or may provide, eg, al least 40 or at least 80 bits of entropy.
For example, the key 330 may be an AES key or similar.
Said key is typically generated using a random number generator.
For example, said key is generated using the pseudorandom generator (PRG) 340. Another example of a random number generator is 3 quantum random number generator, The anonymizalion key 330 can be exported from the anonymization device, e.g. for the sake of persistent storage.
Also, a key can be loaded from an external source ini the storages register 330. The interface used for this anonymization-key import/export feature is not explicitly shown in the figure. in case the data input is a character stream, a tokenizer 310 can be used lo separale sald stream into separate data Hems, For sxample, # the input stream is a sequence of ASCH characters, where data items are separated from each other by a delimiter symbol, ag. the newline character, the tokenizer may split the input based on this delimiter symbol, remove the delimiter symbols from the dala tems, and forward the data Hems as separate units lo the anonymization mapper 320. In another example, the input arrives as a sequence of frames where the length of each frame ís encoded in the frame itself typically at the beginning of the frame.
In such case, the {okenizer may decode the framing format, and split the input into separate data items, where the additional framing information is removed, in yet another example, the input is already split in separate paris, and the tokenizer 310 is either absent or is a no-op. - The gnonymization mapper 320 is a mapping that maps an input bit string to an output bit string, where the mapping depends on the anonymization key 330, While the lengths of said key and the output of the mapping are typically fixed, the input bit string may sither have fixed-length or have arbitrary length, depending on the particular choice of the mapping. in an embodiment, the mapping is a one-way mapping. in an embodiment, the mapping is a collision-resistant mapping.
The actual choice of the mapping, and which cryptographic properties it has (e.g. one-waynass, coliision-resistance} may be made depending on the application and on the implementation platform used.
Examples of keyed mappings includes: ~ a pseudorandom permuiation: — the block cipher AES; — & one-way compression function instantiated with a block cipher, like the Davies- Meyer or the Matyas-Meyer-Oseas construction instantiated with AES, where the anonymization key is used as the initialization vector (IV) {note that an IV is typically inherent fo thase constructions); — {he HMAC construction, which securely combines a key with a coliision-resistant hash function, like HMAC-SHA256, and NMAC-SHA512; 1043913 |
— the SHA-3 hash of the-input-concatenated-with-the-key; and ~ the BLAKEZD algorithm.
The re-formatter 380, whose presences is optional, re-lormats each data item, such that the data (tems that emanale from the re-formatter have the same format as the data before it § underwent any anonymization operations.
There might be additional requirements on the dalg re-formatiing operation. One such requirement is a capability of mitigating frequency analyses. For example, when performing a formal-preserving anonymization of US streel names (e.g, originating from a person's postal address), a typical re-identification attempt might be to perform a frequency iQ analysis on the anonymized data set. Such frequency analysis could be informative (hence, successful} when, e.g., a very uncommon US street name ocours very frequently in the anonymized data sel.
To mitigate such frequency attacks, the re-formatter 360 could be equipped with a method that has knowledge of the irequency distribution of the data iem to be re-formatied, where this distribution is induced by the original, non-anonymized data set. The said reformatting method works as follows; the explanation is in terms of the street name example that was introduced above, The re-formatling method has a list of all possible steel names and an estimate of the probaliiiy mass function (PME) corresponding to the streel names occurring in the input data set, or, more generally, all street names that could possibly have occurred in the input data set. # only a subset of empirical probabilities is known, an estimate of the PMF might be found using an “add-constant” estimator {like Laplace’s add-ons rule) or the Good-Turing estimator [Orlitsky]. From the PME the cumulative distribution function can then be computed by sununation. Next, a closse-to- uniformiy-random number is exiracted from the oulput of the anonymization mapper 320 (where the probability is over the randomness of the anonymization key). H necessary, this number is converted to a floating point number and scaled to an appropriate range. Finally, the number is converted to a street name that has approximately the desired distribution | by means of the inverse transform sampling method.
The encoder 350 optionally transforms the output of the anonymization mapper 320 {or the output of the reformatter 360, ff present) to a format sultable for transmission lo the data sink 250, and optionally adds framing information. For example, the encoder 350 could transform binary data to a subset of the ASCH character set by means of the BASE-84 encoding, and re-introduce newline delimiter lo separate data items. In another example, the encoder 350 transmis prior to each data item the byte-length of that daly Hem: the data item itself is then iransmitled as is.
In another embodiment of the investtion, described in Fig. 4, an anonymization device 400 perlorms certain critical tasks inside a trusted execution environment {TEE} 410, also called “secure enclave”. Examples of a TEE include Intel® SGX, AMD® Secure Technology, Apple® Secure Enclave, and Arm® TrustZone™. The anonymization mapper
320, which applies a keyed anonymization mapping to data thal is typically already anonymized, is run inside the TEE, and also the anonymization key 330 is kept inside the TEE and never leaves it in unencrypted form. This example is based on that of Fig, 3 and various options described with respect to that figure also apply here, it might be desirable, however, to be able lo store the anonymization key 330, at least temporarily, in non-volatile memory, e.g., in order to anonymize a dala set in several phases, or to continue anonymization alter an unexpected interruption like a power outage. Fig. 5 describes possible operations related 10 the anonymization key 330 in more detail, The anonymizalion key 330 can be generated inside the TEE 410 using a pseudorandom : generator (PRG) 340. To store the ananymization key 330 in encrypted form outside the TEE 410, i is encrypted using an encryption scheme 510 under an enclave key 540 that is typically present in a TEE platform, The encryption scheme 510 can be an authenticated encryption scheme, for example.
Another embaodimant of the invention is shown in Fig. 8. This embodiment may re- use the various options described with respect lo the earlier figures. This embodiment shows an anonymization device that is Implemented almost entirely Inside a TEE. That is, not only the gnonymization mapper 320, the anonymization key 330, and optionally key management operations as shown in Fig. § are placed inside the TEE; also decryplion and encryption of incoming and/or outgoing network traffic 620, 630 is performed inside the TEE. The TEE may be capable of altesting lo another entity, for example to another anonymization device, that the TEE platform is authentic 810 and that the TLS endpoinis for secure socket corrections 620, 630 are inside the TEE. The attestation is typically performed via a third-party attestation service 650.
The benefit of this embodiment is that the party that runs this anonymization device cannot observe any data in the clear, hence said party does not have to be trusted.
As will be appreciated by one skilled in the art, aspects of the present invention may be embodied as a system, method or computer program product, Accordingly, aspects of the present invention may lake the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, microcode, sc} or an embodiment combining software and hardware aspects that may all generally be referred to herein as an “apparatus”, “device” or “system.” Furthermore, aspects of the present invention may take the form of a computer program product embodied in one or more compuler-readable medium{(s) having computer-readable program code embodied therson, Any combination of one or more compuler-readabie medium{s} may be utilized. The computer-readable medium may be a computer-readable signal medium or a computer-readable storage medium, A compulerrteadable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examplas {a non-exhaustive list) of the computer-readable
S storage medium would include the following: an electrical connection having ong or more wires, a portable USB stick, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM), Flash memory, an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage § device, a magnetic slorage device, or any suitable combinalion of the foregoing. In the context of this document, a compuler-readabie storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device, A computer-readable signal medium may include a propagated data signal with computer-readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electromagnetic, optical, or any suitable combination thereol. A compuier-readable signal medium may be any computer-readable medion that is not a computer-readatde storage medium and that can communicals, propagate, or ransport a program for use by or in connection with an instruction execution 18 system, apparatus, or device, Program code embodied on a compuder-readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optinal fiber cable, RF etc, or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an 2 object oriented programming language such as Java, Python, C++ or the Ike, a fune- tional programming language such as Lisp, Haskell, Clojure or the like, and conventional procedural programming languages, such as the "C7 programming language or similar programming languages. The program code may execule entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's 28 computer and partly on a remote compuier or entirely on the remote computer or server. in the later scenario, the remote computer may be conneclad to the users computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer {for example, through the Internet using an internet Service Provider).
Aspects of the present invention are described in this application with reference to flowchart illustrations and/or block diagrams of methods, apparatus {systems} and com puter program products according fo embodiments of the invention. It will be understood that each block of the flowchart Hustrations and/or block diagrams, and combinations of blocks in the flow chart illustrations and/or Block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other pro- grammabie dala processing apparatus to produce a machine, such that the instructions, which execute with the processor of the computer or other programmable data processing apparatus, create means for implementing the funclions/acts specified in the flowchart and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer-readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices 10 cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the {0 instructions which execute on the computer or other programmable apparalus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
Referring now ta Fig. 7, a representative hardware environment for practicing at least one embodiment of the invention is depicted.
This schematic draw- ing Hustrates a hardware configuration of an information handling/computer system in accordance with at least one embodiment of the invention.
The system comprises at least one processor or central processing unit (CPU) 10, The CPUs 10 are interconnacted with system bus 12 to various devices such as a random access memory (BAM) 14, read-only memory (ROM) 18, and an input/output (YO) adapter 18. The VO adapter 18 can connect to peripheral devices, such as disk units 11 and tape drives 13, or other program storage devices that are readable by the system, The system can read the inventive instructions on the program storage devices and follow these instructions to execute the methodology of at least one embodiment of the invention.
The system further includes a user interface adapter 19 that connects a keyboard 15, trackpad 17, and/or other user interface devices such as a mouse {not shown) to the bus 12 to gather user input, Additionally, a commu- nication adapter 20 connects the bus 12 to a data processing network 25, and a display adapter 21 connects the bus 12 to a display device 23 which may be embodied as an output device such as a monitor, printer, or transmiiter, for example.
The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according fo various embodiments of the present invention, In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures.
For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The following is a list of clauses showing envisioned embodiments of the invention.
1. A computer-implemented method for anonymizing data, characterized in that the method comprises.
~ accessing dala via an input interface; ~ applying a keyed anonymization mapping to said data using an anonymization key; and ~ outputting the result of applying said mapping via an cutput interface for further processing; wherein: — accessing the data via the input interface comprises obtaining said data from an anonymization device comprising a processor configured to perform the method of the present clause, and/or 18 — putputting the result of applying said mapping via the output interface com- prises providing said result to an anonymization device comprising a processor configured to perform the method of the present clause.
2. Method as described In clause 1, comprising performing the further processing on the result of applying said mapping, comprising at least one of: applying a further keyed anonymization mapping to said result, storing sald result to a file, storing said result in a database, sending said result over a network.
3. Method as described in clause 1 or 2, comprising, after applying the keyed anonymiza- tion mapping, applying a data reformatiing method to make the result of applying said mapping conform to a format of the original data.
4. Method as described in clause 3, wherein applying the data reformatting method comprises extracting a close-to-uniformiy-distributed number from the anonymized data item and obtaining a reformatied data item by inverse sampling according to the close-to-uniformiy-distributed number.
5. Method as described in clause 3 or 4, characterized in that said reformatting method comprises: — obtaining a cumulative probability distribution of possible data values according to the format of the original dala; ~ extracting a close-to-uniformiy-distributed number from the result of applying the keyed anonymization mapping; and
— converting said number to an slement in the support of said estimated probabil ity mass function, using an inverse transform sampling method.
8. Method as described in any of clauses 1-5, characierized in that a method selected from the group consisting of: ~ a haved permutation; — & one-way compression function instantiated with a block cipher, such as: — the Davies~Meyer construction instantiated with AES, ar ~ the Matyas-Meyar-Oseas construction instantiated with AES, whete the anonymization key is used as initialization vector; — the HMAC construction, like HMAL-SHA258, and HMAC-SHAST2; ~ the BLAKER2D algorithm; ~ the SHA3-hash of the input concatenated with the ananymization key; is used as said keyed anonymization mapping.
7. A computer program product comprising instructions which, when the program is executed by a computer, cause the computer to carry out the steps of the method as described In any of clauses 1-8.
8. A computer-readabls storage medium comprising instructions which, whan executed by a computer, cause the computer to carry out the steps of the method as described in any of clauses 1-8.
9. A device for anonymizing data, comprising: — an input interface for accessing the data; ~ an output interface for oulpulting anonymized data; — a memory unit for staring an anonymization key; ~ a processor configured ©) ~ geeess the dats via the input interface; - apply a keyed anonymization mapping to said data using an anonymízalion key, obtaining the anonymized data; — output the anonymized data via an output interface for further processing, wherein: ~ accessing the data comprises oblaining said data via the input interface from a further anonymization device according lo the present clause, and/or ~ outputting the anonymized data comprises providing said data to a further anonymization device according lc the present clause.
10, Device as described in clause 9, wherein the processor is configured to perform a tokenization of the dala prior to applying the keyed anonymization mapping.
11. Device as described in one of clauses 3 or 10, wherein the processor is further configured to: : ~ perform a data reformatting to make the anonymized data conform to the format of the original data, and/or ~ transform the anonymized data into a format suitable for transmission.
12. Device as described in any of clauses 9-11, wherein the processor is further configured to generale an anonymization key using a random number generator, such as a pseudorandom generator or a quantum random number generator; and store the generated anonyraization key in said memory unit,
13. Device as described in any of clauses 8-12, comprising a trusted execution snvi- ronment (TEE) in which al least part of said processor, memory unit and/or random number generator is implemented.
14 14. Device as described in clause 13, comprising: ~ a memory unit inside the TEE for storing an enclave key; ~ gprogessor inside the TEE, configured lo encrypt the anonymization key under the enclave key, and to decrypt an encrypted anenymization key using the enclave key, and <0 — a non-volatile storage medium outside the TEE, for storing the encrypted ananymization key.
15. Device as described in any of clauses 13-14, wherein the processor inside the TEE is contigured to obtain the dala to be anonymized via an attested secure channel from a further anonymization device and/or to provide the anonymized data via an attested secure channel to a further anonymizaiion device.
18, A system comprising at least {wo anonymization devices, each according lo one of clauses 9-15.
The terminology used hersin is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein, the singular forms ar, “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the root terms “Include” and/or “have”, when used in this specification, specify the presence of stated Jealures, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of ong or more other features, integers, steps, operations, elements, components,
and/or groups thereol. The corresponding structures, materials, acis, and gguivalents of alt means plus function elements in the claims below are intended to include any structure, : or material, ior performing the funchion in combination with other claimed elements as specifically claimed. The description of the present invention has been presented for & purposes of illustration and description, but is not intended lo be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent {o those of ordinary skill in the art without departing from the scope and spirit of the invention. The embodiment was chosen and described in order to best explain the principles of the invention and the practical application, and to enable others of ordinary ski in the artto understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.
REFERENCES [Orlitsky] Qrlitsky & Suresh: Competitive Distribution Eslimation: "Why is Good-Turing Good", MIPS 2015
LEGAL NOTICE AMD ís a trademark of Advanced Micro Devices, Inc. Apple is a trademark of Apple Inc. Arm and TruslZone are registered trademarks of Arm Limited {or Hs subsidiaries) in the US and/or elsewhere. Intel is a trademark of Intel Corporation or its subsidiaries.

Claims (15)

CONCLUSIESCONCLUSIONS 1. Een computer-geïmplementeerde methode voor het anonimiseren van gegevens, met het kerimerk dat de methode omval: — loegang lol gegevens vig een invoerinterlace; — het toepassen van een sleulelgebaseerde anonimiseringsmapping op ge- noemde gegevens met gebruik van een anonimiseringssleuiei; en — het uitvoeren van het resultaat van het toepassen van genoemde mapping via cen uitvoerinterlace voor verdere verwerking; waarin: — joagang tot de gegevens via de invoerinierface omvat het verkrijgen van de gegevens van een anonimiseringsapparaat dat een processor omvat die is geconiigureerd om de methode van de huidige conclusie uit te voeren, en/of ~ het uitvoeren van het resultaat van het toepassen van genvemde mapping via de uitvoerinterface omvat het leveren van genoemd resullaat aan een anonimiseringsapparaat dat een processor omvat die is geconfigureerd om de methode van de huidige conclusie uit te voeren.1. A computer-implemented method for anonymizing data, characterized in that the method overturns: — logang lol data vig an input interlace; — applying a key-based anonymization mapping to said data using an anonymization key; and — outputting the result of applying said mapping through an output interlace for further processing; where: — accessing the data through the input interface includes obtaining the data from an anonymization device that includes a processor configured to perform the method of the present claim, and/or ~ outputting the result of applying Said mapping via the output interface comprises supplying said result to an anonymization device comprising a processor configured to perform the method of the present claim. 2. Methode zoals beschreven in conclusie 1, omvatlende het uilvoeren van een ver- ders verwerking op het resultaat van het toepassen van de genoemde mapping, omvattende ten minste één van: het toepassen van sen verdere sleutelgebasserde anonimisering op het gencemde resuliaat, genoemd resultaat opslaan in een be- stand, genoemd resultaat opslaan in een database, het verzenden van genoemd resultaat via een netwerk.The method as set forth in claim 1, comprising performing further processing on the result of applying said mapping, comprising at least one of: applying a further key-based anonymization to the captured result, storing said result in a file, storing said result in a database, transmitting said result over a network. 3. Methode zoals beschreven in conclusie 1 of 2, omvattende, na het toepassen van de sleutelgebaseerde anonimiseringsmapping, het toepassen van een dala- herlormatteringsmethode om het resultaat van het toepassen van genoemde map- ping te laten overeenstemmen met het formaat van de originele data.The method as set forth in claim 1 or 2, comprising, after applying the key-based anonymization mapping, applying a Dala reformatting method to match the result of applying said mapping to the format of the original data. 4. Methode zoals beschreven in conclusie 3, waarbij het toepassen van de dala- herformatieringsmethode omvat: het extraheren van een bijna-uniform-verdeeld getal uit het geanonimiseerde data-item, en het verkrijgen van een ge-herlormatteerd data-item door inverse-sampling volgens dal geëxtraheerde getal.The method as set forth in claim 3, wherein applying the dala reshaping method comprises: extracting a near-uniformly distributed number from the anonymized data item, and obtaining a reformatted data item by inverse- sampling according to valley extracted number. 5. Methods zoals beschreven in conclusie 3 of 4, met het kenmerk dat genoemde herformatteermethode omvat: ~ het verkrijgen van een cumulatieve kansverdeling over de mogelijke data-items, op basis van originele data,Methods as described in claim 3 or 4, characterized in that said reformatting method comprises: ~ obtaining a cumulative probability distribution over the possible data items, based on original data, ~ het extraheren van sen bijna uniform verdeeld getal uit het resultant van het toepassen van de sieulelgebasserde anonimiseringsmapping; en ~ hel omzelien van dit gelal naar een slement in de drager van de geschatie kansfunctie, met behulp van een inverse transform sampling methode.~ extracting a nearly uniformly distributed number from the resultant of applying the sample-based anonymization mapping; and ~ bypassing this gelal to a slement in the carrier of the estimate probability function, using an inverse transform sampling method. 6. Methode volgens eén van de conclustes 1-5, met het kenmerk dat een gekozen mathode uit de groen bestaande ui: ~ gen sleutelgebasserde permutatie; — gen one-way comprassiefunciie, geïnstantieerd met een Diokcipher, zoals: ~ de Davies-Meyer-consiructie geïnstantieerd met AES, of ~ de Matyas-Meyer-Oseas-constructie gsïnslantieerd met AES, waarbij de anonimiseringssteutel wordt gebruik! als initialisatievector; ~ de HMAG-constructie, zoals HMAC-SHA258, en HMAC-SHAS12; ~ het BLAKE2b-algoritme; ~ de SHA3-hash van de invoer, geconcateneerd met de anonimiseringssieutel; wordt gebruik! als de genoemde sieuleligebaseerde anorimiseringsmapping.Method according to any one of claims 1-5, characterized in that a selected mathode from the green consisting of: ~ gene key-based permutation; — no one-way compression function instantiated with a Diokcipher, such as: ~ the Davies-Meyer construction instantiated with AES, or ~ the Matyas-Meyer-Oseas construction instantiated with AES, using the anonymization key! as an initialization vector; ~ the HMAG construct, such as HMAC-SHA258, and HMAC-SHAS12; ~ the BLAKE2b algorithm; ~ the SHA3 hash of the input, concatenated with the anonymization key; is being used! as said sieuleli-based anonymization mapping. 7. Ean computerprogamma-praduct dat instructies omvat die, wanneer het programma wordt uitgevoerd door een computer, ervoor zorgen dat de computer de stappen van de methode uitvoert zoals beschreven in én van de conclusies 1-6. B. A computer program product comprising instructions which, when the program is executed by a computer, causes the computer to perform the steps of the method as described in any one of claims 1-6. b. Een compuier-ieesbaar opslagmedium omvattende instructies die, wanneer uilgs- 0 voerd door gen computer, ervoor zorgen dat de computer de stappen uitvoert van de meihode zoals beschreven in én van de conclusies 1-8.A computer-readable storage medium comprising instructions which, when executed by a computer, cause the computer to perform the steps of the method as described in any one of claims 1-8. 9. Een apparaat voor het anonimiseren van gegevens, bestaande uit: ~ gen invoerinterizce voor toegang tol de gegevens; ~ gen uitvoerinterlace voor het uitvoeren van geanonimiseerde gegevens; — gen geheugeneenheid voor het opslaan van een anonimiseringssieutad, ~ aan processor geconigureerd on: ~ de gegevens in te lezen via de invoerintarface; — hel lgepassen van een sieuelgebaseerds. anonimiseringsmapping op genoemde gegevens met behulp van genoemde anonimiseringssieutel, waarbij de geanonimiseerde gegevens worden verkregen; — de geanonimiseerde gegevens uil le voeren via een uitvoerinterlace voor verdere verwerking; waarin:9. A data anonymization device comprising: ~ gene input interface to access the data; ~ gene output interlace for outputting anonymized data; — no memory unit for storing an anonymization code, ~ configured on the processor to: ~ read the data through the input interface; — the fit of a sieuel based. anonymization mapping on said data using said anonymization key, wherein the anonymized data is obtained; — output the anonymized data through an output interlace for further processing; in which: ~ de gegevens inlezen omval het verkrijgen van de gegevens via de invostin- terface van een ander anonimiseringsapparaat volgens de huidige conclusie, enol — het uitvoeren van de geanonimiseerde gegevens omvat het verschaffen van ge- noemde gegevens aan een ander anonimiseringsapparaat volgens de huidige conclusie.~ reading in the data includes obtaining the data through the invo interface of another anonymization device according to the present claim, enol - outputting the anonymized data comprises providing said data to another anonymizing device according to the present claim. 10, Apparaat zoals beschreven in conclusie 8, waarbij de processor is geconfigureerd om een tokenisatie van de gegevens uil te voeren vooralgaand aan het toepassen van de sleutelgebaseerde anonimiseringsmapping.An apparatus as set forth in claim 8, wherein the processor is configured to perform a tokenization of the data prior to applying the key-based anonymization mapping. 11, Apparaat volgens een van de conclusies 9 of 10, waarbij de processor verder is gecontigureard om: ~ gen gegevens-herformattering uit te voeren om de geanonimiseerde gegevens in overeenstemming te brengen met het formaat van de originele gegevens, en/of — de geanonimiseerde gegevens om te zetten in een formaat dat geschikt is voor verzending.Apparatus according to any one of claims 9 or 10, wherein the processor is further configured to: - perform no data reformatting to match the anonymized data to the format of the original data, and/or - the anonymized convert data into a format suitable for transmission. 12. Apparaat zoals beschreven in één van de conclusies 9-11, waarbij de processor verder is geconfigureerd om een anonimiseringssleute! te genereren met behulp van een random number generator, zoals een pseudorandom generator of een quantum random number generator; en om de gegenereerde anonimiseringssieutel op te slaan in de geheugeneenheid.An apparatus as claimed in any one of claims 9-11, wherein the processor is further configured to accept an anonymization key! generate using a random number generator, such as a pseudorandom generator or a quantum random number generator; and to store the generated anonymization key in the memory unit. 13. Apparaat zoals beschreven in één van de conclusies 9-12, emvatiende san trusted execution environment (TEE) waarin ten minste een deel van de processor, geheu- geneenheid en/of random number generator is geïmplementeerd.An apparatus as described in any one of claims 9-12, including a trusted execution environment (TEE) wherein at least a portion of the processor, memory unit and/or random number generator is implemented. 14, Apparaat zoals beschreven in conclusie 13, omvattende: — gen geheugeneenheid binnen de TEE voor het opslaan van een enciave-sleutel — gen processor binnen de TEE, geconfigureerd om de anonimiseringssleutel te versleutelen onder de enciave-sleutel, en om een versleutelde anonimiserings- sleutel te ontsleutelen met gebruikmaking van de enclave-sleutel; en ~ gen niet-viuchtig opslagmedium buiten de TEE, voor het opslaan van de ver- sleutelde anonimiseringssleutel.Apparatus as set forth in claim 13, comprising: - gene memory unit within the TEE for storing an encryption key - gene processor within the TEE configured to encrypt the anonymization key under the encryption key, and to store an encrypted anonymization key key decrypt using the enclave key; and a non-viable storage medium outside the TEE for storing the encrypted anonymization key. 15. Apparaat volgens &én van de conclusies 13-14, waarbij de processor binnen de TEE is geconfigureerd om de le anonimiseren gegevens le verkrijgen via een geattesteerd beveiligd kanaal van een ander anonimiseringsapparaal en/of om de geanonimiseerde gegevens via een geatiesteerd beveiligd kanaal te verstrekken aan een ander anonimisaringsapparaat.Apparatus according to any one of claims 13 to 14, wherein the processor within the TEE is configured to obtain the anonymized data through an attestational secure channel from another anonymization device and/or to provide the anonymized data through an attested secure channel to another anonymization device. 16, Een systeem dat len minsle twee anonimiseringsapparaten omval, elk zoals be- schreven in één van de conclusies 9-15.16. A system encompassing at least two anonymization devices, each as described in any one of claims 9-15.
NL1043913A 2020-01-23 2021-01-22 Data anonymization NL1043913B1 (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
EP20075002 2020-01-23

Publications (2)

Publication Number Publication Date
NL1043913A true NL1043913A (en) 2021-09-01
NL1043913B1 NL1043913B1 (en) 2021-12-14

Family

ID=69326346

Family Applications (1)

Application Number Title Priority Date Filing Date
NL1043913A NL1043913B1 (en) 2020-01-23 2021-01-22 Data anonymization

Country Status (1)

Country Link
NL (1) NL1043913B1 (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120303616A1 (en) * 2011-05-27 2012-11-29 International Business Machines Corporation Data Perturbation and Anonymization Using One Way Hash
US20170272251A1 (en) * 2015-11-22 2017-09-21 Dyadic Security Ltd. Method of performing keyed-hash message authentication code (hmac) using multi-party computation without boolean gates
US10140474B2 (en) 2013-12-23 2018-11-27 Intel Corporation Techniques for context information management

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120303616A1 (en) * 2011-05-27 2012-11-29 International Business Machines Corporation Data Perturbation and Anonymization Using One Way Hash
US9202078B2 (en) 2011-05-27 2015-12-01 International Business Machines Corporation Data perturbation and anonymization using one way hash
US10140474B2 (en) 2013-12-23 2018-11-27 Intel Corporation Techniques for context information management
US20170272251A1 (en) * 2015-11-22 2017-09-21 Dyadic Security Ltd. Method of performing keyed-hash message authentication code (hmac) using multi-party computation without boolean gates
US10103888B2 (en) 2015-11-22 2018-10-16 Dyadic Security Ltd. Method of performing keyed-hash message authentication code (HMAC) using multi-party computation without Boolean gates

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ORLITSKYSURESH: "Competitive Distribution Estimation: ''Why is Good-Turing Good", NIPS, 2015

Also Published As

Publication number Publication date
NL1043913B1 (en) 2021-12-14

Similar Documents

Publication Publication Date Title
Qiu et al. All-Or-Nothing data protection for ubiquitous communication: Challenges and perspectives
CN111371549B (en) Message data transmission method, device and system
Ram et al. Security as a service (sass): securing user data by coprocessor and distributing the data
WO2021166787A1 (en) Information processing system, information processing device, information processing method, and information processing program
Kuznetsov et al. Performance analysis of cryptographic hash functions suitable for use in blockchain
Khakim et al. Security system design for cloud computing by using the combination of AES256 and MD5 algorithm
Widiasari Combining advanced encryption standard (AES) and one time pad (OTP) encryption for data security
US10673627B2 (en) Encryption device, search device, computer readable medium, encryption method, and search method
Hussain et al. Security of cloud storage system using various cryptographic techniques
Qiu et al. Privacy-preserving health data sharing for medical cyber-physical systems
NL1043913B1 (en) Data anonymization
Veeraragavan et al. Enhanced encryption algorithm (EEA) for protecting users' credentials in public cloud
Santos et al. Enhancing medical data security on public cloud
Khaleel et al. A study of graph theory applications in it security
Oduor et al. Application of cryptography in enhancing privacy of personal data in medical services
Priyanka et al. A hybrid encryption method handling big data vulnerabilities
CN112395629A (en) File encryption method and system based on TCM chip
Narmatha et al. Text File Encryption and Decryption by FFT and IFFT Algorithm Using Lab view
US20030138099A1 (en) Method for computer-based encryption and decryption of data
Scientific ENHANCING CLOUD SECURITY BASED ON THE KYBER KEY ENCAPSULATION MECHANISM
Shoukat Secure cloud based IoT data storage
Ahmed Energetic data security management scheme using hybrid encryption algorithm over cloud environment
Wang Application of AES and DES Algorithms in File Management
CN115242389B (en) Data confusion transmission method and system based on multi-level node network
Mukhammadovich EMBEDDING DATA HASHING ALGORITHMS INTO A CRYPTOGRAPHIC INFORMATION SECURITY TOOL