WO2001018631A1 - Method for anonymizing data - Google Patents

Method for anonymizing data Download PDF

Info

Publication number
WO2001018631A1
WO2001018631A1 PCT/EP2000/008301 EP0008301W WO0118631A1 WO 2001018631 A1 WO2001018631 A1 WO 2001018631A1 EP 0008301 W EP0008301 W EP 0008301W WO 0118631 A1 WO0118631 A1 WO 0118631A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
hash
owner
code
patient
Prior art date
Application number
PCT/EP2000/008301
Other languages
French (fr)
Inventor
Ingo Elfering
Original Assignee
Medical Data Services Gmbh
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Medical Data Services Gmbh filed Critical Medical Data Services Gmbh
Publication of WO2001018631A1 publication Critical patent/WO2001018631A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/12Applying verification of the received information
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • G06F21/6254Protecting personal data, e.g. for financial or medical purposes by anonymising data, e.g. decorrelating personal data from the owner's identification
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records

Definitions

  • This invention relates to a method for identifying the source or o ⁇ gin of data in databases containing anonymized data. It is particularly useful means for providing a patient or his or her medical provider with access to patient data over a network where the data is stored in an anonymized data file.
  • Patient data can normally be stored in a centralised data file such as a central server only if it is adequate secured and anonymized. This normally leads to trusted third party- environments or encryption of the data, or both.
  • Trusted Third Party (TTP) service is a current way for anonymizing patient data.
  • the data is sent to a TTP, which takes the data and replaces all patient identifiers with a new code.
  • the TTP matches codes against the patients - it therefore knows all the codes and patients. This may make it a vulnerable target for hacking. Also the services are complex and therefore expensive. And all data has to be routed through the TTP so it cannot be used or accessed directly.
  • This invention provides a method for anonymizing patient data without having to use a TTP.
  • the instant algorithm can be used to anonymize any data source and the data associated with that source m a data base, patient medical data being but one example of the application of this invention.
  • This invention comp ⁇ ses, in a networked computer system, a method for anonymizing data unique to a data owner m a data file by: identifying at least one unique alpha-numeric identifier for a data owner having owner-associated data m the data file; generating a hashcode using a computational device for said unique identifier; optionally encrypting the hashcode; linking the hashcode with owner's data; communicating the code and data to a database; and providing the data owner or authorized party with computational means for accessing and querying the database over a network; and on said computational means, a computational method for deciphering the hash- code and associating it with the owner and data transmitted to the computational means from the query made to the database.
  • this invention represents a secure networked computational system for stonng and accessing data by a data owner or an autho ⁇ zed party, the system comprising a) a server capable of receiving and stonng a hash-code, data associated with that hash-code and linking said hash-code with its associated data, b) a client networked with or capable of linking up with the server and having on it
  • this invention relates to a method for preventing theft and use by a third party of a data owner's data residing on a networked computer-readable medium, which method compnses generating a hashcode for an identifier unique to the data owner using a computational device, optionally encrypting the hashcode, linking the hashcode with owner's data; communicating the code and data to a networked database, providing the data owner or authonzed party with computational means for accessing and querying the database over a network; and on said computational means, a computational method for deciphenng the hash- code and associating it with the owner and data transmitted to the computational means from the query made to the database Detailed Description of the Invention
  • this invention provides a means for anonymizing data without the need for a TTP or encryption technologies, particularly in the context of client/server computational system illustrated by the likes of the Internet
  • the upshot is that one or many parties can generate data for a single data owner and that data owner can search through one or more data files where that data resides and get direct access to his or her data, and be assured of pnvacy
  • This allows one to store data for many data owners m or several data bases while permitting secure and pnvate access to the data for each data owner, to the exclusion of the other owners' with data m that data base
  • the data base can be made available in one or several data bases using a client/server configuration and running over a public network such as the Internet without loss of pnvacy and the need to collect the data into one single data base.
  • this method also provides a means for carrying out analyses of all of the data m the data base without compromising the pnvacy of any data owner; thus avoiding the problem created if encryption is used as the means for creating a secure link between data owner and data in a networked data file.
  • this invention is set forth herein after in terms of its application in the context of health care activities and patient p ⁇ vacy
  • What is set forth herein with regards to the anonymization of patient data applies equally well to any situation where an individual, company, or group owns or has nghts to data which it or they wish to hold in anonymity but still have access it over non-restncted computational systems like the Internet.
  • the initial step, or action is to identify or create a significant data owner identifier like a social secunty number and generate a hashcode from this data.
  • Hashcodes are mathematical trap-door functions used by digital signatures. The hashcode calculates from the unique identifier a fix length "number". The number is always the same if the same input is given.
  • the secunty in the hashcodes lies m the fact there is no way to re-idenhfy from the hashcode value the ongmal data.
  • This approach can therefore be used to replace the patient identifiers in the data with the hashcode.
  • the data can then be used directly Since every party m the system will always generate the same hashcode for the same patient, data from one patient across several providers still has the same hashcode.
  • HTML is similar to SGML (http //webopedia.mternet com TERM S/SGML.html and XML (http //www oasis- open.org/cover/xmllntro html) although it is not a stnct subset.
  • DHTML Refers to Web content that changes each time it is viewed. For example, the same URL could result in a different page depending on any number of parameters, such as:
  • CGI scnpts http://webopedia.internet.eom/TERM/C/CGI.html
  • Server-Side Includes SSI
  • SSI Server-Side Includes
  • cookies http://weboped ⁇ a lntemet.com TERM/c/cookie.html
  • Java httpV/webopedia.mtemet.com TERM/J/Java.html
  • JavaScnpts http://webopedia.internet.eom/TERM/C/CGI.html
  • Dynamic HTML refers to new HTMLextensions that will enable a Web page to react to user input without sending requests to the Web server Microsoft and Netscape have submitted competing Dynamic HTML proposals to W3C, which is producing the final specification.
  • W3C is short for World Wide Web Consortium, an international consortium of companies involved with the Internet and the Web.
  • the W3C was founded in 1994 by Tim Bemers-Lee, the ongmal architect of the World Wide Web The organization's purpose is to develop open standards so that the Web evolves in a single direction rather than being splintered among competing factions
  • the W3C is the chief standards body for HTTP (hyper text transfer protocol) and HTML.
  • Smartcard A small electronic device about the size of a credit card that contains electronic memory, and possibly an embedded integrated circuit. Smart cards containing an IC are sometimes called Integrated Circuit Cards (ICCs) Smart cards are used for a vanety of purposes, including: 1 Stonng a patient's medical records 2. Storing digital case 3. Generating network IDs (similar to a token)
  • COM Object A model for binary code developed by Microsoft.
  • the Component Object Model (COM) enables programmers to develop objects that can be accessed by any COM-comphant application.
  • An object is, generally, any item that can be individually selected and manipulated. This can include shapes and pictures that appear on a display screen as well as less tangible software entities.
  • object-onented programming for example, an object is a self-contained entity that consists of both data and procedures to manipulate the data. Both OLE (object linking and embedding) and Active X are based on COM.
  • COM specifications can be found at http://www.microsoft.com/com/fnf.asp. hashing: Producing hash values for accessing data or for secunty.
  • a hash value (or simply hash) is a number generated from a stnng of text.
  • the hash is substantially smaller than the text itself, and is generated by a formula m such a way that it is extremely unlikely that some other text will produce the same hash value. Hashes play a role in security systems where they're used to ensure that transmitted messages have not been tampered with.
  • the sender generates a hash of the message, encrypts it, and sends it with the message itself.
  • the recipient then decrypts both the message and the hash, produces another hash from the received message, and compares the two hashes. If they're the same, there is a very high probability that the message was transmitted intact.
  • Hashing is also a method of accessing data records. Take for example a list of names:
  • Hash functions with just this property have a vanety of general computational uses, but when employed in cryptography, the hash functions are usually chosen to have some additional properties.
  • the basic requirements for a cryptographic hash function are: the input can be of any length, the output has a fixed length,
  • H ⁇ x • H ⁇ x) is collision-free.
  • H is said to be a weakly collision-free hash function
  • hash function is in the provision of message mtegnty checks and digital signatures. Since hash functions are generally faster than encryption or digital signature algonthms, it is typical to compute the digital signature or mtegnty check to some document by applying cryptographic processing to the document's hash value, which is small compared to the document itself. Additionally, a digest can be made public without revealing the contents of the document from which it is denved. This is important in digital timestampmg where, using hash functions, one can get a document timestamped without revealing its contents to the timestampmg service.
  • a compression function takes a fixed length input and returns a shorter, fixed-length output.
  • a hash function can be defined by repeated applications of the compression function until the entire message has been processed.
  • a message of arbitrary length is broken into blocks whose length depends on the compression function, and "padded" (for security reasons) so the size of the message is a multiple of the block size.
  • the blocks are then processed sequentially, taking as input the result of the hash so far and the current message block, with the final output being the hash value for the message.
  • hash function techniques are often divided into three classes. • those built around block ciphers,
  • a designer By building a hash function around a block cipher, a designer aims to leverage the secunty of a well-trusted block cipher such as DES to obtain a well-trusted hash function
  • a well-trusted block cipher such as DES
  • the so- called Davies-Meyer hash function is an example of a hash function built around the use of DES.
  • a hash function is generally used in conjunction with a digital signature algonthm which itself makes use of modular anthmetic
  • the track record of such hash functions is not good from a secunty perspective and there are no hash functions in this second class that can be recommended for use today.
  • MD4 is an early example of a popular hash function with such a design. Although MD4 is no longer considered secure for most cryptographic applications, most new dedicated hash functions make use of the same design pnnciples as MD4 in a strengthened version. Their strength vanes depending on the techniques, or combinations of techniques, employed in their design.
  • Dedicated hash functions m current use include MD5 and SHA-1 as well as RTPE-MD (H. Dobbertm, A. Bosselaers, and B. Preneel.
  • RIPEMD- 160 A strengthened version of RIPEMD. In Proceedings of 3rd International Workshop on Fast Software Encryption, pages 71-82, Spnnger-Verlag, 1996) and HAVAL (Y Zheng, J Pieprzyk and J. Seberry. HAVAL - a one-way hashing algonthm with vanable length output. In Advances in Cryptology Auscrypt '92, pages 83-104, Spnnger-Verlag, 1993).
  • SHA and SHA-1 stand for The Secure Hash Algonthm It is the algorithm specified in the Secure Hash Standard (SHS, FEPS PUB 180). It was developed by the National Institute of Standards and Technology, a division of the U S Department of Commerce SHA-1 (National Institute of Standards and Technology (NIST) Announcement of Weakness in the Secure Hash Standard. May 1994 is a revision to SHA that was published in 1994, the revision corrected an unpublished flaw m SHA. The design of SHA-1 is very similar to the MD4 family of hash functions developed by Rivest. SHA- 1 is also desc ⁇ bed in the ANSI X9 30 (part 2) standard.
  • the algonthm takes a message of less than 2 M bits in length and produces a 160-bit message digest
  • the algonthm is slightly slower than MD5 but the larger message digest makes it more secure against brute-force collision and inversion attacks.
  • SHA is part of the Capstone project.
  • MD2 MD4 and MD5 are message-digest algonthms developed by Rivest (http //www rsa.com/rsalabs/faq/html 3-6-6.html). They are meant for digital signature applications where a large message has to be "compressed" m a secure manner before being signed with the p ⁇ vate key. All three algonthms take a message of arbitrary length and produce a 128-bit message digest. While the structures of these algonthms are somewhat similar, the design of MD2 is quite different from that of MD4 and MD5 MD2 was optimized for 8-bit machines, whereas MD4 and MD5 were aimed at 32-bit machines.
  • Desc ⁇ ption and source code for the three algonthms can be found as Internet RFCs 1319 - 1321 (http.//www.rsa.com/rsalabs/faq/html references.html - Kal92), (http://www.rsa.com/rsalabs/faq/html references.html - R ⁇ v92b), and (http.//www.rsa.com rsalabs/faq/html references.html - R ⁇ v92c).
  • MD2 was developed by Rivest m 1989. The message is first padded so its length in bytes is divisible by 16. A 16-byte checksum is then appended to the message, and the hash value is computed on the resulting message. Rogier and Chauvaud have found that collisions for MD2 can be constructed if the calculation of the checksum is omitted (httpV/www.rsa.com rsalabs/faq/html/references.html - RC95). This is the only cryptanalytic result known for MD2.
  • MD4 was developed by Rivest in 1990. The message is padded to ensure that its length in bits plus 448 is divisible by 512.
  • a 64-bit binary representation of the ongmal length of the message is then concatenated to the message.
  • the message is processed in 512- bit blocks m the Damgard/Merkle iterative structure, and each block is processed in three distinct rounds.
  • Attacks on versions of MD4 with either the first or the last rounds missing were developed very quickly by Den Boer, Bosselaers (B. den Boer and A. Bosselaers. An attack on the last two rounds of MD4. In Advances in Cryptology Crypto '91, pages 194- 203, Spnnger-Verlag, 1992) and others.
  • Dobbertm H. Dobbertm. Alf Swindles Ann.
  • MD5 was developed by Rivest in 1991 It is basically MD4 with “safety-belts" and while it is slightly slower than MD4, it is more secure.
  • the algonthm consists of four distinct rounds, which has a slightly different design from that of MD4 Message-digest size, as well as padding requirements, remain the same.
  • Den Boer and Bosselaers have found pseudo-collisions for MD5. More recent work by Dobbertm has extended the techniques used so effectively in the analysis of MD4 to find collisions for the compression function of MD5. While stopping short of providing collisions for the hash function in its entirety this is clearly a significant step. For a compa ⁇ son of these different techniques and their impact the reader is referred to MJ.B. Robshaw. On Recent Results for MD2, MD4 and MD5 RSA Laborato ⁇ es Bulletin No 4 November 12, 1996.
  • Van Oorschot and Wiener P. van Oorschot and M. Wiener. Parallel collision search with application to hash functions and discrete loga ⁇ thms. In Proceedings of 2nd ACM Conference on Computer and Communication Security, 1994) have considered a brute-force search for collisions in hash functions, and they estimate a collision search machine designed specifically for MD5 (costing $10 million in 1994) could find a collision for MD5 in 24 days on average.
  • the general techniques can be applied to other hash functions. More details on MD2, MD4, and MD5 can be found in B. Preneel 1993 paper (http://www.rsa.com/rsalabs/faq/html/references.html - Pre93 and MJ.B.
  • MIME Short for Multipurpose Internet Mail Extensions, a specification for formatting non-ASCII messages so that they can be sent over the Internet. Many e-mail now support MIME, which enables them to send and receive graphics, audio, and video files via the Internet mail system. In addition, MIME supports messages m character sets other than
  • ASC ⁇ There are many predefined MIME types, such as GIF graphics files and PostScnpt files. It is also possible to define your own MIME types.
  • Web browsers In addition to e-mail applications, Web browsers also support va ⁇ ous MIME types.
  • MIME supports encrypted messages.
  • the hashcode is generated from significant information. In Germany the Insurance
  • the hashcode itself could be encrypted with, for example, the public key of the server if sent to the server and with the public key of the user if used by a user.
  • the public key of the server if sent to the server and with the public key of the user if used by a user.
  • the system would still have the capability to track patients on the server (since still each encrypted hash would be the same binary blob again because everybody would encrypt with the public key of the server) and the server can re-identify a patient for a user (since he is the only one which can decrypt with his pnvate key and then decrypt again for a given user who would then be the only person who can decrypt again).
  • any other such unique information can be used. What information items are used for generating this identifier or how long this information is, is without any problems for the invention. Anything unique can be chosen, e.g. SSN's, Master Patient-Index-ED-Numbers, Names, etc. The information ust has to uniquely descnbe any given patient and each provider (or user of the system) has to be able to have this information when he wants to encode or decode information for a patient. Hash Encoding the Data The unique identifier stnng is now taken and the hashcode is calculated. Any hashcode can be used by to ensure proper functionality and robustness. One example is SH- 1. (Or SHA-1?).
  • the hashcode transforms the stnng, which can be of any length and can contain any character, numbers or other binary values.
  • the transformed value as outputted by the hashcode is always of a fixed lengths.
  • the hashing algonthm takes the data and always generates a fixed length bit stream uniquely presenting the data. Even a one-bit change m the data will generate a different hash. No two different pieces of data can produce the same hash value.
  • This output value of the SH-1 operation is then encoded via the base64 algorithm so that it contains only normal characters and therefore is easier to transport and handle Identification Processes
  • Patient Data can then be stored in a database where this hash-coded value now identifies the patient. All other patient identifier data can be deleted
  • Each provider can send anonymous patient data to a central server with this hash- coded value. Therefore each patient item is identifiable but also all the data about one patient can be merged on the server since each item from each provider still has the same hash-coded item. This is the preferred approach.
  • This technique makes it possible to have patient data residing on more than one server and be accessible to the patient or her autho ⁇ sed agent via a networked client, i.e., a PC. If anybody has to re-identify data for a given patient he has to a) know what kind of unique identifier stnng elements have been used, and b) he has to have this information items from the patient.
  • This search can be earned out using a PC which has software capable of interacting with the data base engine where the data is stored.
  • a PC which has software capable of interacting with the data base engine where the data is stored.
  • One example is a client/server configuration where the data base engine and the data resides on the server and the client, for example a PC, has access to the server over a public or limited-access network.
  • a server running under UNLX or Microsoft's NT could have on it a data base such as Oracle's data base engine or be configured as a Web site.
  • the server would be accessible by PC or Macintosh (the client) over a virtual pnvate network or an intranet, extranet or the Internet.
  • the client software could be a propnetary software package or one could use an Internet browser such as Netscape or Microsoft's Internet Explorer if the server was set up as a Web site.
  • This invention could also be used m the context of a main-frame operation where access is through terminal emulation.
  • This can also be used m a website where the physician views analytics, e.g. nsks in a patient population.
  • the database holds anonymous information about all patients. If she selects any patient m the analysis the system quenes her local databases and tnes to find the hashcode value from that patient in the local database. If found it means that this physician knows the patient and the site can retneve the patient's name etc. from the local database and display this instead of the hash value. Therefore the physician would see which patients this is and can act. (not clear on this point) Example:
  • Example 1 A server is set up which separates paUent data and medical patient data but links both data sources on the server through a unique ID generated on the server. Both data bases have different access/user nghts and are only accessible through a COM-object layer that controls access to the database.
  • the patient database contains a unique patient ID that is generated from unique information associated solely with that patient.
  • the Insurance Company Code Number and the patient's member number in the company are taken together and are encoded into a fixed binary number by applying a hash-code algonthm (SHA-1) to the data. Then there is applied a base 64 algonthm on top of that. If a unique ldenfrfier is not readily available one can create one for patients, e.g., a master patient index.
  • SHA-1 hash-code algonthm
  • This hash-code algonthm generates a fixed bit number, e.g. a 160 bit number, that identifies uniquely the patient
  • Each healthcare provider can transmit anonymous patient data to the central server with just this hash-code identifier number.
  • Each provider can calculate this number All providers will calculate the same number independently without shanng information beforehand.
  • the database on the server can link all data elements across all providers using this number since all providers will generate the same number for a given patient.
  • the database on the server will not have any of the patient data from which was generated the hash-code number Therefore by one or more providers can send anonymously lab results, prescnptions, treatment protocols, etc, to one or more databases where the data is linked up, in a given database, with the patient and held anonymously
  • analyses can be performed on the patient's data in the database and can be resent to the healthcare provider via a Web page and still remain anonymous This is accomplished as follows
  • this Web page replaces a piece of HTML (a ⁇ DFV> section) by DTHML scnpt code dunng the load process of the Web page with the patient's real name and other identifying information
  • Example 2 In dinosaurnos where one already have some large pools of anonymous data in this format one can ask a patient if he is willing to give his ID-elements and then one can generate the hashed value and can access the data from this patient; e.g. you get all Rx information and offer a patient to access the database over the web to get refills etc.

Abstract

This invention relates to a method for identifying the source or origin of data in databases containing anonymized data. It is a particularly useful means for providing a patient or his or her medical provider with access to patient data over a network where the data is stored in an anonymized data file.

Description

Metbod for Anonymizing Data Field of the Invention
This invention relates to a method for identifying the source or oπgin of data in databases containing anonymized data. It is particularly useful means for providing a patient or his or her medical provider with access to patient data over a network where the data is stored in an anonymized data file. Background of the Invention
Patient data can normally be stored in a centralised data file such as a central server only if it is adequate secured and anonymized. This normally leads to trusted third party- environments or encryption of the data, or both.
While stoπng the data in a central archive and protecting it by encryption is one means of msunng patient pπvacy, encryption of the data is not a viable solution if the data is to be used for any task on the server (e.g. running epidemiological analyses).
Trusted Third Party (TTP) service is a current way for anonymizing patient data. The data is sent to a TTP, which takes the data and replaces all patient identifiers with a new code. The TTP matches codes against the patients - it therefore knows all the codes and patients. This may make it a vulnerable target for hacking. Also the services are complex and therefore expensive. And all data has to be routed through the TTP so it cannot be used or accessed directly. This invention provides a method for anonymizing patient data without having to use a TTP. In addition, the instant algorithm can be used to anonymize any data source and the data associated with that source m a data base, patient medical data being but one example of the application of this invention. Summary of the Invention This invention compπses, in a networked computer system, a method for anonymizing data unique to a data owner m a data file by: identifying at least one unique alpha-numeric identifier for a data owner having owner-associated data m the data file; generating a hashcode using a computational device for said unique identifier; optionally encrypting the hashcode; linking the hashcode with owner's data; communicating the code and data to a database; and providing the data owner or authorized party with computational means for accessing and querying the database over a network; and on said computational means, a computational method for deciphering the hash- code and associating it with the owner and data transmitted to the computational means from the query made to the database. Also, this invention represents a secure networked computational system for stonng and accessing data by a data owner or an authoπzed party, the system comprising a) a server capable of receiving and stonng a hash-code, data associated with that hash-code and linking said hash-code with its associated data, b) a client networked with or capable of linking up with the server and having on it
I) a means for generating a hash-code based on a unique identifier associated with the data owner, n) a means for communicating said hash-code and associated data to the server, π) a means for querying the data on the server for instances of a hash- code and data associated with it and receiving the results of said query, and in) a means for decoding the hash-code to re-identify the data owner and associated data. In yet another aspect, this invention relates to a method for preventing theft and use by a third party of a data owner's data residing on a networked computer-readable medium, which method compnses generating a hashcode for an identifier unique to the data owner using a computational device, optionally encrypting the hashcode, linking the hashcode with owner's data; communicating the code and data to a networked database, providing the data owner or authonzed party with computational means for accessing and querying the database over a network; and on said computational means, a computational method for deciphenng the hash- code and associating it with the owner and data transmitted to the computational means from the query made to the database Detailed Description of the Invention
In its broadest iteration, this invention provides a means for anonymizing data without the need for a TTP or encryption technologies, particularly in the context of client/server computational system illustrated by the likes of the Internet The upshot is that one or many parties can generate data for a single data owner and that data owner can search through one or more data files where that data resides and get direct access to his or her data, and be assured of pnvacy This allows one to store data for many data owners m or several data bases while permitting secure and pnvate access to the data for each data owner, to the exclusion of the other owners' with data m that data base Hence the data base can be made available in one or several data bases using a client/server configuration and running over a public network such as the Internet without loss of pnvacy and the need to collect the data into one single data base. And this method also provides a means for carrying out analyses of all of the data m the data base without compromising the pnvacy of any data owner; thus avoiding the problem created if encryption is used as the means for creating a secure link between data owner and data in a networked data file. For purposes of convenience, this invention is set forth herein after in terms of its application in the context of health care activities and patient pπvacy What is set forth herein with regards to the anonymization of patient data applies equally well to any situation where an individual, company, or group owns or has nghts to data which it or they wish to hold in anonymity but still have access it over non-restncted computational systems like the Internet.
The initial step, or action is to identify or create a significant data owner identifier like a social secunty number and generate a hashcode from this data. Hashcodes are mathematical trap-door functions used by digital signatures. The hashcode calculates from the unique identifier a fix length "number". The number is always the same if the same input is given. The secunty in the hashcodes lies m the fact there is no way to re-idenhfy from the hashcode value the ongmal data.
This approach can therefore be used to replace the patient identifiers in the data with the hashcode. The data can then be used directly Since every party m the system will always generate the same hashcode for the same patient, data from one patient across several providers still has the same hashcode.
If data in a given database needs be re-identified as to which data belongs to a given patient, then the hashcode is calculated for the patient again and all data bases are searched for that hashcode. Data with it belongs therefore to the patient and can be accessed only by the patient or someone acting by authoπsation of the patient. This system can also be used to authonse activities. For example it can be used to permit a patient to generate a refill for a perscπption by accessing an electronic form residing in the server's database or linked to it. This form can be authenticated in regards to the medιcatιon(s) prescribed by the doctor, which will already be on the database, time to refill, and authentication of the requestor, for example. This is but one additional example of how this invention could be used in the context of providing healthcare over a networked computer system while maintaining patient confidentiality. Glossary of terms:
The following terms and definitions are used herein after in descnbing this invention. These definitions are provided for clanty and certainty in defining the invention at the time it was created While believed to be accurate at the time this invention was made, nomenclature and usage may change with time. These definitions are to be read as representative of their usage in the art at the time the invention was made Most of these definitions were obtained from two sources on the world wide web, the PC Webopeadia by internet com Corp, copynght 1999, which has a URL of http //webopedia internet com/ at the time these definitions were obtained and RSA Corporation's FAQ pages at http //www rsa.com. HTML: Short for HyperText Markup Language, the authoπng language used to create documents on the World Wide Web. HTML is similar to SGML (http //webopedia.mternet com TERM S/SGML.html and XML (http //www oasis- open.org/cover/xmllntro html) although it is not a stnct subset.
DHTML. Refers to Web content that changes each time it is viewed. For example, the same URL could result in a different page depending on any number of parameters, such as:
1 Geographic location of the reader
Figure imgf000005_0001
3 Previous pages viewed by the reader 4 Profile of the reader
There are many technologies for producing dynamic HTML, including CGI scnpts (http://webopedia.internet.eom/TERM/C/CGI.html), Server-Side Includes (SSI), (http7/webopedιa.ιntemet.com/TERM/S/SSI.html), cookies (http://webopedιa lntemet.com TERM/c/cookie.html), Java (httpV/webopedia.mtemet.com TERM/J/Java.html), JavaScnpt
(http://webopedia.internet.eom TERM/J/JavaScnpt.html), and Active X (http://webopedia.mtemet.eom TERM A ActiveX.html).
When capitalized, Dynamic HTML refers to new HTMLextensions that will enable a Web page to react to user input without sending requests to the Web server Microsoft and Netscape have submitted competing Dynamic HTML proposals to W3C, which is producing the final specification. W3C is short for World Wide Web Consortium, an international consortium of companies involved with the Internet and the Web. The W3C was founded in 1994 by Tim Bemers-Lee, the ongmal architect of the World Wide Web The organization's purpose is to develop open standards so that the Web evolves in a single direction rather than being splintered among competing factions The W3C is the chief standards body for HTTP (hyper text transfer protocol) and HTML.
Smartcard A small electronic device about the size of a credit card that contains electronic memory, and possibly an embedded integrated circuit. Smart cards containing an IC are sometimes called Integrated Circuit Cards (ICCs) Smart cards are used for a vanety of purposes, including: 1 Stonng a patient's medical records 2. Storing digital case 3. Generating network IDs (similar to a token)
COM Object: A model for binary code developed by Microsoft. The Component Object Model (COM) enables programmers to develop objects that can be accessed by any COM-comphant application. An object is, generally, any item that can be individually selected and manipulated. This can include shapes and pictures that appear on a display screen as well as less tangible software entities. In object-onented programming, for example, an object is a self-contained entity that consists of both data and procedures to manipulate the data. Both OLE (object linking and embedding) and Active X are based on COM. COM specifications can be found at http://www.microsoft.com/com/fnf.asp. hashing: Producing hash values for accessing data or for secunty. A hash value (or simply hash) is a number generated from a stnng of text. The hash is substantially smaller than the text itself, and is generated by a formula m such a way that it is extremely unlikely that some other text will produce the same hash value. Hashes play a role in security systems where they're used to ensure that transmitted messages have not been tampered with. The sender generates a hash of the message, encrypts it, and sends it with the message itself. The recipient then decrypts both the message and the hash, produces another hash from the received message, and compares the two hashes. If they're the same, there is a very high probability that the message was transmitted intact.
Hashing is also a method of accessing data records. Take for example a list of names:
• John Smith
• Sarah Jones
• Roger Adams
To create an index, called a hash table, for these records, you would apply a formula to each name to produce a unique numenc value. So you might get something like:
• 1345873 John smith
• 3097905 Sarah Jones
• 4060964 Roger Adams
Then to search for the record containing Sarah Jones, you just need to reapply the formula, which directly yields the index key to the record. hash function: A hash function H is a transformation that takes an input m and returns a fixed-size stnng, which is called the hash value h (that is, h = H{m)). Hash functions with just this property have a vanety of general computational uses, but when employed in cryptography, the hash functions are usually chosen to have some additional properties.
The basic requirements for a cryptographic hash function are: the input can be of any length, the output has a fixed length,
• H(x) is relatively easy to compute for any given x ,
Figure imgf000007_0001
• H{x) is collision-free. A hash function H is said to be one-way if it is hard to invert, where "hard to invert" means that given a hash value h, it is computationally mfeasible to find some input x such that H(x) = h.
If, given a message x, it is computationally mfeasible to find a message y not equal to x such that H(x) = H(y) then H is said to be a weakly collision-free hash function A strongly collision-free hash function H is one for which it is computationally mfeasible to find any two messages x and_y such that H(x) = H(y). For more information and a particularly thorough study of hash functions, see B. Preneel. Analysis and Design of Cryptographic Hash Functions. Ph.D Thesis, Kathoheke University Leuven, 1993. The hash value represents concisely the longer message or document from which it was computed; this value is called the message digest. One can think of a message digest as a "digital fingerpnnt" of the larger document. Examples of well-known hash functions are MD2 and MD5.
Perhaps the mam role of a cryptographic hash function is in the provision of message mtegnty checks and digital signatures. Since hash functions are generally faster than encryption or digital signature algonthms, it is typical to compute the digital signature or mtegnty check to some document by applying cryptographic processing to the document's hash value, which is small compared to the document itself. Additionally, a digest can be made public without revealing the contents of the document from which it is denved. This is important in digital timestampmg where, using hash functions, one can get a document timestamped without revealing its contents to the timestampmg service.
Damgard and Merkle greatly influenced cryptographic hash function design by defining a hash function in terms of what is called a compression function. A compression function takes a fixed length input and returns a shorter, fixed-length output. Given a compression function, a hash function can be defined by repeated applications of the compression function until the entire message has been processed. In this process, a message of arbitrary length is broken into blocks whose length depends on the compression function, and "padded" (for security reasons) so the size of the message is a multiple of the block size. The blocks are then processed sequentially, taking as input the result of the hash so far and the current message block, with the final output being the hash value for the message. The best review of hash function techniques is provided by Preneel (B Preneel Analysis and Design of Cryptographic Hash Functions. Ph.D. Thesis, Kathoheke University Leuven, 1993) For a bnef overview it will be noted that hash functions are often divided into three classes. • those built around block ciphers,
• those which use modular anthmetic, and
• those which have what is termed a "dedicated" design
By building a hash function around a block cipher, a designer aims to leverage the secunty of a well-trusted block cipher such as DES to obtain a well-trusted hash function The so- called Davies-Meyer hash function is an example of a hash function built around the use of DES.
The purpose of employing modular anthmetic in the second class of hash functions is to save on implementation costs. A hash function is generally used in conjunction with a digital signature algonthm which itself makes use of modular anthmetic Unfortunately, the track record of such hash functions is not good from a secunty perspective and there are no hash functions in this second class that can be recommended for use today.
The hash functions m the third class, with their so-called "dedicated" design, tend to be fast, achieving a considerable advantage over algonthms that are based around the use of a block cipher. MD4 is an early example of a popular hash function with such a design. Although MD4 is no longer considered secure for most cryptographic applications, most new dedicated hash functions make use of the same design pnnciples as MD4 in a strengthened version. Their strength vanes depending on the techniques, or combinations of techniques, employed in their design. Dedicated hash functions m current use include MD5 and SHA-1 as well as RTPE-MD (H. Dobbertm, A. Bosselaers, and B. Preneel. RIPEMD- 160: A strengthened version of RIPEMD. In Proceedings of 3rd International Workshop on Fast Software Encryption, pages 71-82, Spnnger-Verlag, 1996) and HAVAL (Y Zheng, J Pieprzyk and J. Seberry. HAVAL - a one-way hashing algonthm with vanable length output. In Advances in Cryptology Auscrypt '92, pages 83-104, Spnnger-Verlag, 1993).
SHA and SHA-1 stand for The Secure Hash Algonthm It is the algorithm specified in the Secure Hash Standard (SHS, FEPS PUB 180). It was developed by the National Institute of Standards and Technology, a division of the U S Department of Commerce SHA-1 (National Institute of Standards and Technology (NIST) Announcement of Weakness in the Secure Hash Standard. May 1994 is a revision to SHA that was published in 1994, the revision corrected an unpublished flaw m SHA. The design of SHA-1 is very similar to the MD4 family of hash functions developed by Rivest. SHA- 1 is also descπbed in the ANSI X9 30 (part 2) standard. The algonthm takes a message of less than 2M bits in length and produces a 160-bit message digest The algonthm is slightly slower than MD5 but the larger message digest makes it more secure against brute-force collision and inversion attacks. SHA is part of the Capstone project.
MD2, MD4 and MD5 are message-digest algonthms developed by Rivest (http //www rsa.com/rsalabs/faq/html 3-6-6.html). They are meant for digital signature applications where a large message has to be "compressed" m a secure manner before being signed with the pπvate key. All three algonthms take a message of arbitrary length and produce a 128-bit message digest. While the structures of these algonthms are somewhat similar, the design of MD2 is quite different from that of MD4 and MD5 MD2 was optimized for 8-bit machines, whereas MD4 and MD5 were aimed at 32-bit machines. Descπption and source code for the three algonthms can be found as Internet RFCs 1319 - 1321 (http.//www.rsa.com/rsalabs/faq/html references.html - Kal92), (http://www.rsa.com/rsalabs/faq/html references.html - Rιv92b), and (http.//www.rsa.com rsalabs/faq/html references.html - Rιv92c).
MD2 was developed by Rivest m 1989. The message is first padded so its length in bytes is divisible by 16. A 16-byte checksum is then appended to the message, and the hash value is computed on the resulting message. Rogier and Chauvaud have found that collisions for MD2 can be constructed if the calculation of the checksum is omitted (httpV/www.rsa.com rsalabs/faq/html/references.html - RC95). This is the only cryptanalytic result known for MD2. MD4 was developed by Rivest in 1990. The message is padded to ensure that its length in bits plus 448 is divisible by 512. A 64-bit binary representation of the ongmal length of the message is then concatenated to the message. The message is processed in 512- bit blocks m the Damgard/Merkle iterative structure, and each block is processed in three distinct rounds. Attacks on versions of MD4 with either the first or the last rounds missing were developed very quickly by Den Boer, Bosselaers (B. den Boer and A. Bosselaers. An attack on the last two rounds of MD4. In Advances in Cryptology Crypto '91, pages 194- 203, Spnnger-Verlag, 1992) and others. Dobbertm (H. Dobbertm. Alf Swindles Ann. CryptoBytes, 1(3): 5, 1995) has shown how collisions for the full version of MD4 can be found m under a minute on a typical PC. In recent work, Dobbertm (Fast Software Encryption, 1998) has shown that a reduced version of MD4 m which the third round of the compression function is not executed but everything else remains the same, is not one-way Clearly, MD4 should now be considered broken
MD5 was developed by Rivest in 1991 It is basically MD4 with "safety-belts" and while it is slightly slower than MD4, it is more secure. The algonthm consists of four distinct rounds, which has a slightly different design from that of MD4 Message-digest size, as well as padding requirements, remain the same. Den Boer and Bosselaers have found pseudo-collisions for MD5. More recent work by Dobbertm has extended the techniques used so effectively in the analysis of MD4 to find collisions for the compression function of MD5. While stopping short of providing collisions for the hash function in its entirety this is clearly a significant step. For a compaπson of these different techniques and their impact the reader is referred to MJ.B. Robshaw. On Recent Results for MD2, MD4 and MD5 RSA Laboratoπes Bulletin No 4 November 12, 1996.
Van Oorschot and Wiener (P. van Oorschot and M. Wiener. Parallel collision search with application to hash functions and discrete logaπthms. In Proceedings of 2nd ACM Conference on Computer and Communication Security, 1994) have considered a brute-force search for collisions in hash functions, and they estimate a collision search machine designed specifically for MD5 (costing $10 million in 1994) could find a collision for MD5 in 24 days on average. The general techniques can be applied to other hash functions. More details on MD2, MD4, and MD5 can be found in B. Preneel 1993 paper (http://www.rsa.com/rsalabs/faq/html/references.html - Pre93 and MJ.B. Robshaw's 1995 article referenced at (http://www.rsa.corn/rsalabs/faq/html/references.html#Rob95c). MIME. Short for Multipurpose Internet Mail Extensions, a specification for formatting non-ASCII messages so that they can be sent over the Internet. Many e-mail now support MIME, which enables them to send and receive graphics, audio, and video files via the Internet mail system. In addition, MIME supports messages m character sets other than
ASCΠ. There are many predefined MIME types, such as GIF graphics files and PostScnpt files. It is also possible to define your own MIME types.
In addition to e-mail applications, Web browsers also support vaπous MIME types.
This enables the browser to display or output files that are not m HTML format.
MIME was defined in 1992 by the Internet Eng eeπng Task Force (IETF) http://webopedia.intemet.eom/TERM/I/IETF.html. A new version, called S/MIME, (Secure
MIME) supports encrypted messages.
Unique Identity Data for a Data Owner/Patient
The hashcode is generated from significant information. In Germany the Insurance
Company Code Number and the patient's member number in that company are take together to building a stnng which adds both numbers and separates them via a dot-character This identifier is then unique for any given patient. Both items can be found on the German KVK
Smartcard and are therefore always available. Also the patient knows these numbers and can therefore give them to any party.
The value m this approach is this: If a hacker computed all hashes for all possible values (e.g. on all number permutation of the full item stnng) he could generate the hash, but this would not help him to identify the patient since he still would have to have access to a system that would allow him to resolve the patient's ID from these numbers. Member ID and Insurance ID are both numbers with up to 9 figures so a hacker has to calculate 18Λ10 hashcodes to get the figures that have build of the hashcode.
If this possibility is perceived as a potential problem the hashcode itself could be encrypted with, for example, the public key of the server if sent to the server and with the public key of the user if used by a user. For a fuller explanation of encrpytion technology see Bruce Schneier's, "Applied Cryptography" ISBN 0-47-11709-9 Addison Wesley Now any hacker could not break the encryption and would therefore not even be able to get to the hashcode. But the system would still have the capability to track patients on the server (since still each encrypted hash would be the same binary blob again because everybody would encrypt with the public key of the server) and the server can re-identify a patient for a user (since he is the only one which can decrypt with his pnvate key and then decrypt again for a given user who would then be the only person who can decrypt again).
In other healthcare systems any other such unique information can be used. What information items are used for generating this identifier or how long this information is, is without any problems for the invention. Anything unique can be chosen, e.g. SSN's, Master Patient-Index-ED-Numbers, Names, etc. The information ust has to uniquely descnbe any given patient and each provider (or user of the system) has to be able to have this information when he wants to encode or decode information for a patient. Hash Encoding the Data The unique identifier stnng is now taken and the hashcode is calculated. Any hashcode can be used by to ensure proper functionality and robustness. One example is SH- 1. (Or SHA-1?). The hashcode transforms the stnng, which can be of any length and can contain any character, numbers or other binary values. The transformed value as outputted by the hashcode is always of a fixed lengths. The hashing algonthm takes the data and always generates a fixed length bit stream uniquely presenting the data. Even a one-bit change m the data will generate a different hash. No two different pieces of data can produce the same hash value. A bad hashing algonthm would be the sum of all numbers of a give number, like 44 = 4+4=8. This is bad because it generates doubles etc. But a good algonthm like SHI (7) fulfils the needs of uniqueness of the outcome.
This output value of the SH-1 operation is then encoded via the base64 algorithm so that it contains only normal characters and therefore is easier to transport and handle Identification Processes
Patient Data can then be stored in a database where this hash-coded value now identifies the patient. All other patient identifier data can be deleted
Each provider can send anonymous patient data to a central server with this hash- coded value. Therefore each patient item is identifiable but also all the data about one patient can be merged on the server since each item from each provider still has the same hash-coded item. This is the preferred approach. This technique makes it possible to have patient data residing on more than one server and be accessible to the patient or her authoπsed agent via a networked client, i.e., a PC. If anybody has to re-identify data for a given patient he has to a) know what kind of unique identifier stnng elements have been used, and b) he has to have this information items from the patient. For example m Germany one could, and probably would, use the Insurance and Member Codes and one would either need to have access to the Smartcard or the patient would need to give the medical provider this information. Also this information is available on medical claim forms, prescnptions, Smartcards, etc..
One can now re-calculate the hashcode for these items and search the database(s) for all items equalling this item. One will find information for about that given pauent, and only that patient, and will not be able to view or access data of another paϋent via that hashcode. This search can be earned out using a PC which has software capable of interacting with the data base engine where the data is stored. One example is a client/server configuration where the data base engine and the data resides on the server and the client, for example a PC, has access to the server over a public or limited-access network. For example a server running under UNLX or Microsoft's NT could have on it a data base such as Oracle's data base engine or be configured as a Web site. The server would be accessible by PC or Macintosh (the client) over a virtual pnvate network or an intranet, extranet or the Internet. The client software could be a propnetary software package or one could use an Internet browser such as Netscape or Microsoft's Internet Explorer if the server was set up as a Web site. This invention could also be used m the context of a main-frame operation where access is through terminal emulation. This can also be used m a website where the physician views analytics, e.g. nsks in a patient population. The database holds anonymous information about all patients. If she selects any patient m the analysis the system quenes her local databases and tnes to find the hashcode value from that patient in the local database. If found it means that this physician knows the patient and the site can retneve the patient's name etc. from the local database and display this instead of the hash value. Therefore the physician would see which patients this is and can act. (not clear on this point) Example:
Figure imgf000012_0001
Figure imgf000013_0001
Related Standards
Base64 Encoding RFC 2045 Multipurpose Internet Mail Extensions (MIME), Part One Format of Internet Message Bodies SHA-1 Hash Encoding Standard http //csrc nist gov/fips/fip 180-1 txt and http //www rsa com/rsalabs/faq/
Example 1 A server is set up which separates paUent data and medical patient data but links both data sources on the server through a unique ID generated on the server. Both data bases have different access/user nghts and are only accessible through a COM-object layer that controls access to the database.
The patient database contains a unique patient ID that is generated from unique information associated solely with that patient. In Germany the Insurance Company Code Number and the patient's member number in the company are taken together and are encoded into a fixed binary number by applying a hash-code algonthm (SHA-1) to the data. Then there is applied a base 64 algonthm on top of that. If a unique ldenfrfier is not readily available one can create one for patients, e.g., a master patient index.
This hash-code algonthm generates a fixed bit number, e.g. a 160 bit number, that identifies uniquely the patient Each healthcare provider can transmit anonymous patient data to the central server with just this hash-code identifier number. Each provider can calculate this number All providers will calculate the same number independently without shanng information beforehand. The database on the server can link all data elements across all providers using this number since all providers will generate the same number for a given patient. The database on the server will not have any of the patient data from which was generated the hash-code number Therefore by one or more providers can send anonymously lab results, prescnptions, treatment protocols, etc, to one or more databases where the data is linked up, in a given database, with the patient and held anonymously
In addition, analyses can be performed on the patient's data in the database and can be resent to the healthcare provider via a Web page and still remain anonymous This is accomplished as follows
On the client this Web page replaces a piece of HTML (a <DFV> section) by DTHML scnpt code dunng the load process of the Web page with the patient's real name and other identifying information This can be done since the client's Web page, (browser1?) quenes a file on the client PC or Macintosh (XXX IDT/UMOD object) withm the scnpt, for the ID (hash-cone number) that has also been stored m the local database. If the scnpt finds the ID, it replaces the DIV section with the actual patient name. If not, it leaves a marker stating that the patient can not be re-identified. Therefore the user sees all her patients and their data without any additional steps.
Example 2 In scenanos where one already have some large pools of anonymous data in this format one can ask a patient if he is willing to give his ID-elements and then one can generate the hashed value and can access the data from this patient; e.g. you get all Rx information and offer a patient to access the database over the web to get refills etc.
For example one can use e-forms document to see how such a service is useful for exchanging electronic prescnptions and how to later on offer per-patient service from the back-end data.

Claims

What is claimed is-
1. In a networked computer system, a method for anonymizing data unique to a data owner m a data file by: identifying at least one unique alpha-numenc identifier for a data owner having owner-associated data in the data file; generating a hashcode using a computational device for said unique identifier; optionally encrypting the hashcode; linking the hashcode with owner's data; communicating the code and data to a database; and providing the data owner or authonzed party with computational means for accessing and querying the database over a network; and on said computational means, a computational method for deciphenng the hash- code and associating it with the owner and data transmitted to the computational means from the query made to the database.
2. A secure networked computational system for stonng and accessing data solely by a data owner or an authonzed party, the system compnsing: a) a server capable of receiving and stonng a hash-code, data associated with that hash-code and linking said hash-code with its associated data; b) a client networked with or capable of linking up with the server and having on it: l) a means for generating a hash-code based on a unique identifier associated with the data owner; n) a means for communicating said hash-code and associated data to the server; n) a means for querying the data on the server for instances of a hash- code and data associated with it and receiving the results of said query; and in) a means for decoding the hash-code to re-identify the data owner and associated data.
3. A method for preventing theft and use by a third party of a data owner's data residing on a networked computer-readable medium, which method compnses: generating a hashcode for an identifier unique to the data owner using a computational device; optionally encrypting the hashcode; linking the hashcode with owner's data; communicating the code and data to a networked database, providing the data owner or authonzed party with computational means for accessing and querying the database over a network; and on said computational means, a computational method for deciphenng the hash-code and associating it with the owner and data transmitted to the computational means from the query made to the database.
PCT/EP2000/008301 1999-09-02 2000-08-24 Method for anonymizing data WO2001018631A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB9920644.3 1999-09-02
GBGB9920644.3A GB9920644D0 (en) 1999-09-02 1999-09-02 Novel method

Publications (1)

Publication Number Publication Date
WO2001018631A1 true WO2001018631A1 (en) 2001-03-15

Family

ID=10860150

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2000/008301 WO2001018631A1 (en) 1999-09-02 2000-08-24 Method for anonymizing data

Country Status (2)

Country Link
GB (1) GB9920644D0 (en)
WO (1) WO2001018631A1 (en)

Cited By (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE10126138A1 (en) * 2001-05-29 2002-12-12 Siemens Ag Tamper-proof and censorship-resistant personal electronic health record
WO2003093956A1 (en) * 2002-04-29 2003-11-13 Mediweb Oy Storing sensitive information
EP1394680A1 (en) * 2002-08-29 2004-03-03 Mobile Management GmbH Procedure for providing data
WO2004031922A2 (en) * 2002-10-03 2004-04-15 Avoca Systems Limited Method and apparatus for secure data storage
WO2004084050A1 (en) * 2003-03-21 2004-09-30 Koninklijke Philips Electronics N.V. User identity privacy in authorization certificates
WO2004090697A1 (en) 2003-04-11 2004-10-21 Jouko Kronholm A method in data transmission, a data transmission system, and a device
WO2004068820A3 (en) * 2003-01-23 2004-11-11 Unspam Llc Method and apparatus for a non-revealing do-not-contact list system
WO2005022414A2 (en) * 2003-08-22 2005-03-10 Oracle International Corporation Method and apparatus for protecting private information within a database
FR2881248A1 (en) * 2005-01-26 2006-07-28 France Telecom Personal medical data management system for insured patient, has computing subsystem with units to generate common key from identification data of person, and another subsystem with database associating sensitive personal data to key
WO2006111205A1 (en) * 2005-04-22 2006-10-26 Daon Holdings Limited A system and method for protecting the privacy and security of stored biometric data
WO2007110035A1 (en) * 2006-03-17 2007-10-04 Deutsche Telekom Ag Method and device for the pseudonymization of digital data
WO2008027247A2 (en) * 2006-08-28 2008-03-06 National Biometric Security Project A method and system for authenticating and validating identities based on multi-modal biometric templates and special codes in a substantially anonymous process
WO2008034841A2 (en) * 2006-09-20 2008-03-27 SIEMENS AKTIENGESELLSCHAFT öSTERREICH Method for controlling access and access control system for digital contents
EP1939785A2 (en) * 2006-12-18 2008-07-02 Surveillance Data, Inc. System and method for the protection of de-identification of health care data
US7522751B2 (en) 2005-04-22 2009-04-21 Daon Holdings Limited System and method for protecting the privacy and security of stored biometric data
EP1763834A4 (en) * 2004-05-05 2009-08-26 Ims Software Services Ltd Mediated data encryption for longitudinal patient level databases
EP2098976A1 (en) * 2007-12-04 2009-09-09 Orbis Patents Limited Secure method and system for the upload of data
US20100204973A1 (en) * 2009-01-15 2010-08-12 Nodality, Inc., A Delaware Corporation Methods For Diagnosis, Prognosis And Treatment
DE102009016419A1 (en) * 2009-04-04 2010-10-07 Az Direct Gmbh Method for storage of data sets, involves converting identification data into record label, where record label with confidential data is transmitted to trust office
US7865376B2 (en) 1999-09-20 2011-01-04 Sdi Health Llc System and method for generating de-identified health care data
US7925704B2 (en) 2004-04-29 2011-04-12 Unspam, Llc Method and system for a reliable distributed category-specific do-not-contact list
EP2426617A1 (en) * 2010-09-03 2012-03-07 Wolfgang Hüffer Method for anonymous compiling of confidential data and accompanying identification data
WO2013064730A1 (en) * 2011-10-31 2013-05-10 Nokia Corporation Method and apparatus for providing authentication using hashed personally identifiable information
DE102012202701A1 (en) * 2012-02-22 2013-08-22 Siemens Aktiengesellschaft Method for processing patient-related data records
EP2752821A2 (en) 2013-01-02 2014-07-09 Albert Kuiper Enhancement of enforcing road user charging
US8930404B2 (en) 1999-09-20 2015-01-06 Ims Health Incorporated System and method for analyzing de-identified health care data
EP2843585A1 (en) * 2013-09-03 2015-03-04 Kabel Deutschland Vertrieb und Service GmbH Method and system for providing anonymised data from a database
JP2015526757A (en) * 2012-06-29 2015-09-10 ペンタ・セキュリティ・システムズ・インコーポレーテッド Generation and verification of alternative data with a specific format
US9141758B2 (en) 2009-02-20 2015-09-22 Ims Health Incorporated System and method for encrypting provider identifiers on medical service claim transactions
WO2017141065A1 (en) * 2016-02-18 2017-08-24 MAGYAR, Gábor Data management method and registration method for an anonymous data sharing system, as well as data manager and anonymous data sharing system
WO2017161464A1 (en) 2016-03-21 2017-09-28 Thomas Krech Software having control logic for secure transmission of personal data via the internet from computers to the server, with secure storage of the data on servers
WO2018009979A1 (en) * 2016-07-15 2018-01-18 E-Nome Pty Ltd A computer implemented method for secure management of data generated in an ehr during an episode of care and a system therefor
US11688015B2 (en) 2009-07-01 2023-06-27 Vigilytics LLC Using de-identified healthcare data to evaluate post-healthcare facility encounter treatment outcomes

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5606610A (en) * 1993-11-30 1997-02-25 Anonymity Protection In Sweden Ab Apparatus and method for storing data
EP0884670A1 (en) * 1997-06-14 1998-12-16 International Computers Limited Secure database

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5606610A (en) * 1993-11-30 1997-02-25 Anonymity Protection In Sweden Ab Apparatus and method for storing data
EP0884670A1 (en) * 1997-06-14 1998-12-16 International Computers Limited Secure database

Cited By (60)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8930404B2 (en) 1999-09-20 2015-01-06 Ims Health Incorporated System and method for analyzing de-identified health care data
US7865376B2 (en) 1999-09-20 2011-01-04 Sdi Health Llc System and method for generating de-identified health care data
US9886558B2 (en) 1999-09-20 2018-02-06 Quintiles Ims Incorporated System and method for analyzing de-identified health care data
EP1262855A3 (en) * 2001-05-29 2003-08-20 Siemens Aktiengesellschaft Personal electronic medical record secured against sabotage and which is censorship-resistant
DE10126138A1 (en) * 2001-05-29 2002-12-12 Siemens Ag Tamper-proof and censorship-resistant personal electronic health record
WO2003093956A1 (en) * 2002-04-29 2003-11-13 Mediweb Oy Storing sensitive information
EP1394680A1 (en) * 2002-08-29 2004-03-03 Mobile Management GmbH Procedure for providing data
WO2004031922A2 (en) * 2002-10-03 2004-04-15 Avoca Systems Limited Method and apparatus for secure data storage
WO2004031922A3 (en) * 2002-10-03 2004-09-16 Avoca Systems Ltd Method and apparatus for secure data storage
WO2004068820A3 (en) * 2003-01-23 2004-11-11 Unspam Llc Method and apparatus for a non-revealing do-not-contact list system
US7461263B2 (en) 2003-01-23 2008-12-02 Unspam, Llc. Method and apparatus for a non-revealing do-not-contact list system
US20150281154A1 (en) * 2003-01-23 2015-10-01 Matthew B. Prince Method and apparatus for a non-revealing do-not-contact list system
US9699125B2 (en) 2003-01-23 2017-07-04 Unspam, Llc Method and apparatus for a non-revealing do-not-contact list system
US7941842B2 (en) 2003-01-23 2011-05-10 Unspam, Llc. Method and apparatus for a non-revealing do-not-contact list system
US8904490B2 (en) 2003-01-23 2014-12-02 Unspam, Llc Method and apparatus for a non-revealing do-not-contact list system
US20110289321A1 (en) * 2003-01-23 2011-11-24 Prince Matthew B Method and apparatus for a non-revealing do-not-contact list system
WO2004084050A1 (en) * 2003-03-21 2004-09-30 Koninklijke Philips Electronics N.V. User identity privacy in authorization certificates
WO2004090697A1 (en) 2003-04-11 2004-10-21 Jouko Kronholm A method in data transmission, a data transmission system, and a device
WO2005022414A3 (en) * 2003-08-22 2005-06-16 Oracle Int Corp Method and apparatus for protecting private information within a database
WO2005022414A2 (en) * 2003-08-22 2005-03-10 Oracle International Corporation Method and apparatus for protecting private information within a database
US7606788B2 (en) * 2003-08-22 2009-10-20 Oracle International Corporation Method and apparatus for protecting private information within a database
US7925704B2 (en) 2004-04-29 2011-04-12 Unspam, Llc Method and system for a reliable distributed category-specific do-not-contact list
EP1763834A4 (en) * 2004-05-05 2009-08-26 Ims Software Services Ltd Mediated data encryption for longitudinal patient level databases
WO2006079752A1 (en) * 2005-01-26 2006-08-03 France Telecom System and method for the anonymisation of sensitive personal data and method of obtaining such data
FR2881248A1 (en) * 2005-01-26 2006-07-28 France Telecom Personal medical data management system for insured patient, has computing subsystem with units to generate common key from identification data of person, and another subsystem with database associating sensitive personal data to key
US8607332B2 (en) 2005-01-26 2013-12-10 France Telecom System and method for the anonymisation of sensitive personal data and method of obtaining such data
US7522751B2 (en) 2005-04-22 2009-04-21 Daon Holdings Limited System and method for protecting the privacy and security of stored biometric data
WO2006111205A1 (en) * 2005-04-22 2006-10-26 Daon Holdings Limited A system and method for protecting the privacy and security of stored biometric data
WO2007110035A1 (en) * 2006-03-17 2007-10-04 Deutsche Telekom Ag Method and device for the pseudonymization of digital data
WO2008027247A2 (en) * 2006-08-28 2008-03-06 National Biometric Security Project A method and system for authenticating and validating identities based on multi-modal biometric templates and special codes in a substantially anonymous process
WO2008027247A3 (en) * 2006-08-28 2008-06-19 Nat Biometric Security Project A method and system for authenticating and validating identities based on multi-modal biometric templates and special codes in a substantially anonymous process
WO2008034841A3 (en) * 2006-09-20 2008-05-15 Siemens Ag Oesterreich Method for controlling access and access control system for digital contents
WO2008034841A2 (en) * 2006-09-20 2008-03-27 SIEMENS AKTIENGESELLSCHAFT öSTERREICH Method for controlling access and access control system for digital contents
EP1939785A3 (en) * 2006-12-18 2011-12-28 SDI Health LLC System and method for the protection of de-identification of health care data
US9355273B2 (en) 2006-12-18 2016-05-31 Bank Of America, N.A., As Collateral Agent System and method for the protection and de-identification of health care data
EP2953053A1 (en) * 2006-12-18 2015-12-09 SDI Health LLC System and method for the protection of de-identification of health care data
EP1939785A2 (en) * 2006-12-18 2008-07-02 Surveillance Data, Inc. System and method for the protection of de-identification of health care data
EP2098976A1 (en) * 2007-12-04 2009-09-09 Orbis Patents Limited Secure method and system for the upload of data
US20100204973A1 (en) * 2009-01-15 2010-08-12 Nodality, Inc., A Delaware Corporation Methods For Diagnosis, Prognosis And Treatment
US9141758B2 (en) 2009-02-20 2015-09-22 Ims Health Incorporated System and method for encrypting provider identifiers on medical service claim transactions
DE102009016419B4 (en) * 2009-04-04 2011-03-31 Az Direct Gmbh A method for securely storing records containing confidential data and associated identification data
DE102009016419A1 (en) * 2009-04-04 2010-10-07 Az Direct Gmbh Method for storage of data sets, involves converting identification data into record label, where record label with confidential data is transmitted to trust office
US11688015B2 (en) 2009-07-01 2023-06-27 Vigilytics LLC Using de-identified healthcare data to evaluate post-healthcare facility encounter treatment outcomes
EP2426617A1 (en) * 2010-09-03 2012-03-07 Wolfgang Hüffer Method for anonymous compiling of confidential data and accompanying identification data
US9847982B2 (en) 2011-10-31 2017-12-19 Nokia Technologies Oy Method and apparatus for providing authentication using hashed personally identifiable information
WO2013064730A1 (en) * 2011-10-31 2013-05-10 Nokia Corporation Method and apparatus for providing authentication using hashed personally identifiable information
CN104137129A (en) * 2012-02-22 2014-11-05 西门子公司 Method for processing patient-based data sets
EP2766863A1 (en) * 2012-02-22 2014-08-20 Siemens Aktiengesellschaft Method for processing patient-based data sets
DE102012202701A1 (en) * 2012-02-22 2013-08-22 Siemens Aktiengesellschaft Method for processing patient-related data records
JP2015526757A (en) * 2012-06-29 2015-09-10 ペンタ・セキュリティ・システムズ・インコーポレーテッド Generation and verification of alternative data with a specific format
EP2752821A2 (en) 2013-01-02 2014-07-09 Albert Kuiper Enhancement of enforcing road user charging
WO2015032791A1 (en) * 2013-09-03 2015-03-12 Kabel Deutschland Vertrieb Und Service Gmbh Method and system for providing anonymised data from a database
US9971898B2 (en) 2013-09-03 2018-05-15 Kabel Deutschland Vertrieb Und Service Gmbh Method and system for providing anonymized data from a database
EP2843585A1 (en) * 2013-09-03 2015-03-04 Kabel Deutschland Vertrieb und Service GmbH Method and system for providing anonymised data from a database
WO2017141065A1 (en) * 2016-02-18 2017-08-24 MAGYAR, Gábor Data management method and registration method for an anonymous data sharing system, as well as data manager and anonymous data sharing system
US11263344B2 (en) 2016-02-18 2022-03-01 Xtendr Zrt. Data management method and registration method for an anonymous data sharing system, as well as data manager and anonymous data sharing system
WO2017161464A1 (en) 2016-03-21 2017-09-28 Thomas Krech Software having control logic for secure transmission of personal data via the internet from computers to the server, with secure storage of the data on servers
CH712285A1 (en) * 2016-03-21 2017-09-29 Krech Thomas Software with control logic for converting personalized personal data into de-personalized personal data and transmitting the de-personalized data to a server.
WO2018009979A1 (en) * 2016-07-15 2018-01-18 E-Nome Pty Ltd A computer implemented method for secure management of data generated in an ehr during an episode of care and a system therefor
US11562812B2 (en) 2016-07-15 2023-01-24 E-Nome Pty Ltd Computer implemented method for secure management of data generated in an EHR during an episode of care and a system therefor

Also Published As

Publication number Publication date
GB9920644D0 (en) 1999-11-03

Similar Documents

Publication Publication Date Title
WO2001018631A1 (en) Method for anonymizing data
Li et al. Blockchain-based data preservation system for medical data
CN106790250B (en) Data processing, encryption, integrity verification method and identity authentication method and system
US9208491B2 (en) Format-preserving cryptographic systems
US8208627B2 (en) Format-preserving cryptographic systems
US11488134B2 (en) Format-preserving cryptographic systems
US8639947B2 (en) Structure preserving database encryption method and system
US8077870B2 (en) Cryptographic key split binder for use with tagged data elements
US20040208316A1 (en) Cryptographic key split binder for use with tagged data elements
JP2005522775A (en) Information storage system
Hacigümüş et al. Ensuring the integrity of encrypted databases in the database-as-a-service model
Giereth On partial encryption of rdf-graphs
Simmons Secure communications and asymmetric cryptosystems
JP4594078B2 (en) Personal information management system and personal information management program
GB2479074A (en) A key server selects policy rules to apply to a key request based on an identifier included in the request
Silva An overview of cryptographic hash functions and their uses
WO2022137668A1 (en) Data file encoding transmision/reception system, and data file encoding transmission/reception method
Brincat et al. New CBC-MAC forgery attacks
Hassinen et al. Client controlled security for web applications
Trombetta et al. Private updates to anonymous databases
Anyapu et al. Message Security Through Digital Signature Generation and Message Digest Algorithm
Yeh et al. Integrity coded databases-protecting data integrity for outsourced databases
Jarvis Protecting sensitive credential content during trust negotiation
WO2023126491A1 (en) Method and system for generating digital signatures using universal composition
Wang et al. Method to Implement Hash-Linking Based Content Integrity Service

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A1

Designated state(s): US

AL Designated countries for regional patents

Kind code of ref document: A1

Designated state(s): AT BE CH CY DE DK ES FI FR GB GR IE IT LU MC NL PT SE

121 Ep: the epo has been informed by wipo that ep was designated in this application
DFPE Request for preliminary examination filed prior to expiration of 19th month from priority date (pct application filed before 20040101)
122 Ep: pct application non-entry in european phase
DPE2 Request for preliminary examination filed before expiration of 19th month from priority date (pct application filed from 20040101)