WO2005093582A2 - Method and system for the storage of data - Google Patents

Method and system for the storage of data Download PDF

Info

Publication number
WO2005093582A2
WO2005093582A2 PCT/EP2005/003309 EP2005003309W WO2005093582A2 WO 2005093582 A2 WO2005093582 A2 WO 2005093582A2 EP 2005003309 W EP2005003309 W EP 2005003309W WO 2005093582 A2 WO2005093582 A2 WO 2005093582A2
Authority
WO
WIPO (PCT)
Prior art keywords
data
tag
genetic fingerprint
repository
cipher
Prior art date
Application number
PCT/EP2005/003309
Other languages
French (fr)
Other versions
WO2005093582A3 (en
Inventor
Matthias Pfeiffer
Uwe Bicker
Deborah J. Allen
Original Assignee
Genonyme Gmbh
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from GB0406852A external-priority patent/GB0406852D0/en
Application filed by Genonyme Gmbh filed Critical Genonyme Gmbh
Publication of WO2005093582A2 publication Critical patent/WO2005093582A2/en
Publication of WO2005093582A3 publication Critical patent/WO2005093582A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • G06F21/32User authentication using biometric data, e.g. fingerprints, iris scans or voiceprints

Definitions

  • the invention relates to a method and system for the storage of data, particularly of sensitive data.
  • the storage of sensitive data is subject to a number of rules and restrictions. For example in Europe, the release and access to such data is governed by data protection laws which often require that access to the sensitive data be restricted to those having a need to know and in some cases require that the data be encrypted. Access is generally given by the input of a password into the computer. If the correct password is entered then access is granted. More recently smart cards with or without password input have been used to grant access to data.
  • PGP Public Key Integrity Protocol
  • This system uses both a public key and a private key.
  • the public key can be freely transmitted.
  • the private key is kept secure and only used by the sender who generates and sends an encrypted document.
  • the receiver of an encrypted document uses his or her private key, together with the public key of the sender to decrypt the document.
  • a number of patent documents are known in which sensitive data, such as a patient's personal health-related information, are stored and/or transmitted using encryption methods. For example, PTC Application No.
  • WO-A-03/025798 (Califano - assigned to First Genetic Trust Inc.) teaches a system and method for maintaining an individual's privacy such that only he or she could authorise the use of his genotype data.
  • the system includes a safe in which the individual's medical information is stored.
  • the safe's encryption mechanisms and certificates only allow designated parties to access the data.
  • the encryption mechanisms and certificates restrict the use of the data in studies through software that is certified to be able to analyse the data without releasing it in any form that would violate the individual's identity.
  • a related international patent application No. WO-A-03/019159 (Califano et al, assigned to First Genetic Trust) teaches the concept of a virtual private identity (VPI) which comprises a random number or some other type of identifier used in a database to store genetic data.
  • the other type of identifier lacks any information that can be used to determine identity information.
  • the data stored in association with a respective VPI may be encrypted with an encryption key generated from the VPI.
  • the VPI is not generated from the genetic data.
  • US-A-2003/074564 teaches another method of storing personal medical information records without jeopardizing the privacy of an individual.
  • the medical information records are portable and accessible throughout the world via the web.
  • a "key" known only to the individual allows access only to the individual for authorised use.
  • the medical information is stored in an encrypted format. There is no linkage at server level between the individual identifying data and the medical information, except by access by the individual.
  • the system allows real-time altering and updating of information using a personal identifier plus a password selected by the individual.
  • the identifier may be printed on a card or otherwise carried on the individual's person. The individual further chooses a second unique identifier for use when the card is not available.
  • US-A-2002/124177 (Harper and Stout) teaches a method and system for encrypting and decryting electronic files using an essesntially symmetric cipher or key system. This system is described as being adapted to electronically store medical records.
  • US Patent No. 6, 463, 417 (Schoenberg, assigned to CareKey.com, Inc.) teaches a method and system for distributing health information which is categorised into a variety of privacy levels. A requestor is assigned an access security code to allow access to the health information. The degree of access, i.e. the degree of privacy, depends on the access security code.
  • US Patent Application US-A-2001/051881 also describes a system and method for managing a medical services network.
  • This patent application includes diagnostic data which is obtained by a diagnostic service and which is sent over a network to a diagnostic interpreter. Subsequently, the interpretation and/or the the diagnostic data may be transmitted to a display via a network.
  • WO-A-02/082347 Copper, assigned to Inner Vision Imaging LLC.
  • the network of this patent application preferebaly includes two channels. Encrypted patent identifiable data is transmitted over one channel whilst unencrypted patient medical condition or treatment data is transmitted over the other channel.
  • the unauthorised person can access the data. If the identity of a subject is stored, then the unauthorised person can misuse the data. For example, the unauthorised person might ascertain that a certain person is suffering from a disease and pass this information onto an employer. It would therefore be desirable to store the data anonymously.
  • the data is stored together with an identification number. If new data needs to be entered either the user must supply the identification number - which means that he or she can be identified - or the data must be stored under a new identification number. In the latter case there is no possibility of correlation between the various items of data.
  • biometric data is stored in a central server and if a user is to be identified, the biometric data is captured from the user and compared with the stored data.
  • the biometric data includes a genetic fingerprint. Whilst this allows identification of a user, it does not provide for anonymous storage of any data.
  • PCT Patent Application No. WO 01/11577 also teaches a method and system for allowing access to sensitive data using biometric data.
  • the sensitive data is provided on a portable data carrier, such as a smart card.
  • the biometric data is fingerprint data.
  • Japanese Patent Publication Number 2002-175280 shows a gene information utilisation system which utilises the gene information to generate a cipher based on the gene information. This patent publication fails, however, to disclose the method by which the cipher is generated.
  • the method has several advantages. It is based upon internationally - recognised technology for analysis and processing of genetic information. It allows complex genetic information to be analysed and processed in a simplified form. It utilises structural genetic elements which are unique to an individual, thereby allowing unique encryption ciphers to be generated. In the event that the cipher is lost, then re-analysis of the DNA can be carried out and, as long as the original method for generating the encryption cipher is known, the encryption cipher can be regenerated.
  • the individual sends a biological sample to a trust centre which carries out the DNA analysis and generates the cipher. Only the trust centre knows the algorithm by which the encryption cipher is generated and only the trust centre can re-generate the encryption cipher.
  • the encryption cipher is stored in a cipher repository, such as a database, for use by authorised users.
  • the trust centre can generate a decryption cipher from the genetic fingerprint.
  • the decryption cipher will be needed by another group of users to decrypt the encrypted data.
  • the object of the invention is also solved by providing a system for the storage of data which has a data repository connected to data entry means, a biological sample analyser and an encryption cipher generator.
  • the biological sample analyser is independent of the data repository, so that it is not possible for an unauthorised person to gain access to both the biological sample analyser and the data repository and therefore access to the encrypted data within the data repository.
  • the encryption cipher generator uses Items of data derived from the genetic fingerprint and based upon structural polymorphisms.
  • the biological sample analyser is used to generate the genetic fingerprint from a biological sample provided by an individual or test subject whose sensitive data is to be encrypted.
  • the system includes an encryption cipher generator, it further includes a decryption cipher generator which generates a different cipher to allow decryption of the data.
  • This object of the invention is solved by providing a method for the anonymous storage of sensitive data from a test subject.
  • the method has a first step of generating one or more tags from a genetic fingerprint and a second step of annotating and storing the sensitive data with the one or more tags.
  • sensitive data includes but is not limited to data such as the medical history or medical results of a test subject.
  • the sensitive data could include purchasing details, the criminal record or credit record of the test subject.
  • any data which needs to be kept confidential and restricted only to a certain group of people can be stored using this method.
  • the genetic fingerprint of every individual is unique and therefore using the genetic fingerprint allows a tag to be generated which is unique. It is not necessary to use all of the genetic data in the genetic fingerprint to generate the tag. Only a selected portion of genetic data need to be used, as long as the selected portion of genetic data is selected to allow a unique identification to be made.
  • the advantage of using the genetic fingerprint to generate the tag is that the tag can be regenerated if it is lost by re-analysing the genetic fingerprint.
  • PCR polymerase chain reaction
  • a trust centre is supplied with a biological sample in order to produce a genetic fingerprint and then generate the tag.
  • both a private tag and a public tag are generated.
  • the private tag and public tag are distinct from each other but each comprises part of the unique genetic fingerprint. This stipulation applies equally to situations where more than one private tag and / or more than one public tag are generated.
  • the public tag can be generally disclosed and can be initially attached to the sensitive data to uniquely identify the sensitive data. For long term storage of the sensitive data - and public analysis of the sensitive data - the private tag is attached to the sensitive data to uniquely identify the sensitive data. This private tag is mapped to the public tag, but does not allow identification of the test subject from which the data is obtained.
  • the sensitive data can also be encrypted using an encryption key. The encryption key can also be generated from the genetic fingerprint.
  • the objects of the invention can also be solved by providing a system for the storage of sensitive data which has a data repository connected to data entry means and a reference table, such as a look-up table, having a private tag and a public tag.
  • a reference table such as a look-up table
  • the sensitive data is supplied by the data entry means with the public tag and stored in the data repository with the private tag.
  • Fig. 1 shows an overview of the system of the invention.
  • Fig. 2 shows a flow diagram for the generation of a tag.
  • Fig. 1 shows an overview of a system 10 for the storage of sensitive data from a test subject 70 in accordance with this invention.
  • the sensitive data can include, but is not limited to, address data, purchasing data, medical data, and any other types of data which is personal to the test subject 70 and which release could be detrimental to the test subject 70.
  • the system 10 comprises a database 20 having a plurality of records 30 stored therewithin.
  • Each of the records 30 has an identifier 30a, an item of information 30b, such as medical information, and a tag 30c.
  • the record 30 is only one example of a record that can be stored in the database 20 and other types of records can be stored.
  • the database 20 could be a database such as the UK
  • Biobank see for example www.ukbiobank.co.uk - accessed on 23 March 2004 or one of the databases of the US National Institutes of Health.
  • the database 20 could also be a database of other confidential data, the access of which has to be limited because of data protection laws or similar requirements.
  • the identifier 30a is a public identifier which is given to the item of information 30b.
  • the identifier 30a could refer to the test subject 70 (such as a particular patient) or it could be an entirely random number.
  • the identifier 30a comprises the name of the patient and further identifiers such as the date of birth of the patient. It is, however, not unknown for two patients in the same hospital or surgery to have identical names and dates of birth and therefore further identifiers must be added to the identifier 30a to distinguish the two patients.
  • the item of information 30b could be an item of sensitive data, such as medical data or other confidential data.
  • the item of information 30b could be, for example, digitalised data from an X-ray examination, a blood test, tissue probe or genetic information.
  • the item of information 30b could furthermore be the name and address of a client or it could relate to the purchases made by the test subject.
  • the tag 30c refers to the test subject 70 and its generation will be explained later.
  • the tag 30c is a so-called "public" tag.
  • the public tag 30c is provided by the test subject 70 to the hospital, doctor, etc. to allow a single unique identification of the items of information 30b.
  • the number of possible public tags 30c is many times the population of the world and thus any possible confusion between any two test subjects 70 should be considered to be negligible.
  • the database 20 is connected to a further data repository 40.
  • the data repository 40 contains data records with items of information 30b and a private tag 30d.
  • the private tag 30d is generated as described below. Mapping between the public tag 30c and the private tag 30d is carried out in a tag repository 85.
  • the tag repository 85 can be implemented as a look-up table 85.
  • the look-up table 85 is generally not on-line to avoid possible access of the information stored therein by hackers.
  • the look-up table is temporarily accessed and the result of the mapping operation returned. This could be done by sending a message to the look-up table 85 and receiving an answer or it could be done by temporarily establishing a secure connection to the look-up table 85 and receiving the results of the mapping operation.
  • the items of information 30b stored in the data repository 40 are stored completely anonymously. There is no possibility of correlating the items of information 30b with, for example, the test subject 70 from whom the items of information 30b were obtained. Transfer of items of information 30b from the database 20 to the data repository 40 can be carried out automatically by removing the identifier 30a or the test subject 70 can review the item of information 30b before authorising its storage in the data repository 40.
  • An interface 50 is connected to the data repository 40 which is connected, for example, to a computer, data server or Internet to allow access to the items of information 30b in the data repository 40. Since the items of information 30b are stored anonymously, there are few restrictions under data protection laws to prevent access to the items of information 30b. The only identifier attached to the items of information 30b is the private tag 30d. There is no reference either to the public tag 30c or to the identifier 30a and thus the items of information 30b are not traceable to the test subject 70.
  • the trust centre 80 stores personal data, such as details of the identity of the test subject 70, and generates the public tag 30c and the private tag 30d. However, the trust centre 80 is completely isolated from the data repository 40. In this context "complete isolated" means that there is no permanent direct connection through a network between the data repository 40 and the trust centre 80. There is no possibility of relating the items of information 30b stored in the data repository 40 to the personal data in the trust centre 80.
  • the trust centre stores both the public tag 30c and the private tag 30d in the look-up table 85.
  • the look-up table 85 can be either part of the trust centre 80 or it could be separate from the trust centre 80. In either case access to the look-up table 85 is restricted to only authorised users and security measures are in place to ensure that hacking into the look-up table 85 is impossible.
  • a first step 200 biological material is obtained or extracted from the test subject 70.
  • This biological material could be a mucus sample, a blood sample, or any other sample containing genetic material.
  • DNA is extracted from the biological material in step 210.
  • step 220 amplification of defined regions of the extracted DNA are carried out using standard PCR-based methods and using primers which are complementary to the conserved regions of the test subject's 70 DNA. Of course, should sufficient DNA be available from the biological material, PCR amplification does not need to be carried out.
  • step 230 the amplified DNA is fractionated using one or more standard biochemical separation techniques and in step 240 the information on a resulting genetic profile of the test subject 70 is stored in either digitised or non-digitised form.
  • step 250 the public tag 30c and the private tag 30d are generated using algorithms. Although the public tag 30c and the private tag 30d are generated from the full genetic profile, it is not possible to use the public tag 30c and the private tag 30d to trace back and subsequently identify the test subject 70. It is also not possible to derive the private tag 30d from the public tag 30c. This is achieved by choosing appropriate algorithms.
  • polymorphisms are short variations in the DNA sequence between individuals which occur even between related members of the same family. As a result, polymorphisms are commonly used for paternity testing and forensic cases.
  • STR segments are short tandem repeat (STR) segments in the DNA.
  • the use of STRs is known in the art and commercial kits are available to carry out an analysis, such as the Profiler Plus machine supplied by Applied Biosystems.
  • STRs are short sequences of DNA, normally of 2-5 base pairs, and are repeated numerous times in a head-tail or tandem manner.
  • the STR segments are amplified using PCR primers that bind in the conserved regions of DNA flanking each of the repeat sections. As the number of repeats within an STR locus is highly variable, the amplified STRs vary in length.
  • STRs have been studied extensively and are well-recognised as a system for the structural analysis of DNA.
  • STR loci there are a number known and documented STR loci.
  • the US Federal Bureau of Investigation uses the CODIS system to identify the perpetrators of crime.
  • the CODIS system uses thirteen different STR loci.
  • One of these loci is the D7S280 locus which is found on the human chromosome 7.
  • the tetrametic repeat sequence of D7S280 is "gata". Different alleles of this locus have from 6 to 15 tandem repeats of the "gata" sequence.
  • Others of the loci include vWA and FGA. Using the results of the analysis of the STRs a numerical result is obtained.
  • the CODIS system supplies the genotype of the test subject for the D3S1358 STR as a pair of numbers, e.g. 15 and 18. A pair of numbers is generated as one number relates to the paternal allele and the other to the maternal allele. Similarly for the vWA locus the genotype is a second pair of numbers 16, 16. The number of possible variations is substantially greater than the population of the planet and, as a result, a reliable identification system can be established based on the CODIS STR system.
  • Generation of the public tag 30c and the private tag 30d can be carried out using two separate and unrelated mathematical operations. This ensures that the private tag 30d cannot be obtained from the public tag 30c.
  • the CODIS STR analysis is divided into maternal and paternal components.
  • the paternal component is kept within the trust centre 80 and is not used to generate the private tag 30d.
  • the maternal component is used to generate the public tag 30d.
  • the biological material is submitted to the trust centre 80 using a unique random identification number to identify the test subject 70.
  • the biological material is not identified with any personal details of the test subject, in particular the name of the test subject 70 is not submitted with the biological material. Only the test subject 70 submitting the biological sample knows the random identification number. In the event that the random identification number is lost, a new random identification number is generated.
  • the results of the genetic analysis are stored in the trust centre 80 and are related to the random identification number.
  • the test subject 70 is sent the public tag 30c after it has been calculated as described above
  • the item of information 30b can be directly stored with the public tag 30c if the public tag 30c is known to the test subject 70 - for example it might be stored on the test subject's health card.
  • the public tag 30c can be calculated when the item of information 30b is obtained. This would of course mean that it is necessary for the algorithm from which the public tag 30c is generated to be publicly known, which may not be desirable.
  • the invention can also be used to generate an encryption cipher based on the genetic fingerprint.
  • This encryption cipher can be used to store the data in the data repository 40 in an encrypted manner.
  • the relevant encryption cipher is stored in a further database (either incorporated into the look-up table 85 or as a separate database for security reason) together with the public tag 30c.
  • a further database either incorporated into the look-up table 85 or as a separate database for security reason

Abstract

A method and system for the encryption of data and the anonymous storage of sensitive data from a test subject together with a private tag are disclosed. The method comprises a) a first step of generating one or more private tags from a genetic fingerprint; and b) a second step of using storing the sensitive data with the private tag. The encryption is carried out by generating a cipher from the genetic fingerprint and based upon structural polymorphisms.

Description

DESCRIPTION
Title
Method and System for the Storage of Data
Field of the Invention
The invention relates to a method and system for the storage of data, particularly of sensitive data.
Background to the Invention
The storage of sensitive data, such as medical data, is subject to a number of rules and restrictions. For example in Europe, the release and access to such data is governed by data protection laws which often require that access to the sensitive data be restricted to those having a need to know and in some cases require that the data be encrypted. Access is generally given by the input of a password into the computer. If the correct password is entered then access is granted. More recently smart cards with or without password input have been used to grant access to data.
The requirements for such the storage of sensitive data (or private information) are discussed in Canadian Patent Application CA-A-2 303 724 in connection with property insurance policies and claims verification. The teachings of this patent application are said also to be relevant to financial and medical recors, as well as the online photo documentation for sales and auctions, as well as online storage and retrieval of personal photos, wills and probates. All of this can be seen to be sensitive data.
A number of encryption systems for data are available on the marketplace. One of the most commonly used systems is the so-called PGP system. This is described in more detail on the website http://www.pgpi.org/doc/pqpintro/ (accessed on 26 March 2004). This system uses both a public key and a private key. The public key can be freely transmitted. The private key is kept secure and only used by the sender who generates and sends an encrypted document. The receiver of an encrypted document uses his or her private key, together with the public key of the sender to decrypt the document. A number of patent documents are known in which sensitive data, such as a patient's personal health-related information, are stored and/or transmitted using encryption methods. For example, PTC Application No. WO-A-03/025798 (Califano - assigned to First Genetic Trust Inc.) teaches a system and method for maintaining an individual's privacy such that only he or she could authorise the use of his genotype data. The system includes a safe in which the individual's medical information is stored. The safe's encryption mechanisms and certificates only allow designated parties to access the data. The encryption mechanisms and certificates restrict the use of the data in studies through software that is certified to be able to analyse the data without releasing it in any form that would violate the individual's identity.
A related international patent application No. WO-A-03/019159 (Califano et al, assigned to First Genetic Trust) teaches the concept of a virtual private identity (VPI) which comprises a random number or some other type of identifier used in a database to store genetic data. The other type of identifier lacks any information that can be used to determine identity information. The data stored in association with a respective VPI may be encrypted with an encryption key generated from the VPI. The VPI is not generated from the genetic data.
Similarly US-A-2003/074564 (Peterson) teaches another method of storing personal medical information records without jeopardizing the privacy of an individual. The medical information records are portable and accessible throughout the world via the web. A "key" known only to the individual allows access only to the individual for authorised use. The medical information is stored in an encrypted format. There is no linkage at server level between the individual identifying data and the medical information, except by access by the individual. The system allows real-time altering and updating of information using a personal identifier plus a password selected by the individual. The identifier may be printed on a card or otherwise carried on the individual's person. The individual further chooses a second unique identifier for use when the card is not available.
In a similar vein, US-A-2002/124177 (Harper and Stout) teaches a method and system for encrypting and decryting electronic files using an essesntially symmetric cipher or key system. This system is described as being adapted to electronically store medical records. US Patent No. 6, 463, 417 (Schoenberg, assigned to CareKey.com, Inc.) teaches a method and system for distributing health information which is categorised into a variety of privacy levels. A requestor is assigned an access security code to allow access to the health information. The degree of access, i.e. the degree of privacy, depends on the access security code.
US Patent Application US-A-2001/051881 (Filler) also describes a system and method for managing a medical services network. This patent application includes diagnostic data which is obtained by a diagnostic service and which is sent over a network to a diagnostic interpreter. Subsequently, the interpretation and/or the the diagnostic data may be transmitted to a display via a network.
Transmission of confidential medical data between a programming device and a remote data system is known in French Patent Application FR-A-2800481. The teachings of this patent include the use of a source of keys to deliver an encryption key to the programming device and a decryption key to the remote data system.
Another method for transmitting medical information over a network is taught in WO-A-02/082347 (Copper, assigned to Inner Vision Imaging LLC). The network of this patent application preferebaly includes two channels. Encrypted patent identifiable data is transmitted over one channel whilst unencrypted patient medical condition or treatment data is transmitted over the other channel.
Problems occur when an unauthorised person obtains the password or smart card. The unauthorised person can access the data. If the identity of a subject is stored, then the unauthorised person can misuse the data. For example, the unauthorised person might ascertain that a certain person is suffering from a disease and pass this information onto an employer. It would therefore be desirable to store the data anonymously.
These issues have been addressed in US Patent Application No. US-A- 2003/0217037 (Pfeiffer and Bicker) in which data can be stored anonymously and no link made with the person who supplied the data.
In the '037 patent application, the data is stored together with an identification number. If new data needs to be entered either the user must supply the identification number - which means that he or she can be identified - or the data must be stored under a new identification number. In the latter case there is no possibility of correlation between the various items of data.
It is known that the so-called genetic fingerprint of a person is unique to that person. This has been exploited in the past to allow access to data. For example, US Patent Application No. US-A-2002/0059521 (Tasler, assigned to Siemens) describes a method and system in which a user is identified using biometric data. This biometric data is stored in a central server and if a user is to be identified, the biometric data is captured from the user and compared with the stored data. The biometric data includes a genetic fingerprint. Whilst this allows identification of a user, it does not provide for anonymous storage of any data.
Similarly PCT Patent Application No. WO 01/11577 (Precise Biometrics) also teaches a method and system for allowing access to sensitive data using biometric data. In this case the sensitive data is provided on a portable data carrier, such as a smart card. The biometric data is fingerprint data.
Japanese Patent Publication Number 2002-175280 (Dai Nippon) shows a gene information utilisation system which utilises the gene information to generate a cipher based on the gene information. This patent publication fails, however, to disclose the method by which the cipher is generated.
Summary of the Invention
It is therefore an object of the invention to improve the encryption methods for data storage.
This is achieved by providing a method for the encryption of data which has a first step of generating an encryption cipher from items of data derived from a genetic fingerprint based upon structural polymorphisms and a second step of encrypting the data using the encryption cipher. The method has several advantages. It is based upon internationally - recognised technology for analysis and processing of genetic information. It allows complex genetic information to be analysed and processed in a simplified form. It utilises structural genetic elements which are unique to an individual, thereby allowing unique encryption ciphers to be generated. In the event that the cipher is lost, then re-analysis of the DNA can be carried out and, as long as the original method for generating the encryption cipher is known, the encryption cipher can be regenerated. In order to ensure confidentiality, the individual (or test subject) sends a biological sample to a trust centre which carries out the DNA analysis and generates the cipher. Only the trust centre knows the algorithm by which the encryption cipher is generated and only the trust centre can re-generate the encryption cipher. The encryption cipher is stored in a cipher repository, such as a database, for use by authorised users.
Similarly, the trust centre can generate a decryption cipher from the genetic fingerprint. The decryption cipher will be needed by another group of users to decrypt the encrypted data.
The object of the invention is also solved by providing a system for the storage of data which has a data repository connected to data entry means, a biological sample analyser and an encryption cipher generator. In this embodiment of the invention, the biological sample analyser is independent of the data repository, so that it is not possible for an unauthorised person to gain access to both the biological sample analyser and the data repository and therefore access to the encrypted data within the data repository. The encryption cipher generator uses Items of data derived from the genetic fingerprint and based upon structural polymorphisms.
The biological sample analyser is used to generate the genetic fingerprint from a biological sample provided by an individual or test subject whose sensitive data is to be encrypted.
Not only does the system includes an encryption cipher generator, it further includes a decryption cipher generator which generates a different cipher to allow decryption of the data.
It is also an object of the invention to store sensitive data, such as medical data, in an anonymous and secure manner.
This object of the invention is solved by providing a method for the anonymous storage of sensitive data from a test subject. The method has a first step of generating one or more tags from a genetic fingerprint and a second step of annotating and storing the sensitive data with the one or more tags. In this context "sensitive data" includes but is not limited to data such as the medical history or medical results of a test subject. The sensitive data could include purchasing details, the criminal record or credit record of the test subject. In brief, any data which needs to be kept confidential and restricted only to a certain group of people can be stored using this method.
The genetic fingerprint of every individual is unique and therefore using the genetic fingerprint allows a tag to be generated which is unique. It is not necessary to use all of the genetic data in the genetic fingerprint to generate the tag. Only a selected portion of genetic data need to be used, as long as the selected portion of genetic data is selected to allow a unique identification to be made. The advantage of using the genetic fingerprint to generate the tag is that the tag can be regenerated if it is lost by re-analysing the genetic fingerprint.
The best method to obtain the genetic fingerprint is from DNA analysis as this is highly repeatable and easily done. Even small amounts of DNA can be used as the material can be amplified using polymerase chain reaction (PCR) methods. PCR based methods form the basis of many advances in genetic fingerprinting and the subsequent analysis of data. One such analytical process demonstrates the presence of short tandem repeats, or STR's in an individual DNA and such technologies can be easily applied to enable generation of the tag.
A trust centre is supplied with a biological sample in order to produce a genetic fingerprint and then generate the tag.
In a preferred embodiment of the invention, both a private tag and a public tag are generated. The private tag and public tag are distinct from each other but each comprises part of the unique genetic fingerprint. This stipulation applies equally to situations where more than one private tag and / or more than one public tag are generated. The public tag can be generally disclosed and can be initially attached to the sensitive data to uniquely identify the sensitive data. For long term storage of the sensitive data - and public analysis of the sensitive data - the private tag is attached to the sensitive data to uniquely identify the sensitive data. This private tag is mapped to the public tag, but does not allow identification of the test subject from which the data is obtained. Finally in a further embodiment of the invention, the sensitive data can also be encrypted using an encryption key. The encryption key can also be generated from the genetic fingerprint. The objects of the invention can also be solved by providing a system for the storage of sensitive data which has a data repository connected to data entry means and a reference table, such as a look-up table, having a private tag and a public tag. As explained above, the sensitive data is supplied by the data entry means with the public tag and stored in the data repository with the private tag.
Description of the Drawings
Fig. 1 shows an overview of the system of the invention. Fig. 2 shows a flow diagram for the generation of a tag.
Detailed Description of the Invention
Fig. 1 shows an overview of a system 10 for the storage of sensitive data from a test subject 70 in accordance with this invention. The sensitive data can include, but is not limited to, address data, purchasing data, medical data, and any other types of data which is personal to the test subject 70 and which release could be detrimental to the test subject 70.
The system 10 comprises a database 20 having a plurality of records 30 stored therewithin. Each of the records 30 has an identifier 30a, an item of information 30b, such as medical information, and a tag 30c. The record 30 is only one example of a record that can be stored in the database 20 and other types of records can be stored. The database 20 could be a database such as the UK
Biobank (see for example www.ukbiobank.co.uk - accessed on 23 March 2004) or one of the databases of the US National Institutes of Health. The database 20 could also be a database of other confidential data, the access of which has to be limited because of data protection laws or similar requirements.
The identifier 30a is a public identifier which is given to the item of information 30b. The identifier 30a could refer to the test subject 70 (such as a particular patient) or it could be an entirely random number. Typically the identifier 30a comprises the name of the patient and further identifiers such as the date of birth of the patient. It is, however, not unknown for two patients in the same hospital or surgery to have identical names and dates of birth and therefore further identifiers must be added to the identifier 30a to distinguish the two patients. The item of information 30b could be an item of sensitive data, such as medical data or other confidential data. The item of information 30b could be, for example, digitalised data from an X-ray examination, a blood test, tissue probe or genetic information. The item of information 30b could furthermore be the name and address of a client or it could relate to the purchases made by the test subject.
The tag 30c refers to the test subject 70 and its generation will be explained later. The tag 30c is a so-called "public" tag. The public tag 30c is provided by the test subject 70 to the hospital, doctor, etc. to allow a single unique identification of the items of information 30b. The number of possible public tags 30c is many times the population of the world and thus any possible confusion between any two test subjects 70 should be considered to be negligible.
The database 20 is connected to a further data repository 40. The data repository 40 contains data records with items of information 30b and a private tag 30d. The private tag 30d is generated as described below. Mapping between the public tag 30c and the private tag 30d is carried out in a tag repository 85. The tag repository 85 can be implemented as a look-up table 85. The look-up table 85 is generally not on-line to avoid possible access of the information stored therein by hackers. When data is transferred from the database 20 to the data repository 40, the look-up table is temporarily accessed and the result of the mapping operation returned. This could be done by sending a message to the look-up table 85 and receiving an answer or it could be done by temporarily establishing a secure connection to the look-up table 85 and receiving the results of the mapping operation.
The items of information 30b stored in the data repository 40 are stored completely anonymously. There is no possibility of correlating the items of information 30b with, for example, the test subject 70 from whom the items of information 30b were obtained. Transfer of items of information 30b from the database 20 to the data repository 40 can be carried out automatically by removing the identifier 30a or the test subject 70 can review the item of information 30b before authorising its storage in the data repository 40.
An interface 50 is connected to the data repository 40 which is connected, for example, to a computer, data server or Internet to allow access to the items of information 30b in the data repository 40. Since the items of information 30b are stored anonymously, there are few restrictions under data protection laws to prevent access to the items of information 30b. The only identifier attached to the items of information 30b is the private tag 30d. There is no reference either to the public tag 30c or to the identifier 30a and thus the items of information 30b are not traceable to the test subject 70.
Generation of the public tag 30c and the private tag 30d is carried out from a genetic fingerprint supplied by the test subject 70 by means of a trust centre 80 as will be described later. The trust centre 80 stores personal data, such as details of the identity of the test subject 70, and generates the public tag 30c and the private tag 30d. However, the trust centre 80 is completely isolated from the data repository 40. In this context "complete isolated" means that there is no permanent direct connection through a network between the data repository 40 and the trust centre 80. There is no possibility of relating the items of information 30b stored in the data repository 40 to the personal data in the trust centre 80. The trust centre stores both the public tag 30c and the private tag 30d in the look-up table 85. The look-up table 85 can be either part of the trust centre 80 or it could be separate from the trust centre 80. In either case access to the look-up table 85 is restricted to only authorised users and security measures are in place to ensure that hacking into the look-up table 85 is impossible.
The generation of the public tag 30c and the private tag 30d will now be described with respect to Fig. 2. In a first step 200, biological material is obtained or extracted from the test subject 70. This biological material could be a mucus sample, a blood sample, or any other sample containing genetic material. Using methods known to the person skilled in the art, DNA is extracted from the biological material in step 210. In step 220, amplification of defined regions of the extracted DNA are carried out using standard PCR-based methods and using primers which are complementary to the conserved regions of the test subject's 70 DNA. Of course, should sufficient DNA be available from the biological material, PCR amplification does not need to be carried out.
In step 230, the amplified DNA is fractionated using one or more standard biochemical separation techniques and in step 240 the information on a resulting genetic profile of the test subject 70 is stored in either digitised or non-digitised form. Finally in step 250 the public tag 30c and the private tag 30d are generated using algorithms. Although the public tag 30c and the private tag 30d are generated from the full genetic profile, it is not possible to use the public tag 30c and the private tag 30d to trace back and subsequently identify the test subject 70. It is also not possible to derive the private tag 30d from the public tag 30c. This is achieved by choosing appropriate algorithms.
One example of the output of a genetic profiling technique is the detection of polymorphisms in the DNA. Polymorphisms are short variations in the DNA sequence between individuals which occur even between related members of the same family. As a result, polymorphisms are commonly used for paternity testing and forensic cases.
One class of polymorphisms are short tandem repeat (STR) segments in the DNA. The use of STRs is known in the art and commercial kits are available to carry out an analysis, such as the Profiler Plus machine supplied by Applied Biosystems. STRs are short sequences of DNA, normally of 2-5 base pairs, and are repeated numerous times in a head-tail or tandem manner. The STR segments are amplified using PCR primers that bind in the conserved regions of DNA flanking each of the repeat sections. As the number of repeats within an STR locus is highly variable, the amplified STRs vary in length.
STRs have been studied extensively and are well-recognised as a system for the structural analysis of DNA. As a result, there are a number known and documented STR loci. For example, the US Federal Bureau of Investigation uses the CODIS system to identify the perpetrators of crime. The CODIS system uses thirteen different STR loci. One of these loci is the D7S280 locus which is found on the human chromosome 7. The tetrametic repeat sequence of D7S280 is "gata". Different alleles of this locus have from 6 to 15 tandem repeats of the "gata" sequence. Others of the loci include vWA and FGA. Using the results of the analysis of the STRs a numerical result is obtained. To take one example, the CODIS system supplies the genotype of the test subject for the D3S1358 STR as a pair of numbers, e.g. 15 and 18. A pair of numbers is generated as one number relates to the paternal allele and the other to the maternal allele. Similarly for the vWA locus the genotype is a second pair of numbers 16, 16. The number of possible variations is substantially greater than the population of the planet and, as a result, a reliable identification system can be established based on the CODIS STR system.
This numerical result can be combined together by a mathematical method and the mathematical method used to generate both the public tag 30c and the private tag 30d. In the simplest method all of the digits could be conjoined together to give - in this example - one of the public tag 30c or the private tag 30d having the value 15181616 (i.e. 15+18+16+16).
Generation of the public tag 30c and the private tag 30d can be carried out using two separate and unrelated mathematical operations. This ensures that the private tag 30d cannot be obtained from the public tag 30c.
In another embodiment of the invention, the CODIS STR analysis is divided into maternal and paternal components. The paternal component is kept within the trust centre 80 and is not used to generate the private tag 30d. The maternal component is used to generate the public tag 30d.
Submission of the biological material to the trust centre 80 is carried out in accordance with the methods disclosed broadly in the afore-mentioned US Patent Application No. US-A-2003/0217307, the details of which are incorporated into this application. The biological material is submitted to the trust centre 80 using a unique random identification number to identify the test subject 70. The biological material is not identified with any personal details of the test subject, in particular the name of the test subject 70 is not submitted with the biological material. Only the test subject 70 submitting the biological sample knows the random identification number. In the event that the random identification number is lost, a new random identification number is generated. The results of the genetic analysis are stored in the trust centre 80 and are related to the random identification number. The test subject 70 is sent the public tag 30c after it has been calculated as described above
Input of any items of information 30b into the database 20 can be carried out in the following manner. The item of information 30b can be directly stored with the public tag 30c if the public tag 30c is known to the test subject 70 - for example it might be stored on the test subject's health card. Alternatively, the public tag 30c can be calculated when the item of information 30b is obtained. This would of course mean that it is necessary for the algorithm from which the public tag 30c is generated to be publicly known, which may not be desirable.
The invention can also be used to generate an encryption cipher based on the genetic fingerprint. This encryption cipher can be used to store the data in the data repository 40 in an encrypted manner. The relevant encryption cipher is stored in a further database (either incorporated into the look-up table 85 or as a separate database for security reason) together with the public tag 30c. When the item of information 30b is transferred from the database 20 to the data repository 40, then it is encrypted. This is done by fetching the encryption cipher from the further database and encrypting the item of information 30b.
As is described in the introduction, numerous encryption methods are known which can be used for this purpose.
The foregoing is considered illustrative of the principles of the invention and since numerous modifications will occur to those skilled in the art, it is not intended to limit the invention to the exact construction and operation described. All suitable modifications are equivalents fall within the scope of the claims.

Claims

Claims
1. Method for the encryption of data (30b) comprising: - a first step of generating an encryption cipher from items of data derived from a genetic fingerprint and based upon structural polymorphisms; - a second step of encrypting the data (30b) using the encryption cipher.
2. The method of claim 1, wherein the structural polymorphisms identified within the genetic fingerprint are short tandem repeats.
3. The method of any of the above claims, wherein the data (30b) is related to a test subject (70) and the test subject (70) sends a biological sample from which the genetic fingerprint is generated to a trust centre (80).
4. The method of claim 3, wherein the trust centre (80) generates the encryption cipher.
5. The method of any of the above claims, wherein the encryption cipher is stored in a cipher repository (80).
6. The method of any of the above claims wherein the data (30b) is stored in an encrypted form in a data repository (40).
7. The method of any of the above claims further comprising the step of generating a decryption cipher from the items of data derived from the genetic fingerprint and based upon structural polymorphisms.
8. A system for the storage of data (30b) comprising: a data repository (40) connected to data entry means (20); a biological sample analyser (80); and an encryption cipher generator (80) wherein the encryption cipher generator (80) uses items of data derived from a genetic fingerprint and based upon structural polymorphisms.
9. The system of claim 8, wherein the biological sample analyser is independent of the data repository (40).
10. The system of claim 8 or 9 wherein the biological sample analyser generates the genetic fingerprint from a biological sample.
11. The system of any one of claims 8 to 10, further including a cipher repository.
12. The system of any one of claims 8 to 11 further including a decryption cipher generator for generating a decryption cipher from items of data derived from the genetic fingerprint and based upon structural polymorphisms.
13. Method for the anonymous storage of sensitive data from a test subject comprising: a) a first step of generating one or more tags (30c, 30d) from a genetic fingerprint; and b) a second step of annotating and storing the sensitive data (30b) with the one or more tags (30c, 30d).
14. The method of claim 13, wherein the first step of generating the one or more tags (30c, 30d) comprises a step of adapting data generated from the genetic fingerprint.
15. The method of claims 13 or 14, wherein the genetic fingerprint is obtained from DNA analysis.
16. The method of any one of claims 13 to 15, wherein the genetic fingerprint is based upon structural polymorphisms.
17. The method of claim 16, wherein the polymorphisms are contained within short tandem repeats.
18. The method of claims 13 to 17, wherein the test subject (70) sends a biological sample and an identification number to a trust centre (80).
19. The method of claim 18, wherein the trust centre (80) carries out the DNA analysis.
20. The method of any one of claims 13 to 19 wherein a private tag (30d) is stored in a tag repository (85).
21. The method of any one of claims 13 to 20 wherein the sensitive data is i stored in a data repository (40).
22. The method of any one of claims 13 to 21 further comprising a further database (80), operatively isolated from the tag repository (85) and the data repository (40).
23. The method of any one of claims 13 to 22 further comprising a step of generating a public tag (30c) from the genetic fingerprint.
24. The method of claim 23, wherein the public tag (30c) is derived from the genetic fingerprint.
25. The method of claim 23 or 24, wherein the private tag (30d) is not identifiable from the public tag (30c).
26. The method of any one of claims 23 to 25, wherein the step of inputting the sensitive data (30b) is carried out using the public tag (30c).
27. The method of claim 26, wherein the step of storing the sensitive data (30b) from the inputted sensitive data (30b) is carried out by attaching the private tag (30d) to the sensitive data (30b).
28. The method of any one of claims 13 to 27 for retrieving the sensitive data (30b) using the private tag (30d).
29. The method of any of claims 13 to 28 further comprising a step of analyzing groups of the sensitive data (30d) without identifying the test subjects (70).
30. The method of any of claims 13 to 29, wherein the sensitive data (30b) is encrypted using an encryption key.
31. A system for the storage of sensitive data comprising: a data repository (40) connected to data entry means (20) and a look-up table (85) having a private tag (30d) and a public tag (30c), wherein the sensitive data (30b) is supplied by the data entry means (20) with the public tag (30c) and stored in the data repository (40) with the private tag (30d).
32. The system of claim 31, wherein the private tag (30d) is obtained from a genetic fingerprint.
33. The system of claim 31 or 32, wherein the public tag (30c) is obtained from the genetic fingerprint.
34. The system of any one of claims 36 to 33, further comprising a trust centre (80) for generating either or both of the private tag (30d) and the public tag (30c).
35. The system of any one of claims 31 to 34, further comprising an interface (50) to the data repository for accessing the information in the data repository (40).
Figure imgf000018_0001
PCT/EP2005/003309 2004-03-26 2005-03-24 Method and system for the storage of data WO2005093582A2 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
GB0406852A GB0406852D0 (en) 2004-03-26 2004-03-26 Generation of personalised code from a genetic fingerprint
GB0406852.4 2004-03-26
GB0407122A GB0407122D0 (en) 2004-03-26 2004-03-30 Method and system for the encrypted storage of data
GB0407122.1 2004-03-30

Publications (2)

Publication Number Publication Date
WO2005093582A2 true WO2005093582A2 (en) 2005-10-06
WO2005093582A3 WO2005093582A3 (en) 2006-03-30

Family

ID=32870963

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2005/003309 WO2005093582A2 (en) 2004-03-26 2005-03-24 Method and system for the storage of data

Country Status (2)

Country Link
GB (1) GB0415130D0 (en)
WO (1) WO2005093582A2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008065341A2 (en) 2006-12-01 2008-06-05 David Irvine Distributed network system

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020124177A1 (en) * 2001-01-17 2002-09-05 Harper Travis Kelly Methods for encrypting and decrypting electronically stored medical records and other digital documents for secure storage, retrieval and sharing of such documents

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002175280A (en) * 2000-12-08 2002-06-21 Dainippon Printing Co Ltd Gene information utilization system, and id card

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020124177A1 (en) * 2001-01-17 2002-09-05 Harper Travis Kelly Methods for encrypting and decrypting electronically stored medical records and other digital documents for secure storage, retrieval and sharing of such documents

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
PATENT ABSTRACTS OF JAPAN vol. 2002, no. 10, 10 October 2002 (2002-10-10) & JP 2002 175280 A (DAINIPPON PRINTING CO LTD), 21 June 2002 (2002-06-21) *
YUKIO ITAKURA, MASAKI HASHIYADA, TOSHIO NAGASHIMA, SHIGEO TSUJII: "Proposal on personal identifiers generated from STR information of DNA" INTERNATIONAL JOURNAL ON INFORMATION SECURITY, vol. 1, 9 April 2002 (2002-04-09), pages 149-160, XP002340004 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008065341A2 (en) 2006-12-01 2008-06-05 David Irvine Distributed network system
EP2472430A1 (en) 2006-12-01 2012-07-04 David Irvine Self encryption

Also Published As

Publication number Publication date
GB0415130D0 (en) 2004-08-11
WO2005093582A3 (en) 2006-03-30

Similar Documents

Publication Publication Date Title
US20220129892A1 (en) System and methods for validating and performing operations on homomorphically encrypted data
TWI254233B (en) Data processing system for patient data
EP1939785B1 (en) System and method for the protection of de-identification of health care data
US6874085B1 (en) Medical records data security system
EP2365458B1 (en) A computer implemented method for determining the presence of a disease in a patient
US7865735B2 (en) Method and apparatus for managing personal medical information in a secure manner
US7519591B2 (en) Systems and methods for encryption-based de-identification of protected health information
US20070192139A1 (en) Systems and methods for patient re-identification
WO2013177297A2 (en) Encrypting and storing biometric information on a storage device
US20030055824A1 (en) Distributed personalized genetic safe
De Moor et al. Privacy enhancing techniques
US20100172495A1 (en) Semiotic system and method with privacy protection
US20220208315A1 (en) Method and system for obtaining, controlling, accessing and/or displaying personal genetic identification information
JP4822842B2 (en) Anonymized identification information generation system and program.
WO2005093582A2 (en) Method and system for the storage of data
US7689829B2 (en) Method for the encryption and decryption of data by various users
JP2019532446A (en) Health care monitoring method and system for secure communication of patient data
JP2000293603A (en) Area medical information system and electronic patient card
WO2005093670A1 (en) Method, system and object for the identification of an individual
Quantin et al. Epidemiological and statistical secured matching in France
US20220117692A1 (en) Healthcare monitoring method and system for secure communication of patient data
US20080320035A1 (en) Data processing system for the processing of object data
Claerhout et al. Secure communication and management of clinical and genomic data: the use of pseudonymisation as privacy enhancing technique
WO2020212604A1 (en) Method and system for selectively transmitting data

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BW BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NA NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SM SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IS IT LT LU MC NL PL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
NENP Non-entry into the national phase in:

Ref country code: DE

WWW Wipo information: withdrawn in national office

Country of ref document: DE

122 Ep: pct application non-entry in european phase