EP2816496A1 - Procédé pour gérer des données génomiques brutes d'une manière préservant la confidentialité dans une biobanque - Google Patents

Procédé pour gérer des données génomiques brutes d'une manière préservant la confidentialité dans une biobanque Download PDF

Info

Publication number
EP2816496A1
EP2816496A1 EP13172607.7A EP13172607A EP2816496A1 EP 2816496 A1 EP2816496 A1 EP 2816496A1 EP 13172607 A EP13172607 A EP 13172607A EP 2816496 A1 EP2816496 A1 EP 2816496A1
Authority
EP
European Patent Office
Prior art keywords
encrypted
biobank
nucleotides
lower bound
genomic data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
EP13172607.7A
Other languages
German (de)
English (en)
Inventor
Jean-Pierre Hubaux
Erman Ayday
Jean-Louis Raisaro
Urs Hengartner
Adam Molyneaux
Zhenyu Xu
Jurgi Camblong
Pierre Hutter
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sophia Genetics SA
Original Assignee
Sophia Genetics SA
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sophia Genetics SA filed Critical Sophia Genetics SA
Priority to EP13172607.7A priority Critical patent/EP2816496A1/fr
Priority to US14/899,999 priority patent/US10013575B2/en
Priority to EP14731256.5A priority patent/EP3011492B1/fr
Priority to PCT/EP2014/062736 priority patent/WO2014202615A2/fr
Publication of EP2816496A1 publication Critical patent/EP2816496A1/fr
Priority to US16/000,234 priority patent/US10402588B2/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/40Encryption of genetic data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/06Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols the encryption apparatus using shift registers or memories for block-wise or stream coding, e.g. DES systems or RC4; Hash functions; Pseudorandom sequence generators
    • H04L9/065Encryption by serially and continuously modifying data stream elements, e.g. stream cipher systems, RC4, SEAL or A5/3
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L2209/00Additional information or applications relating to cryptographic mechanisms or cryptographic arrangements for secret or secure communication H04L9/00
    • H04L2209/04Masking or blinding

Definitions

  • Genomics holds great promise for better predictive medicine and improved diagnoses.
  • genomics also comes with a risk to privacy.
  • the main threats to genomic data are (i) the revelation of an individual's genetic properties due to the leakage of his genomic data and (ii) the identification of an individual from his own genome sequence.
  • the genetic information of a patient, once leaked, could be linked to the disease under study (or to other diseases), which can have serious consequences such as denial of access to life insurance or to employment.
  • Sequence alignment/map (SAM and its binary version BAM) files are the de facto standards used for all DNA sequence analyses produced by next-generation DNA sequencers. There are hundreds of millions of short reads (each including between 100 and 400 nucleotides) in the SAM file of a patient. Each nucleotide is present in several short reads in order to have high coverage of each patient's DNA.
  • SAM Sequence alignment/map
  • geneticists prefer storing aligned, raw genomic data of the patients (i.e., their SAM files), in addition to their variant calls (which include each nucleotide on the DNA sequence once, hence is much more compact). This is due to (i) the immaturity of bioinformatic algorithms and sequencing platforms, (ii) diseases that change the DNA sequence, and (iii) the rapid evolution of genomic research.
  • Bioinformatic algorithms for variant calling are currently not yet mature.
  • the bioinformatic tools that geneticists require to assess the reliability of a variant call essentially necessitate keeping the read-level information available (e.g., in the SAM files).
  • DNA sequencing platforms are not error-free.
  • error rates for the commercially available DNA sequencing platforms, per nucleotide in a short read are around 0.4% for the Illumina platforms, 1.78% for Ion Torrent and 13% for PacBio sequencing.
  • geneticists prefer to observe each nucleotide in several short reads and to make conclusions based on the different values of a particular nucleotide in different short reads.
  • a method to manage raw genomic data in a privacy preserving manner in a biobank said raw genomic data comprising a plurality of aligned short reads, each aligned short read comprising a plurality of nucleotides and other fields comprising at least a position and a cigar string, said method comprising an encryption and storage stage comprising the steps of:
  • the proposed scheme privately stores the SAM files of the patients at a biobank.
  • it also provides the requested range of nucleotides (on the DNA sequence) to a medical unit, without revealing the locations of the short reads (which include the requested nucleotides) to the biobank.
  • it prevents the leakage of extra information in the short reads to the medical unit by masking the encrypted short reads at the biobank.
  • Alignment is with respect to the reference genome, which is assembled by the scientists as a representative example of the set of genes.
  • Troncoso-Pastoriza et al. J. R. Troncoso-Pastoriza, S. Katzenbeisser, and M. Celik, "Privacy preserving error resilient DNA searching through oblivious automata," CCS '07: Proceedings of the 14th ACM Conference on Computer and Communications Security,2007 ) propose a protocol for string searching (using a finite state machine), which is then re-visited by Blanton and Aliasgari ( M. Blanton and M. Aliasgari, "Secure outsourcing of DNA searching via finite automata," DBSec'10: Proceedings of the24th Annual IFIP WG 11.3 Working Conference on Data and Applications Security and Privacy, pp. 49-64, 2010 ).
  • Jha et al. S. Jha, L. Kruger, and V. Shmatikov, "Towards practical privacy for genomic computation," Proceedings of the 2008 IEEE Symposium on Security and Privacy, pp. 216-230, 2008 propose techniques for privately computing the edit distance of two strings by using garbled circuits.
  • Bruekers et al. F. Bruekers, S. Katzenbeisser, K. Kursawe, and P. Tuyls, "Privacy-preserving matching of DNA profiles," tech. rep., 2008 propose a privacy-enhanced comparison of DNA profiles by using homomorphic encryption.
  • Kantarcioglu et al. M. Kantarcioglu, W. Jiang, Y. Liu, and B. Malin, "A cryptographic approach to securely share and query genomic sequences," IEEE Transactions on Information Technology in Biomedicine, vol. 12, no. 5, pp. 606-617, 2008 ) propose using homomorphic encryption to perform scientific investigations on integrated genomic data.
  • Baldi et al. P. Baldi, R. Baronio, E. De Cristofaro, P. Gasti, and G.
  • the DNA sequence data produced by next-generation DNA sequencing consists of millions of short reads, each typically including between 100 and 400 nucleotides (A,C,G,T), depending on the type of sequencer. These reads are randomly sampled from a human genome. Each read is then bio-informatically treated and positioned(aligned) to its genetic location to produce a so-called SAM file. There are hundreds of millions of aligned short reads in the SAM file of one patient.
  • Fig. 1 we illustrate the format of a short read in a SAM file. The numbers and letters after the content in Fig. 1 represent the sequencing quality of the nucleotides in the content.
  • the privacy-sensitive fields of a short read are (i) its position with respect to the reference genome(digital nucleic acid sequence database, assembled by scientists as a representative example of a species' setof genes), (ii) its cigar string (CS), and (iii) its content (including the nucleotides from ⁇ A, T,G,C ⁇ ).
  • CS cigar string
  • CS its content
  • the rest of the short read does not contain privacy sensitive information about the patient, hence the rest of the short read can be encrypted as a vector and provided to the medical unit, along with the aforementioned privacy-sensitive fields.
  • the position of a short read denotes the position of the first aligned nucleotide in its content, with respect to the reference genome.
  • the short read might have additional nucleotides that are not in the reference or it might be missing nucleotides that are in the reference.
  • the cigar string (CS) of a short read expresses these variations in the content of the short read.
  • the CS includes pairs of nucleotide lengths and the associated operations.
  • the operations in the CS indicate some properties about content of the short read such as which nucleotides align with the reference, which are deleted from the reference, and which are insertions that are not in the reference.
  • the content of a short read includes the nucleotides. In Fig.
  • SNP single nucleotide polymorphism
  • SNP positions might carry a different nucleotide than the reference genome.
  • position 22 can be a SNP position, because, even though there is an alignment match between the short read and the reference genome, the nucleotide in the short read is different from the reference.
  • the position of the short read corresponds to the first aligned nucleotide in its content and it is 12 in this example.
  • the CS of the short read includes 7 pairs, each indicating an operation from Table 1 and the number of nucleotides involved in the corresponding operation.
  • the non-aligned nucleotides (the nucleotides represented with the operation "S" in the CS) are represented in lowercase letters (i.e., a).
  • the dots (at positions 18 - 20) and star (at position 15) represent a skipped region and a deletion in the SR, respectively, and they are not present in the actual content.
  • Each part (location, CS, and content) of each short read (in the SAM file) is encrypted (via a different encryption scheme) after the sequencing, and encrypted SAM files of the patients are stored at a biobank.
  • the cryptographic keys of the patients are stored using one of the following approaches: (i) The patient's cryptographic keys are stored on a patient's device (e.g., smart card or a smart phone), or (ii) the patient's cryptographic keys are stored on a key manager by using the patient's identification.
  • a patient's device e.g., smart card or a smart phone
  • the patient's cryptographic keys are stored on a key manager by using the patient's identification.
  • the former approach operations involving the patient are done on the MU's (e.g., the pharmaceutical company or the physician)computer via the patient's device, hence this approach requires the involvement of the patient in the operation(e.g., physical presence at the physician).
  • the latter approach does not require the participation of the patient in the protocol.
  • MK masking and key manager
  • the MK can also be embodied in the government or a private company.
  • the proposed scheme can be formulated similarly for the patient's device. In the following, we briefly discuss
  • the biobank When the MU requests a specific range of nucleotides (on the DNA sequence of one or multiple patients), the biobank provides all the short reads that include at least one nucleotide from the requested range through the MK. During this process, the patient does not want to reveal his complete genome to the MU, to the biobank, or to the MK. Furthermore, it is not desirable for the biobank to learn the requested range of nucleotides (as the biobank can infer the nature of the genetic test from this requested range). Thus, we develop a privacy-preserving system for the retrieval of the short reads by the MU. The proposed scheme provides the short reads that include the requested range of nucleotides to the MU without revealing the positions of these short reads to the biobank.
  • OPE order preserving encryption
  • each short read includes between 100 and 400 nucleotides
  • some provided short reads might include information out of the MU's requested (or authorized) range of genomic data, as in Fig. 3 .
  • some provided short reads might contain privacy-sensitive SNPs of the patient (which would reveal the patient's susceptibilities to privacy-sensitive diseases such as Alzheimer's), hence the patient might not give consent to reveal such parts, as in Fig. 4 .
  • the nucleotides that the patient does not consent to reveal will be referred to as the non-consented nucleotides.
  • the MK marks particular parts of the requested shorts reads (which are retrieved by the biobank as discussed before) for masking, based on the patient's consent (the patient provides his consent to the MU for the genetic test and his consent is provided to the MK by the MU in a pseudonymized form) and the boundaries of the requested range of nucleotides.
  • the MK creates masking vectors and passes them to the biobank.
  • the biobank executes the masking on the previously retrieved (encrypted) short reads by using these masking vectors and sends them to the MU, where the short reads are decrypted and used for genetic tests. It is important to note that after the short reads are decrypted at the MU, the MU is not able to determine the nucleotides at the masked positions.
  • the SAM files at one biobank, multiple MUs can access the patients' genomic data from it (instead of each MU individually storing that same large amount of data).
  • the genomic data of the patient should be available any time to any MU (e.g., for emergencies), thus it should be stored at a reliable centralized storage.
  • genomic data can be stored on a patient's computer or mobile device, instead of the biobank.
  • genomic data of the patient should be available any time, thus it should be stored at a reliable source such as the biobank.
  • leaving the patient's genomic data in his own hands and letting him store it on his computer or mobile device is risky, because his mobile device can be stolen or his computer can be hacked. It is true that the patient's cryptographic keys (or his authentication material) to access his genomic data at the biobank can also be stolen.
  • biobank authenticates the patient's access to his genomic data by using biometric authentication tools; the use of such tools would make it even harder for an attacker to compromise the genomic data of the patient.
  • SAM files are encrypted and stored at the biobank to avoid the biobank from inferring the genomic data of the patients.
  • the biobank can de-anonymize a victim using other sources (e.g.,by associating the time of the test and the location of the MU with the location patterns of the victim),hence associate the conducted genetic test with the victim.
  • the MK only learns the requested range of nucleotides and the pseudonyms of the patient and the MU.
  • the MK can infer the type of the conducted genetic test from the requested range of nucleotides, but the aforementioned de-anonymization attack is not possible, as the MK does not know the real identities of the MUs.
  • the MK cannot infer the genomic data of the patients by using the information it receives from the biobank and the cryptographic keys it stores.
  • a potential attacker at the MU can learn about a patient's susceptibilities to privacy-sensitive diseases if he obtains specific SNPs of the patient. As we mentioned in Section 4, by masking the encrypted short-reads before providing them to the MU, we avoid the MU acquiring more genomic data than it is authorized for.
  • the Figure 5 is an illustrative example for the encryption, masking and decryption of the content of the short read (SR) that was first introduced in Fig. 2 .
  • the arrows on the right show the inputs of the corresponding XOR operation.
  • (a) Content of the SR (the 2 stars between positions 17 and 21 represent the positions at which the SR has insertions, G and C), its binary representation (following the encoding in (b)), the key stream to encrypt the corresponding content, and the format of the encrypted content (after the binary plaintext content is XOR-ed with the key stream).
  • each short read is encrypted as follows: (i) The positions of the short reads are encrypted using order preserving encryption (OPE), (ii) the cigar string (CS) of each short read is encrypted using a semantically secure symmetric encryption function (SE), and (iii) the content of each short read, i.e. the nucleotides, is encrypted using a stream cipher (SC).
  • SC also provides semantic security, and although we really need an SC for the encryption of the content, one can also use an SC for the encryption of the CS (we use an SC both for the encryption of the content and the CS in our implementation).
  • the symmetric OPE key that is used to encrypt the positions of the short reads of patient P is represented as K P O .
  • the master key of patient P which is used to generate the keys of the SC is represented as M P O .
  • K P C i the SC key used to encrypt the content of the short read whose position is L i .
  • K P C i H Mp , F L i ⁇ S i , L i
  • L i is the (starting) position of the corresponding short read (on the DNA sequence)
  • S i is a random salt to provide different keys for the short-reads with the same positions
  • H is a pseudorandom function.
  • F(L i , S i ) is a function that generates a nonce from the position and the random salt of the corresponding short read.
  • the random salts of the short reads are stored in plaintext.
  • K P,CI is used to encrypt the CSs of the short reads for the initial encryption of the patient's genomic data.
  • K P,CI is used to encrypt the CSs of the short reads for the initial encryption of the patient's genomic data.
  • K P,CI is used to encrypt the CSs of the short reads for the initial encryption of the patient's genomic data.
  • These keys are then deleted from the CL after the sequencing, alignment, and encryption.
  • the patient's cryptographic keys for symmetric encryption, OPE, and SC are stored at the MK, and the patient does not participate in the protocol (except for giving his consent). If the patient participates in the protocols, his keys are stored on his device (e.g., smart phone) and operations are done via that device, instead of the MK.
  • the MK stores, K P O , M P , and K P,CI .
  • the MK uses M P to generate the decryption keys required by the SC and sends them to the MU (through the biobank).
  • the MU only stores the public key of the MK, ⁇ MK .
  • SE semantically secure symmetric encryption function
  • the MU requests a range of nucleotides (on the DNA sequence of one or more patients) from the biobank (either for a personal genetic test or for clinical research). For simplicity of the presentation, we assume that the request is for a specific range of nucleotides of patient P. We note that when the MU is embodied in a pharmaceutical company, the MU does not know the real identities of the patients (i.e. participants of the clinical trial).
  • the MU asks for a certain range of nucleotides of several pseudonymized patients from the biobank, who consented to participate in the corresponding clinical trial (the pseudonyms of these patients are known by the MU or by the biobank, and the general consent for the corresponding clinical trial is forwarded to the MK for masking).
  • the connections between the parties that are involved in the protocol in Fig. 7(a) .
  • we describe the steps of the proposed protocol (these steps are also illustrated in Fig. 7(b) ).
  • the MK can determine the exact positions of the nucleotides in the content of a short read (but not the contents of the nucleotides, because the contents are encrypted and stored at the biobank). Using this information, the MK can determine the parts in the content of the short read that are out of the requested range [R L ,R U ]. Furthermore, the MK can also determine whether the short read includes any nucleotide positions for which the patient P does not give consent to the MU (the patient's pseudonymized consent, ⁇ P , is provided to the MK in Step 5) or the MU is not authorized due to lack of its access rights. Therefore, the MK constructs binary masking vectors indicating the positions in the contents of the short reads that are needed to be masked by the biobank before sending the retrieved short reads to the MU.
  • the masking vector for a short read (with position L i ) is constructed following Algorithm 1. n Fig. 5(a) , we illustrate how the masking vector is constructed for the corresponding short read, when the requested range of nucleotides is [10, 20] and for a given ⁇ p (as in Fig. 5(c) ).
  • the MK also modifies the CS of each short read (if it is marked for masking) according to the nucleotides to be masked. That is, the MK modifies the CS such that the masked nucleotides are represented with a new operation " O " in the CS.
  • the consent of the patient can be used by the MU instead of modifying the CS.
  • the MU determines the masked nucleotides from the consent. By doing so, when the MU receives the short reads (which include the requested nucleotides), it can see which parts of them are masked (hence which parts of them it needs to discard for its research purposes). In Fig.
  • the MK generates the decryption keys for each short read (whose position is in ⁇ ) by using the master key of the patient (M P ), positions of the shorts-read, and the random salt values.
  • the generation of the decryption keys for the SC is the same as the generation of the encryption keys as we discussed in Section 7.1
  • the MU requests a specific range of nucleotides of patient P (e.g., for a genetic test) from the biobank.
  • the biobank provides the MU with all the short reads of the patient, which include at least one nucleotide from the requested range.
  • some provided short reads can include out-of-range nucleotides (for which the MU is not authorized), consequently causing leakage of the patient's genomic data (unless there is the proposed masking technique in place).
  • unauthorized genomic data i.e., number of nucleotides provided to the MU that are out of the requested range
  • authorized data i.e., number of nucleotides within the requested range
  • Fig. 9 we illustrate the amount of genomic data (i.e., number of nucleotides) that is leaked to the MU in 100 time-slots.
  • the jumps in the number of leaked nucleotides (at some time-slots) is due to the fact that some requests might retrieve more short reads comprised of more out-of-range nucleotides.
  • leakage becomes 0 when masking is in place, which shows the crucial role of the proposed scheme.
  • single nucleotide polymorphisms can reveal a patient's susceptibility to privacy-sensitive diseases. Consequently, leakage of the nucleotides at the SNP positions poses more risk for the genomic privacy of the patient. Therefore, we also study the information leakage, focusing on the leaked SNPs of the patient as a result of different sizes of requests (from random parts of the patient's genome).
  • Fig. 10 we illustrate the number of SNPs leaked to the MU in 100 time-slots. We observe that the number of leaked SNPs is more than twice the number of authorized SNPs (which are within the requested range of nucleotides). We also observe that the leaked SNPs (in Fig.
  • the size of the requested range of nucleotides (by the MU) for a single SNP is typically 1, but the SNPs are from several parts of the patient's genome.
  • Fig. 11 we illustrate the genomic data leakage of the patient as a result of various disease susceptibility tests each requiring a different number of SNPs from different parts of the patient's genome (on the x-axis we illustrate the number of SNPs required for each test).
  • the leaked SNPs reveal privacy-sensitive data about the patient.
  • leaked SNPs of the patient as a result of a test for the Alzheimer's disease could leak information about the patient's susceptibility to "smoking behavior" or "cholesterol" (in Appendix B, we list the nature of some important leaked SNPs due to each susceptibility test in Fig. 11). Similar to the previous cases, the number of leaked nucleotides and SNPs is0 when masking is in place.
  • Order-preserving symmetric encryption is a deterministic encryption scheme whose encryption function preserves numerical ordering of the plaintexts.
  • OPE was initially proposed by Agrawal et al. ( R. Agrawal, J. Kiernan, R. Srikant, and Y. Xu, "Order preserving encryption for numeric data," Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data, pp. 563-574, 2004 ) and recently re-visited by Boldyreva et al. ( A. Boldyreva, N. Chenette, Y. Lee, and A. O'Neill, "Order-preserving symmetric encryption," Proceedings of the 28th Annual International Conference on Advances in Cryptology: the Theory and Applications of Cryptographic Techniques, 2009 ). Following this document, we briefly introduce OPE next.
  • a ⁇ B is order-preserving if for all i , j ⁇ A, f ( i ) >f ( j ) iff i > j.
  • a stream cipher is a symmetric key cipher, where plaintext digits are combined with a pseudorandom cipher digit stream (key stream).
  • each plaintext digit is encrypted one at a time with the corresponding digit of the key stream, to give a digit of the ciphertext stream.
  • a digit is typically a bit and the encryption operation is an XOR.
  • the message m is encrypted as H (key, nonce ) ⁇ m, where H is a pseudorandom function.
EP13172607.7A 2013-06-19 2013-06-19 Procédé pour gérer des données génomiques brutes d'une manière préservant la confidentialité dans une biobanque Withdrawn EP2816496A1 (fr)

Priority Applications (5)

Application Number Priority Date Filing Date Title
EP13172607.7A EP2816496A1 (fr) 2013-06-19 2013-06-19 Procédé pour gérer des données génomiques brutes d'une manière préservant la confidentialité dans une biobanque
US14/899,999 US10013575B2 (en) 2013-06-19 2014-06-17 Method to manage raw genomic data in a privacy preserving manner in a biobank
EP14731256.5A EP3011492B1 (fr) 2013-06-19 2014-06-17 Procédé pour gérer des données génomiques brutes d'une manière préservant la confidentialité dans une biobanque
PCT/EP2014/062736 WO2014202615A2 (fr) 2013-06-19 2014-06-17 Procédé pour gérer des données génomiques brutes d'une manière préservant la confidentialité dans une biobanque
US16/000,234 US10402588B2 (en) 2013-06-19 2018-06-05 Method to manage raw genomic data in a privacy preserving manner in a biobank

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
EP13172607.7A EP2816496A1 (fr) 2013-06-19 2013-06-19 Procédé pour gérer des données génomiques brutes d'une manière préservant la confidentialité dans une biobanque

Publications (1)

Publication Number Publication Date
EP2816496A1 true EP2816496A1 (fr) 2014-12-24

Family

ID=48628362

Family Applications (2)

Application Number Title Priority Date Filing Date
EP13172607.7A Withdrawn EP2816496A1 (fr) 2013-06-19 2013-06-19 Procédé pour gérer des données génomiques brutes d'une manière préservant la confidentialité dans une biobanque
EP14731256.5A Active EP3011492B1 (fr) 2013-06-19 2014-06-17 Procédé pour gérer des données génomiques brutes d'une manière préservant la confidentialité dans une biobanque

Family Applications After (1)

Application Number Title Priority Date Filing Date
EP14731256.5A Active EP3011492B1 (fr) 2013-06-19 2014-06-17 Procédé pour gérer des données génomiques brutes d'une manière préservant la confidentialité dans une biobanque

Country Status (3)

Country Link
US (2) US10013575B2 (fr)
EP (2) EP2816496A1 (fr)
WO (1) WO2014202615A2 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018008547A1 (fr) * 2016-07-06 2018-01-11 日本電信電話株式会社 Système de calcul secret, dispositif de calcul secret, procédé de calcul secret, et programme

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
SG11201608601TA (en) * 2014-04-23 2016-11-29 Agency Science Tech & Res Method and system for generating / decrypting ciphertext, and method and system for searching ciphertexts in a database
CN107533586A (zh) 2015-03-23 2018-01-02 私有通道公司 用于加强生物信息学数据隐私和实现生物信息学数据广泛共享的系统、方法和设备
US11393559B2 (en) 2016-03-09 2022-07-19 Sophia Genetics S.A. Methods to compress, encrypt and retrieve genomic alignment data
WO2018081113A1 (fr) 2016-10-24 2018-05-03 Sawaya Sterling Dissimulation d'informations présentes dans des acides nucléiques
US10447661B1 (en) 2016-12-23 2019-10-15 Iqvia Inc. System and method for privacy-preserving genomic data analysis
US20180314842A1 (en) * 2017-04-27 2018-11-01 Awakens, Inc. Computing system with genomic information access mechanism and method of operation thereof
US11631477B2 (en) 2017-09-07 2023-04-18 Dmitry Shvartsman System and method for authenticated exchange of biosamples
LU100449B1 (en) * 2017-09-26 2019-03-29 Univ Luxembourg Improved Computing Device
US11862297B1 (en) 2017-11-07 2024-01-02 Iqvia Inc. System and method for genomic data analysis
WO2019104140A1 (fr) * 2017-11-21 2019-05-31 Kobliner Yaacov Nissim Interrogation efficiente de bases de données tout en assurant une confidentialité différentielle
DE102018113475A1 (de) * 2018-06-06 2019-12-12 Infineon Technologies Ag Rechenwerk zum rechnen mit maskierten daten
CN110245515B (zh) * 2019-05-08 2021-06-01 北京大学 一种面向hdfs访问模式的保护方法和系统
CA3141078A1 (fr) * 2019-06-10 2020-12-17 Xiaowu Gai Chiffrement/dechiffrement dynamiques d'informations genomiques
RU2747625C1 (ru) * 2020-04-28 2021-05-11 Федеральное государственное бюджетное учреждение высшего образования «Тамбовский государственный технический университет» (ФГБОУ ВО «ТГТУ») Способ совместного сжатия и шифрования данных при геномном выравнивании
US20220209934A1 (en) * 2020-12-30 2022-06-30 Elimu Informatics, Inc. System for encoding genomics data for secure storage and processing
CN113344593B (zh) * 2021-05-31 2022-04-26 优合集团有限公司 一种基于dna检测技术的肉制品溯源管理系统

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013049420A1 (fr) * 2011-09-27 2013-04-04 Maltbie Dan Système et procédé permettant de faciliter des transactions basées sur un réseau et impliquant des données séquentielles

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2002343798A1 (en) * 2001-11-22 2003-06-17 Hitachi, Ltd. Information processing system using information on base sequence
US20130246460A1 (en) * 2011-03-09 2013-09-19 Annai Systems, Inc. System and method for facilitating network-based transactions involving sequence data
WO2012122549A2 (fr) * 2011-03-09 2012-09-13 Lawrence Ganeshalingam Réseaux de données biologiques et procédés associés
US20140121990A1 (en) * 2012-09-12 2014-05-01 The Regents Of The University Of California Secure Informatics Infrastructure for Genomic-Enabled Medicine, Social, and Other Applications
EP2709028A1 (fr) * 2012-09-14 2014-03-19 Ecole Polytechnique Fédérale de Lausanne (EPFL) Technologies renforçant la protection de la vie privée pour tests médicaux à l'aide de données génomiques
WO2014197377A2 (fr) * 2013-06-03 2014-12-11 Good Start Genetics, Inc. Procédés et systèmes pour stocker des données de lecture de séquence
US9524392B2 (en) * 2013-11-30 2016-12-20 Microsoft Technology Licensing, Llc Encrypting genomic data for storage and genomic computations

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2013049420A1 (fr) * 2011-09-27 2013-04-04 Maltbie Dan Système et procédé permettant de faciliter des transactions basées sur un réseau et impliquant des données séquentielles

Non-Patent Citations (22)

* Cited by examiner, † Cited by third party
Title
A. BOLDYREVA; N. CHENETTE; A. O'NEILL: "Order-preserving encryption revisited: Improved security analysis and alternative solutions", PROCEEDINGS OF THE 31ST ANNUAL CONFERENCE ON ADVANCES IN CRYPTOLOGY, 2011, pages 578 - 595
A. BOLDYREVA; N. CHENETTE; Y. LEE; A. O'NEILL: "Order-preserving symmetric encryption", PROCEEDINGS OF THE 28TH ANNUAL INTERNATIONAL CONFERENCE ON ADVANCES IN CRYPTOLOGY: THE THEORY AND APPLICATIONS OF CRYPTOGRAPHIC TECHNIQUES, 2009
ADA RALUCA ET AL: "CryptDB: A Practical Encrypted Relational DBMS CryptDB: A Practical Encrypted Relational DBMS", 26 January 2011 (2011-01-26), XP055086907, Retrieved from the Internet <URL:http://18.7.29.232/bitstream/handle/1721.1/60876/MIT-CSAIL-TR-2011-005.pdf?sequence=1> [retrieved on 20131106] *
E. DE CRISTOFARO; S. FABER; P. GASTI; G. TSUDIK: "Genodroid: Are privacy-preserving genomic tests ready for primetime?", PROCEEDINGS OF THE ACM WORKSHOP ON PRIVACY IN THE ELECTRONIC SOCIETY - WPES, 2012, pages 97 - 108
ERMAN AYDAY ET AL: "Privacy-Enhancing Technologies for Medical Tests Using Genomic Data, Technical Report", 28 December 2012 (2012-12-28), XP055086484, Retrieved from the Internet <URL:http://infoscience.epfl.ch/record/182897/files/CS_version_technical_report.pdf> [retrieved on 20131104] *
ERMAN AYDAY ET AL: "Privacy-Preserving Processing of Raw Genomic Data", 21 July 2013 (2013-07-21), XP055087337, Retrieved from the Internet <URL:http://infoscience.epfl.ch/record/187573/files/DPM_13_tech_report.pdf> [retrieved on 20131108] *
F. BRUEKERS; S. KATZENBEISSER; K. KURSAWE; P. TUYLS: "Privacy-preserving matching of DNA profiles", TECH. REP., 2008
J. R. TRONCOSO-PASTORIZA; S. KATZENBEISSER; M. CELIK: "Privacy preserving error resilient DNA searching through oblivious automata", CCS '07: PROCEEDINGS OF THE 14TH ACM CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, 2007
KANTARCIOGLU M ET AL: "A Cryptographic Approach to Securely Share and Query Genomic Sequences", IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, IEEE SERVICE CENTER, LOS ALAMITOS, CA, US, vol. 12, no. 5, 1 September 2008 (2008-09-01), pages 606 - 617, XP011345491, ISSN: 1089-7771, DOI: 10.1109/TITB.2007.908465 *
M. BLANTON; M. ALIASGARI: "Secure outsourcing of DNA searching via finite automata", DBSEC'10: PROCEEDINGS OF THE24TH ANNUAL IFIP WG 11.3 WORKING CONFERENCE ON DATA AND APPLICATIONS SECURITY AND PRIVACY, 2010, pages 49 - 64
M. CANIM; M. KANTARCIOGLU; B. MALIN: "Secure management of biomedical data with cryptographic hardware", IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, vol. 16, no. 1, 2012
M. GYMREK; A. L. MCGUIRE; D. GOLAN; E. HALPERIN; Y. ERLICH: "Identifying personal genomes by surname inference", SCIENCE, vol. 339, no. 6117, January 2013 (2013-01-01)
M. KANTARCIOGLU; W. JIANG; Y. LIU; B. MALIN: "A cryptographic approach to securely share and query genomic sequences", IEEE TRANSACTIONS ON INFORMATION TECHNOLOGY IN BIOMEDICINE, vol. 12, no. 5, 2008, pages 606 - 617
N. HOMER; S. SZELINGER; M. REDMAN; D. DUGGAN; W. TEMBE: "Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays", PLOS GENETICS, vol. 4, August 2008 (2008-08-01)
O. GOLDREICH; R. OSTROVSKY: "Software protection and simulation on oblivious RAMs", J. ACM, vol. 43, May 1996 (1996-05-01), pages 431 - 473
P. BALDI; R. BARONIO; E. DE CRISTOFARO; P. GASTI; G. TSUDIK: "Countering GATTACA: Efficient and secure testing of fully-sequenced human genomes", CCS '11: PROCEEDINGS OF THE 18TH ACM CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, 2011, pages 691 - 702
R. AGRAWAL; J. KIERNAN; R. SRIKANT; Y. XU: "Order preserving encryption for numeric data", PROCEEDINGS OF THE 2004 ACM SIGMOD INTERNATIONAL CONFERENCE ON MANAGEMENT OF DATA, 2004, pages 563 - 574
R. WANG; X. WANG; Z. LI; H. TANG; M. K. REITER; Z. DONG: "Privacy-preserving genomic computation through program specialization", PROCEEDINGS OF THE 16TH ACM CONFERENCE ON COMPUTER AND COMMUNICATIONS SECURITY, 2009, pages 338 - 347
S. E. FIENBERG; A. SLAVKOVIC; C. UHLER: "Privacy preserving GWAS data sharing", PROCEEDINGS OF THE IEEE 11TH INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW, December 2011 (2011-12-01)
S. JHA; L. KRUGER; V. SHMATIKOV: "Towards practical privacy for genomic computation", PROCEEDINGS OF THE 2008 IEEE SYMPOSIUM ON SECURITY AND PRIVACY, 2008, pages 216 - 230
X. ZHOU; B. PENG; Y. F. LI; Y. CHEN; H. TANG; X. WANG: "To release or not to release: Evaluating information leaks in aggregate human-genome data", ESORICS'11: PROCEEDINGS OF THE 16TH EUROPEAN CONFERENCE ON RESEARCH IN COMPUTER SECURITY, 2011, pages 607 - 627
Y. CHEN; B. PENG; X. WANG; H. TANG: "Large-scale privacy-preserving mapping of human genomic sequences on hybrid clouds", NDSS'12: PROCEEDING OF THE 19TH NETWORK AND DISTRIBUTED SYSTEM SECURITY SYMPOSIUM, 2012

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018008547A1 (fr) * 2016-07-06 2018-01-11 日本電信電話株式会社 Système de calcul secret, dispositif de calcul secret, procédé de calcul secret, et programme
JPWO2018008547A1 (ja) * 2016-07-06 2019-04-04 日本電信電話株式会社 秘密計算システム、秘密計算装置、秘密計算方法、およびプログラム

Also Published As

Publication number Publication date
US20160275308A1 (en) 2016-09-22
WO2014202615A2 (fr) 2014-12-24
US10013575B2 (en) 2018-07-03
EP3011492C0 (fr) 2023-06-07
US20180276409A1 (en) 2018-09-27
WO2014202615A3 (fr) 2015-02-19
EP3011492A2 (fr) 2016-04-27
EP3011492B1 (fr) 2023-06-07
US10402588B2 (en) 2019-09-03

Similar Documents

Publication Publication Date Title
US10402588B2 (en) Method to manage raw genomic data in a privacy preserving manner in a biobank
Ayday et al. Privacy-preserving processing of raw genomic data
US20230385437A1 (en) System and method for fast and efficient searching of encrypted ciphertexts
EP2895980B1 (fr) Technologies renforçant la protection de la vie privée pour tests médicaux à l&#39;aide de données génomiques
US20160224735A1 (en) Privacy-enhancing technologies for medical tests using genomic data
EP3016011A1 (fr) Procédé permettant de respecter la confidentialité de tests de risque médical
Sousa et al. Efficient and secure outsourcing of genomic data storage
Ying et al. A lightweight policy preserving EHR sharing scheme in the cloud
Hasan et al. Secure count query on encrypted genomic data
CN112751670B (zh) 一种多中心密文策略的属性基可搜索加密及相应的搜索获取数据的方法
Karvelas et al. Privacy-preserving whole genome sequence processing through proxy-aided ORAM
JP6619401B2 (ja) データ検索システム、データ検索方法およびデータ検索プログラム
Ayday et al. Personal use of the genomic data: Privacy vs. storage cost
Sun et al. When gene meets cloud: Enabling scalable and efficient range query on encrypted genomic data
Hasan et al. Secure count query on encrypted genomic data: a survey
Zhou et al. Secure scheme for locating disease-causing genes based on multi-key homomorphic encryption
Raisaro et al. Medco: Enabling privacy-conscious exploration of distributed clinical and genomic data
CN110660450A (zh) 一种基于加密基因组数据的安全计数查询与完整性验证装置和方法
Singh et al. Practical personalized genomics in the encrypted domain
Huang et al. P2GT: Fine-grained genomic data access control with privacy-preserving testing in cloud computing
Zhu et al. A privacy-preserving framework for conducting genome-wide association studies over outsourced patient data
Khan et al. Towards preserving privacy of outsourced genomic data over the cloud
WO2020259847A1 (fr) Procédé mis en œuvre par ordinateur pour le stockage préservant la confidentialité de données de génome brutes
Zhu et al. Privacy-Preserving Identification of Target Patients from Outsourced Patient Data
Choi et al. Privacy-preserving exploration of genetic cohorts with i2b2 at lausanne university hospital

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

17P Request for examination filed

Effective date: 20130619

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

AX Request for extension of the european patent

Extension state: BA ME

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION IS DEEMED TO BE WITHDRAWN

18D Application deemed to be withdrawn

Effective date: 20150625