WO2020019039A1 - A method for secure handling of gene sequences - Google Patents

A method for secure handling of gene sequences Download PDF

Info

Publication number
WO2020019039A1
WO2020019039A1 PCT/AU2019/050787 AU2019050787W WO2020019039A1 WO 2020019039 A1 WO2020019039 A1 WO 2020019039A1 AU 2019050787 W AU2019050787 W AU 2019050787W WO 2020019039 A1 WO2020019039 A1 WO 2020019039A1
Authority
WO
WIPO (PCT)
Prior art keywords
gene
analysis
pieces
gene sequence
owners
Prior art date
Application number
PCT/AU2019/050787
Other languages
French (fr)
Inventor
Xue LI
Yanjun Zhang
Mingyang ZHONG
Yu Li
Original Assignee
The University Of Queensland
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from AU2018902719A external-priority patent/AU2018902719A0/en
Application filed by The University Of Queensland filed Critical The University Of Queensland
Publication of WO2020019039A1 publication Critical patent/WO2020019039A1/en

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/40Encryption of genetic data
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/30Data warehousing; Computing architectures

Landscapes

  • Health & Medical Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioethics (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Databases & Information Systems (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Genetics & Genomics (AREA)
  • Storage Device Security (AREA)

Abstract

A method for processing gene sequences of gene owners to provide analysis reports thereof to information consumers, comprising: providing pieces of the gene sequences by use of an electronic data network to each of a plurality of network connected analysis providers; operating a task manger computer to transmit assigned tasks across the data network to each of the analysis providers in respect of the gene pieces of each of the gene owners the assigned tasks being produced in response to analysis specifications received from computers of the information consumers; receiving analysis results from the analysis providers for the assigned tasks in respect of the gene sequence pieces of each of the gene owners and compiling respective reports therefrom; and transmitting the reports across said network to the network connected computers of the information consumers; wherein the information consumer computers receive neither the gene sequences nor the gene sequence pieces.

Description

A METHOD FOR SECURE HANDLING OF GENE SEQUENCES
TECHNICAL FIELD
The present invention concerns a method for secure distribution and processing of gene sequences.
RELATED APPLICATIONS
The present application claims priority from Australian provisional patent application No. 2018902719 filed on 26 July 2018, the disclosure of which is incorporated herein by reference in its entirety.
BACKGROUND
Any references to methods, apparatus or documents of the prior art are not to be taken as constituting any evidence or admission that they formed, or form part of the common general knowledge.
In recent decades great advances have been made in the analysis of genes. The cost for sequencing and analysing genetic material has fallen by orders of magnitude. Pharmaceutical companies for example have embarked on programs for the creation of drugs that are specific to individuals depending on their genetic makeup. However, the distribution and analysis of an individual’s genetic material poses a number of challenges which raise privacy and security concerns for the wide scale sharing of gene sequence data.
Data, including genetic data, unlike traditional non-digital property, can be easily copied and spread. Genomic data, in particular a gene sequence, is mostly comprised of text files, which can be readily redistributed without such distribution being detected. Though there are anti-piracy technologies such as watermarks (e.g. steganography) to detect unauthorised copying, simply being able to detect an unauthorised copy after it has been made and distributed is insufficient to protect sensitive material such as genomic data.
Once genomic data has been redistributed without authorisation, there is no way to ensure comprehensive retrieval of all of the unauthorised copies. Therefore methods for preventing unauthorised redistribution from the outset are required.
In addition to the problem of unauthorised redistribution of genomic data occurring, there is also a re-identification problem, that is the problem of the individual who is the gene owner being identified from the genomic data without authorisation.
The human genome can be processed to retrieve sensitive information including sufficient information to re-identify the gene owner. Though sharing sequencing data sets without identities has become a common practice in genomics, there have been studies published in the literature that have revealed that it will be possible to discover peoples’ identities from an anonymous database of gene sequences, and even to produce images of peoples’ faces using only whole-genome sequencing data. In addition to identifying the person, sensitive information contained in an individual’s gene sequence data, such as health problems, predispositions to diseases, familial relationships and life expectancy, may also be exposed.
There are some existing systems that aim to provide confidential transmission of gene sequence data. Some examples of commercial providers of such confidential transmission are: Encrypgen: https://encrypgen.com;
Luna DNA: https://www.lunadna.com; and Zenome: https://zenome.io. The methodology followed by these providers is basically to encrypt and then transfer the genetic sequence data from providers (“Gene Owners”) to recipients (“Information Consumers”, such as pharmaceuticals companies). The Information Consumer then frequently outsources analysis of the gene sequence for certain characteristics, e.g. the presence of a gene or genetic marker, to Analysis Providers. A problem with such a method is that of the Gene Owner losing control of the genetic information. The Information Consumer has access to the entirety of the original data, which carries high risks of re distribution and re-identification, to the disadvantage of the Gene Owner. Once re-distribution and re-identification happens, gene owners lose exclusive ownership of the data and always lose control over it. An effect of this situation is that Gene Owners are becoming increasingly reluctant to make their genetic information available which in turn makes it more difficult and expensive for Information Consumers to conduct large scale genetic studies.
An associated problem for the Gene Owner is that it is difficult to securely store the genetic information in a fashion that allows the Gene Owner to retrieve it when necessary and at the same time is invulnerable to hardware failure of storage devices and to interception by criminal third parties.
SUMMARY OF THE INVENTION
According to a first aspect of the present invention there is provided a method for processing gene sequences of gene owners to provide analysis reports thereof to information consumers, comprising:
providing gene sequence pieces, being pieces of the gene sequences, by use of an electronic data network to each of a plurality of network connected analysis providers;
operating a task manger computer to transmit assigned tasks across the data network to each of the analysis providers in respect of the gene sequence pieces of each of the gene owners the assigned tasks being produced in response to analysis specifications received from computers of the information consumers;
receiving analysis results from the analysis providers for the assigned tasks in respect of the gene sequence pieces of each of the gene owners and compiling respective reports therefrom; and
transmitting the reports across said network to the network connected computers of the information consumers;
wherein the information consumer computers receive neither the gene sequences nor the gene sequence pieces. In a preferred embodiment of the invention each of the gene sequence pieces are encrypted with keys of their respective owners.
Preferably the method includes providing a gene sequence fragmentation software product that is installed on computers of each of the gene owners for producing the gene sequence pieces.
The gene sequence fragmentation software product preferably includes instructions to fragment the gene sequence at positions of the gene sequence that minimise re-identification from the resulting pieces.
In a preferred embodiment of the invention the method includes facilitating storage of respective gene sequence pieces comprising gene sequences of the owners from gene owner computers into data storage hosts with storage redundancy.
Preferably the data storage hosts are configured to allow retrieval of assigned gene sequence pieces by assignee analysis providers upon a verification criteria being met.
The method may include operating the task management server to compile the reports. Alternatively, reports may be compiled on a computer distinct from the task management server.
Preferably the network includes at least one network attached device that is configured as a smart contract controller, wherein the smart contract controller is arranged to facilitate the exchange of smart contracts between at least the information consumers’ computers and the gene owners’ computers.
It is preferred that the method includes maintaining a distributed electronic ledger throughout nodes of the network for recording distribution and analysis transactions in respect of the gene sequence pieces.
Preferably the distributed electronic ledger comprises a blockchain. The method preferably includes establishing at least one validator which is connected to the data network and which has authority for the adding of new transactions into the blockchain.
The method may include receiving the respective gene sequence pieces with the data storage hosts wherein said pieces are encrypted with respective public keys of the Gene Owners.
The method may include providing for the gene owner computers to obtain proof of storage from the data storage hosts by arranging for said hosts to return hashes of at least part of the stored data of respective gene owners to the gene owner computers upon request wherein the hashes are determined by the respective gene owner computers prior to storage of the gene sequence pieces in the data storage hosts.
Preferably the pieces provided to any one of the analysis providers are insufficient for identification of the corresponding gene owner by the analysis provider.
It is preferred that the assigned tasks are insufficient for reconstructing the analysis specifications therefrom.
According to another aspect of the present invention there is provided a method for processing gene sequences of gene owners to provide analysis reports thereof to information consumers, comprising providing pieces of gene sequences of the gene owners to analysis providers; assigning analysis tasks to the analysis providers; and providing analysis reports to information consumers based on the results of the analysis tasks from the analysis providers.
According to a further aspect of the present invention there is provided a method of operating a gene owner’s computer to fragment a gene sequence stored thereon, the method including:
producing a plurality of pieces of the gene sequence; encrypting the pieces of the gene sequence;
transmitting the encrypted pieces of the gene sequence across a data network to a number of remote storage hosts.
Preferably the method includes:
operating the gene owner’s computer to produce and store a hash of one or more of the encrypted gene sequences; and
requesting the hosts to provide a hash of the one or more of the encrypted gene sequences for verification with reference to said stored hash.
According to a further aspect of the present invention there is provided a system including a data network and a plurality of network devices connected thereto for performing the previously described method.
According to another aspect of the present invention there is provided a method for operating a data network including data storage hosts and gene sequence analysis providers to provide analysis reports of gene sequences of gene owners to information consumers, comprising:
provision of a gene sequence fragmentation software product installed on computers of each of the gene owners;
facilitating storage of respective gene sequence fragments from the gene owner computers in the data storage hosts;
operating a task management server to assign analysis jobs to each of a plurality of gene sequence analysis providers;
configuring the data storage hosts to allow retrieval of assigned gene sequence fragments by assignee analysis providers upon a verification criteria being met;
operating the task management server to compile reports in respect of each of the gene owners based on analysis of corresponding gene sequence fragments from the analysis providers;
forwarding the compiled reports to computers of the information consumers; maintaining a distributed electronic ledger throughout nodes of the network for transactions occurring thereon leading to the forwarding of the compiled reports; and
validating updates to the electronic ledger;
wherein the gene sequence fragments are encrypted with keys controlled by the gene owners.
BRIEF DESCRIPTION OF THE DRAWINGS
Preferred features, embodiments and variations of the invention may be discerned from the following Detailed Description which provides sufficient information for those skilled in the art to perform the invention. The Detailed Description is not to be regarded as limiting the scope of the preceding Summary of the Invention in any way. The Detailed Description will make reference to a number of drawings as follows:
Figure 1. Is a block diagram of a system according to an embodiment of the present invention.
Figures 2-5 Illustrates a method according to an embodiment of the present invention for storing pieces of a gene sequence from a Gene Owner and providing reports thereon to an Information Consumer.
Figure 6 Illustrates a method for securely storing pieces of a gene sequence of a Gene Owner according to an embodiment of the present invention.
Figure 7 Illustrates a method for a Gene Owner to check for Proof of Storage of genetic sequence pieces from a number of network Storage Hosts according to an embodiment of the present invention.
Figure 8 Illustrates a method for a Gene Owner to retrieve genetic sequence pieces from the network Storage Hosts according to an embodiment of the present invention. Figure 9 Illustrates a first step in an authorization procedure according to an embodiment of the present invention.
Figure 10 Illustrates a second step in an authorization procedure according to an embodiment of the present invention.
Figure 11 Illustrates transactions occurring during performance of a method according to an embodiment of the present invention that are recorded to a blockchain for maintaining an immutable record of the transactions.
DETAILED DESCRIPTION OF PREFERRED EMBODIMENTS
Figure 1 depicts a system 100 according to a preferred embodiment of the invention for carrying out a method for storage/retrieval, distribution and analysis genetic data that addresses the previously described re-identification and re-distribution problems that beset systems of the prior art.
The system 100 includes a Data Network 9 such as the Internet to which a number of network nodes are connected as follows:
• Information Consumer Computer 5 which sends out an electronic offer request to Gene Owner Computer 11 from Information Consumer 4 and which ultimately receives an analysis report for the Gene Owner’s genetic information from the Task Management Server 33.
• Smart contract Controller 3: A network device that issues software that is run by at least some nodes of the networked system 100 for providing functions to implement a smart contract (signing, execution, payment, etc.).
• Gene Owner Computer 11 : This is a device that is privately controlled by the gene owner 13. The Gene Owner Computer stores a gene sequence of the Gene Owner 13, executes gene sequence fragmentation to form gene sequence pieces and encryption software 27 and transmits encrypted gene sequence pieces to the Hosts 25-11 , ... ,25-mn. The software 27 may be provided as non-transitory, tangible, machine readable instructions, for example borne on an optical or magnetic disk. The Gene Owner Computer 1 1 stores a private key of the gene owner so that the Gene Owner 13 is able to use the private key to give authorizations.
• Router(s) 21 : Provides routing and workload balancing to improve network efficiency. The router(s) is/are programmed to efficiently find Analysis Providers 39-1 1 , ... , 39, mn (sometimes generally referred to simply with item “39” herein) .
• Hosts 25: Provide data storage and retrieval services. They store encrypted gene sequence data pieces (sometimes simply called“data pieces” herein) from the Gene Owners Computers 11. The data pieces are generated by a data fragmentation program running on the Gene Owners’ Computers and encrypted by symmetric keys. It is preferred that the gene sequence fragmentation software includes instructions to fragment the gene sequence at positions of the gene sequence that minimise re-identification from the resulting pieces. The Hosts store the encrypted symmetric keys, which are encrypted by the Gene Owners’ public keys.
• Task Management Server 33: A network device that is programmed to allocate analysis tasks to Analysis Providers 39, and aggregate the analysis result.
• Analysis providers 39: Network connected analysis installations that separately process a small piece of genomic data, preferably by parallel in memory computing, and which report their own single point results back to the Task Management Server 33. As will be discussed, an Analysis provider may also act as a Validator for purposes of adding a block to a blockchain.
The blockchain comprises a distributed electronic ledger that is distributed throughout the previously described nodes of networked system 100 for tracking various transactions that occur throughout the networked system which will be described. More particularly if a gene sequence of one individual is fragmented into m pieces, then the task Management Server 33 will assign jobs to m*n analysis providers, and every set of n Analysis Providers will get the same piece of data and run the same analysis script, so, they should achieve same single-point result. The system then compares their results and regards the majority as correct, and the fastest Analysis Provider that produce the correct result wins the opportunity to add a block. Accordingly, an Analysis Provider also acts as a“Validator” (validators are akin to miners in the prior art Bitcoin network). In this way, a preferred embodiment of the present invention replaces the process of solving meaningless puzzles, in‘proof of work’ as used in the prior art Bitcoin system, with the process of doing meaningful jobs. The advantage of a public block chain is that it is unnecessary to trust anyone in the networked system so that the system has a potential for global-wide participants and more openness.
However, a private chain also has its advantages, for example it may be more efficient than a public chain. Accordingly embodiments of the present invention encompass both public and private blockchain approaches.
Referring now to Figure 2, in a first phase“A” of operation, Smart Contract Controller (SMCC) 3 receives an Analysis Offer Request 2 from an Information Consumer computer 5 via data network 9. The SMCC 3 logs the Analysis Offer Request 2 and forwards logged Analysis Offer Request 6 to Gene Owner Computer 11. The Gene Owner 13 may then decide to agree to the Analysis Offer Request and if so sends an Accept message 15 back to the SMCC 3. The SMCC 3 logs the Accept message 15 and forwards a logged Accept message 17 to the Information Consumer computer 5.
This completes the transaction between the Information Consumer 4 and the Gene Owner 13 for the acceptance by Gene Owner 13 of Information Consumer 4’s offer.
Referring now to Figure 3 there is depicted a second phase of operation“B” of the system 100. In response to receiving the Accept message 15, the Smart contract controller 3 issues Smart Contract Execute Requests which are electronic messages that contain requests for various network accessible devices to carry out pre-arranged operations in order to fulfil the terms of the logged analysis offer request 6 that the Gene Owner 13 has accepted. At the same time, the Information Consumer computer 5 issues an analysis specification in the form of an analysis script message 37 to Task Management Server 33. The analysis script message 37 contains information specifying the type of analysis that the Information Consumer 4 wishes to have carried out on Gene Owner 13’s gene sequence.
Accordingly, the SMCC 3 issues a first Smart Contract Execute Request (SCER) 19 to at least one of the Router(s) (or as it is sometimes referred to herein“Distributor”) 21. The SMCC 3 includes the network addresses of Gene Owner Computer 11 and a Public Key of Gene Owner 13 for an asymmetric Public key / Private key encryption pair that is hosted on Gene Owner Computer 11. The SCER 19 is recognised by router 21 which is programmed to in turn issue SCERs 23-11 ,..,23-mn to data storage and retrieval hosts 25-11 ,.. ,25-mn.
In response to receiving the SCERs 23-1 1 ,..,23-mn the Hosts 25-11 ,..,25-mn unpack the Public Key and IP Address of the Gene Owner Computer 11 from the corresponding SCERs 23-11 , ... ,23-mn and send Return Key messages 25- 11 , ..,25-mn back to the Gene Owner Computer 11. The Return Key messages each include a different symmetric encryption key that the Hosts 25 have encrypted with the Public key that they received in the SCERs 23-11 ,..,23-mn. In response to receiving the Return Key messages 25-1 1 , ... ,25-mn the Gene Owner Computer 11 , which is running the gene sequence distribution software product 27, fragments the Gene Owner’s gene sequence, which has been pre stored in the computer 11 , into Encrypted Data Pieces 11 ,... , mn.
One suitable method for fragmenting gene sequences is described in Enabling Privacy-preserving Sharing of Genomic Data for Gl/l/ASs in Decentralized Networks by Yanjun Zhang, Xin Zhao, Xue Li, Mingyang Zhong, Caitlin Curtis, and Chen Chen. In The Twelfth ACM International Conference on Web Search and Data Mining (WSDM Ί9), February 11-15, 2019, Melbourne, VIC, Australia. ACM, New York, NY, USA, the disclosure of which is incorporated herein by reference it its entirety. Other methods are also known and may be suitable.
Computer 1 1 then uses its Private Key to decrypt the Public Key encrypted Symmetric Keys that it has received in the Return Key messages 25-11 , ... , 25- mn. Using the Symmetric Keys the Gene Owner Computer 1 1 then encrypts each piece and forms corresponding Encrypted Data Piece Messages 29-
11.....29mn which it transmits back to respective hosts 25-11 , ... ,25-mn via Router(s) 21 using data network 9. It will therefore be realised that no single one of the Hosts 25 has a copy of the entire gene sequence and that each piece is encrypted using a different symmetric key so that criminal interception of the encrypted pieces as they travel in messages 29 across the network is futile. Furthermore, the encrypted pieces are stored redundantly, i.e. in numerous copies across the multiple Hosts 25 so that if some of the Hosts 25 become inoperable it will still be possible for the Gene Owner 13 to operate Gene Owner Computer 1 1 to recover and recompile his/her gene sequence.
Once the Hosts 25-11 , ... ,25-mn have received the encrypted pieces Router 21 issues a Smart Contract Execute Request 31 to Task Management Server 33. The Task Management Server is a programmed to assign analysis jobs to each of a plurality of Analysis Providers 39-11 ,... ,39-mn in respect of the encrypted gene sequence pieces that are stored on Hosts 25-1 1 , ... ,25-mn. As previously mentioned, the Task Management Server 33 has previously received the analysis script 37 from the Information Consumer Computer 5.
The Task Management Server 33 is programmed to allocate corresponding analysis jobs for each of the pieces to the respective Analysis Providers 39-
11.....39-mn. Accordingly, Task Management Server 33 issues Assign Analysis Job messages 41-1 1 , ... ,41-mn to respective Analysis Providers 39-
11.....39-mn.
In response to receiving the Assign Analysis Job messages 41-11 , ... ,41-mn, the Analysis Providers each transmit a Return Public Key message 43-
11.....43-mn to the Gene Owner Computer 11. Each one of the Analysis Providers has a Public/Private encryption key pair. Each of the Return Public Key messages 43-1 1 , ... ,43-mn contains the Public Key of the corresponding Analysis Provider 39-1 1 , ... ,39-mn.
T urning now to Figure 4, at the same time as transmitting the Return Public Key messages 43-11 ,... ,43-mn, the Analysis Providers also each issue Data Piece Request messages 45-11 , ... ,45-mn to the Hosts 25-1 1 ,.. ,25-mn. The Data Piece Request messages include a unique ID for the Hosts so that each message is received by the correct Host. Encrypted pieces making up the gene sequence of each Gene Owner are stored with redundancy in the Hosts. The overall gene sequence of each gene owner may therefore be retrieved by the Gene Owner even if some of the Hosts 25 fail.
In response to receiving the Data Piece Request messages the Hosts issue Return Encrypted Data messages 47-11 , ... ,47-mn which contain the encrypted data pieces 1 1 , ... ,mn that the Gene Owner Computer 1 1 previously transmitted to the Hosts in messages 29-11 ,... ,29-mn (Figure 3). Consequently the Analysis Providers 39-11 , ... ,39-mn now each have an encrypted copy of the piece(s) that they are to analyse in accordance with the Assign Analysis job messages 41-11 , ... ,41-mn that the Analysis Providers have previously received from the Task Management Server 33. However, since each piece of the gene sequence is encrypted by a Symmetric key the Analysis Providers cannot yet commence analysis. In the presently described preferred embodiment of the invention, each one of the Analysis Providers 39-1 1 , ... ,39-mn receives only a single gene sequence piece of each gene owner’s sequence thereby making re-identification of the gene owner very difficult if not impossible. However, each one of the Analysis Provider 39-11 ,..,39-mn can provide service for many gene owners, which means it can receive one (or less desirably more) data pieces of many gene owners. The smart contract controller 3 (together with the routers 21), for the purpose of network efficiency and workload balance will specify which one of the Analysis Providers retrieves which piece of data from which one of the Storage Hosts 25-11 , ... ,25-mn, thereby ensuring that each Analysis Provider that is involved in the analysis contract receives only a single gene sequence piece of the gene owner 13. In other less desirable embodiments an Analysis Provider 39 may receiver more than one piece of the gene sequence of any particular Gene Owner provided that the likelihood of re-identification is kept to a minimum.
In response to receiving the Return Public Key messages 43-11 , ... ,43-mn (Figure 3) the Gene Owner computer 11 transmits the symmetric keys, encrypted with the Analysis Providers’ public keys, in Encrypted Symmetric Key messages 49-11 ,..,49-mn. Therefore, upon receiving the Encrypted Symmetric Key messages 49-11 , ... ,49-mn the corresponding Analysis Providers 39- 11 , ... ,39-mn are able to decrypt the corresponding Symmetric Key using their Private Key of their Private/Public key pair. Once the corresponding Symmetric Keys are decrypted the Analysis providers apply them to the encrypted gene sequence pieces to decrypt the pieces. The decrypted pieces are then analysed by the corresponding Analysis Providers 39-11 ,... ,39-mn according to the tasks assigned by the Task Management Server 33, which together comprise the analysis requirements specified by the Information Consumer 4 in the Upload Analysis Script message 37 (Figure 3).
With reference to Figure 5, which illustrates phase D of the method of the preferred embodiment, the results of each of the analysis tasks performed by each of the Analysis Providers are returned to Task Management Server 33 in Result on Single Node messages 51-1 1 , ... ,51-mn. Task Management Server 33 compiles the analysis results and transmits it in encrypted form to Information Consumer computer 5 in an Encrypted Analysis Report message 53. For example the Analysis report may be encrypted with a pre-shared public key of a Private/Public key pair that resides on Information Consumer computer 5. Finally, the Task Management Server 33 transmits a Contract Execution Complete message 55 to the Smart Contract Controller 3 to confirm that the Encrypted Analysis Report 53 has been transmitted to the Information Consumer computer 5.
Further embodiments and details concerning embodiments of the invention will now be discussed with reference to Figures 6 to 10. In the following discussion acronyms are used as set out below:
GO: Gene Owner
SC: smart contract
GOpuk: Gene Owner’s public key
GOsg: Gene Owner’s digital signature
GOprk: Gene Owner’s private key
SSPNsg: Sequence Service Provider Node’s digital signature Vsg: Validator’s Digital Signature
ORG: Organizations such as Information Consumers
ASPV: Analysis Service Provider and Validator
ASPVsg: Service Provider and Validators signature
ASPVpuk: Service Provider and Validators’ public key
ASPVprk: Service Provider and Validators’ private key
SYSpuk: System’s public key
SYSprk: System’s private key Figure 6 illustrates the steps by which the Gene Owner 13, operating Gene Owner computer 11 , stores pieces 29 of a gene sequence in the various hosts 25 according to an embodiment of the present invention. The transactions that are undertaken between the network nodes are stored in blockchain 1 14 and authorisation for updating the blockchain 114 is provided by Validators 14. In the presently described embodiment, Validators are the fastest Analysis Providers of the“n” analysis providers in each of the“m” sets of Analysis Providers. The system compares the Validators results and regards the majority common result as correct, and the fastest Analysis Provider that produce the correct result wins the opportunity to add a block to update blockchain 114. The various steps in the storage procedure of Figure 6 and the inputs to those steps and outputs from them are set out in Table 1.
Figure imgf000017_0001
Figure imgf000018_0001
Table 1 - Store Process
Figure 7 illustrates a method by which a Gene Owner is able to check for proof that storage of their gene sequence pieces into various of the data storage Hosts 25 has been effected. The method includes the Gene Owners computers 11 , under control of software 27 (Fig. 1), obtaining proof of storage from the data storage Hosts 25 by arranging for the Hosts 25 to return hashes of at least part of the stored data of respective gene owners 13 to the gene owner computers 11 upon request. The hashes are determined by the respective gene owner computers 11 prior to storage of the gene sequence pieces 29 in the data storage hosts 25 in order to carry out the check subsequently.
Figure 8 illustrates the steps by which the Gene Owner 13, operating Gene Owner computer 1 1 , is able to retrieve gene sequence pieces 29 from distributed storage hosts 25. The various steps that are illustrated and the inputs to those steps and outputs from them are set out in Table 2.
Figure imgf000018_0002
Figure imgf000019_0001
Table 2 - Retrieval Process
Figure 9 illustrates a first step by which the Gene Owner 13, operating Gene Owner computer 11 , is able authorise an Information Consumer 4 by means of information consumer 4’s computer 5, to receive analysis reports in respect of the Gene Owner’s gene sequence.
The transaction is recorded in blockchain 114 and the addition of a new transaction block is authorised by the Validators 14. The various steps that are illustrated and the inputs to those steps and outputs from them are set out in Table 3.
Figure imgf000020_0001
Table 3 - Authorisation Process - Phase 1
Figure 10 illustrates a second step by which the Gene Owner 13, operating Gene Owner computer 11 , is able to authorise Hosts 25 to release encrypted gene sequence pieces to Analysis Providers and also to authorise an Information Consumer 4, via its computer 5, to receive analysis reports in respect of the Gene Owner’s gene sequence. The transaction is recorded in blockchain 114 and the addition of a new transaction block is authorised by the Validators 14. The various steps that are illustrated and the inputs to those steps and outputs from them are set out in Table 4.
Figure imgf000020_0002
Figure imgf000021_0001
Figure imgf000022_0001
Table 4 - Authorisation Process - Phase 2
In a preferred embodiment of the invention a distributed electronic ledger, in the form of a Blockchain 1 14, is established throughout nodes of the network, i.e. the Gene Owner Computers, Routers, Data Storage Hosts, Analysis Providers, Smart Contract Server, Validators and Information Consumer Computers. Transactions concerning the exchange of the gene sequence data pieces, encryption keys and analysis reports, leading to the forwarding of the compiled reports to the Information Consumer Computers are stored in the Blockchain.
Figure 11 is a diagram that illustrates the various transactions that are recorded in a distributed ledger in the form of a blockchain according to an embodiment of the present invention. In Figure 11 the following data exchanges across an electronic data network may be observed.
S-1-S-2: Storage process
S-1 : Data Fragmentation
S-2: Encrypt and distribute the encrypted data pieces to storage nodes A-1-A-8: Analysis process
A-1 & A-2: An analysis agreement is signed between the Information
Consumer and data owner.
A-3: Information Consumer uploads compiled analysis scripts or choose scripts from a script library.
A-4: Report node, acting as a task manager, assigns jobs to analysis nodes. A-5: Data owner transfers encrypted share keys are transmitted to analysis nodes.
A-6: Analysis nodes retrieve encrypted data piece from storage nodes.
A-7: Analysis nodes do single-point analysis and report their single point results to the report node. A-8: The report node aggregates the complete analysis result, then encrypts and sends it to the information consumer.
T-1-T-6: Transactions
T-1 : Transactions between a data owner and the related storage nodes, recording which storage node stores what data piece from whom.
T-2: Transactions between (a) data owner(s) and (an) information
consumer(s), recording the analysis agreement between them.
T-3: Transactions between a report node and an information consumer, recording the process of the Information Consumer uploading the analysis script and the report node returning the complete analysis report.
T-4: Transactions between a report node and analysis nodes, recording the process of the report node assigning the analysis jobs to the analysis nodes and the analysis nodes returning the single-point analysis result to the report node.
T-5: Transactions between a data owner and analysis nodes, recording the process of transferring the encrypted share keys.
T-6: Transactions between storage nodes and analysis nodes, recording the transferring of the encrypted data pieces from storage nodes to analysis nodes.
The decision to add a transaction to the blockchain, i.e. achieve consensus, is made by an analysis node acting as a Validator as previously described.
Methods according to preferred embodiments of the present invention may provide benefits as follows:
• The Analysis Providers and Data Storage Hosts provide computing resources and the Analysis Providers apply‘double-blinded’ analysis. That is the Analysis providers neither know whose data they are processing (the data they have are unidentifiable pieces) nor what analysis they are applying (the analysis scripts are compiled).
• The security (regarding the prevention of listening) of network are guaranteed by a PKI encryption framework so that even if the pieces of the gene sequences are intercepted by a criminal agent, they cannot be deciphered.
It will be realized that references to“computers”, encompasses all electronic computational devices including processor based programmable devices such as desktop computers, tablets, laptops, smartphones, servers and the like.
In one or more examples, the functions described may be implemented by the various network node devices in hardware, software, firmware, or any combination thereof. If implemented in software, the functions may be stored on or transmitted over, as one or more instructions or code, a computer- readable medium and executed by a hardware-based processing unit. Computer-readable media may include computer-readable storage media, which corresponds to a tangible medium such as data storage media, or communication media including any medium that facilitates transfer of a computer program from one place to another, e.g., according to a communication protocol. In this manner, computer-readable media generally may correspond to tangible computer-readable storage media which is non- transitory or a communication medium such as a signal or carrier wave. Data storage media may be any available media that can be accessed by one or more computers or one or more processors to retrieve instructions, code and/or data structures for implementation of the techniques described in this disclosure. A computer program product may include a computer-readable medium. By way of example, and not limitation, such computer-readable storage media can comprise RAM, ROM, EEPROM, CD-ROM or other optical disk storage, magnetic disk storage, or other magnetic storage devices, flash memory, or any other medium that can be used to store desired program code in the form of instructions or data structures and that can be accessed by a computer. Also, any connection is properly termed a computer-readable medium.
For example, if instructions are transmitted from a website, server, or other remote source using a coaxial cable, fiber optic cable, twisted pair, digital subscriber line (DSL), or wireless technologies such as infrared, radio, and microwave, then the coaxial cable, fiber optic cable, twisted pair, DSL, or wireless technologies such as infrared, radio, and microwave are included in the definition of medium. It should be understood, however, that computer- readable storage media and data storage media do not include connections, carrier waves, signals, or other transient media, but are instead directed to non transient, tangible storage media. Disk and disc, as used herein, includes compact disc (CD), laser disc, optical disc, digital versatile disc (DVD), floppy disk and Blu-ray disc, where disks usually reproduce data magnetically, while discs reproduce data optically with lasers. Combinations of the above should also be included within the scope of computer-readable media.
Instructions may be executed by one or more processors, such as one or more digital signal processors (DSPs), general purpose microprocessors, application specific integrated circuits (ASICs), field programmable logic arrays (FPGAs), or other equivalent integrated or discrete logic circuitry. Accordingly, the term "processor," as used herein may refer to any of the foregoing structure or any other structure suitable for implementation of the techniques described herein. In addition, in some aspects, the functionality described herein may be provided within dedicated hardware and/or software modules configured for encoding and decoding, or incorporated in a combined codec. Also, the techniques could be fully implemented in one or more circuits or logic elements.
The techniques of this disclosure may be implemented in a wide variety of devices or apparatuses, including a wireless handset, an integrated circuit (1C) or a set of ICs (e.g., a chip set). Various components, modules, or units are described in this disclosure to emphasize functional aspects of devices configured to perform the disclosed techniques, but do not necessarily require realization by different hardware units. Rather, various units may be combined in a hardware unit or provided by a collection of interoperative hardware units, including one or more processors in conjunction with suitable software and/or firmware.
In compliance with the statute, the invention has been described in language more or less specific to structural or methodical features. The term“comprises” and its variations, such as“comprising” and“comprised of” is used throughout in an inclusive sense and not to the exclusion of any additional features. It is to be understood that the invention is not limited to specific features shown or described since the means herein described comprises preferred forms of putting the invention into effect. The invention is, therefore, claimed in any of its forms or modifications within the proper scope of the appended claims appropriately interpreted by those skilled in the art.
Throughout the specification and claims (if present), unless the context requires otherwise, the term "substantially" or "about" will be understood to not be limited to the value for the range qualified by the terms.
Any embodiment of the invention is meant to be illustrative only and is not meant to be limiting to the invention. Therefore, it should be appreciated that various other changes and modifications can be made to any embodiment described without departing from the spirit and scope of the invention.
Throughout the description and claims of this specification, the singular encompasses the plural unless the context otherwise requires. In particular, where the indefinite article is used, the specification is to be understood as contemplating plurality as well as singularity, unless the context requires otherwise.
Features, integers, characteristics, compounds, chemical moieties or groups described in conjunction with a particular aspect, embodiment or example of the invention are to be understood to be applicable to any other aspect, embodiment or example described herein unless incompatible therewith.

Claims

CLAIMS:
1. A method for processing gene sequences of gene owners to provide analysis reports thereof to information consumers, comprising:
providing gene sequence pieces, being pieces of the gene sequences by use of an electronic data network to each of a plurality of network connected analysis providers;
operating a task manger computer to transmit assigned tasks across the data network to each of the analysis providers in respect of the gene pieces of each of the gene owners the assigned tasks being produced in response to analysis specifications received from computers of the information consumers; receiving analysis results from the analysis providers for the assigned tasks in respect of the gene sequence pieces of each of the gene owners and compiling respective reports therefrom; and
transmitting the reports across said network to the network connected computers of the information consumers;
wherein the information consumer computers receive neither the gene sequences nor the gene sequence pieces.
2. The method of claim 1 , wherein each of the gene sequence pieces are encrypted with keys of their respective owners.
3. The method of claim 1 or claim 2 including provision of a gene sequence fragmentation product installed on computers of each of the gene owners for producing the gene sequence pieces.
4. The method of claim 3, wherein the gene sequence fragmentation software includes instructions to fragment the gene sequence at positions of the gene sequence that minimise re-identification from the resulting pieces.
5. The method of any one of the preceding claims, including facilitating storage of respective gene sequence pieces comprising gene sequences of the owners from gene owner computers into data storage hosts with storage redundancy.
6. The method of claim 5, including configuring the data storage hosts to allow retrieval of assigned gene sequence pieces by assignee analysis providers upon a verification criteria being met.
7. The method of any one of the preceding claims including operating the task management server to compile the reports.
8. The method of any one of the preceding claims, wherein the network includes at least one network attached device that is configured as a smart contract controller, wherein the smart contract controller is arranged to facilitate the exchange of smart contracts between at least the information consumers’ computers and the gene owners’ computers.
9. The method of any one of the preceding claims, including maintaining a distributed electronic ledger throughout nodes of the network for recording distribution and analysis transactions in respect of the gene sequence pieces.
10. The method of claim 9, wherein the distributed electronic ledger comprises a blockchain.
11. The method of claim 10 including establishing at least one validator which is connected to the data network and which has authority for the adding of new transactions into the block chain.
12. The method of any one of the preceding claims, wherein the method includes receiving the respective gene sequence pieces with the data storage hosts wherein said pieces are encrypted with respective public keys of the Gene Owners.
13. The method of claim 5 or claim 6, including providing for the gene owner computers to obtain proof of storage from the data storage hosts by arranging for said hosts to return hashes of at least part of the stored data of respective gene owners to the gene owner computers upon request wherein the hashes are determined by the respective gene owner computers prior to storage of the gene sequence pieces in the data storage hosts.
14. A method for operating a data network including data storage hosts and gene sequence analysis providers to provide analysis reports of gene sequences of gene owners to information consumers, comprising:
provision of a gene sequence fragmentation software product installed on computers of each of the gene owners;
facilitating storage of respective gene sequence fragments from the gene owner computers in the data storage hosts;
operating a task management server to assign analysis jobs to each of a plurality of gene sequence analysis providers;
configuring the data storage hosts to allow retrieval of assigned gene sequence fragments by assignee analysis providers upon a verification criteria being met;
operating the task management server to compile reports in respect of each of the gene owners based on analysis of corresponding gene sequence fragments from the analysis providers;
forwarding the compiled reports to computers of the information consumers;
maintaining a distributed electronic ledger throughout nodes of the network for transactions occurring thereon leading to the forwarding of the compiled reports; and
validating updates to the electronic ledger;
wherein the gene sequence fragments are encrypted with keys controlled by the gene owners.
15. A method for operating a gene owner’s computer to fragment a gene sequence stored thereon, the method including:
producing a plurality of pieces of the gene sequence;
encrypting the pieces of the gene sequence;
transmitting the encrypted pieces of the gene sequence across a data network to a number of remote storage hosts.
16. A method according to claim 15, further comprising;
operating the gene owner’s computer to produce and store a hash of one or more of the encrypted gene sequences; and
requesting the hosts to provide a hash of the one or more of the encrypted gene sequences for verification with reference to said stored hash.
PCT/AU2019/050787 2018-07-26 2019-07-25 A method for secure handling of gene sequences WO2020019039A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
AU2018902719 2018-07-26
AU2018902719A AU2018902719A0 (en) 2018-07-26 A method for secure handling of gene sequences

Publications (1)

Publication Number Publication Date
WO2020019039A1 true WO2020019039A1 (en) 2020-01-30

Family

ID=69180216

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/AU2019/050787 WO2020019039A1 (en) 2018-07-26 2019-07-25 A method for secure handling of gene sequences

Country Status (1)

Country Link
WO (1) WO2020019039A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023119268A1 (en) * 2021-12-22 2023-06-29 Igentify Ltd. Distributed storage of genomic data

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010051881A1 (en) * 1999-12-22 2001-12-13 Aaron G. Filler System, method and article of manufacture for managing a medical services network
US20050059059A1 (en) * 2003-08-06 2005-03-17 Benjamin Liang Novel nucleic acid based steganography system and application thereof
US20090240441A1 (en) * 2008-03-20 2009-09-24 Helicos Biosciences Corporation System and method for analysis and presentation of genomic data
US20160110500A1 (en) * 2011-05-13 2016-04-21 Indiana University Research And Technology Corporation Secure and scalable mapping of human sequencing reads on hybrid clouds
US20180046766A1 (en) * 2016-06-27 2018-02-15 Novus Paradigm Technologies Corporation System for rapid tracking of genetic and biomedical information using a distributed cryptographic hash ledger

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010051881A1 (en) * 1999-12-22 2001-12-13 Aaron G. Filler System, method and article of manufacture for managing a medical services network
US20050059059A1 (en) * 2003-08-06 2005-03-17 Benjamin Liang Novel nucleic acid based steganography system and application thereof
US20090240441A1 (en) * 2008-03-20 2009-09-24 Helicos Biosciences Corporation System and method for analysis and presentation of genomic data
US20160110500A1 (en) * 2011-05-13 2016-04-21 Indiana University Research And Technology Corporation Secure and scalable mapping of human sequencing reads on hybrid clouds
US20180046766A1 (en) * 2016-06-27 2018-02-15 Novus Paradigm Technologies Corporation System for rapid tracking of genetic and biomedical information using a distributed cryptographic hash ledger

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023119268A1 (en) * 2021-12-22 2023-06-29 Igentify Ltd. Distributed storage of genomic data

Similar Documents

Publication Publication Date Title
US11475137B2 (en) Distributed data storage by means of authorisation token
US10756885B2 (en) System and method for blockchain-based cross entity authentication
US10728042B2 (en) System and method for blockchain-based cross-entity authentication
CN107967416B (en) Copyright right-maintaining detection method, device and system
JP6547079B1 (en) Registration / authorization method, device and system
KR101974075B1 (en) Method and system for verifying ownership of a digital asset using a distributed hash table and a peer-to-peer distributed ledger
US10979222B2 (en) Resilient secret sharing cloud based architecture for data vault
EP3400550B1 (en) Methods and systems for securing data in the public cloud
CN103098070B (en) For the methods, devices and systems of Data Position in monitoring network service
CN111434084B (en) Permission to access information from an entity
WO2021000419A1 (en) System and method for blockchain-based cross-entity authentication
US8539231B1 (en) Encryption key management
JP5852265B2 (en) COMPUTER DEVICE, COMPUTER PROGRAM, AND ACCESS Permission Judgment Method
US20170046693A1 (en) Systems and methods for detecting and resolving data inconsistencies among networked devices using hybrid private-public blockchain ledgers
TW202040398A (en) Retrieving access data for blockchain networks using highly available trusted execution environments
EP3788523A1 (en) System and method for blockchain-based cross-entity authentication
JP2021516004A (en) Distributed ledger for generating and validating random sequences
CN107948152B (en) Information storage method, information acquisition method, information storage device, information acquisition device and information acquisition equipment
US20150347773A1 (en) Method and system for implementing data security policies using database classification
WO2020183319A1 (en) System and associated method for ensuring data privacy
EP3063690B1 (en) Method and system for validating a virtual asset
US8848922B1 (en) Distributed encryption key management
US10181954B2 (en) Cloud-based code signing service—hybrid model to avoid large file uploads
CN113261024A (en) Method for routing to mesh network content using block chaining techniques
Ulybyshev et al. (WIP) blockhub: Blockchain-based software development system for untrusted environments

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19842063

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19842063

Country of ref document: EP

Kind code of ref document: A1