WO2004031922A2 - Method and apparatus for secure data storage - Google Patents

Method and apparatus for secure data storage Download PDF

Info

Publication number
WO2004031922A2
WO2004031922A2 PCT/GB2003/004262 GB0304262W WO2004031922A2 WO 2004031922 A2 WO2004031922 A2 WO 2004031922A2 GB 0304262 W GB0304262 W GB 0304262W WO 2004031922 A2 WO2004031922 A2 WO 2004031922A2
Authority
WO
WIPO (PCT)
Prior art keywords
entity
data
information
key
computing means
Prior art date
Application number
PCT/GB2003/004262
Other languages
French (fr)
Other versions
WO2004031922A3 (en
Inventor
Bernard Harvey Gaus
Callum Thomas Peter Kennedy
Original Assignee
Avoca Systems Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Avoca Systems Limited filed Critical Avoca Systems Limited
Priority to AU2003274302A priority Critical patent/AU2003274302A1/en
Publication of WO2004031922A2 publication Critical patent/WO2004031922A2/en
Publication of WO2004031922A3 publication Critical patent/WO2004031922A3/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • G06F21/6254Protecting personal data, e.g. for financial or medical purposes by anonymising data, e.g. decorrelating personal data from the owner's identification

Definitions

  • identifying information identifying an entity is not in itself sensitive.
  • Anonymous healthcare data such as data relating to a particular test, is also of relatively low sensitivity provided that it cannot be matched up to the person to whom it relates. It is the ability to relate identifying information identifying a person to and from healthcare data relating to that person which is sensitive and which the present invention aims to control.
  • first data storage means relatably to the key prepared by the key preparation computing means, some or all of the received information concerning the entity, but not any information identifying the entity.
  • the invention also provides a method of retrieving information concerning an entity, the method comprising the steps executed by computing means of:
  • the first data storage means being configured to store and allow retrieval of data comprising information concerning an entity (preferably excluding any information identifying the entity) relatably to a key prepared by the key preparation computing means.
  • the data stored in the first data storage means may comprise a record.
  • the record may comprise the information concerning the entity and the key prepared by the key preparation computing means, but not any information identifying the entity.
  • the invention therefore provides a methodology for storing and retrieving information concerning an entity in a data storage means where it is not stored with identifying information identifying the entity which it concerns.
  • information stored in the first data storage means cannot by itself be analysed to determine the entity which the stored information concerns.
  • an authorised party that already has identifying information concerning the entity can store and retrieve information concerning the entity in the first data storage means.
  • the information concerning an entity is healthcare data relating to a person.
  • a healthcare worker caring for a patient of a healthcare system can store medical information such as test results, details of treatments etc. concerning that patient on the first data storage means, and retrieve previously stored medical information concerning that patient.
  • medical information such as test results, details of treatments etc. concerning that patient on the first data storage means
  • retrieve previously stored medical information concerning that patient there is not a computational process which would allow a party with access (perhaps unauthorised access) to the first data storage means to establish whom medical information therein concerns.
  • This has important security benefits.
  • a data storage means which includes a key prepared from identifying information using a reversible computational process, such as a two-way encryption algorithm, it should not be possible for a party to make a connection from medical information to the identity of the person which it concerns.
  • the first data storage means may store medical information in an unencrypted form, or a form that can be rapidly unencrypted, and still remain secure. This allows potentially faster and cheaper access to the first data storage means.
  • An irreversible computing procedure is one which does not have a procedure for relating the product of the irreversible procedure to the identifier used as an input to the procedure.
  • a computing procedure consisting of or including a hash function step will be inherently irreversible, as there are in general a plurality of inputs to a hash function which give the same output.
  • a reversible computing process is one with a procedure for determining the input to the process given its output.
  • the reverse (de-encryption) procedure may be entirely different to the forward (encryption) procedure.
  • the irreversible computing procedure comprises the steps of applying a hash function to the identifier of the entity (or a value derived therefrom). More preferably, the irreversible computing procedure comprises the further step of subsequently encrypting the hashed identifier.
  • the hash step comprises the application of an SHA hash function, for example SHA-1 , SHA- 256, SHA-512, or SHA-384 or another hash function specified in ISO/TEC 101 18-3:(E), such as Whirlpool.
  • Subsequent encryption may employ the well-known RSA algorithm, or triple-DES algorithm.
  • the invention also extends to processes for updating, changing and deleting data stored in the first data storage means, by a process corresponding to the method of storing information concerning an entity except that information concerning the entity in the first data storage means is updated, changed or deleted instead of being stored.
  • the identifying information typically includes the name, address, postcode and other information which could be used to readily identify an individual and may include other indicia which can be resolved to identify a person, such as National Health Service Number, an alias, etc.
  • the identifier of the entity upon which the prepared key depends may be any identifying information, preferably a person's identification number issued by the Health Service, for example, a British NHS number.
  • the method may include the step of receiving information concerning the identity of an entity, and retrieving the identifier of the identity from an identifying information storage means (such as a computer system implementing a database of identifying information).
  • an identifying information storage means such as a computer system implementing a database of identifying information.
  • a user may query the identifying information storage means (e.g. by carrying out a search based on a patient's name), allowing retrieval of the identifier of the identity (e.g. a patient's NHS number).
  • the information concerning an entity which is received may include identifying information, which is not the identifier of the identity, usable by the key preparation computing means to prepare a key.
  • the first computing means may interrogate the identifying information storage means, to establish the entity to which the information relates, and retrieve the identifier of the entity from the identifying information storage means.
  • the key preparation computing means may comprise a server or other computing device or devices to which the identifier of the entity is transmitted, and from which a key dependent on the identifier of the entity is received.
  • the key is not stored in a non-volatile storage means by the key preparation computing means.
  • the key is not stored in a non-volatile storage means other than in the first data storage means.
  • the method of storing information may further comprise the step of generating a data identifier of at least some of the information concerning an entity, storing that data identifier in a second data storage means relatably to the information concerning the entity (and preferably also the key dependent on the index indicium), and storing in a data identifier database an identifier of the entity which the information concerns and a value determined from the data identifier using an irreversible computational procedure.
  • the invention may also extend to an identity resolution process for determining the entity to which data stored in the second data storage means relates, comprising the steps of retrieving from the second data storage means a data identifier related to information concerning the entity, determining a value from the data identifier using the irreversible computation procedure and then searching the data identifier database for the corresponding identifier of the entity.
  • the identity resolution procedure is allowed only after authorisation.
  • each use of the identity resolution procedure is logged, typically along with an identifier of the person or organisation using this identity resolution procedure.
  • the second data storage means includes information which can be matched to other information concerning the same entity (by virtue of the key dependent on the index indicium denoting the entity which the information concerns) allowing detailed analysis of the information, but which can be linked to the identity of the entity to which it relates by virtue of the identity resolution procedure.
  • the data identifier database cannot be used to relate the identifiers of entities stored therein to information stored in the second data storage means as it is a value determined from the data identifier using an irreversible computational procedure which is stored in the first data storage means.
  • the second data storage means may be the first data storage means , or may be provided as an alternative to the first data storage means, or implemented as well as the first data storage means. Where the information is healthcare data relating to patients, and the second data storage means is implemented as well as the first data storage means, the first data storage means may be used to provide information for use in patient care, and the second data storage means may be used to provide information for use in epidemiological and other studies.
  • the invention also extends to a method for storing information concerning an entity, the method comprising the steps executed by computing means of:
  • key preparation computing means operable to receive an identifier of an entity which received information concerns, and to prepare a key dependent on the identifier of the entity using an irreversible computational procedure;
  • first data storage means configured to store some or all of the received information concerning the entity, but not any information identifying the entity, relatably to the key prepared by the key preparation computing means.
  • computing means configured to ensure that the information concerning the entity which is stored in the first data storage means does not include information identifying the entity. This will not be necessary if it is known that the received information will not include an identifier of the entity.
  • the computer apparatus may comprise first data processing means configured to receive the information concerning an entity, transmit some or all of the information concerning the entity to the first data storage means, and cause the key preparation computing means to transmit a key to the first data storage means for storage relatable to the information concerning the entity which is transmitted to the first data storage means.
  • the first data processing means will receive the key from the key preparation computing means and transmit it to the first data storage means .
  • the first data processing means will function as computing means configured to ensure that the information concerning the entity which is stored in the first data storage means does not include identify information identifying the entity, by transmitting to the first data storage means only information that does not identify the entity.
  • the identifier of the entity will typically be transferred to the key preparation computing means, by the first data processing means.
  • the identifier of the entity may be received by the first data processing means with the information concerning an entity and transmitted to the key preparation computing means, but not transmitted to the first data storage means.
  • the computer apparatus may further comprise an identifying information database including identifying data concerning a plurality of entities, relatably to the identifier of the entity.
  • the first data processing means may receive information identifying an entity, transmit the information to the identifying information database and receive the identifier of the entity.
  • the identifying information database could transmit the identifier of the entity directly to the key preparation computing means.
  • the means to receive information may be a connection to a network through which information can be received from a plurality of clients.
  • the first data processing means, the identifying information storage means, the key preparation computing means, and the first data storage means may each comprise a plurality of separate computers and/ or storage devices.
  • information concerning an entity is healthcare data concerning a person.
  • the invention also extends to computer apparatus for storing (and preferably also retrieving) information concerning entities, the apparatus comprising:
  • data identifier generation means operable to generate a data identifier of the information concerning the entity
  • data storage means configured to store the data identifier relatably to the information concerning the entity, but not any information identifying the entity; and means for storing in a data identifier database an identifier of the entity which the at least some of the information concerns and a value determined from the data identifier using an irreversible computational procedure.
  • the key is prepared by the privacy control computing means from the identifying information using an irreversible computational procedure or algorithm. This means that there does not exist a function which allows the identity of an entity to be produced from the data which does not include identifying information, but does include a key. Nevertheless, it is possible to retrieve the data relating to an entity, given identifying information identifying the entity.
  • the key included in the anonymous data originated from the privacy control computing means.
  • the recipient computing means may query a plurality of remote or local data storage means for anonymous data including the key, and then retrieves that anonymous data.
  • the privacy control computing means is operably connected to an entity information database which contains records of identifying information concerning a plurality of entities .
  • each entity is referenced by an index indicium for an index of identifying information retained on the privacy control computing means, the anonymous data being incapable of being matched with the identifying information without reference to the index, and thus without access to the privacy control computing means.
  • the indicium is encrypted by the privacy control computing means to create the key.
  • the privacy control computing means has encryption/decryption means operable to encrypt the indicium to create the key and operable to decrypt the key to yield the indicium. This means that even if both the anonymous data and the entity information database were compromised, for example by a hacker with access to both the privacy control computing means and data source computing means (see below) or recipient computing means, then provided that the encryption/decryption means remains secure, the anonymous data and the identifying information cannot be matched up.
  • the key is never stored in non-volatile storage means by the privacy control computing means, but is created only when required.
  • the step of ensuring that the anonymous data does not include said identifying information but does include a key comprises the stages of firstly exchanging identifying information with the privacy control computing means, for the key, and secondly including the said key in the anonymous data.
  • the entities are people, such as patients of a healthcare system.
  • the identifying information typically includes name, address, post code and other information which could be used to readily identify an individual and may include other indicia which can be resolved to identify a person, such as National Health Service number, an alias etc.
  • Aliases and identifying numbers may be specific to particular source computing means. Aliases and identifying numbers may be resolved to identify a person by the privacy control computing means. This embodiment enables data management problems relating to the use of different identifiers for patients in different source computing means to be resolved or mitigated. New index indicia may be allocated to new entities, allowing handling of data relating to entities not previously known.
  • the anonymous data may include, for example, items relating to the results of patient consultations, such as the results of tests or studies, medical history information, details of diagnosis, prescriptions, symptoms, domestic circumstances; information about clinical interventions (such as procedures carried out, treatments, diagnoses); referrals, results of diagnostic reports or appointments.
  • the anonymous data may comprise records relating to individual events relating to a patient, such as events listed above.
  • the anonymous data may be streamed data.
  • the anonymous data may include all types of data which can be packaged in the XML format.
  • the recipient computing means includes a reconstruct server and one or more recipient systems
  • the user typically interacts with the recipient system.
  • several users interacting with recipient systems may receive data records from the same reconstruct server.
  • Different classes or groupings of authorised users or recipient computing means may be specified; for example, a patient may allow access to their medical records to the accident and emergency department of any UK NHS hospital, but no clinics.
  • Different recipient computing means or users thereof may have authorisation to receive different subsets of the identifying information.
  • the step of determining whether the recipient computing means or a user thereof is authorised to match the data to the identifying information may include receiving override instructions from a user, authorising the matching of data to identifying information and then logging the user and details (such as date, time, recipient computing means, reason, data, entity etc) of the override instructions.
  • determining whether the recipient computing means or a user thereof is authorised to match the data to the identifying information takes place with reference to a consent register which includes data and/or rules specific to individual entities or to groups of entities.
  • the consent register or data and/or rules therein may be specified by an administrator of the consent register or by consent surveys completed by or in relation to entities whose data is stored. This enables different identifying information to be available to different parties; for example, patient record administrators, specialist clinics, accident and emergency departments etc. will have different authorisations. This could be used to implement enhanced patient control of access to their personal identifying information.
  • the data processing computing means may comprise a server under the control of an external organisation. This information can therefore be used for statistical purposes, or any other type of analysis, without the data processing computing means having access to identifying information.
  • this embodiment allows anonymous data to be processed by third parties without the identity of the party to whom it refers being disclosed.
  • the recipient computing means, privacy control computing means and, where present, data store and " / or data processing computing means are each contained in "safe havens” , being secure and trusted environments in an organisation or part of an organisation, where there are procedures in place to ensure the safety and secure handling of data.
  • the recipient computing means may transfer the anonymous data or data derived therefrom matched with identifying information to a second recipient system, typically within a safe haven within which the recipient computing means resides.
  • the privacy control computing means may determine whether the recipient computing means is authorised to transfer matched data to the second recipient system. Consent to transfer such matched data may also be stored in the consent register. Transmissions between or within computing means may be effected by a secure encrypted link, for additional security.
  • any one or more of the recipient computing means or a part thereof such as a reconstruct server if present, the privacy control computing means and, if present, the data source computing means and/or data processing computing means may comprise a respective server.
  • the exchange means may relay identifying information relating to an entity to said recipient " computing means in response- to" the receipt of a- key related to that entity from said recipient computing means.
  • the exchange means may relay a key relating to an entity to said recipient computing means responsive to receipt of identifying information related to that entity from said recipient computing means.
  • the relation computing means comprises encryption/deencryption computing means operable to encrypt the index indicium to give the key and opeable to deencrypt the key to yield the index indicium.
  • the recipient computing means further comprises matching means to match the anonymous data with the identifying information.
  • the recipient computing means further comprises forwarding means for forwarding the anonymous data and identifying information to a second recipient system.
  • a deconstruction computing means comprising receiving means for receiving data relating to an entity including identifying information, exchange means for exchanging the received identifying information for a key relating to the entity with a privacy control computing means, and anonymisation means for preparing anonymous data comprising received data and the key, but not the received identifying information.
  • the anonymisation means may be operable to prepare anonymous data including a generalisation, approximation or category of receive data and/or identifying information.
  • the embodiments of the invention described with reference to the drawings comprise processes performed in computer apparatus and computer networks, and also computing apparatus and computer networks
  • the invention also extends to computer programs, particular computer programs on or in a carrier, adapted for carrying out the processes of the invention or for causing a computer to perform as the computer apparatus of the invention.
  • Programs may be in the form of source code, object code, a code intermediate source, such as in partially compiled form, or any other form suitable for use in the implementation of the processes according to the invention.
  • the carrier may be any entity or device capable of carrying the program.
  • the computing means may comprise a plurality of separate discontinuously or continuously operatively connected processors or other computing apparatus, with individual method steps carried out on one or more distinct apparatus. Additionally or alternatively, the computing means may comprise computing apparatus controlled by different parties.
  • Figure 2 is a schematic diagram of an alternative computer apparatus forming a healthcare data infrastructure
  • Figure 3 is a schematic diagram of an infrastructure for relaying information about patients in a healthcare system
  • Figure 6 is a schematic diagram of data records used in the process of Figure 4.
  • Figure 8 is a schematic diagram illustrating the movement of data around part of the infrastructure of Figure 3 by the process of Figure 7.
  • Figure 1 illustrates the components of computer apparatus forming a healthcare infrastructure.
  • One or more control computer systems 1 (functioning as first data processing means) are connected to a plurality of client computers 2 across a network 4.
  • the network is nationwide, with individual client systems being located with healthcare providers, such as in doctors' surgeries, hospital departments, in ambulances and the like, and in mobile form carried by healthcare personnel.
  • Client computers 2 are used to both submit healthcare information for storage and to retrieve healthcare information relating to particular patients.
  • Control computer system 1 is operatively linked to a database of demographic information (functioning as identifying information storage means) 6, a privacy gate keeper computer system 10 (functioning as key preparation computing means), and a data warehouse 8 (functioning as first data storage means).
  • data is to be submitted for storage, it is transferred by client 2 to the control computer system 1 which establishes an identifier of the patient to which the information refers.
  • This unique patient identifier such as a patient's National Health Service number, may be included in the information submitted by the client terminal 2. If it is not, then it may be retrieved from the demographic database 6, and the control computer system 1 may interact with the client terminal 2 to enable the client terminal to query the database 6 determining the PID of the patient to which it relates.
  • PID unique patient identifier
  • the identifier of the entity is transmitted to the privacy gate keeper 10.
  • the privacy gate keeper 10 then carries out an irreversible computational process on the patient identifier, or a value derived from the PID, to prepare a patient privacy key (PPK), which is a key unique to a specific patient.
  • the irreversible computing process includes the steps of applying the SHA-256 hash function (defined in ISO/TEC 10118-3 -.2003(E)) to the PID, followed by triple DES encryption, producing the PPK.
  • the PPK is transmitted back to the control computer system 1.
  • the control computer system 1 then prepares a data record comprising information which was received from the client terminal 2, except that any information identifying the patient to which it relates (e.g. their name, address etc) is removed. Instead, the key received from the privacy gatekeeper is included in the database record which is then transmitted to the data warehouse 8.
  • the data warehouse 8 Within the data warehouse 8, are stored records comprising the PPK of the person to which it relates, and the data 14, in a format such as XML, or HL7, using otherwise conventional database technology.
  • a user of a client 2 inputs the PID of a patient, or identifying information (e.g. name, address) sufficient to enable the PID to be deduced by the control computer system 1 with reference to the database of demographic information.
  • the PID is then submitted by the control computer system to the privacy gate keeper 10.
  • the privacy gate keeper carries out an irreversible computational process on the patient identifier, or a value derived from the patient identifier, and so prepares the same patient privacy key as was used when storing data relating to the same individual in the past.
  • the PPK is transmitted back to the control computer system which then submits it to the data warehouse 8, along with a request for the desired data concerning the person.
  • Healthcare information relating to the patient with the calculated PPK can therefore be found, using standard database query techniques, and that healthcare information is then transmitted to the client 2.
  • Figure 2 illustrates a further example of computer infrastructure for a healthcare system.
  • the components of the infrastructure are as before, except that there is further provided a secondary-use " data warehouse 20, operably linked " to the " control computer system 1.
  • Secondary-use data warehouse 20 may be accessed by authorised external computer systems 24.
  • data to be stored can be transmitted by the client computers 2 across the network 4 to a control computer system 1.
  • the PID of the patient to which the received data relates is determined, either because it is included in or deducible from the received data, or can be retrieved by querying the identifying information in the demographic database 6.
  • this PID is exchanged for a PPK prepared using a one-way computational algorithm, with privacy gatekeeper computing system 10, and then, the healthcare data, excluding any identifying information, but including the PPK, is transmitted to the data warehouse 8 for storage.
  • a further process takes place when the PID is submitted to the privacy gatekeeper 10 for preparation of a PPK.
  • Privacy gate keeper 10 assigns a unique identifier to a specified batch or stream of data (hereafter referred to as a message) which is to be stored. This identifier is hereafter referred to as the unique message identifier (UMID).
  • the UMID is sent to the secondary-use database along with the PPK and data which is being sent to the data warehouse 08.
  • the secondary -use data warehouse 20 stores records including, for each record, both PPK 12, and the relevant healthcare data 14, along with the UMID 22.
  • the privacy gate keeper 10 adds a new record to a database 26, consisting of the PID 28 of the person to whom the message relates and data 30 which is the output of an irreversible computational process applied to the UMID .
  • the secondary -use database 20 is made available to authorised users, who can use it for studies, analysis, etc. As each record includes a PPK, data relating to the same entity can be matched to other data concerning the same entity, allowing queries identifying people who fit complex criteria. However, the PPK cannot be used to determine the identity of the individuals which the data concerns.
  • Data submitted from client computers 2 to the control computer system 1 would typically be stored as quickly as possible in data warehouse 8, as it may be urgent, and relating to a current health emergency.
  • Data for the secondary-use database may be stored at a later date, perhaps after further processing and analysis.
  • Figure 3 illustrates in schematic form key components of another example infrastructure 1001 for relaying information about patients in a healthcare system.
  • a source computer system 1002 functioning as a data source computing means, holds both data concerning events and personal identifying information about the person to whom the data relates.
  • Source computer system 1002 is operatively connected to deconstruct server 1004.
  • Deconstruct server 1004 is operatively connected to event server 1006 which functions as data processing computing means.
  • the event server 1006 is in turn operatively connected to reconstruct server 1008, which along with the event server 1006 constitutes recipient computing means.
  • the reconstruct server 1008 is in turn able to transmit data to a recipient system 1009.
  • Figure 3 also shows boundaries between different organisations 1018.
  • the source computer system 1002 and deconstruct server 1004 are contained within a first safe haven 1020a within which an organisation has procedures in place to control access to data
  • the privacy gatekeeper server 1010 is contained within a second safe haven 1020b
  • the reconstruct server 1008 and recipient system 1009 are contained within a third safe haven 1020c.
  • the first, second and third safe havens 1020a, 1020b and 1020c are different parts of a healthcare service, for example a clinic, a computing centre and a hospital accident and emergency department respectively.
  • the event server 1006 is operated by an external organisation, such as a computing consultancy.
  • a national or international infrastructure includes a plurality of first safe havens 1020a, each with source computer systems 1002 and deconstruct servers 1004; a plurality of third safe havens 1020c, each with reconstruct server 1008 and recipient system 1009; and a plurality of event servers 1006.
  • the safe haven 1020b including the privacy gatekeeper server 1010 is centrally located.
  • Figure 4 is a flow diagram showing the steps that take place in the relaying of information from the source computer system 1002 to the recipient system 1009.
  • Figure 5 shows the concomitant movement of data between components of the infrastructure 1001.
  • Figure 6 shows in block form data records used at various stages of the relaying of information.
  • source computer system 1002 transmits initial data record 1200 ( Figure 4) to deconstruct server 1004, within safe haven 1020a (step 1100).
  • Initial data record 1200 includes data concerning events 1202 and also personal identifying information 1204, such as name and date of birth information, about the person to whom the data concerning events 1202 relates.
  • the deconstruct server 1004 then anonymises the initial data record 1200, producing anonymous data record 1210 (step 1102).
  • anonymous data record 1210 personal identifying information 1204 is removed, generalised, approximated or categorised. In this example, the date of birth of a patient is replaced with the age range 1212 into which they fall.
  • the deconstruct server 1004 concomitantly relays the identifying information 1204 to the privacy gatekeeper server 1010 (step 1104).
  • the privacy gatekeeper server 1010 uses the identifying information 1204 to establish the index indicium 1216 for to the person to whom initial data record 1200 pertains. This is achieved by querying master index data store 1012 using conventional database searching techniques (step 1106).
  • Figure 6 illustrates part of a database 1218 recorded in master index data store 1012.
  • Database 1218 comprises a plurality of records 1220 for individual patients, referenced by index indicia 1216.
  • Each record 1220 includes identifying information 1222 and consent rules/data 1224 which specify what identifying information 1222 may be disclosed to particular recipient machines or users.
  • the encryption/decryption module 1014 then encrypts the index indicium 1216 to form the key 1214 (step 1108).
  • the key 1214 is not stored in non-volatile memory, but is relayed 1110 immediately to the deconstruct server 1004 where it is included in anonymous data record 1210.
  • the privacy gatekeeper server 1010 checks 1112 to see if the identifying information 1204 which is received is different, more up-to-date information than the identifying information 1222 stored for the particular patient, identified by the index indicium 1216. If it is different, more up-to-date information, it updates the identifying information 1222 stored in master index data store 1012 (step 1114). If there is not already a record 1210 relating to the particular patient, a new one is automatically created.
  • the deconstruct server 1004 relays 1116 the anonymous data 1210 to the event server 1006.
  • the event server 1006 relays the anonymous data 1210 onward to the reconstruct server 1008 according to an onward relaying schedule (step 1118).
  • the anonymous data 1210 is also stored in an event data repository 1007 for use in data processing, analysis and reporting
  • the data repository 1007 provides a useful archive of anonymous data 1210.
  • Data processing rules defined in the event server 1006 specify actions, such as automatic processing, or analysis, for instance to gather statistics, which are carried out on received anonymous data 1210.
  • Data distribution management software determines which reconstruct servers 1008 data should be relayed to with reference to configured rules.
  • the privacy gatekeeper server 1010 authenticates 1124 the reconstruct server 1008.
  • a user of the reconstruct server 1008 uses a web browser incorporating secure socket layer (SSL) technology to securely log on to the privacy gatekeeper server 1010, providing a user ID and password allowing their identity to be authenticated, as is well known in the art.
  • SSL secure socket layer
  • the reconstruct server prepares a reconstructed data record 1228 comprising data 1230 from the source computer system 1002 and the requested identifying information 1226 (step 1130). This reconstructed data record 1228 is then available for relaying onwards to recipient system 1009 (step 1132).
  • the anonymous data 1210 and identifying information 1204 were sent separately and could only be recombined by exchanging the key 1214 with the requested identifying information 1226 after authentication. All point-to-point transfers of data are by 128-bit triple-DES encrypted links as an additional layer of security. If an unauthorised party accesses the anonymous data 1210, they cannot identify to whom it relates. Even if the same unauthorised party finds a way to access the data stored in the master index storage device 1012, they still cannot match the anonymous data 1210 and the information identifying to whom it relates, as the anonymous data 1210 does not contain the index indicia 1216, but only an encrypted key 1214.
  • Figure 7 is a flow diagram showing the procedural steps in a related method carried out on the same apparatus, for establishing which data relates to a person, given identifying information pertaining to that person.
  • Figure 8 illustrates the resulting flow of data.
  • a user is authenticated (step 1300).
  • a user of the reconstruct server 1008 logs in to the privacy gatekeeper server 1010 using a web browser with a secure socket layer, providing a userid and password.
  • the user submits identifying information, such as a name or health service number of a patient that is then relayed from the reconstruct server 1008 to the privacy gatekeeper server 1010 (step 1304).
  • the privacy gatekeeper server 1010 then retrieves the index indicia 1226 for the person whose identifying information has been submitted (step 1306). This is carried out by conventional database searching techniques.
  • the encryption deencryption module 1014 then encrypts the index indicia 1226 to form a key 1214 which is then relayed to the reconstruct server 1008 (step 1310).
  • the reconstruct server 1008 and privacy gatekeeper server 1010 have exchanged the key 1214 and the identifying information.
  • the reconstruct server 1008 is now enabled to match anonymous data 1210 containing the retrieved key 1214 with identifying information.
  • the reconstruct server 1008 now requests data from each event server 1006 by supplying the key 1214 to each server 1006 (step 1312).
  • Event servers 1006 having anonymous data 1210 including the key 1214 in their data repositories 1007 then relay that anonymous data to the reconstruct server 1008 for matching to the identifying information.
  • a user can retrieve anonymous data 1210 relating to a particular individual.
  • a national or international infrastructure can be implemented, including deconstruct servers in locations such as clinics where patient data is created, reconstruct servers in locations, such as doctors or dental surgeries, hospital departments, ambulances etc. where patient data matched to patient identifying information is required, and event servers to store anonymous data for processing.
  • the privacy gatekeeper server 1010 is located centrally where it can be carefully regulated.
  • Consent rules 1224 stored in the master index data store 1212 are created with reference to default settings and in response to patient questionnaires.
  • All communication between servers is over HTTP or HTTPS ports.
  • Data (including anonymous data and identifying information) is formatted according to the XML format, well known in the art.
  • Servers implement SOAP (Simple Object Access Protocol), (SOAP is a trademark of Microsoft Corporation, Redmond, WA, USA) to minimise the complexity of organisational firewalls.
  • SOAP Simple Object Access Protocol
  • the reconstruct server or a user thereof is authenticated for the purpose of determining authorisation to match the anonymous data to identifying information by means of the HTTPS protocol.
  • initial data record 1200 is transmitted to deconstruct server 1004 via SOAP over HTTP (step 1100).
  • reconstructed data record 1228 is relayed 1132 to recipient system 1009 via SOAP over HTTP.
  • the following data is relayed via SOAP over HTTPS: anonymous data 1210 from deconstruct server 1004 to event server 1006; anonymous data 1210 from event server 1006 to reconstruct server 1008; identifying information 1204 from deconstruct server 1004 to privacy gatekeeper server 1010 and identifying information 1204 from privacy gatekeeper server 1010 to reconstruct server 1008.
  • the data repositories 1007 provide copies of all anonymous data sent from deconstruct servers to reconstruct servers, which can be used for analysis and reporting.
  • the event server 1006 can be an external organisation, allowing secure, confidential processing of anonymous data at the premises of third party IT consultants.
  • Consent rules 1224 can be personalised for each patient, allowing patient control of their personal data.

Abstract

Disclosed is a method of storing information concerning an entity, the method comprising the steps executed by computing means of receiving information concerning an entity; receiving (with or separately to the information concerning an entity) an identifier of the entity which the information concerns; causing key preparation computing means to prepare a key dependent on the identifier of the entity using an irreversible computational procedure; and storing, in first data storage means, relatably to the key prepared by the key preparation computing means, some or all of the received information concerning the entity, but not including any information identifying the entity. There is also disclosed a method of transferring data related to an entity and identifying information identifying the entity, the method comprising the steps of ensuring that the data does not include said identifying information, and is hence anonymous, but does include a key, the key being relatable to identifying information identifying the entity to which the data relates by privacy control computing means; receiving the anonymous data at a recipient computing means; before, during or after said receiving, determining whether the recipient computing means or a user thereof is authorised to match the data to the identifying information and, if so, exchanging the key and the identifying information between the privacy control computing means and the recipient computing means separately from said data; thereby enabling the recipient computing means to match the anonymous data and the identifying information identifying said entity.

Description

Method of and apparatus for storing data
Field of the Invention
The invention relates to a method of and apparatus for storing, retrieving and/or transferring data related to an entity and identifying information identifying the entity. The invention is particularly concerned with the secure storage, retrieval, and relaying of healthcare data within a healthcare system.
Background to the Invention
Although the subject matter of the present invention can relate to any type of information concerning any type of entity, various issues will now be discussed with reference to the specific example of personal healthcare data.
In order to run a healthcare system effectively, it is desirable to store information about patients in a standardised and to some extent centralised way. There are two particular types of information which it is desirable to store in this way. The first is identifying information identifying a patient. For example, their name, address, their number allocated by the healthcare system (e.g. in the UK, a patient's NHS number). Secondly, it is desirable to store healthcare information concerning individual patients. For example, the results of tests, information about past and future treatments, allergies, doctor's notes etc.
However, it is important to maintain confidentiality of this personal data. It is difficult to implement a system for storing and retrieving personal healthcare data in a fashion which ensures that unauthorised accessed information concerning a particular person cannot be obtained.
In general, identifying information identifying an entity is not in itself sensitive. Anonymous healthcare data, such as data relating to a particular test, is also of relatively low sensitivity provided that it cannot be matched up to the person to whom it relates. It is the ability to relate identifying information identifying a person to and from healthcare data relating to that person which is sensitive and which the present invention aims to control.
Summary of the Invention
According to the present invention there is provided a method of storing information concerning an entity, the method comprising the steps executed by computing means of:
receiving information concerning an entity;
receiving (with or separately to the information concerning an entity) an identifier of the entity which the information concerns;
causing key preparation computing means to prepare a key dependent on the identifier of the entity using an irreversible computational procedure; and
storing, in first data storage means, relatably to the key prepared by the key preparation computing means, some or all of the received information concerning the entity, but not any information identifying the entity.
The invention also provides a method of retrieving information concerning an entity, the method comprising the steps executed by computing means of:
receiving an identifier of an entity at a key preparation computing means; determining at the key preparation computing means a key dependent on the identifier of the entity using an irreversible computational procedure;
retrieving information related to the key from a first data storage means, the first data storage means being configured to store and allow retrieval of data comprising information concerning an entity (preferably excluding any information identifying the entity) relatably to a key prepared by the key preparation computing means.
Preferably, data is stored and retrieved by the above methods.
For example, the data stored in the first data storage means may comprise a record. The record may comprise the information concerning the entity and the key prepared by the key preparation computing means, but not any information identifying the entity.
The invention therefore provides a methodology for storing and retrieving information concerning an entity in a data storage means where it is not stored with identifying information identifying the entity which it concerns. As the key is prepared using an irreversible computational process, information stored in the first data storage means cannot by itself be analysed to determine the entity which the stored information concerns. However, an authorised party that already has identifying information concerning the entity can store and retrieve information concerning the entity in the first data storage means.
In a particularly preferred embodiment, the information concerning an entity is healthcare data relating to a person. Thus, a healthcare worker caring for a patient of a healthcare system can store medical information such as test results, details of treatments etc. concerning that patient on the first data storage means, and retrieve previously stored medical information concerning that patient. However, there is not a computational process which would allow a party with access (perhaps unauthorised access) to the first data storage means to establish whom medical information therein concerns. This has important security benefits. In contrast to a data storage means which includes a key prepared from identifying information using a reversible computational process, such as a two-way encryption algorithm, it should not be possible for a party to make a connection from medical information to the identity of the person which it concerns.
Another benefit is that the first data storage means may store medical information in an unencrypted form, or a form that can be rapidly unencrypted, and still remain secure. This allows potentially faster and cheaper access to the first data storage means.
An irreversible computing procedure is one which does not have a procedure for relating the product of the irreversible procedure to the identifier used as an input to the procedure. For example, a computing procedure consisting of or including a hash function step will be inherently irreversible, as there are in general a plurality of inputs to a hash function which give the same output. A reversible computing process is one with a procedure for determining the input to the process given its output. The reverse (de-encryption) procedure may be entirely different to the forward (encryption) procedure.
Preferably, the irreversible computing procedure comprises the steps of applying a hash function to the identifier of the entity (or a value derived therefrom). More preferably, the irreversible computing procedure comprises the further step of subsequently encrypting the hashed identifier. This makes it harder for the procedure to be compromised by an attack consisting of applying the hash function to all possible identifiers and so preparing a lookup table which could be used to determine the identifier from the key. Preferably, the hash step comprises the application of an SHA hash function, for example SHA-1 , SHA- 256, SHA-512, or SHA-384 or another hash function specified in ISO/TEC 101 18-3:(E), such as Whirlpool. Subsequent encryption may employ the well-known RSA algorithm, or triple-DES algorithm.
The invention also extends to processes for updating, changing and deleting data stored in the first data storage means, by a process corresponding to the method of storing information concerning an entity except that information concerning the entity in the first data storage means is updated, changed or deleted instead of being stored. The identifying information typically includes the name, address, postcode and other information which could be used to readily identify an individual and may include other indicia which can be resolved to identify a person, such as National Health Service Number, an alias, etc. The identifier of the entity upon which the prepared key depends may be any identifying information, preferably a person's identification number issued by the Health Service, for example, a British NHS number.
The method may include the step of receiving information concerning the identity of an entity, and retrieving the identifier of the identity from an identifying information storage means (such as a computer system implementing a database of identifying information). For example, a user may query the identifying information storage means (e.g. by carrying out a search based on a patient's name), allowing retrieval of the identifier of the identity (e.g. a patient's NHS number). Alternatively, the information concerning an entity which is received may include identifying information, which is not the identifier of the identity, usable by the key preparation computing means to prepare a key. In this case, the first computing means may interrogate the identifying information storage means, to establish the entity to which the information relates, and retrieve the identifier of the entity from the identifying information storage means.
The key preparation computing means may comprise a server or other computing device or devices to which the identifier of the entity is transmitted, and from which a key dependent on the identifier of the entity is received.
Preferably, the key is not stored in a non-volatile storage means by the key preparation computing means. Preferably also, the key is not stored in a non-volatile storage means other than in the first data storage means.
The method of storing information may further comprise the step of generating a data identifier of at least some of the information concerning an entity, storing that data identifier in a second data storage means relatably to the information concerning the entity (and preferably also the key dependent on the index indicium), and storing in a data identifier database an identifier of the entity which the information concerns and a value determined from the data identifier using an irreversible computational procedure.
For example, the information may be received and then stored in the second data storage means in the form of records, and the data identifier may be an index number for the relevant record stored in the second storage means .
The invention may also extend to an identity resolution process for determining the entity to which data stored in the second data storage means relates, comprising the steps of retrieving from the second data storage means a data identifier related to information concerning the entity, determining a value from the data identifier using the irreversible computation procedure and then searching the data identifier database for the corresponding identifier of the entity. Preferably, the identity resolution procedure is allowed only after authorisation. Preferably also, each use of the identity resolution procedure is logged, typically along with an identifier of the person or organisation using this identity resolution procedure.
Thus the second data storage means includes information which can be matched to other information concerning the same entity (by virtue of the key dependent on the index indicium denoting the entity which the information concerns) allowing detailed analysis of the information, but which can be linked to the identity of the entity to which it relates by virtue of the identity resolution procedure. The data identifier database cannot be used to relate the identifiers of entities stored therein to information stored in the second data storage means as it is a value determined from the data identifier using an irreversible computational procedure which is stored in the first data storage means.
The second data storage means may be the first data storage means , or may be provided as an alternative to the first data storage means, or implemented as well as the first data storage means. Where the information is healthcare data relating to patients, and the second data storage means is implemented as well as the first data storage means, the first data storage means may be used to provide information for use in patient care, and the second data storage means may be used to provide information for use in epidemiological and other studies.
The invention also extends to a method for storing information concerning an entity, the method comprising the steps executed by computing means of:
receiving information concerning an entity;
receiving (with or separately to the information concerning an entity) an identifier of the entity which the information concerns;
generating a data identifier of the information concerning the entity;
storing in data storage means the data identifier relatably to the information concerning the entity, but not any information identifying the entity;
storing in a data identifier database an identifier of the entity which the information concerns and a value determined from the data identifier using an irreversible computational procedure.
According to a further aspect of the present invention there is provided computer apparatus for storing (and preferably also retrieving) information concerning entities, the apparatus comprising:
means to receive information concerning an entity;
key preparation computing means operable to receive an identifier of an entity which received information concerns, and to prepare a key dependent on the identifier of the entity using an irreversible computational procedure; and o
first data storage means, configured to store some or all of the received information concerning the entity, but not any information identifying the entity, relatably to the key prepared by the key preparation computing means.
If necessary, there is also provided computing means configured to ensure that the information concerning the entity which is stored in the first data storage means does not include information identifying the entity. This will not be necessary if it is known that the received information will not include an identifier of the entity.
The computer apparatus may comprise first data processing means configured to receive the information concerning an entity, transmit some or all of the information concerning the entity to the first data storage means, and cause the key preparation computing means to transmit a key to the first data storage means for storage relatable to the information concerning the entity which is transmitted to the first data storage means.
Typically, the first data processing means will receive the key from the key preparation computing means and transmit it to the first data storage means .
Typically also, the first data processing means will function as computing means configured to ensure that the information concerning the entity which is stored in the first data storage means does not include identify information identifying the entity, by transmitting to the first data storage means only information that does not identify the entity.
The identifier of the entity will typically be transferred to the key preparation computing means, by the first data processing means. The identifier of the entity may be received by the first data processing means with the information concerning an entity and transmitted to the key preparation computing means, but not transmitted to the first data storage means. The computer apparatus may further comprise an identifying information database including identifying data concerning a plurality of entities, relatably to the identifier of the entity. Thus, the first data processing means may receive information identifying an entity, transmit the information to the identifying information database and receive the identifier of the entity.
However, the identifying information database could transmit the identifier of the entity directly to the key preparation computing means.
The means to receive information may be a connection to a network through which information can be received from a plurality of clients.
The first data processing means, the identifying information storage means, the key preparation computing means, and the first data storage means may each comprise a plurality of separate computers and/ or storage devices.
Preferably, information concerning an entity is healthcare data concerning a person.
The invention also extends to computer apparatus for storing (and preferably also retrieving) information concerning entities, the apparatus comprising:
means to receive information concerning an entity;
data identifier generation means operable to generate a data identifier of the information concerning the entity;
data storage means configured to store the data identifier relatably to the information concerning the entity, but not any information identifying the entity; and means for storing in a data identifier database an identifier of the entity which the at least some of the information concerns and a value determined from the data identifier using an irreversible computational procedure.
According to a further aspect of the present invention there is provided a method of transferring data related to an entity and identifying information identifying the entity, the method comprising the steps of:
(a) ensuring that the data does not include said identifying information, and is hence anonymous, but does include a key, the key being relatable to identifying information identifying the entity to which the data relates by privacy control computing means;
(b) receiving the anonymous data at a recipient computing means;
(c) before, during or after said receiving, determining whether the recipient computing means or a user thereof is authorised to match the data to the identifying information and, if so,
(d) exchanging the key and the identifying information between the privacy control computing means and the recipient computing means separately from said data; thereby enabling the recipient computing means to match the anonymous data and the identifying information identifying said entity.
Anonymous data is therefore available for relaying across organisational boundaries or outwith an organisation so that it may be stored or analysed at the most appropriate location; however, a Data Controller who controls the privacy control computing means can retain control of the matching of anonymous data to identifying information, even though a Data Controller may not control the recipient computing means and/or all routes by which data may reach recipient computing means.
The invention also extends to a method of transferring data related to an entity and identifying information identifying the entity as above in which the key is prepared by privacy control computing means from an identifier of the entity using a reversible computational procedure. This means that a function exists which allows the key to be decrypted to give an identifier of the entity. The function and/or the key associated therewith is kept secure and the process of decrypting the key to give an identifier is allowed only under controlled and preferably logged circumstances. This means that if the anonymous data is intercepted, wrongly transmitted, or accessed by an unauthorised party, it cannot be matched up to the entity to which it relates without authorisation. Anonymous data including a key prepared from an identifier of the entity using a reversible algorithm could therefore be made available for processing by parties who are not entitled to match it to identifying information but yet the data can subsequently be matched up to identifying information by authorised parties.
However, in a further aspect of the invention, the key is prepared by the privacy control computing means from the identifying information using an irreversible computational procedure or algorithm. This means that there does not exist a function which allows the identity of an entity to be produced from the data which does not include identifying information, but does include a key. Nevertheless, it is possible to retrieve the data relating to an entity, given identifying information identifying the entity.
An identifier of the entity may be an index indicium, discussed below, for example, the identifier of the entity might be the identification number issued by National Health Care Service, such as a United Kingdom National Health Service Number.
Preferably, the key included in the anonymous data originated from the privacy control computing means.
Preferably also, the same key is used consistently in relation to the same entity. This means that it is possible to cross-reference different anonymous data concerning the entity without needing to access identifying information relating to that entity. For instance, anonymous data concerning treatments given to patients could be cross-referenced with data concerning clinical outcomes of treatments by reference to the key, without the party carrying out that cross-referencing having authorisation to match the anonymous data with identifying information. The step of exchanging the key and the identifying information may take place before, during or after the receipt of the anonymous data, including the key.
Thus, in one example of the invention, the key may be relayed from the recipient computing means to the privacy control computing means after the recipient computing means has received said data, the privacy control computing means then relating the key to the entity and then relaying identifying information concerning that entity to the recipient computing means. Thus the recipient computing means can match the anonymous data to identifying information identifying said entity only if the recipient computing means or a user thereof is authorised to do so. Preferably, the recipient computing means requests identifying information from the privacy control computing means.
Alternatively, the recipient computing means may first receive identifying information about an entity (for example, by user input at a keyboard), and then, if authorised, exchanges that identifying information for the' key relating to the same entity with the privacy control computing means. The recipient computing means can then interrogate the data storage means for anonymous data containing the key and so identify, retrieve or otherwise carry out actions on or with reference to anonymous data associated with the entity identified by the received identifying information.
The recipient computing means may query a plurality of remote or local data storage means for anonymous data including the key, and then retrieves that anonymous data.
Matching the anonymous data to the identifying information relating to the entity preferably includes preparing a further data record including both the anonymous data and the identifying information at a reconstruct server, which may then be forwarded to one or more recipient systems. Thus, the recipient computing means may include a reconstruct server and one or more recipient systems. In this case, the reconstruct server and one or more recipient systems are preferably within the same safe haven. The reconstruct server and one or more recipient systems can be implemented as a LAN. However, matching does not necessarily require both the identifying information and anonymous data to be present at once in the recipient computing means; for example, the recipient computing means may relay identifying information without storing it.
Preferably, the privacy control computing means is operably connected to an entity information database which contains records of identifying information concerning a plurality of entities .
Preferably, each entity is referenced by an index indicium for an index of identifying information retained on the privacy control computing means, the anonymous data being incapable of being matched with the identifying information without reference to the index, and thus without access to the privacy control computing means.
Preferably, the indicium is encrypted by the privacy control computing means to create the key. Preferably, the privacy control computing means has encryption/decryption means operable to encrypt the indicium to create the key and operable to decrypt the key to yield the indicium. This means that even if both the anonymous data and the entity information database were compromised, for example by a hacker with access to both the privacy control computing means and data source computing means (see below) or recipient computing means, then provided that the encryption/decryption means remains secure, the anonymous data and the identifying information cannot be matched up. Preferably, the key is never stored in non-volatile storage means by the privacy control computing means, but is created only when required.
The step of ensuring that the anonymous data does not include said identifying information but does include a key preferably takes place at a deconstruction computing means operative to ensure the anonymous data does not include said identifying information and to relay the anonymous data to the recipient computing means. The step of ensuring that the anonymous data does not include said identifying information but does include a key is typically preceded by the step of receiving data including identifying information from a source computing means.
Preferably, the step of ensuring that the anonymous data does not include said identifying information but does include a key, comprises the stages of firstly exchanging identifying information with the privacy control computing means, for the key, and secondly including the said key in the anonymous data.
Thus, data may be relayed from the deconstruction computing means to the recipient computing means by two separate processes. Anonymous data can be relayed to the recipient computing means without identifying information. Identifying information can be exchanged for the key between the deconstruction computing means and the recipient computing means. Identifying information is then retrieved from the privacy control computing means by the recipient computing means exchanging the identifying information for the key if authorised.
If the method may include the step of supplying to the deconstruction computing means data which relates to an entity whose identifying information is not to be released to the recipient computing means (e.g. a patient of a different healthcare service) the deconstruction computing means ensures that this data includes no identifying information, nor any key. Such data can still be used (for example by the recipient computing means) for some statistical purposes, but cannot be rematched to identifying information via the exchange of identifying information for a key by the process of the present invention.
The identifying information exchanged for the key by the deconstruction computing means and the recipient computing means may be different. Responsive to receipt of identifying information, the privacy control computing means may update, add to, delete or otherwise alter identifying information stored in the entity information database or create new records if required. The recipient computing means may receive more, less or different identifying information to that relayed to the privacy control computing means by a deconstruction computing means.
Preferably the entities are people, such as patients of a healthcare system.
In this case, the identifying information typically includes name, address, post code and other information which could be used to readily identify an individual and may include other indicia which can be resolved to identify a person, such as National Health Service number, an alias etc. Aliases and identifying numbers may be specific to particular source computing means. Aliases and identifying numbers may be resolved to identify a person by the privacy control computing means. This embodiment enables data management problems relating to the use of different identifiers for patients in different source computing means to be resolved or mitigated. New index indicia may be allocated to new entities, allowing handling of data relating to entities not previously known.
The anonymous data may include, for example, items relating to the results of patient consultations, such as the results of tests or studies, medical history information, details of diagnosis, prescriptions, symptoms, domestic circumstances; information about clinical interventions (such as procedures carried out, treatments, diagnoses); referrals, results of diagnostic reports or appointments. The anonymous data may comprise records relating to individual events relating to a patient, such as events listed above.
The anonymous data may be streamed data. The anonymous data may include all types of data which can be packaged in the XML format.
Anonymous data may include data derived from identifying information in the data source computing means. The derivation may be by generalisation, approximation, or categorisation. For example, if the identifying information is a date of birth of a person, the anonymous data may include the person's age in years, or into which of several age bands the person falls. Determining whether the recipient computing means or a user thereof is authorised to match the data to the identifying information may take place dependant on the identity of a user of the recipient computing means, the identity of the recipient computing means, or be based on properties of the recipient computing means, or a user thereof. Preferably, authorisation is dependent on the identity of the user. Where the recipient computing means includes a reconstruct server and one or more recipient systems, the user typically interacts with the recipient system. Thus, several users interacting with recipient systems may receive data records from the same reconstruct server. Different classes or groupings of authorised users or recipient computing means may be specified; for example, a patient may allow access to their medical records to the accident and emergency department of any UK NHS hospital, but no clinics. Different recipient computing means or users thereof may have authorisation to receive different subsets of the identifying information. The step of determining whether the recipient computing means or a user thereof is authorised to match the data to the identifying information may include receiving override instructions from a user, authorising the matching of data to identifying information and then logging the user and details (such as date, time, recipient computing means, reason, data, entity etc) of the override instructions.
Preferably, determining whether the recipient computing means or a user thereof is authorised to match the data to the identifying information takes place with reference to a consent register which includes data and/or rules specific to individual entities or to groups of entities. The consent register or data and/or rules therein may be specified by an administrator of the consent register or by consent surveys completed by or in relation to entities whose data is stored. This enables different identifying information to be available to different parties; for example, patient record administrators, specialist clinics, accident and emergency departments etc. will have different authorisations. This could be used to implement enhanced patient control of access to their personal identifying information.
Anonymous data is preferably relayed from the data source computing means to the recipient computing means at a different time to when associated identifying information is relayed to the privacy control computing means . Anonymous data may be relayed from the data source computing means to the recipient computing means via data processing computing means. The data processing computing means may store the anonymous data temporarily or permanantly, and may process the anonymous data, alter the anonymous data or prepare new anonymous data derived therefrom for use by the recipient computing means, for relaying to the recipient computing means or any other usage on receipt, on demand, periodically or according to any other time pattern. The data processing computing means may be in an insecure or less secure environment than the recipient computing means. For example, healthcare information will often have to be transmitted to a specialised software company for complex data processing. Thus, the data processing computing means may comprise a server under the control of an external organisation. This information can therefore be used for statistical purposes, or any other type of analysis, without the data processing computing means having access to identifying information. Thus, this embodiment allows anonymous data to be processed by third parties without the identity of the party to whom it refers being disclosed.
Preferably, the recipient computing means, privacy control computing means and, where present, data store and"/ or data processing computing means are each contained in "safe havens" , being secure and trusted environments in an organisation or part of an organisation, where there are procedures in place to ensure the safety and secure handling of data.
The recipient computing means may transfer the anonymous data or data derived therefrom matched with identifying information to a second recipient system, typically within a safe haven within which the recipient computing means resides.
The privacy control computing means may determine whether the recipient computing means is authorised to transfer matched data to the second recipient system. Consent to transfer such matched data may also be stored in the consent register. Transmissions between or within computing means may be effected by a secure encrypted link, for additional security.
Any one or more of the recipient computing means or a part thereof such as a reconstruct server if present, the privacy control computing means and, if present, the data source computing means and/or data processing computing means may comprise a respective server.
According to another aspect of the present invention there is provided a privacy control computing means comprising storage means for storing an entity information database in which identifying information relating to a particular entity is stored with reference to an index indicium; authorisation determining means operable to determine whether a recipient computing means or a user thereof is authorised to exchange identifying information relating to an entity for a key exchange means operable to exchange identifying information relating to an entity for a key with a recipient computing means if authorised; and relation computing means operable to relate a key to an entity.
Thus, for example, the exchange means may relay identifying information relating to an entity to said recipient" computing means in response- to" the receipt of a- key related to that entity from said recipient computing means.
Alternatively, the exchange means may relay a key relating to an entity to said recipient computing means responsive to receipt of identifying information related to that entity from said recipient computing means.
Preferably, the relation computing means comprises encryption/deencryption computing means operable to encrypt the index indicium to give the key and opeable to deencrypt the key to yield the index indicium.
According to a further aspect of the present invention, there is provided recipient computing means comprising: means to receive anonymous data including a key, and exchange means operable to exchange the key for identifying information with a privacy control computing means responsive to the receipt of the key, wherein the exchange means is adapted to provide authentication information (such as a userid and password) to the privacy control computing means.
Preferably the recipient computing means further comprises matching means to match the anonymous data with the identifying information.
Preferably the recipient computing means further comprises forwarding means for forwarding the anonymous data and identifying information to a second recipient system.
According to a further aspect of the present invention there is provided a deconstruction computing means comprising receiving means for receiving data relating to an entity including identifying information, exchange means for exchanging the received identifying information for a key relating to the entity with a privacy control computing means, and anonymisation means for preparing anonymous data comprising received data and the key, but not the received identifying information.
The anonymisation means may be operable to prepare anonymous data including a generalisation, approximation or category of receive data and/or identifying information.
According to a further aspect of the present invention there is provided data transmission apparatus, the apparatus comprising a privacy control computing means according to the second aspect, recipient computing means according to the third aspect and deconstruction computing means according to the fourth aspect operatively connected to carry out the method of the first aspect.
Although the embodiments of the invention described with reference to the drawings comprise processes performed in computer apparatus and computer networks, and also computing apparatus and computer networks, the invention also extends to computer programs, particular computer programs on or in a carrier, adapted for carrying out the processes of the invention or for causing a computer to perform as the computer apparatus of the invention. Programs may be in the form of source code, object code, a code intermediate source, such as in partially compiled form, or any other form suitable for use in the implementation of the processes according to the invention. The carrier may be any entity or device capable of carrying the program.
For example, the carrier may comprise a storage medium, such as a ROM, for example a CD ROM or a semiconductor ROM, or a magnetic recording medium, for example a floppy disc or hard disc. Further, the carrier may be a transmissible carrier such as an electrical or optical signal which may be conveyed via electrical or optical cable or by radio or other means. When a program is embodied in a signal which may be conveyed directly by cable, the carrier may be constituted by such cable or other device or means. Alternatively, the carrier may be an integrated circuit in which the program is embedded, the integrated circuit being adapted for performing, or for use in the performance of, the relevant processes.
The computing means may comprise a plurality of separate discontinuously or continuously operatively connected processors or other computing apparatus, with individual method steps carried out on one or more distinct apparatus. Additionally or alternatively, the computing means may comprise computing apparatus controlled by different parties.
Brief Description of the Drawings
Figure 1 is a schematic diagram of computer apparatus forming a healthcare data infrastructure;
Figure 2 is a schematic diagram of an alternative computer apparatus forming a healthcare data infrastructure; Figure 3 is a schematic diagram of an infrastructure for relaying information about patients in a healthcare system;
Figure 4 is a flow diagram of a process for relaying data in the infrastructure of Figure 3;
Figure 5 is a schematic diagram illustrating the movement of data around the infrastructure of Figure 3 ;
Figure 6 is a schematic diagram of data records used in the process of Figure 4;
Figure 7 is a flow diagram of a second process for relaying data in the infrastructure of Figure 3; and
Figure 8 is a schematic diagram illustrating the movement of data around part of the infrastructure of Figure 3 by the process of Figure 7.
Detailed Description of Example Embodiments
Example 1
Figure 1 illustrates the components of computer apparatus forming a healthcare infrastructure. One or more control computer systems 1 (functioning as first data processing means) are connected to a plurality of client computers 2 across a network 4. The network is nationwide, with individual client systems being located with healthcare providers, such as in doctors' surgeries, hospital departments, in ambulances and the like, and in mobile form carried by healthcare personnel.
Client computers 2 are used to both submit healthcare information for storage and to retrieve healthcare information relating to particular patients. Control computer system 1 is operatively linked to a database of demographic information (functioning as identifying information storage means) 6, a privacy gate keeper computer system 10 (functioning as key preparation computing means), and a data warehouse 8 (functioning as first data storage means).
Data transmitted across the network 4 can primarily be classified into two types. First, are filing actions, so-called CRUD (create, record, update, delete) requests. The second type of data transfer is a query request, which should result in the retrieval of information concerning a patient.
Where data is to be submitted for storage, it is transferred by client 2 to the control computer system 1 which establishes an identifier of the patient to which the information refers. This unique patient identifier (PID), such as a patient's National Health Service number, may be included in the information submitted by the client terminal 2. If it is not, then it may be retrieved from the demographic database 6, and the control computer system 1 may interact with the client terminal 2 to enable the client terminal to query the database 6 determining the PID of the patient to which it relates.
Next, the identifier of the entity is transmitted to the privacy gate keeper 10. The privacy gate keeper 10 then carries out an irreversible computational process on the patient identifier, or a value derived from the PID, to prepare a patient privacy key (PPK), which is a key unique to a specific patient. The irreversible computing process includes the steps of applying the SHA-256 hash function (defined in ISO/TEC 10118-3 -.2003(E)) to the PID, followed by triple DES encryption, producing the PPK. Next, the PPK is transmitted back to the control computer system 1.
The control computer system 1 then prepares a data record comprising information which was received from the client terminal 2, except that any information identifying the patient to which it relates (e.g. their name, address etc) is removed. Instead, the key received from the privacy gatekeeper is included in the database record which is then transmitted to the data warehouse 8. Within the data warehouse 8, are stored records comprising the PPK of the person to which it relates, and the data 14, in a format such as XML, or HL7, using otherwise conventional database technology.
In order to retrieve healthcare data from the data warehouse 8, a user of a client 2 inputs the PID of a patient, or identifying information (e.g. name, address) sufficient to enable the PID to be deduced by the control computer system 1 with reference to the database of demographic information. The PID is then submitted by the control computer system to the privacy gate keeper 10. As before, the privacy gate keeper carries out an irreversible computational process on the patient identifier, or a value derived from the patient identifier, and so prepares the same patient privacy key as was used when storing data relating to the same individual in the past.
Next, the PPK is transmitted back to the control computer system which then submits it to the data warehouse 8, along with a request for the desired data concerning the person. Healthcare information relating to the patient with the calculated PPK can therefore be found, using standard database query techniques, and that healthcare information is then transmitted to the client 2.
It is notable that both data storage, and data retrieval methods have used the same irreversible computational process to calculate a PPK from a PID. This means, that it is possible to go from the identity of a person to the healthcare data relating to them. This is an acceptable use of healthcare data, necessary to run a healthcare service.
However, it is not possible given the database records in the data warehouse 8, to determine the identity of the person to which it relates. Due to the use of a irreversible computational process to prepare the PPK, there is not a computational process to go back to the identity of the individual given the PPIC. This means that parties who require access to the data warehouse 8, or obtain access to the data warehouse 8 cannot relate the information therein to individuals . It is in principle possible to prepare a database of PIDs and PPKs to which they relate, in order to allow controlled matching of records from the data warehouse to the identity of the person to which they relate. Access to this database can be strictly controlled. However, preferably no such database is made.
This allows a very high level of confidence that it is not possible to relate healthcare information to a person's identity, it is only possible to retrieve healthcare data for a specific person. This would not be the case if the PPK was made using a reversible computational process. Even if the reverse process required to go from the PPK to the PID was a de-encryption process requiring a key which was securely held, the risk that healthcare information can be matched up to the person to whom it relates would be higher.
Example 2
Figure 2 illustrates a further example of computer infrastructure for a healthcare system. The components of the infrastructure are as before, except that there is further provided a secondary-use "data warehouse 20, operably linked" to the" control computer system 1. Secondary-use data warehouse 20 may be accessed by authorised external computer systems 24.
As before, data to be stored can be transmitted by the client computers 2 across the network 4 to a control computer system 1. As before, the PID of the patient to which the received data relates is determined, either because it is included in or deducible from the received data, or can be retrieved by querying the identifying information in the demographic database 6. As before, this PID is exchanged for a PPK prepared using a one-way computational algorithm, with privacy gatekeeper computing system 10, and then, the healthcare data, excluding any identifying information, but including the PPK, is transmitted to the data warehouse 8 for storage. However, in this example, a further process takes place when the PID is submitted to the privacy gatekeeper 10 for preparation of a PPK. Privacy gate keeper 10 assigns a unique identifier to a specified batch or stream of data (hereafter referred to as a message) which is to be stored. This identifier is hereafter referred to as the unique message identifier (UMID). The UMID is sent to the secondary-use database along with the PPK and data which is being sent to the data warehouse 08. Thus, the secondary -use data warehouse 20 stores records including, for each record, both PPK 12, and the relevant healthcare data 14, along with the UMID 22.
When the UMID is allocated, the privacy gate keeper 10 adds a new record to a database 26, consisting of the PID 28 of the person to whom the message relates and data 30 which is the output of an irreversible computational process applied to the UMID .
The secondary -use database 20 is made available to authorised users, who can use it for studies, analysis, etc. As each record includes a PPK, data relating to the same entity can be matched to other data concerning the same entity, allowing queries identifying people who fit complex criteria. However, the PPK cannot be used to determine the identity of the individuals which the data concerns.
However, in this example, the UMID can be used, under strictly controlled conditions to allow data to be related to the patient which it concerns. After receiving authorisation, users of the secondary-use database 20 may submit UMIDs which can be processed using the same irreversible computational process that was used to prepare database 26, then database 26 can be searched and the PID identifying the particular patient returned.
Thus, data can safely be made available for epidemiological studies, whilst retaining a high level of security.
Data submitted from client computers 2 to the control computer system 1 would typically be stored as quickly as possible in data warehouse 8, as it may be urgent, and relating to a current health emergency. Data for the secondary-use database, may be stored at a later date, perhaps after further processing and analysis.
Example 3
Figure 3 illustrates in schematic form key components of another example infrastructure 1001 for relaying information about patients in a healthcare system. A source computer system 1002, functioning as a data source computing means, holds both data concerning events and personal identifying information about the person to whom the data relates.
Source computer system 1002 is operatively connected to deconstruct server 1004. Deconstruct server 1004 is operatively connected to event server 1006 which functions as data processing computing means. The event server 1006 is in turn operatively connected to reconstruct server 1008, which along with the event server 1006 constitutes recipient computing means. The reconstruct server 1008 is in turn able to transmit data to a recipient system 1009.
Deconstruct server 1004 and reconstruct server 1008 are both operatively connected to a privacy gatekeeper server 1010, which "functions "as privacy" control cόfnputing"means. The" privacy gatekeeper server 1010 includes master index data store 1012 which stores identifying information relating to patients, stored with reference to an index indicium for each patient. The privacy gatekeeper server 1010 has an encryption/decryption module 1014, functioning as relation computing means, which is operable to prepare an encrypted key from an index indicium and which is also operable to decrypt a received key to yield the index indicium.
Figure 3 also shows boundaries between different organisations 1018. The source computer system 1002 and deconstruct server 1004 are contained within a first safe haven 1020a within which an organisation has procedures in place to control access to data, the privacy gatekeeper server 1010 is contained within a second safe haven 1020b and the reconstruct server 1008 and recipient system 1009 are contained within a third safe haven 1020c. The first, second and third safe havens 1020a, 1020b and 1020c are different parts of a healthcare service, for example a clinic, a computing centre and a hospital accident and emergency department respectively. The event server 1006 is operated by an external organisation, such as a computing consultancy. A national or international infrastructure includes a plurality of first safe havens 1020a, each with source computer systems 1002 and deconstruct servers 1004; a plurality of third safe havens 1020c, each with reconstruct server 1008 and recipient system 1009; and a plurality of event servers 1006. The safe haven 1020b including the privacy gatekeeper server 1010 is centrally located.
Figure 4 is a flow diagram showing the steps that take place in the relaying of information from the source computer system 1002 to the recipient system 1009. Figure 5 shows the concomitant movement of data between components of the infrastructure 1001. Figure 6 shows in block form data records used at various stages of the relaying of information.
Firstly, source computer system 1002 transmits initial data record 1200 (Figure 4) to deconstruct server 1004, within safe haven 1020a (step 1100). Initial data record 1200 includes data concerning events 1202 and also personal identifying information 1204, such as name and date of birth information, about the person to whom the data concerning events 1202 relates.
The deconstruct server 1004 then anonymises the initial data record 1200, producing anonymous data record 1210 (step 1102). To form anonymous data record 1210, personal identifying information 1204 is removed, generalised, approximated or categorised. In this example, the date of birth of a patient is replaced with the age range 1212 into which they fall.
The deconstruct server 1004 concomitantly relays the identifying information 1204 to the privacy gatekeeper server 1010 (step 1104). The privacy gatekeeper server 1010 uses the identifying information 1204 to establish the index indicium 1216 for to the person to whom initial data record 1200 pertains. This is achieved by querying master index data store 1012 using conventional database searching techniques (step 1106). Figure 6 illustrates part of a database 1218 recorded in master index data store 1012. Database 1218 comprises a plurality of records 1220 for individual patients, referenced by index indicia 1216. Each record 1220 includes identifying information 1222 and consent rules/data 1224 which specify what identifying information 1222 may be disclosed to particular recipient machines or users. By querying the database 1218, using identifying information 1204, the patient can be identified and the index indicium 1216 determined.
The encryption/decryption module 1014 then encrypts the index indicium 1216 to form the key 1214 (step 1108). The key 1214 is not stored in non-volatile memory, but is relayed 1110 immediately to the deconstruct server 1004 where it is included in anonymous data record 1210.
The privacy gatekeeper server 1010 checks 1112 to see if the identifying information 1204 which is received is different, more up-to-date information than the identifying information 1222 stored for the particular patient, identified by the index indicium 1216. If it is different, more up-to-date information, it updates the identifying information 1222 stored in master index data store 1012 (step 1114). If there is not already a record 1210 relating to the particular patient, a new one is automatically created.
The deconstruct server 1004 relays 1116 the anonymous data 1210 to the event server 1006. The event server 1006 relays the anonymous data 1210 onward to the reconstruct server 1008 according to an onward relaying schedule (step 1118). The anonymous data 1210 is also stored in an event data repository 1007 for use in data processing, analysis and reporting The data repository 1007 provides a useful archive of anonymous data 1210. Data processing rules defined in the event server 1006 specify actions, such as automatic processing, or analysis, for instance to gather statistics, which are carried out on received anonymous data 1210. Data distribution management software determines which reconstruct servers 1008 data should be relayed to with reference to configured rules. Once the anonymous data 1210 has been relayed to the reconstruct server 1008, the reconstruct server 1008 relays 1120 the key 1214 to the privacy gatekeeper server 1010 along with a request for which information selected from identifying information 1222 it requires for its purpose (step 1116). The encryption/decryption module 1014 then deencrypts the key 1214 to yield the index indicium 1216 for the person to whom the anonymous data 1210 relates.
The privacy gatekeeper server 1010 authenticates 1124 the reconstruct server 1008. In this example, a user of the reconstruct server 1008 uses a web browser incorporating secure socket layer (SSL) technology to securely log on to the privacy gatekeeper server 1010, providing a user ID and password allowing their identity to be authenticated, as is well known in the art.
Using the consent rules 1224 pertaining to the patient with the particular index indicium 1216 and the authenticated identity of the user, the privacy gatekeeper server 1010 establishes whether it is authorised to relay the requested identifying information to the reconstruct server 1008 (step 1126). If so, the requested identifying information 1226 is relayed to the reconstruct server 1008 (step 1128). Accordingly, the key 1214 has now been exchanged for the requested identifying information 1226.
Finally, the reconstruct server prepares a reconstructed data record 1228 comprising data 1230 from the source computer system 1002 and the requested identifying information 1226 (step 1130). This reconstructed data record 1228 is then available for relaying onwards to recipient system 1009 (step 1132).
By the above process, data has been relayed from the source computer system 1002 to the recipient system 1009. The anonymous data 1210 and identifying information 1204 were sent separately and could only be recombined by exchanging the key 1214 with the requested identifying information 1226 after authentication. All point-to-point transfers of data are by 128-bit triple-DES encrypted links as an additional layer of security. If an unauthorised party accesses the anonymous data 1210, they cannot identify to whom it relates. Even if the same unauthorised party finds a way to access the data stored in the master index storage device 1012, they still cannot match the anonymous data 1210 and the information identifying to whom it relates, as the anonymous data 1210 does not contain the index indicia 1216, but only an encrypted key 1214.
Figure 7 is a flow diagram showing the procedural steps in a related method carried out on the same apparatus, for establishing which data relates to a person, given identifying information pertaining to that person. Figure 8 illustrates the resulting flow of data.
Firstly, a user is authenticated (step 1300). In authentication, a user of the reconstruct server 1008 logs in to the privacy gatekeeper server 1010 using a web browser with a secure socket layer, providing a userid and password. If the user is authenticated (step 1302), the user submits identifying information, such as a name or health service number of a patient that is then relayed from the reconstruct server 1008 to the privacy gatekeeper server 1010 (step 1304).
The privacy gatekeeper server 1010 then retrieves the index indicia 1226 for the person whose identifying information has been submitted (step 1306). This is carried out by conventional database searching techniques.
The encryption deencryption module 1014 then encrypts the index indicia 1226 to form a key 1214 which is then relayed to the reconstruct server 1008 (step 1310). Thus, the reconstruct server 1008 and privacy gatekeeper server 1010 have exchanged the key 1214 and the identifying information.
The reconstruct server 1008 is now enabled to match anonymous data 1210 containing the retrieved key 1214 with identifying information. In an example embodiment including a plurality of disparate event servers 1006, the reconstruct server 1008 now requests data from each event server 1006 by supplying the key 1214 to each server 1006 (step 1312). Event servers 1006 having anonymous data 1210 including the key 1214 in their data repositories 1007 then relay that anonymous data to the reconstruct server 1008 for matching to the identifying information. Thus, a user can retrieve anonymous data 1210 relating to a particular individual.
A national or international infrastructure can be implemented, including deconstruct servers in locations such as clinics where patient data is created, reconstruct servers in locations, such as doctors or dental surgeries, hospital departments, ambulances etc. where patient data matched to patient identifying information is required, and event servers to store anonymous data for processing. The privacy gatekeeper server 1010 is located centrally where it can be carefully regulated. Consent rules 1224 stored in the master index data store 1212 are created with reference to default settings and in response to patient questionnaires.
All communication between servers is over HTTP or HTTPS ports. Data (including anonymous data and identifying information) is formatted according to the XML format, well known in the art. Servers implement SOAP (Simple Object Access Protocol), (SOAP is a trademark of Microsoft Corporation, Redmond, WA, USA) to minimise the complexity of organisational firewalls. The reconstruct server or a user thereof is authenticated for the purpose of determining authorisation to match the anonymous data to identifying information by means of the HTTPS protocol.
In particular, initial data record 1200 is transmitted to deconstruct server 1004 via SOAP over HTTP (step 1100). Similarly, reconstructed data record 1228 is relayed 1132 to recipient system 1009 via SOAP over HTTP. However, the following data is relayed via SOAP over HTTPS: anonymous data 1210 from deconstruct server 1004 to event server 1006; anonymous data 1210 from event server 1006 to reconstruct server 1008; identifying information 1204 from deconstruct server 1004 to privacy gatekeeper server 1010 and identifying information 1204 from privacy gatekeeper server 1010 to reconstruct server 1008.
Benefits of the example embodiment include:
1. The data repositories 1007 provide copies of all anonymous data sent from deconstruct servers to reconstruct servers, which can be used for analysis and reporting.
2. The event server 1006 can be an external organisation, allowing secure, confidential processing of anonymous data at the premises of third party IT consultants.
3. No personal identifying information is stored or handled by any system outside of a safe haven, and in particular identifying information is not stored or handled by event servers.
4. It is not possible to reconcile anonymous data 1210 with identifying information without recourse to the privacy gatekeeper server. Indeed, even with access to the master index storage means 1012, anonymous data 1210 cannot be matched to identifying information without recourse to the encryption/decryption module 1014, the key to which is stored securely .
5. Consent rules 1224 can be personalised for each patient, allowing patient control of their personal data.
Further alterations and modifications can be made by one skilled in the art within the scope of the invention herein disclosed.

Claims

Claims
1. A method of storing information concerning an entity, the method comprising the steps executed by computing means of:
receiving information concerning an entity;
receiving (with or separately to the information concerning an entity) an identifier of the entity which the information concerns;
causing key preparation computing means to prepare a key dependent on the identifier of the entity using an irreversible computational procedure; and
storing, in first data storage means, relatably to the key prepared by the key preparation computing means, some or all of the received information concerning the entity, but not any information identifying the entity.
2. A method according to claim 1, wherein the irreversible computational procedure includes a hash step.
3. A method according to claim 1 or claim 2, wherein the data stored in the first data storage means comprises a record, the records comprise the information concerning the entity and the key prepared by the key preparation computing means, but not any information identifying the entity.
4. A method according to any one preceding claim, wherein the information concerning an entity is healthcare data relating to a person.
5. A method according to any one preceding claim, further comprising the step of receiving information concerning the identity of an entity, and retrieving the identifier of the identity from an identifying information storage means.
6. A method according to any one preceding claim, during which the key is not stored in a non- volatile storage means other than in the first data storage means.
7. A method according to any one preceding claim wherein the first data storage means stores medical information in an unencrypted form.
8. A method according to any one preceding claim, wherein method further comprises the step of generating a data identifier of at least some of the information concerning an entity, storing that data identifier in a second data storage means relatably to the information concerning the entity and the key dependent on the index indicium, and storing in a data identifier database an identifier of the entity which the information concerns and a value determined from the data identifier using an irreversible computational procedure.
9. A method of res~blving the dentify of an entity which data stored by the method of claim 8 concerns, comprising the steps of retrieving from the second data storage means a data identifier related to information concerning the entity, determining a value from the data identifier using the irreversible computational procedure and then searching the data identifier database for the corresponding identifier of the entity.
10. Computer apparatus for storing and retrieving information concerning entities, the apparatus comprising:
means to receive information concerning an entity; key preparation computing means operable to receive an identifier of an entity which received information concerns, and to prepare a key dependent on the identifier of the entity using an irreversible computational procedure; and
first data storage means, configured to store some or all of the received information, but not any information identifying the entity which the information concerns, relatably to the key prepared by the key preparation computing means.
11. Computer apparatus according to claim 10, further comprising computing means configured to ensure that the information concerning the entity which is stored in the first data storage means does not include information identifying the entity.
12. Apparatus according to claim 10 or claim 11, wherein the first data processing means is configured to receive the information concerning an entity, transmit some or all of the information concerning the entity to the first data storage means, and cause the key preparation computing means to transmit a key to the first data storage means for storage relatably to the information concerning the entity which is transmitted to the first data storage means.
13. Computer apparatus according to any one of claims 10 to 12 further comprising an identifying information database including identifying data concerning a plurality of entities, relatably to the identifier of the entity.
14. Computer apparatus according to any one of claims 10 to 13, wherein the information concerning an entity is healthcare data concerning a person.
15. Computer apparatus according to any one of claims 10 to 14, further comprising data identifier generation means operable to generate a data identifier of at least some of the information concerning an entity; second data storage means configured to store the data identifier relatably to the at least some of the information concerning the entity; and means for storing in a data identifier database an identifier of the entity which the at least some of the information concerns and a value determined from the data identifier using an irreversible computational procedure.
16. Computer apparatus according to claim 15 wherein the data identifier generation means and the means for storing in a data identifier database are comprised within the key preparation computing means.
17. Computer apparatus according to claim 15 or claim 16, wherein the second data storage means is configured to store the key dependent on the index indicium relatably also to the information concerning the entity.
18. Computer apparatus for storing information concerning entities, the apparatus comprising:
means to receive information concerning an entity;
data identifier generation means operable to generate a data identifier of the information concerning the entity;
data storage means configured to store the data identifier relatably to the information concerning the entity, but not any information identifying the entity; and
means for storing in a data identifier database an identifier of the entity which the at least some of the information concerns and a value determined from the data identifier using an irreversible computational procedure.
19. A method of storing information concerning an entity, the method comprising the steps executed by computing means of: receiving information concerning an entity;
receiving (with or separately to the information concerning an entity) an identifier of the entity which the information concerns;
generating a data identifier of the information concerning the entity;
storing in data storage means the data identifier relatably to the information concerning the entity, but not any information identifying the entity; and
storing in a data identifier database an identifier of the entity which the information concerns and a value determined from the data identifier using an irreversible computational procedure.
20. A method of transferring data related to an entity and identifying information identifying the entity, the method comprising the steps of:
ensuring that the data does not include said identifying information, and is hence anonymous, but does include a key, the" key being relatable to- identifying information identifying the entity to which the data relates by privacy control computing means;
receiving the anonymous data at a recipient computing means;
before, during or after said receiving, determining whether the recipient computing means or a user thereof is authorised to match the data to the identifying information and, if so,
exchanging the key and the identifying information between the privacy control computing means and the recipient computing means separately from said data; thereby enabling the recipient computing means to match the anonymous data and the identifying information identifying said entity.
21. A method according to claim 20 wherein the key is prepared by privacy control computing means from an identifier of the entity using a reversible computational procedure.
22. A method according to claim 20 wherein the key is prepared by privacy control computing means from an identifier of the entity using a irreversible computational procedure.
23. A method according to any one of claims 20 to 22, wherein the same key is used consistently in relation to the same entity.
24. A method according to any one of claims 20 to 23, wherein the step of exchanging the key and the identifying information takes place before the receipt of the anonymous data, including the key.
25. A method according to any one of claims 20 to 23", whereifTthe"step" of exchanging the key and the identifying information takes place after the receipt of the anonymous data, including the key.
26. A method according to any one of claims 20 to 25, wherein the key is relayed from the recipient computing means to the privacy control computing means after the recipient computing means has received said data, the privacy control computing means then relating the key to the entity and then relaying identifying information concerning that entity to the recipient computing means.
27. A method according to any one of claims 20 to 25, wherein the recipient computing means first receives identifying information about an entity and then, if authorised, exchanges that identifying information for the key relating to the same entity with the privacy control computing means.
28. A method according to claim 27 wherein the recipient computing means queries a plurality of data storage means for anonymous data including the key, and then retrieves that anonymous data.
29. A method according to any one of claims 20 to 28, wherein matching the anonymous data to the identifying information includes the step of preparing a further data record including both the anonymous data and the identifying information at a reconstruct server.
30. A method according to any one of claims 20 to 29, wherein the privacy control computing means is operably connected to an entity information database which contains records of identifying information concerning a plurality of entities.
31. A method according to any one of claims 20 to 30, wherein each entity is referenced by an index indicium for an index of identifying information retained on the privacy control computing means and the anonymous data isTricapable of heirig matched with the identifying information without reference to the index, and thus without access to the privacy control computing means.
32. A method according to claim 31 wherein the indicium is encrypted by the privacy control computing means to create the key.
33. A method according to any one of claims 20 to 32, wherein the key is never stored in non-volatile storage means by the privacy control computing means, but is created only when required.
34. A method according to any one of claims 20 to 33, wherein the information exchanged for the key by the deconstruction means and the recipient computing means may be different.
35. A method according to any one of claims 20 to 34, wherein the entities are patients of a healthcare system.
36. A method according to any one of claims 20 to 35, wherein the anonymous data includes data derived from identifying information in the data source computing means.
37. A method according to any one of claims 20 to 36, wherein determining whether the recipient computing means or a user thereof is authorised to match the data to the identifying information takes place with reference to a consent register which includes data and/or rules specific to individual entities or groups of entities.
38. A method according to any one of claims 20 to 37, wherein anonymous data is relayed from the data source computing means to the recipient computing means at a different time to when associated "idehtifying^infδrmation is relayed to" the" privacy control computing means.
39. A privacy control computing means comprising storage means for storing an entity information database in which identifying information relating to a particular entity is stored with reference to an index indicium; authorisation determining means operable to determine whether a recipient computing means or a user thereof is authorised to exchange identifying information relating to an entity for a key exchange means operable to exchange identifying information relating to an entity for a key with a recipient computing means if authorised; and relation computing means operable to relate a key to an entity.
40. Recipient computing means comprising: means to receive anonymous data including a key, and exchange means operable to exchange the key for identifying information with a privacy control computing means responsive to the receipt of the key, wherein the exchange means is adapted to provide authentication information (such as a userid and password) to the privacy control computing means.
41. Deconstruction computing means comprising receiving means for receiving data relating to an entity including identifying information, exchange means for exchanging the received identifying information for a key relating to the entity with a privacy control computing means, and anonymisation means for preparing anonymous data comprising received data and the key, but not the received identifying information.
42. Data transmission apparatus, the apparatus comprising a privacy control computing means according to claim 39, recipient computing means according to claim 40 and deconstruction computing means according to claim 41 operatively connected to carry out the method of claim 20.
PCT/GB2003/004262 2002-10-03 2003-10-03 Method and apparatus for secure data storage WO2004031922A2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
AU2003274302A AU2003274302A1 (en) 2002-10-03 2003-10-03 Method and apparatus for secure data storage

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
GB0222896A GB0222896D0 (en) 2002-10-03 2002-10-03 Method of and apparatus for transferring data
GB0222896.3 2002-10-03

Publications (2)

Publication Number Publication Date
WO2004031922A2 true WO2004031922A2 (en) 2004-04-15
WO2004031922A3 WO2004031922A3 (en) 2004-09-16

Family

ID=9945210

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2003/004262 WO2004031922A2 (en) 2002-10-03 2003-10-03 Method and apparatus for secure data storage

Country Status (3)

Country Link
AU (1) AU2003274302A1 (en)
GB (1) GB0222896D0 (en)
WO (1) WO2004031922A2 (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006111205A1 (en) 2005-04-22 2006-10-26 Daon Holdings Limited A system and method for protecting the privacy and security of stored biometric data
US7522751B2 (en) 2005-04-22 2009-04-21 Daon Holdings Limited System and method for protecting the privacy and security of stored biometric data
WO2017102390A1 (en) * 2015-12-16 2017-06-22 Cbra Genomics, S.A. Genome query handling
FR3067158A1 (en) * 2017-06-01 2018-12-07 Ineo METHOD FOR PROCESSING HYBRID DIGITAL DATA
US11688015B2 (en) 2009-07-01 2023-06-27 Vigilytics LLC Using de-identified healthcare data to evaluate post-healthcare facility encounter treatment outcomes

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5606610A (en) * 1993-11-30 1997-02-25 Anonymity Protection In Sweden Ab Apparatus and method for storing data
EP0884670A1 (en) * 1997-06-14 1998-12-16 International Computers Limited Secure database
WO2001018631A1 (en) * 1999-09-02 2001-03-15 Medical Data Services Gmbh Method for anonymizing data

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5606610A (en) * 1993-11-30 1997-02-25 Anonymity Protection In Sweden Ab Apparatus and method for storing data
EP0884670A1 (en) * 1997-06-14 1998-12-16 International Computers Limited Secure database
WO2001018631A1 (en) * 1999-09-02 2001-03-15 Medical Data Services Gmbh Method for anonymizing data

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006111205A1 (en) 2005-04-22 2006-10-26 Daon Holdings Limited A system and method for protecting the privacy and security of stored biometric data
US7522751B2 (en) 2005-04-22 2009-04-21 Daon Holdings Limited System and method for protecting the privacy and security of stored biometric data
AU2005330619B2 (en) * 2005-04-22 2011-08-11 Daon Technology A system and method for protecting the privacy and security of stored biometric data
US11688015B2 (en) 2009-07-01 2023-06-27 Vigilytics LLC Using de-identified healthcare data to evaluate post-healthcare facility encounter treatment outcomes
WO2017102390A1 (en) * 2015-12-16 2017-06-22 Cbra Genomics, S.A. Genome query handling
US10726155B2 (en) 2015-12-16 2020-07-28 Cbra Genomics, S.A. Genome query handling
FR3067158A1 (en) * 2017-06-01 2018-12-07 Ineo METHOD FOR PROCESSING HYBRID DIGITAL DATA

Also Published As

Publication number Publication date
AU2003274302A8 (en) 2004-04-23
GB0222896D0 (en) 2002-11-13
AU2003274302A1 (en) 2004-04-23
WO2004031922A3 (en) 2004-09-16

Similar Documents

Publication Publication Date Title
AU2023204296B2 (en) Encrypted userdata transit and storage
US6874085B1 (en) Medical records data security system
US9390228B2 (en) System and method for securely storing and sharing information
US7908487B2 (en) Systems and methods for public-key encryption for transmission of medical information
TW510997B (en) Privacy and security method and system for a world-wide-web site
AU761680B2 (en) A secure database management system for confidential records
US8627107B1 (en) System and method of securing private health information
US20070192139A1 (en) Systems and methods for patient re-identification
CN101002417A (en) System and method for dis-identifying sensitive information and assocaites records
US20170083713A1 (en) Data encryption scheme using symmetric keys
US10929509B2 (en) Accessing an interoperable medical code
EP3219048A1 (en) System and method for securely storing and sharing information
WO2004031922A2 (en) Method and apparatus for secure data storage
Singh et al. Intelligent Cryptography Approach on Identity Based Encryption (IBE) for Secured Distributed EHR Data Storage in Cloud Computing
CN115100008A (en) Sanitation information interaction auditing platform and auditing method based on block chain
Aboelfotoh An ecosystem for improving the quality of personal health records
IL293412A (en) Encrypted userdata transit and storage
Almarzooqi A Security Scheme for Cloud Based Electronic Health Systems
Miguel Ferreira Guimaraes Pedrosa

Legal Events

Date Code Title Description
AK Designated states

Kind code of ref document: A2

Designated state(s): AE AG AL AM AT AU AZ BA BB BG BR BY BZ CA CH CN CO CR CU CZ DE DK DM DZ EC EE EG ES FI GB GD GE GH GM HR HU ID IL IN IS JP KE KG KP KR KZ LC LK LR LS LT LU LV MA MD MG MK MN MW MX MZ NI NO NZ OM PG PH PL PT RO RU SC SD SE SG SK SL SY TJ TM TN TR TT TZ UA UG US UZ VC VN YU ZA ZM ZW

AL Designated countries for regional patents

Kind code of ref document: A2

Designated state(s): GH GM KE LS MW MZ SD SL SZ TZ UG ZM ZW AM AZ BY KG KZ MD RU TJ TM AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HU IE IT LU MC NL PT RO SE SI SK TR BF BJ CF CG CI CM GA GN GQ GW ML MR NE SN TD TG

121 Ep: the epo has been informed by wipo that ep was designated in this application
122 Ep: pct application non-entry in european phase
NENP Non-entry into the national phase in:

Ref country code: JP

WWW Wipo information: withdrawn in national office

Country of ref document: JP