WO2020212611A1 - Method and system for transmitting combined parts of distributed data - Google Patents

Method and system for transmitting combined parts of distributed data Download PDF

Info

Publication number
WO2020212611A1
WO2020212611A1 PCT/EP2020/060927 EP2020060927W WO2020212611A1 WO 2020212611 A1 WO2020212611 A1 WO 2020212611A1 EP 2020060927 W EP2020060927 W EP 2020060927W WO 2020212611 A1 WO2020212611 A1 WO 2020212611A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
server
user
data element
user device
Prior art date
Application number
PCT/EP2020/060927
Other languages
French (fr)
Inventor
Baher AL HAKIM
Bassel ALKHATIB
Makram SALEH
Mouhamad KAWAS
Rafael VARTIAN
Firas ATAYA
Hazem ATAYA
Original Assignee
Medicus Ai Gmbh
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Medicus Ai Gmbh filed Critical Medicus Ai Gmbh
Publication of WO2020212611A1 publication Critical patent/WO2020212611A1/en

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H40/00ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices
    • G16H40/20ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices for the management or administration of healthcare resources or facilities, e.g. managing hospital staff or surgery rooms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes
    • G06F21/6254Protecting personal data, e.g. for financial or medical purposes by anonymising data, e.g. decorrelating personal data from the owner's identification
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H40/00ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices
    • G16H40/60ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices for the operation of medical equipment or devices
    • G16H40/67ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices for the operation of medical equipment or devices for remote operation
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H80/00ICT specially adapted for facilitating communication between medical practitioners or patients, e.g. for collaborative diagnosis, therapy or health monitoring
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/07User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail characterised by the inclusion of specific contents
    • H04L51/10Multimedia information
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/42Mailbox-related aspects, e.g. synchronisation of mailboxes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/04Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
    • H04L63/0407Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the identity of one or more communicating identities is hidden
    • H04L63/0421Anonymous communication, i.e. the party's identifiers are hidden from the other party or parties, e.g. using an anonymizer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/606Protecting data by securing the transmission between two devices or processes
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/04Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks
    • H04L63/0428Network architectures or network communication protocols for network security for providing a confidential data exchange among entities communicating through data packet networks wherein the data content is protected, e.g. by encrypting or encapsulating the payload
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/01Protocols
    • H04L67/12Protocols specially adapted for proprietary or special-purpose networking environments, e.g. medical networks, sensor networks, networks in vehicles or remote metering networks

Definitions

  • the invention relates to data analysis for privacy-sensitive or otherwise confidential data on distributed systems.
  • selective information delivery to users is usually performed by applying the selection criteria centrally in order to extract the contact data, e.g. postal or electronical addresses, of the people to contact or to deliver information to.
  • the information is then transmitted to the user, using the extracted contact data.
  • the procedure is equivalent: Even if information is masked or removed at the information source in order to protect the user's privacy or to protect data that may be confidential in any way, this must be performed at the data-requesting party, the data source or in between.
  • the user often has no inherent control of the use of their data, but they have to trust entities holding their data (e.g. a hospital) to ensure the protection of their privacy when they provide their contact data or when they deliver other information such as health data to third parties. Furthermore, even if privacy-sensitive or confidential data is masked or removed, it is at least once available together with the finally requested information (before being removed).
  • entities holding their data e.g. a hospital
  • privacy-sensitive or confidential data is masked or removed, it is at least once available together with the finally requested information (before being removed).
  • a supplementary problem in this context is the storage of medical data.
  • Medical data is usually either saved in a distributed way - so basically, every health care provider holds their own data concerning a patient. In this case, to retrieve information that is not comprehensively stored at one single health care provider, it will be necessary to match this data. If all patient data is stored centrally instead, the protection of the patient's privacy is quite difficult as the data is available at least to the processing unit belonging to the central storage system.
  • US20090150362A1 discloses a double blinded privacy-safe distributed data mining protocol is disclosed, among an aggregator, a data consumer entity having privacy- sensitive information, and data source entities having privacy-sensitive information. The aggregator does not have access to the privacy-sensitive information at either the data consumer entity or the data source entities.
  • the aggregator formulates a query without using privacy-sensitive information, and sends the query to the data consumer entity.
  • the data consumer entity generates a list of specific instances that meet the conditions of the query and sends the list, encrypted, to the data source entities either directly or through the aggregator.
  • the data source entities match the list against transactional data, de-identify the matched results, and send them to the aggregator.
  • the aggregator combines results from data source entities and sends the combined result to the data consumer entity. This allows for privacy-safe data mining where both the data consumer entity and data source entities have privacy-sensitive information not available for the aggregator to see or use.
  • US20020116227A1 discloses a method for searching for medical information executed by one or more computers comprising the steps of formulating a request for medical information concerning an individual or group of individuals, transmitting a record request to a record facilitator, the record facilitator determining which patient record sources to investigate, a record query being sent from the facilitator to the patient record sources which are appropriate, receiving a patient record report back from the patient record sources, normalizing and augmenting the patient record report before forwarding it back to the requester, and de-identifying the patient record to remove any identifying information.
  • US7823207B2 discloses a privacy preserving data-mining protocol, between a secure “aggregator” and “sources” having respective access to privacy-sensitive micro-data, the protocol including : the “aggregator” accepting a user query and transmitting a parameter list for that query to the "sources” (often including privacy-problematic identifiable specifics to be analyzed); the “sources” then forming files of privacy-sensitive data-items according to the parameter list and privacy filtering out details particular to less than a predetermined quantity of micro-data-specific data-items; and the “aggregator” merging the privacy-filtered files into a data-warehouse to formulate a privacy-safe response to the user— even though the user may have included privacy-problematic identifiable specifics.
  • US20090326981A1 provides a system and/or a method that facilitates collecting a portion of health data from a collection of users.
  • An interface component can receive health data communicated from a collection of users, wherein each user within the collection is associated with a respective portion of health data.
  • a verification component can authenticate at least one of a transmission source of the portion of health data, an ownership between a portion of health data and a user, an integrity level associated with the portion of health data, or a user submitting the portion of health data.
  • a collection component can aggregate authenticated health data into a semantic data store in which the health data is indicative of a raw and unmolested source of health information from the collection of users. The collection component can further organize the health data to facilitate identification of a medical related trend.
  • US20060069957A1 discloses a communication system processing element comprises a processor coupled to a memory and implements at least a portion of a distributed expert system.
  • the distributed expert system is arranged in at least two hierarchical levels, including an upper level comprising a central controller, and a lower level comprising a plurality of local agents each associated with one or more communication devices of the system.
  • US6397224B1 discloses a system for anonymously linking a plurality of data records, each data record comprising a plurality of elements for identifying an associated individual, includes a first identity reference encoding module configured to encode a first encoded identity reference from a first subset of the identifying elements of a data record; a second identity reference encoding module configured to encode a second encoded identity reference from a second subset of the identifying elements of the data record; and an anonymization code assignment module configured to assign to each of the first and second encoded identity references an identical anonymization code for anonymously representing the individual associated with the data record.
  • US7543149B2 discloses a method for securing patient identity comprising accessing an electronic medical records database including patient data for a plurality of patients. Each patient in the electronic medical records database is assigned a unique patient identifier. Patient data for a first patient, including a first patient identifier, is retrieved from the electronic medical records database. The first patient is de-identified from the patient data. De-identifying includes the creation of a first encoded patient identifier responsive to the first patient identifier. The de-identifying results in de-identified first patient data and includes the replacement of the first patient identifier with the first encoded patient identifier. The de-identified first patient data is transmitted to a data warehouse system. The method further comprises identifying a second patient in response to receiving report data that includes a second encoded patient identifier from the data warehouse system. The identifying includes the creation of a second patient identifier responsive to the second encoded patient identifier.
  • the prior art approaches may be satisfactory in some regards, they have certain shortcomings and disadvantages. More specifically, the prior art discloses either to store multiple patient records or parts thereof at single health care providers or at a central entity. In the former case of records that are saved in a distributed way, the data needs to be matched by patient, especially if a comprehensive medical record (or excerpts or derived information that is not saved at one health care provider) is needed. In the latter case of a central data storage, the data is available to the central entity in any case. Also, the patient has no control of their data and must trust the operator of the storage units - whether it be central, or whether it be locally stored at each health care provider. The user has no means to directly control their data at these storage entities.
  • the patient or user does not control the data storage entity. Furthermore, their data is available to the storage entity and the storage entity will technically govern the access of third parties to the patient's data.
  • the storage entity/ies can be all entities that hold patient data - health care providers, central registers, employers as far as they need to keep a medical record, etc, also including IT-providers operating transmission, processing or storing services, hardware or systems for the aforementioned entities.
  • a method for sending combined parts of distributed data from user devices to at least one recipient comprises a device data storing step that comprises for each of at least one user device, storing user data relating to the respective user device on said respective user device.
  • the method also comprises a device sending step that comprises sending at least one data set from the at least one user device to a server.
  • the method further comprises a server receiving step that comprises receiving the at least one data set by the server.
  • the method also comprises a server packaging step that comprises combining data elements of the at least one received data set to at least one data container.
  • the user device may comprise a user terminal and/or a portable computing device such as a smartphone, a tablet, a laptop, a wearable device and/or the like.
  • the user devices may comprise a data storage, at least one processing component configured to execute a program in a suitable form and format and a communication component at least configured to communicate with a remote server.
  • the processing component can for example comprise processors, hardware accelerators and/or microcontrollers.
  • the data storage can comprise memory components, such as, main memory (e.g. RAM), cache memory (e.g. SRAM) and/or secondary memory (e.g. HDD, SDD).
  • main memory e.g. RAM
  • cache memory e.g. SRAM
  • secondary memory e.g. HDD, SDD
  • the user devices can comprise busses configured to facilitate data exchange between their components, such as, the communication between the data storage and the processing component.
  • the user device can comprise network interface controllers that can be configured to connect the data processing device to a network, such as, to the Internet.
  • the server can comprise a single server, a server system composed of multiple servers, and/or a program emulating the functionality of a server, running on a cloud computing platform or any system configured to implement the functionality of a server.
  • the server can comprise means of data processing, such as, processor units, hardware accelerators and/or microcontrollers.
  • the server can comprise memory components, such as, main memory (e.g. RAM), cache memory (e.g. SRAM) and/or secondary memory (e.g. HDD, SDD).
  • the server can comprise busses configured to facilitate data exchange between components of the server, such as, the communication between the memory components and the processing components of the server.
  • the server can comprise network interface cards that can be configured to connect the server to a network, such as, to the Internet.
  • the server can comprise user interfaces, such as:
  • output user interface such as screens or monitors configured to display visual data and/or speakers configured to communicate audio data
  • input user interface such as a camera, a microphone configured to capture audio data, a keyboard, a trackpad, mouse, touchscreen and/or joystick.
  • the server can also be configured to be controlled from another computer system, such as via a remote-desktop connection, via a secure shell connection (SSH) or the like.
  • SSH secure shell connection
  • the server can be a processing unit configured to carry out instructions of a program.
  • the server can be a system-on-chip comprising processing units, memory components and busses.
  • the server can be a processing unit or a system-on-chip that can be interfaced with a personal computer, a laptop, a pocket computer, a smartphone, a tablet computer and/or user interfaces (such as the upper-mentioned user interfaces).
  • User data may comprise any data related to the user of the user device. Multiple users may also be associated with one user device, where each individual user would then have a unique "user profile" or the like.
  • user data comprises, at least partially, medical data associated with the user. This can comprise results of various medical tests or procedures, diagnoses, measurements from fitness tracking or medical devices and the like.
  • the present method may advantageously allow to securely share parts of sensitive user data (such as for example medical data) with third parties (e.g. research institutions or the like) without compromising user privacy.
  • the user data may first be anonymised via a certain technique and sent from the user device to the server. There, the data may be stored until parts or all of it are needed by a third party. The data may then be anonymised again, via a different technique, and provided to the third party packaged into a data container.
  • the present method is useful for ensuring that user data is handled with utmost case and user privacy is respected, while the integrity of the data is preserved so that it can be further analysed and/or studied and/or otherwise used by third parties.
  • the user data can be specific to the respective user device.
  • the device data storing step can comprise storing medical data and wherein the user data can comprise medical data.
  • the device data storing step can comprise storing at least a part of the user data in a machine-interpretable form.
  • storing at least the part of the user data in a machine-interpretable form can comprise at least one of using a homogenous naming for fields and, for each field, encoding values with a same dimension unit.
  • the device data storing step can comprise storing at least partially automatically generated medical data that comprise at least one of at least one medical image, at least one result of a laboratory analysis of material originating from or expelled by the human body, and data from a sensing device that senses biometrical or medical data of the user.
  • Material originating from or expelled by the human body for example can comprise body fluids such as blood or urine, stool or tissue samples.
  • the at least partially automatically generated medical data can be automatically generated.
  • the device sending step can comprise a device processing step that comprises processing the at least one data set on the at least one device.
  • the device sending step can comprise a device data set selection step that can comprise selecting at least one data set from the user data on the at least one user device. In some embodiments, the device sending step can be performed by at least one of the at least one user device periodically and/or upon request by the server.
  • the server receiving step can comprise connecting at least one of the at least one user device at least at some points in time to the server.
  • the server receiving step can comprise storing server data on the server, wherein the server data can comprise at least a part of at least one of the at least one data set received by the server.
  • the server packaging step can comprises receiving at least one data request from at least one requesting party.
  • a request can comprise, for example a request for a specific type of medical data and/or a patient profile.
  • the server packaging step can comprise furthermore a server processing step.
  • the server packaging step can comprise furthermore a server data selection step that can comprise selecting the data elements of the at least one received data set to be combined to the at least one data container.
  • the server packaging step can comprise furthermore a server container releasing step that comprises preventing releasing at least one of the at least one data container before at least one container releasing condition is matched.
  • the at least one user device can comprise a plurality of user devices.
  • the user data on the at least one user device can comprise at least one data element.
  • This data element can comprise one or more of the following data : at least one numeric value, single selectable options from at least one list, multiple selectable options from at least one list, at least one time-stamped value, and at least one binary value.
  • the method can further comprise a device processing step that comprises processing at least one data element of at least one data set of the user data on at least one user device by the respective user device. Processing at least one data element may also be processing at least one value of the data element, if the data element comprises a plurality of values, such as a vector.
  • the device processing step can comprise, on at least one user device, at least one of removing information from at least a part of the user data and limiting a precision of at least a part of the user data. This can be achieved by measures such as by adding noise, by adding errors, by changing a data type of a value or by only indicating range selected that may be selected from a pre-defined set of ranges, wherein the values is. That is, the device processing step can anonymise data, or at least limit a traceability of data or inhibit direct linking of parts of data sets obtained by an adverse party, if these data sets all refer to a same user or user device.
  • the device processing step can comprise processing at least one numerical data element.
  • the at least one numerical data element can comprise a data element which comprises at least one numeric value.
  • the device processing step can furthermore comprise combining numerical noise and the numerical data element.
  • combining numerical noise and the at least one numerical data element can comprise adding the numerical noise to at least one of the at least one numeric value of the numerical data element.
  • combining numerical noise and the at least one numerical data element can comprise adding the numerical noise to the numeric value of the respective numeric data element and for at least one of the at least one numerical data element, limiting the numerical noise so that the respective numeric value does not exceed a pre defined interval (and/or a predefined threshold value).
  • the pre-defined interval or threshold value can be global, such as for the height of a user, wherein a minimum height may be fixed.
  • the interval may be selected, for example for a biomarker that has three ranges, "high”, “medium” and “low", if a value of the biomarker was in the high-range, the biomarker may be limited to said range.
  • the numerical noise can be generated by a Laplace- distribution with an appropriate scaling. This can be particularly advantageous to provide sufficient anonymity to user data, while maintaining its statistical properties.
  • a probability density function of a variable that is added as noise can optionally be given by the following formula with appropriate m and b.
  • the device processing step can comprise processing at least one data element by converting a representation of the data element from a first encoding to a second encoding. That can also comprise changing a part of an encoding, for example an encoding of a quantity of consumed cigarettes per day if a data element comprises a quantity of consumed cigarettes per day and a timestamp.
  • the first and the second encoding can be, at least for some values of the data element, not equivalent and converting a representation of the data element in the second encoding can comprise using an appropriate random function. This can be the case if a range A in a first encoding alpha (for example corresponding to "high") corresponds to two values B and B* in a second encoding beta (for example "critically high” and "over-average”).
  • the device processing step can comprise processing at least a timestamped data element.
  • the at least one timestamped data element can comprise a data element which comprises at least one timestamped value.
  • the device processing step can then comprise replacing a timestamp of at least one of the at least one timestamped value.
  • the step of replacing a timestamp of the at least one of the at least one timestamped value can comprise limiting the precision of said timestamp.
  • the step of replacing a timestamp of the at least one of the at least one timestamped value can comprise replacing said timestamp by a temporal distance relative to another point in time, such as a timestamp of another data element.
  • the method can further comprise limiting the precision of said temporal distance relative to another point in time.
  • the device processing step can comprise an operation to anonymize at least a part of the user data on at least one user device.
  • At least one data element of the at least one data element can comprise at least one of the following data : at least one numeric value, single selectable options from at least one list, multiple selectable options from at least one list, at least one time-stamped value, and at least one binary value.
  • the server processing step can comprise processing at least one data element of at least one data set of the server data on the server.
  • the server processing step can comprise at least one of removing at least one of removing information from at least a part of the user data and limiting a precision of at least a part of the respective data element. This can be achieved by measures such as by adding noise, by adding errors, by changing a data type of a value or by only indicating range selected that may be selected from a pre-defined set of ranges, wherein the values is. That is, the server processing step can anonymise data, or at least limit a traceability of data or inhibit direct linking of parts of data sets obtained by an adverse party.
  • the server processing step can comprise processing at least one numerical data element.
  • the at least one numerical data element can be a data element which comprises at least one numeric value.
  • the server processing step can furthermore comprise combining numerical noise and the numerical data element.
  • combining numerical noise and the at least one numerical data element can comprise adding the numerical noise to at least one of the at least one numeric value of the numerical data element.
  • combining numerical noise and the at least one numerical data element can comprise adding the numerical noise to the numeric value of the respective numeric data element and for at least one of the at least one numerical data element, limiting the numerical noise so that the respective numeric value does not exceed a pre-defined interval.
  • the numerical noise can be generated by a Laplace- distribution with an appropriate scaling. This can be particularly advantageous to provide sufficient anonymity to user data, while maintaining its statistical properties.
  • a probability density function of a variable that is added as noise can optionally be given by the following formula with appropriate m and b.
  • the server processing step can comprise processing at least one data element by converting a representation of the data element from a first encoding to a second encoding. That can also comprise changing a part of an encoding, for example an encoding of a quantity of consumed cigarettes per day if a data element comprises a quantity of consumed cigarettes per day and a timestamp.
  • the first and the second encoding can be, at least for some values of the data element, not equivalent and converting a representation of the data element in the second encoding can comprise using an appropriate random function. This can be the case if a range A in a first encoding alpha (for example corresponding to "high") corresponds to two values B and B* in a second encoding beta (for example "critically high” and "over-average”).
  • the server processing step comprises processing at least a timestamped data element
  • the at least one timestamped data element can be a data element which comprises at least one timestamped value.
  • the server processing step can comprise replacing a timestamp of at least one of the at least one timestamped value.
  • the step of replacing a timestamp of the at least one of the at least one timestamped value can comprise limiting the precision of said timestamp. In some other embodiments, the step of replacing a timestamp of the at least one of the at least one timestamped value can comprise replacing said timestamp by a temporal distance relative to another point in time, such as a timestamp of another data element.
  • the method can further comprise limiting the precision of said temporal distance relative to another point in time.
  • the server processing step can comprise an operation to anonymize at least a part of the server data.
  • the at least one data set in the device data set selection step can be selected only from a pre-defined part of user data on the user device. This part can for example exclude identifying data, such as contact data, a user's address or at least a part thereof and/or his payment data.
  • the server data selection step can comprise receiving at least one data request.
  • the data request can be received from third parties, such as research partners.
  • the at least one data request comprises a data request condition and a first list of fields.
  • the data request condition is a condition that specifies criteria for users that are relevant for the third party or the research partner. Technically, it is a condition that needs to be matched for data to be selected.
  • the first list of fields lists a minimum of data elements necessary for the purpose of the third party, such as a research purpose for a third party.
  • the at least one data request can be a plurality of data requests.
  • each of the at least one data container is specific to a respective data request. That is, each data container comprises the data corresponding to the respective data request.
  • the server data can comprise at least one data element group, wherein the at least one data element group comprises at least one data element and the at least one data element comprises a common group key that corresponds to the at least one data element group.
  • the common group key can also be linked to a data element by the data set to which the group key and the data element belong.
  • Each data element group can be understood as collection of data that have such a common group key element, so that the common group key defines a data profile.
  • the common group key can comprise a user device indicator. That is, the data element groups can be understood as anonymised profiles of users that collect the anonymised data that is sent by the users.
  • the user device indicator may also be the same for a plurality of devices if the user device receives a corresponding instruction, e.g. if a user changes his user device.
  • the at least one data element group can be a plurality of data element groups.
  • the server data selection step comprises evaluating for each data request at least one server selection condition by the server, wherein each server selection condition corresponds to one data request condition.
  • each server selection condition can comprise the corresponding data request condition. That is, the server can add at least one criterion to the data request condition, e.g. in order to protect the privacy of the users, as will be detailed below.
  • each server selection condition can comprise a condition regarding whether a data element group comprises at least some or all data elements indicated by the corresponding data request's first list of fields. This can be optionally advantageous to limit data element groups that match the server condition to data element groups that can be selected to respond to the data request.
  • each server selection condition can comprise a condition regarding a proportion of data elements indicated by the corresponding data request's first list of fields and a data element group's data elements.
  • a proportion can be a proportion such as a ration of a number of requested fields or data elements to a number for fields or data elements of the data element group.
  • the data selection step can comprise adding a selection flag to each data element that is selected during processing of a data request.
  • each server selection condition can comprise a condition referring to an amount of data elements of a data element group that are indicated by the corresponding data request's first list of fields and to which a selection flag was added.
  • a condition could for example be a maximum number of selection flags that may have been added to the data elements that are indicated by the first list of fields and that might therefore be used to match user profiles if an adverse party gains access to more than one data container.
  • each server selection condition can comprise a condition referring to a proportion of data elements of a data element group that are indicated by the corresponding data request's first list of fields and to which a selection flag was added to the data elements of a data element group that are indicated by the corresponding data request's first list of fields.
  • the advantage of the preceding paragraph can apply accordingly.
  • An example for said part of each server selection condition can be that the proportion of data elements that were previously shared with a research partner and all the data elements of the data element group is below 50%.
  • each server selection condition can comprise a condition regarding a maximum number of data element groups that are selected for the data request corresponding to the server selection condition.
  • the server data selection step can comprise for each data request, evaluating the server selection condition data element group-wise until a finishing condition is matched. That is, the server checks for each data element group whether it matches the respective server condition and selects it accordingly.
  • An optional advantage can be a limited processing time, as not all of the data element groups need to be checked. Furthermore, it can be easier to limit the selection to a part of the data element groups.
  • the server data selection step can comprise for each data request selecting data elements from the at least one data element group based on the data request and a result of evaluating the server selection condition for the respective data element group. That is, a result of verifying the server selection condition can be used as selection criterion for data element groups, as implied above.
  • selecting data elements from the at least one data element group based on the data request and a result of evaluating the server selection condition for the respective data element group can comprise selecting the data elements from the at least one data element group that are indicated by the first list of fields if the server selection condition was matched for the respective data element group.
  • the at least one data request can comprise a second list of fields.
  • selecting data elements from the at least one data element group based on the data request and a result of evaluating the server selection condition for the respective data element group can comprise selecting data elements from the at least one data element group that are indicated by the second list of fields if the server selection condition was matched for the respective data element group.
  • selecting data elements from the at least one data element group based on the data request and a result of evaluating the server selection condition for the respective data element group can comprise furthermore selecting data elements from the at least one data element group that are indicated by the second list of fields if the server selection condition was matched for the respective data element group, until the part of the server condition regarding the proportion of data elements indicated by the corresponding data request's first list of fields and a data element group's data elements is not matched anymore. That is, if the first list of fields specifies x data elements to be sent and by the aforementioned criterion regarding the proportion of data elements, y data elements may be selected, then for y>x, up to y-x data elements are selected according to the second list of fields.
  • an optional advantage can be that data element groups are considered that only comprise all data elements from the first list of fields and none or not all of the data elements from the second list of fields. So, this option allows the specification of optional data elements that are selected from the data element group, but does at the same time not limit the quantity of data element groups that match the server selection condition.
  • the at least one data request can comprise at least one further list of fields, such as a third list of fields.
  • selecting data elements from the at least one data element group based on the data request and a result of evaluating the server selection condition for the respective data element group can comprise furthermore selecting data elements from the at least one data element group that are indicated by the further list of fields, such as the third list of fields, when there are no data elements left that are indicated by the first and the second list of fields, if the server selection condition was matched for the respective data element group, until the part of the server condition regarding the proportion of data elements indicated by the corresponding data request's first list of fields and a data element group's data elements is not matched anymore.
  • This can have the same advantages as specified in the preceding paragraph.
  • the server data selection step can comprise finishing selecting data elements from a data element group when a condition referring to a proportion of data elements that are selected and a data element group's data elements is matched, such as the condition discussed above. That is, the data elements can be selected from the second and subsequently from the third list of fields as long as the proportion of selected data elements and available data elements of a respective data element group does fulfil a condition, such as not exceed a ratio of 50%.
  • the at least one container releasing condition can be verified for each of the at least one data container separately.
  • the at least one container releasing condition can comprise a minimum number of different data element groups from which data elements were selected for the respective data container.
  • the server container releasing step can comprise preventing releasing each of the at least one data container before the at least one container releasing condition is matched for the respective data container. That is that sharing a data container comprising data elements from too few data element groups can be prevented.
  • This can be optionally advantageous in a case where the absolute number of data element groups satisfying the data request condition or the server selection condition respectively is small.
  • a shared part of the user's data would be exposed and it would be possible to match said part of the user's data to the user is the data request condition is sufficiently specific and another party obtains knowledge about the specificity by other means.
  • an insurance company insuring patients with a very rare disease could thus obtain information on their customers. Avoiding such a scenario may be an advantage of the option discussed in this paragraph.
  • the server packaging step can comprise furthermore a server container releasing step that comprises preventing releasing each of the at least one data container respectively before at least one container releasing condition that is specific to the respective data container is matched. That is, the container releasing conditions can be adapted to the container and thus to the data request condition of the respective data request, for example depending on how specific the data request condition is.
  • At least one of the at least one container releasing condition can comprise a minimum number of different data element groups from which data elements were selected for the respective data container. That is, again for the example with data element groups having a user device indicator as key element, that data from a minimum number of users must be selected.
  • At least one of the at least one container releasing condition can comprise a condition regarding a uniqueness of data elements from data element groups that were selected for the respective data container.
  • the uniqueness can also be measured with a vectoral proximity measure or by fuzzy measures and does not need to be strict.
  • Many unique data elements can be an indicator for a high variety of data, which can be an optional advantage in particular for third parties or research partners that want to research a phenomenon without limiting themselves to special cases or for example if the data are used for data mining.
  • a system for sending combined parts of distributed data from user devices to at least one recipient comprises at least one user device configured to store user data relating to the respective user device on said respective user device and to send at least one data set from the at least one user device to a server.
  • the system also comprises at least one server configured to receive the at least one data set and combine data elements of the at least one received data set to at least one data container.
  • the system further comprises at least one data container configured to store data.
  • the various system elements can be as described above with respect to the method embodiments.
  • the present system may be particularly configured to execute or perform the method for selectively transmitting data as described in the above embodiments.
  • the user data can comprise medical data.
  • user data can comprise at least partially technical medical data.
  • the user device can be further configured to at least partially encode the technical medical data by replacing at least parts of the data by machine-interpretable expressions.
  • user data can comprise at least partially automatically generated medical data that can comprise at least one of at least one medical image, at least one result of a laboratory analysis of material originating from or expelled by the human body, and data from a sensing device that senses biometrical or medical data of the user.
  • the user device may even be configured to generate some of the original data such as images via the user device's sensors such as a camera (further sensors such as biometric sensors may also be used).
  • the system can further comprise at least one requesting party.
  • the requesting party may be interested in receiving parts or all of user data for further analysis and/or research (particularly medical research for example). However, the user data should be anonymised and protected from abuse while maintaining its statistical features.
  • the server can be configured to receive a data request from the requesting party.
  • the user device can be further configured to processing at least one data element of at least one data set of the user data.
  • the processing can comprise at least one of removing information from at least a part of the user data and limiting precision of at least a part of the user data.
  • the processing can further comprise processing at least one numerical data element, wherein the at least one numerical data element is a data element which comprises at least one numeric value, and wherein the device processing step comprises furthermore combining numerical noise and the numerical data element.
  • the server can be configured to perform an operation to anonymize at least a part of the server data.
  • Ml A method for sending combined parts of distributed data from user devices to at least one recipient, comprising
  • a device data storing step (DD) that comprises for each of at least one user device (11), storing user data (1) relating to the respective user device (11) on said respective user device (11),
  • a device sending step (DS) that comprises sending at least one data set (2) from the at least one user device (11) to a server (12),
  • a server receiving step (SR) that comprises receiving the at least one data set (2) by the server, and
  • a server packaging step that comprises combining data elements of the at least one received data set (2) to at least one data container (5).
  • the user data (1) are specific to the respective user device (11).
  • the device data storing step (DD) comprises storing medical data and wherein the user data (1) comprise medical data.
  • the device data storing step (DD) comprises storing at least a part of the user data (1) in a machine-interpretable form.
  • storing at least the part of the user data (1) in a machine-interpretable form comprises at least one of
  • the device data storing step (DD) comprises storing at least partially automatically generated medical data that comprise at least one of
  • the device sending step (DS) comprises a device processing step (DPS) that comprises processing the at least one data set on the at least one device.
  • DPS device processing step
  • the device sending step (DS) comprises a device data set selection step (DDS) that comprises selecting at least one data set (2) from the user data (1) on the at least one user device (11).
  • DDS device data set selection step
  • the device sending step (DS) is performed by at least one of the at least one user device (11) periodically and/or upon request by the server (12).
  • server receiving step (SR) comprises connecting at least one of the at least one user device (11) at least at some points in time to the server (12).
  • server receiving step (SR) comprises storing server data (6) on the server (12),
  • server data (6) comprise at least a part of at least one of the at least one data set (2) received by the server (12).
  • server packaging step (SP) comprises receiving at least one data request (4) from at least one requesting party (13).
  • server packaging step (SP) comprises furthermore a server processing step (SPS).
  • server packaging step comprises furthermore a server data selection step (SDS) that comprises selecting the data elements of the at least one received data set (2) to be combined to the at least one data container (5).
  • SDS server data selection step
  • server packaging step comprises furthermore a server container releasing step (SCR) that comprises preventing releasing at least one of the at least one data container (5) before at least one container releasing condition (25) is matched.
  • SCR server container releasing step
  • the user data (1) on the at least one user device (11) comprise at least one data element (3), wherein at least one data element (3) of the at least one data element (3) comprises at least one of the following data :
  • the method comprises a device processing step (DPS) that comprises processing at least one data element (3) of at least one data set (2) of the user data (1) on at least one user device (11) by the respective user device (11).
  • DPS device processing step
  • the device processing step comprises on at least one user device (11) at least one of removing information from at least a part of the user data (1) and limiting a precision of at least a part of the user data (1).
  • DPS device processing step
  • the at least one numerical data element is a data element (3) which comprises at least one numeric value
  • DPS device processing step
  • combining numerical noise and the at least one numerical data element comprises adding the numerical noise to at least one of the at least one numeric value of the numerical data element.
  • combining numerical noise and the at least one numerical data element comprises adding the numerical noise to the numeric value of the respective numeric data element and for at least one of the at least one numerical data element, limiting the numerical noise so that the respective numeric value does not exceed a pre-defined interval.
  • the device processing step comprises processing at least one data element (3) by converting a representation of the data element (3) from a first encoding to a second encoding.
  • first and the second encoding are at least for some values of the data element (3) not equivalent and converting a representation of the data element (3) in the second encoding comprises using an appropriate random function.
  • DPS device processing step
  • the at least one timestamped data element is a data element (3) which comprises at least one timestamped value
  • DPS device processing step
  • step of replacing a timestamp of the at least one of the at least one timestamped value comprises limiting the precision of said timestamp.
  • step of replacing a timestamp of the at least one of the at least one timestamped value comprises replacing said timestamp by a temporal distance relative to another point in time, such as a timestamp of another data element (3).
  • server data (6) on the server comprise at least one data element (3), wherein at least one data element (3) of the at least one data element (3) comprises at least one of the following data :
  • server processing step comprises processing at least one data element (3) of at least one data set (2) of the server data (6) on the server (12).
  • server processing step comprises at least one of removing information from at least a part of the server data (6) and limiting a precision of at least a part of the server data (6).
  • server processing step comprises processing at least one numerical data element
  • the at least one numerical data element is a data element (3) which comprises at least one numeric value
  • server processing step comprises furthermore combining numerical noise and the numerical data element.
  • combining numerical noise and the at least one numerical data element comprises adding the numerical noise to at least one of the at least one numeric value of the numerical data element.
  • combining numerical noise and the at least one numerical data element comprises adding the numerical noise to the numeric value of the respective numeric data element and for at least one of the at least one numerical data element, limiting the numerical noise so that the respective numeric value does not exceed a pre-defined interval.
  • server processing step comprises processing at least one data element (3) by converting a representation of the data element (3) from a first encoding to a second encoding.
  • first and the second encoding are at least for some values of the data element (3) not equivalent and converting a representation of the data element (3) in the second encoding comprises using an appropriate random function.
  • server processing step (SPS) processing at least a timestamped data element
  • the at least one timestamped data element is a data element (3) which comprises at least one timestamped value
  • server processing step comprises replacing a timestamp of at least one of the at least one timestamped value.
  • step of replacing a timestamp of the at least one of the at least one timestamped value comprises limiting the precision of said timestamp.
  • step of replacing a timestamp of the at least one of the at least one timestamped value comprises replacing said timestamp by a temporal distance relative to another point in time, such as a timestamp of another data element (3).
  • the at least one data set (2) in the device data set selection step (DDS) is selected only from a pre-defined part of user data (1) on the user device (11).
  • server data selection step (SDS) comprises receiving at least one data request (4).
  • the at least one data request (4) comprises a data request condition (29) and a first list of fields (20).
  • the at least one data request (4) is a plurality of data requests (4).
  • each of the at least one data container (5) is specific to a respective data request (4).
  • server data (6) comprise at least one data element group (7)
  • the at least one data element group (7) comprises at least one data element (3)
  • the at least one data element (3) comprises a common group key (8) that corresponds to the at least one data element group (7).
  • the common group key (7) comprises a user device indicator (UDI).
  • UMI user device indicator
  • the at least one data element group (7) is a plurality of data element groups (7).
  • server data selection step (SDS) comprises evaluating for each data request (29) at least one server selection condition (30) by the server,
  • each server selection condition (30) corresponds to one data request condition (29).
  • each server selection condition (30) comprises the corresponding data request condition (29).
  • each server selection condition (30) comprises a condition regarding whether a data element group (7) comprises at least some or all data elements (3) indicated by the corresponding data request's (4) first list of fields (20).
  • each server selection condition (30) comprises a condition regarding a proportion of data elements (3) indicated by the corresponding data request's (4) first list of fields (20) and a data element group's (7) data elements (3).
  • the data selection step comprises adding a selection flag (28) to each data element (3) that is selected during the processing of a data request (4).
  • each server selection condition (30) comprises a condition referring to an amount of data elements (3) of a data element group (7) that are indicated by the corresponding data request's (4) first list of fields (20) and to which a selection flag (28) was added.
  • each server selection condition (30) comprises a condition referring to a proportion of data elements (3) of a data element group (7) that are indicated by the corresponding data request's (4) first list of fields (20) and to which a selection flag (28) was added to the data elements (3) of a data element group (7) that are indicated by the corresponding data request's (4) first list of fields (20).
  • each server selection condition (30) comprises a condition regarding a maximum number of data element groups (7) that are selected for the data request (4) corresponding to the server selection condition (30).
  • server data selection step (SDS) comprises for each data request (4), evaluating the server selection condition (30) data element group-wise (7) until a finishing condition (31) is matched.
  • server data selection step (SDS) comprises for each data request (4) selecting data elements (3) from the at least one data element group (7) based on the data request (4) and a result of evaluating the server selection condition (30) for the respective data element group (7).
  • the at least one data request (4) comprises a second list of fields (21) and wherein selecting data elements (3) from the at least one data element group (7) based on the data request (4) and a result of evaluating the server selection condition (30) for the respective data element group (7)
  • the at least one data request (4) comprises a second list of fields (21) and
  • the at least one data request comprises at least one further list of fields, such as a third list of fields (22),
  • server data selection step comprises
  • the at least one container releasing condition (25) is verified for each of the at least one data container (5) separately and wherein the at least one container releasing condition (25) comprises a minimum number of different data element groups (7) from which data elements (3) were selected for the respective data container (5).
  • server container releasing step (SCR) comprises preventing releasing each of the at least one data container (5) before the at least one container releasing condition (25) is matched for the respective data container (5).
  • server packaging step comprises furthermore a server container releasing step (SCR) that comprises preventing releasing each of the at least one data container (5) respectively before at least one container releasing condition (25) that is specific to the respective data container (5) is matched.
  • SCR server container releasing step
  • At least one of the at least one container releasing condition (25) comprises a minimum number of different data element groups (7) from which data elements (3) were selected for the respective data container (5).
  • At least one of the at least one container releasing condition (25) comprises a condition regarding a uniqueness of data elements (3) from data element groups (7) that were selected for the respective data container (7).
  • At least one user device (11) configured to store user data (1) relating to the respective user device (11) on said respective user device (11) and to send at least one data set (2) from the at least one user device (11) to a server (12);
  • At least one server (12) configured to receive the at least one data set (2) and combine data elements of the at least one received data set (2) to at least one data container (5);
  • At least one data container (5) configured to store data.
  • user data (1) comprises at least partially automatically generated medical data that comprise at least one of
  • the server (12) is configured to receive a data request (4) from the requesting party.
  • the user device (11) is further configured to processing at least one data element (3) of at least one data set (2) of the user data (1).
  • processing comprises at least one of removing information from at least a part of the user data (1) and limiting precision of at least a part of the user data (1).
  • processing further comprises processing at least one numerical data element, wherein the at least one numerical data element is a data element (3) which comprises at least one numeric value, and wherein the device processing step (DPS) comprises furthermore combining numerical noise and the numerical data element.
  • DPS device processing step
  • server (12) is configured to perform an operation to anonymize at least a part of the server data (6).
  • Figure 1 depicts a part of the method comprising method steps that are performed on user devices and a step to send data to a server as well as receiving it there.
  • Figure 2 depicts a part of the method comprising method steps that are performed on a server, receiving data requests and releasing data to third parties.
  • Figure 1 shows as an example three user devices 11.
  • a device data storing step DD is performed on each of the three user devices 11.
  • the device data storing step comprises storing user data 1 on each user device 11.
  • the user data 1 are specific to at least one user of the respective user device 11.
  • the user data are in this example medical data. They can for example be a patient record of the user of the respective user device.
  • the device data storing step may be performed for a period of time during which other steps of the method can be performed.
  • the device data set selection step DDS can be performed while the respective user device 11 is still storing user data 1.
  • the device data set selection step comprises selecting a data set 2 from the user data 1, wherein the data set 2 comprises at least one or a plurality of data elements 3.
  • the device data set selection step selects no more than a maximum of 50 % of data elements 3 from the user data 1 for the data set 2.
  • the selected data are processed in the device data processing step DPS.
  • the device data processing step comprises anonymizing at least a part of the data in the data set 2.
  • anonymizing the encoding of data elements 3 can be changed, their precision can be limited, parts of the data can be removed and noise can be added.
  • the noise is preferably according to a Laplace-distribution with an appropriate scaling, as this distribution leaves the statistical features of a collection of such data sets 2 unchanged. Also, other anonymisation techniques can be applied.
  • the processed data sets 2 are then sent to the server 12 in a device sending step DS.
  • This step comprises sending the processed data sets from the respective device to the server 12.
  • the server receiving step SR comprises receiving the data sets 2 from the at least one or in this example three user devices 11.
  • the server receiving step can, as well as the device data storing step, be performed for a period of time during which other steps of the method can be performed, as a person skilled in the art will easily understand.
  • the device data set selection step DDS can be triggered periodically, on request of the server or by a different trigger.
  • the three device data set selection steps DDS are to be understood as an example of one or a plurality of device data set selection steps DDS per user device 11. Also, the user devices 11 do not need to perform the same number of device data set selection steps DDS and subsequent steps.
  • FIG. 2 shows the server 12, as an example three incoming data requests 4 and three outgoing data containers 5, whereas those might also be more or less data requests 4 and data containers 5.
  • the number of data requests 4 can be the same as the number of data containers 5, as each data container 5 corresponds to a data request 4.
  • the server receiving step SR generates server data 6.
  • the server packaging step SP comprises for each data request 4, performing a server data selection step SDS.
  • Each data request 4 comprises a data request condition 29 that needs to be satisfied by data of the server data 6 that are selected for the respective data container 5.
  • Each data request comprises furthermore at least a first list of fields 20 that specifies which fields of the server 6 data relating to a user device 11 that satisfy the request condition 29 are requested by the data request.
  • the first list of fields 20 is a mandatory list.
  • the server checks the server data 6 for corresponding data and selects data element groups 7 that match the data request condition 29 and that comprise at least data elements 3 that are specified by the first list of fields 20.
  • the method comprises a server processing step SPS that comprises processing the data that are selected in the server data selection step analogously to the processing on the user device 11.
  • the method further comprises a server container releasing step SCR that comprises preventing releasing the respective data container 5 before a certain number of data element groups 7 was added to the container to prevent a de-anonymisation of a user in case that the data request condition 20 is only matched by few data element groups 7 at all.
  • the data container After the certain number of data element groups 7 was added to the respective data container 5, the data container is released. It can be sent to a third party such as a research partner, but it can also only be made available to said third party who then accesses it on the server 12.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Epidemiology (AREA)
  • Public Health (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Primary Health Care (AREA)
  • Business, Economics & Management (AREA)
  • General Business, Economics & Management (AREA)
  • Computer Security & Cryptography (AREA)
  • Bioethics (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Multimedia (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Pathology (AREA)
  • Information Transfer Between Computers (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)
  • Storage Device Security (AREA)

Abstract

Disclosed are a method and system for sending combined parts of distributed data from user devices to at least one recipient. The method comprises a device data storing step (DD) that comprises for each of at least one user device (11), storing user data (1) relating to the respective user device (11) on said respective user device (11), a device sending step (DS) that comprises sending at least one data set (2) from the at least one user device (11) to a server (12), a server receiving step (SR) that comprises receiving the at least one data set (2) by the server, and a server packaging step (SP) that comprises combining data elements of the at least one received data set (2) to at least one data container (5). The system comprises at least one user device (11) configured to store user data (1) relating to the respective user device (11) on said respective user device (11) and to send at least one data set (2) from the at least one user device (11) to a server (12); at least one server (12) configured to receive the at least one data set (2) and combine data elements of the at least one received data set (2) to at least one data container (5); and at least one data container (5) configured to store data.

Description

Method and system for transmitting combined parts of distributed data
Field
The invention relates to data analysis for privacy-sensitive or otherwise confidential data on distributed systems.
Background
It is a known task to analyze user data with datamining techniques. There is often a trade-off to be made between the quality of the data and the privacy of the users, if the original or the generated data are privacy-relevant, such as medical data for example.
If data is saved centrally (for example in a hospital's medical records of their patients or in the electoral register), selective information delivery to users is usually performed by applying the selection criteria centrally in order to extract the contact data, e.g. postal or electronical addresses, of the people to contact or to deliver information to. The information is then transmitted to the user, using the extracted contact data. To acquire data of a certain user group for the purpose of data mining, the procedure is equivalent: Even if information is masked or removed at the information source in order to protect the user's privacy or to protect data that may be confidential in any way, this must be performed at the data-requesting party, the data source or in between.
The user often has no inherent control of the use of their data, but they have to trust entities holding their data (e.g. a hospital) to ensure the protection of their privacy when they provide their contact data or when they deliver other information such as health data to third parties. Furthermore, even if privacy-sensitive or confidential data is masked or removed, it is at least once available together with the finally requested information (before being removed).
Especially for the purpose of medical research, the conflict of interest between the preservation of privacy of the patients and the use of patient data, especially in cases where information from different sources should be linked, is a disputed issue.
A supplementary problem in this context is the storage of medical data. Medical data is usually either saved in a distributed way - so basically, every health care provider holds their own data concerning a patient. In this case, to retrieve information that is not comprehensively stored at one single health care provider, it will be necessary to match this data. If all patient data is stored centrally instead, the protection of the patient's privacy is quite difficult as the data is available at least to the processing unit belonging to the central storage system. US20090150362A1 discloses a double blinded privacy-safe distributed data mining protocol is disclosed, among an aggregator, a data consumer entity having privacy- sensitive information, and data source entities having privacy-sensitive information. The aggregator does not have access to the privacy-sensitive information at either the data consumer entity or the data source entities. The aggregator formulates a query without using privacy-sensitive information, and sends the query to the data consumer entity. The data consumer entity generates a list of specific instances that meet the conditions of the query and sends the list, encrypted, to the data source entities either directly or through the aggregator. The data source entities match the list against transactional data, de-identify the matched results, and send them to the aggregator. The aggregator combines results from data source entities and sends the combined result to the data consumer entity. This allows for privacy-safe data mining where both the data consumer entity and data source entities have privacy-sensitive information not available for the aggregator to see or use.
US20020116227A1 discloses a method for searching for medical information executed by one or more computers comprising the steps of formulating a request for medical information concerning an individual or group of individuals, transmitting a record request to a record facilitator, the record facilitator determining which patient record sources to investigate, a record query being sent from the facilitator to the patient record sources which are appropriate, receiving a patient record report back from the patient record sources, normalizing and augmenting the patient record report before forwarding it back to the requester, and de-identifying the patient record to remove any identifying information.
US7823207B2 discloses a privacy preserving data-mining protocol, between a secure "aggregator" and "sources" having respective access to privacy-sensitive micro-data, the protocol including : the "aggregator" accepting a user query and transmitting a parameter list for that query to the "sources" (often including privacy-problematic identifiable specifics to be analyzed); the "sources" then forming files of privacy-sensitive data-items according to the parameter list and privacy filtering out details particular to less than a predetermined quantity of micro-data-specific data-items; and the "aggregator" merging the privacy-filtered files into a data-warehouse to formulate a privacy-safe response to the user— even though the user may have included privacy-problematic identifiable specifics.
US20090326981A1 provides a system and/or a method that facilitates collecting a portion of health data from a collection of users. An interface component can receive health data communicated from a collection of users, wherein each user within the collection is associated with a respective portion of health data. A verification component can authenticate at least one of a transmission source of the portion of health data, an ownership between a portion of health data and a user, an integrity level associated with the portion of health data, or a user submitting the portion of health data. A collection component can aggregate authenticated health data into a semantic data store in which the health data is indicative of a raw and unmolested source of health information from the collection of users. The collection component can further organize the health data to facilitate identification of a medical related trend.
US20060069957A1 discloses a communication system processing element comprises a processor coupled to a memory and implements at least a portion of a distributed expert system. The distributed expert system is arranged in at least two hierarchical levels, including an upper level comprising a central controller, and a lower level comprising a plurality of local agents each associated with one or more communication devices of the system.
US6397224B1 discloses a system for anonymously linking a plurality of data records, each data record comprising a plurality of elements for identifying an associated individual, includes a first identity reference encoding module configured to encode a first encoded identity reference from a first subset of the identifying elements of a data record; a second identity reference encoding module configured to encode a second encoded identity reference from a second subset of the identifying elements of the data record; and an anonymization code assignment module configured to assign to each of the first and second encoded identity references an identical anonymization code for anonymously representing the individual associated with the data record.
US7543149B2 discloses a method for securing patient identity comprising accessing an electronic medical records database including patient data for a plurality of patients. Each patient in the electronic medical records database is assigned a unique patient identifier. Patient data for a first patient, including a first patient identifier, is retrieved from the electronic medical records database. The first patient is de-identified from the patient data. De-identifying includes the creation of a first encoded patient identifier responsive to the first patient identifier. The de-identifying results in de-identified first patient data and includes the replacement of the first patient identifier with the first encoded patient identifier. The de-identified first patient data is transmitted to a data warehouse system. The method further comprises identifying a second patient in response to receiving report data that includes a second encoded patient identifier from the data warehouse system. The identifying includes the creation of a second patient identifier responsive to the second encoded patient identifier.
While the prior art approaches may be satisfactory in some regards, they have certain shortcomings and disadvantages. More specifically, the prior art discloses either to store multiple patient records or parts thereof at single health care providers or at a central entity. In the former case of records that are saved in a distributed way, the data needs to be matched by patient, especially if a comprehensive medical record (or excerpts or derived information that is not saved at one health care provider) is needed. In the latter case of a central data storage, the data is available to the central entity in any case. Also, the patient has no control of their data and must trust the operator of the storage units - whether it be central, or whether it be locally stored at each health care provider. The user has no means to directly control their data at these storage entities.
In any of the two cases of data storage or in case of a hybrid combination of those two, the patient or user does not control the data storage entity. Furthermore, their data is available to the storage entity and the storage entity will technically govern the access of third parties to the patient's data.
Obviously, the storage entity/ies can be all entities that hold patient data - health care providers, central registers, employers as far as they need to keep a medical record, etc, also including IT-providers operating transmission, processing or storing services, hardware or systems for the aforementioned entities.
Summary
It is therefore an object of the invention to overcome or at least alleviate the shortcomings and disadvantages of the prior art. More particularly, it is an object of the present invention to provide a method and system for sending or transmitting combined parts of distributed data. It is also an object to provide an improved secure and reliable way of transmitting sensitive user data (such as medical data) anonymously to third parties.
In a first embodiment, a method for sending combined parts of distributed data from user devices to at least one recipient is disclosed. The method comprises a device data storing step that comprises for each of at least one user device, storing user data relating to the respective user device on said respective user device. The method also comprises a device sending step that comprises sending at least one data set from the at least one user device to a server. The method further comprises a server receiving step that comprises receiving the at least one data set by the server. The method also comprises a server packaging step that comprises combining data elements of the at least one received data set to at least one data container.
The user device may comprise a user terminal and/or a portable computing device such as a smartphone, a tablet, a laptop, a wearable device and/or the like. The user devices may comprise a data storage, at least one processing component configured to execute a program in a suitable form and format and a communication component at least configured to communicate with a remote server.
The processing component can for example comprise processors, hardware accelerators and/or microcontrollers.
The data storage can comprise memory components, such as, main memory (e.g. RAM), cache memory (e.g. SRAM) and/or secondary memory (e.g. HDD, SDD).
The user devices can comprise busses configured to facilitate data exchange between their components, such as, the communication between the data storage and the processing component. The user device can comprise network interface controllers that can be configured to connect the data processing device to a network, such as, to the Internet.
The server can comprise a single server, a server system composed of multiple servers, and/or a program emulating the functionality of a server, running on a cloud computing platform or any system configured to implement the functionality of a server.
The server can comprise means of data processing, such as, processor units, hardware accelerators and/or microcontrollers. The server can comprise memory components, such as, main memory (e.g. RAM), cache memory (e.g. SRAM) and/or secondary memory (e.g. HDD, SDD). The server can comprise busses configured to facilitate data exchange between components of the server, such as, the communication between the memory components and the processing components of the server. The server can comprise network interface cards that can be configured to connect the server to a network, such as, to the Internet. The server can comprise user interfaces, such as:
• output user interface, such as screens or monitors configured to display visual data and/or speakers configured to communicate audio data,
• input user interface, such as a camera, a microphone configured to capture audio data, a keyboard, a trackpad, mouse, touchscreen and/or joystick.
The server can also be configured to be controlled from another computer system, such as via a remote-desktop connection, via a secure shell connection (SSH) or the like.
To put it simply, the server can be a processing unit configured to carry out instructions of a program. The server can be a system-on-chip comprising processing units, memory components and busses. The server can be a processing unit or a system-on-chip that can be interfaced with a personal computer, a laptop, a pocket computer, a smartphone, a tablet computer and/or user interfaces (such as the upper-mentioned user interfaces).
User data may comprise any data related to the user of the user device. Multiple users may also be associated with one user device, where each individual user would then have a unique "user profile" or the like. In a preferred embodiment, user data comprises, at least partially, medical data associated with the user. This can comprise results of various medical tests or procedures, diagnoses, measurements from fitness tracking or medical devices and the like.
The present method may advantageously allow to securely share parts of sensitive user data (such as for example medical data) with third parties (e.g. research institutions or the like) without compromising user privacy. The user data may first be anonymised via a certain technique and sent from the user device to the server. There, the data may be stored until parts or all of it are needed by a third party. The data may then be anonymised again, via a different technique, and provided to the third party packaged into a data container. The present method is useful for ensuring that user data is handled with utmost case and user privacy is respected, while the integrity of the data is preserved so that it can be further analysed and/or studied and/or otherwise used by third parties.
In some embodiments, for at least one of user devices, the user data can be specific to the respective user device.
In some embodiments, the device data storing step can comprise storing medical data and wherein the user data can comprise medical data. In some such embodiments, the device data storing step can comprise storing at least a part of the user data in a machine-interpretable form. In some such embodiments, storing at least the part of the user data in a machine-interpretable form can comprise at least one of using a homogenous naming for fields and, for each field, encoding values with a same dimension unit.
In some embodiments, the device data storing step can comprise storing at least partially automatically generated medical data that comprise at least one of at least one medical image, at least one result of a laboratory analysis of material originating from or expelled by the human body, and data from a sensing device that senses biometrical or medical data of the user. Material originating from or expelled by the human body for example can comprise body fluids such as blood or urine, stool or tissue samples.
In some such embodiments, the at least partially automatically generated medical data can be automatically generated.
In some embodiments, the device sending step can comprise a device processing step that comprises processing the at least one data set on the at least one device.
In some embodiments, the device sending step can comprise a device data set selection step that can comprise selecting at least one data set from the user data on the at least one user device. In some embodiments, the device sending step can be performed by at least one of the at least one user device periodically and/or upon request by the server.
In some embodiments, the server receiving step can comprise connecting at least one of the at least one user device at least at some points in time to the server.
In some embodiments, the server receiving step can comprise storing server data on the server, wherein the server data can comprise at least a part of at least one of the at least one data set received by the server.
In some embodiments, the server packaging step can comprises receiving at least one data request from at least one requesting party. Such a request can comprise, for example a request for a specific type of medical data and/or a patient profile.
In some embodiments, the server packaging step can comprise furthermore a server processing step.
In some embodiments, the server packaging step can comprise furthermore a server data selection step that can comprise selecting the data elements of the at least one received data set to be combined to the at least one data container.
In some embodiments, the server packaging step can comprise furthermore a server container releasing step that comprises preventing releasing at least one of the at least one data container before at least one container releasing condition is matched.
In some embodiments, the at least one user device can comprise a plurality of user devices.
In some embodiments, the user data on the at least one user device can comprise at least one data element. This data element can comprise one or more of the following data : at least one numeric value, single selectable options from at least one list, multiple selectable options from at least one list, at least one time-stamped value, and at least one binary value.
In some embodiments, the method can further comprise a device processing step that comprises processing at least one data element of at least one data set of the user data on at least one user device by the respective user device. Processing at least one data element may also be processing at least one value of the data element, if the data element comprises a plurality of values, such as a vector.
In some such embodiments, the device processing step can comprise, on at least one user device, at least one of removing information from at least a part of the user data and limiting a precision of at least a part of the user data. This can be achieved by measures such as by adding noise, by adding errors, by changing a data type of a value or by only indicating range selected that may be selected from a pre-defined set of ranges, wherein the values is. That is, the device processing step can anonymise data, or at least limit a traceability of data or inhibit direct linking of parts of data sets obtained by an adverse party, if these data sets all refer to a same user or user device.
In some such embodiments, the device processing step can comprise processing at least one numerical data element. The at least one numerical data element can comprise a data element which comprises at least one numeric value. The device processing step can furthermore comprise combining numerical noise and the numerical data element.
In some embodiments, combining numerical noise and the at least one numerical data element can comprise adding the numerical noise to at least one of the at least one numeric value of the numerical data element.
In some embodiments, combining numerical noise and the at least one numerical data element can comprise adding the numerical noise to the numeric value of the respective numeric data element and for at least one of the at least one numerical data element, limiting the numerical noise so that the respective numeric value does not exceed a pre defined interval (and/or a predefined threshold value). The pre-defined interval or threshold value can be global, such as for the height of a user, wherein a minimum height may be fixed. For other values, there may be a plurality of intervals, and depending on the original value, the interval may be selected, for example for a biomarker that has three ranges, "high", "medium" and "low", if a value of the biomarker was in the high-range, the biomarker may be limited to said range. An optional advantage of this technique can be that results of a subsequent analysis are not perturbated.
In some such embodiments, the numerical noise can be generated by a Laplace- distribution with an appropriate scaling. This can be particularly advantageous to provide sufficient anonymity to user data, while maintaining its statistical properties. A probability density function of a variable that is added as noise can optionally be given by the following formula with appropriate m and b.
Figure imgf000009_0001
In some embodiments, the device processing step can comprise processing at least one data element by converting a representation of the data element from a first encoding to a second encoding. That can also comprise changing a part of an encoding, for example an encoding of a quantity of consumed cigarettes per day if a data element comprises a quantity of consumed cigarettes per day and a timestamp.
In some such embodiments, the first and the second encoding can be, at least for some values of the data element, not equivalent and converting a representation of the data element in the second encoding can comprise using an appropriate random function. This can be the case if a range A in a first encoding alpha (for example corresponding to "high") corresponds to two values B and B* in a second encoding beta (for example "critically high" and "over-average").
In some embodiments, the device processing step can comprise processing at least a timestamped data element. The at least one timestamped data element can comprise a data element which comprises at least one timestamped value. The device processing step can then comprise replacing a timestamp of at least one of the at least one timestamped value.
In some such embodiments, the step of replacing a timestamp of the at least one of the at least one timestamped value can comprise limiting the precision of said timestamp.
In some embodiments, the step of replacing a timestamp of the at least one of the at least one timestamped value can comprise replacing said timestamp by a temporal distance relative to another point in time, such as a timestamp of another data element.
In some embodiments, the method can further comprise limiting the precision of said temporal distance relative to another point in time.
In some embodiments, the device processing step can comprise an operation to anonymize at least a part of the user data on at least one user device.
In some embodiments where the server data on the server comprise at least one data element, at least one data element of the at least one data element can comprise at least one of the following data : at least one numeric value, single selectable options from at least one list, multiple selectable options from at least one list, at least one time-stamped value, and at least one binary value.
In some embodiments, the server processing step can comprise processing at least one data element of at least one data set of the server data on the server. In some such embodiments, the server processing step can comprise at least one of removing at least one of removing information from at least a part of the user data and limiting a precision of at least a part of the respective data element. This can be achieved by measures such as by adding noise, by adding errors, by changing a data type of a value or by only indicating range selected that may be selected from a pre-defined set of ranges, wherein the values is. That is, the server processing step can anonymise data, or at least limit a traceability of data or inhibit direct linking of parts of data sets obtained by an adverse party.
In some such embodiments, the server processing step can comprise processing at least one numerical data element. The at least one numerical data element can be a data element which comprises at least one numeric value. The server processing step can furthermore comprise combining numerical noise and the numerical data element.
In some such embodiments, combining numerical noise and the at least one numerical data element can comprise adding the numerical noise to at least one of the at least one numeric value of the numerical data element.
In some other embodiments, combining numerical noise and the at least one numerical data element can comprise adding the numerical noise to the numeric value of the respective numeric data element and for at least one of the at least one numerical data element, limiting the numerical noise so that the respective numeric value does not exceed a pre-defined interval.
In some such embodiments, the numerical noise can be generated by a Laplace- distribution with an appropriate scaling. This can be particularly advantageous to provide sufficient anonymity to user data, while maintaining its statistical properties. A probability density function of a variable that is added as noise can optionally be given by the following formula with appropriate m and b.
Figure imgf000011_0001
In some embodiments, the server processing step can comprise processing at least one data element by converting a representation of the data element from a first encoding to a second encoding. That can also comprise changing a part of an encoding, for example an encoding of a quantity of consumed cigarettes per day if a data element comprises a quantity of consumed cigarettes per day and a timestamp.
In some such embodiments, the first and the second encoding can be, at least for some values of the data element, not equivalent and converting a representation of the data element in the second encoding can comprise using an appropriate random function. This can be the case if a range A in a first encoding alpha (for example corresponding to "high") corresponds to two values B and B* in a second encoding beta (for example "critically high" and "over-average").
In some embodiments where the server processing step comprises processing at least a timestamped data element, the at least one timestamped data element can be a data element which comprises at least one timestamped value. The server processing step can comprise replacing a timestamp of at least one of the at least one timestamped value.
In some such embodiments, the step of replacing a timestamp of the at least one of the at least one timestamped value can comprise limiting the precision of said timestamp. In some other embodiments, the step of replacing a timestamp of the at least one of the at least one timestamped value can comprise replacing said timestamp by a temporal distance relative to another point in time, such as a timestamp of another data element.
In some such embodiments, the method can further comprise limiting the precision of said temporal distance relative to another point in time.
In some embodiments, the server processing step can comprise an operation to anonymize at least a part of the server data.
In some embodiments, the at least one data set in the device data set selection step can be selected only from a pre-defined part of user data on the user device. This part can for example exclude identifying data, such as contact data, a user's address or at least a part thereof and/or his payment data.
In some embodiments, the server data selection step can comprise receiving at least one data request. The data request can be received from third parties, such as research partners.
In some such embodiments, the at least one data request comprises a data request condition and a first list of fields. The data request condition is a condition that specifies criteria for users that are relevant for the third party or the research partner. Technically, it is a condition that needs to be matched for data to be selected. The first list of fields lists a minimum of data elements necessary for the purpose of the third party, such as a research purpose for a third party.
In some embodiments, the at least one data request can be a plurality of data requests.
In some such embodiments, each of the at least one data container is specific to a respective data request. That is, each data container comprises the data corresponding to the respective data request.
In some embodiments, the server data can comprise at least one data element group, wherein the at least one data element group comprises at least one data element and the at least one data element comprises a common group key that corresponds to the at least one data element group. The common group key can also be linked to a data element by the data set to which the group key and the data element belong. Each data element group can be understood as collection of data that have such a common group key element, so that the common group key defines a data profile.
In some embodiments, the common group key can comprise a user device indicator. That is, the data element groups can be understood as anonymised profiles of users that collect the anonymised data that is sent by the users. The user device indicator may also be the same for a plurality of devices if the user device receives a corresponding instruction, e.g. if a user changes his user device. In some such embodiments, the at least one data element group can be a plurality of data element groups.
In some such embodiments, the server data selection step comprises evaluating for each data request at least one server selection condition by the server, wherein each server selection condition corresponds to one data request condition.
In such embodiments, each server selection condition can comprise the corresponding data request condition. That is, the server can add at least one criterion to the data request condition, e.g. in order to protect the privacy of the users, as will be detailed below.
In some such embodiments, each server selection condition can comprise a condition regarding whether a data element group comprises at least some or all data elements indicated by the corresponding data request's first list of fields. This can be optionally advantageous to limit data element groups that match the server condition to data element groups that can be selected to respond to the data request.
In some such embodiments, each server selection condition can comprise a condition regarding a proportion of data elements indicated by the corresponding data request's first list of fields and a data element group's data elements. Such a proportion can be a proportion such as a ration of a number of requested fields or data elements to a number for fields or data elements of the data element group. An optional advantage of this feature can be that consequently, in one data container, there can be only a limited amount of data regarding a user and the user data that are stored on the server cannot be disclosed by just a single data request.
In some such embodiments, the data selection step can comprise adding a selection flag to each data element that is selected during processing of a data request.
In some such embodiments, each server selection condition can comprise a condition referring to an amount of data elements of a data element group that are indicated by the corresponding data request's first list of fields and to which a selection flag was added. Such a condition could for example be a maximum number of selection flags that may have been added to the data elements that are indicated by the first list of fields and that might therefore be used to match user profiles if an adverse party gains access to more than one data container.
In some such embodiments, each server selection condition can comprise a condition referring to a proportion of data elements of a data element group that are indicated by the corresponding data request's first list of fields and to which a selection flag was added to the data elements of a data element group that are indicated by the corresponding data request's first list of fields. The advantage of the preceding paragraph can apply accordingly. An example for said part of each server selection condition can be that the proportion of data elements that were previously shared with a research partner and all the data elements of the data element group is below 50%.
In some such embodiments, each server selection condition can comprise a condition regarding a maximum number of data element groups that are selected for the data request corresponding to the server selection condition.
In some such embodiments, the server data selection step can comprise for each data request, evaluating the server selection condition data element group-wise until a finishing condition is matched. That is, the server checks for each data element group whether it matches the respective server condition and selects it accordingly. An optional advantage can be a limited processing time, as not all of the data element groups need to be checked. Furthermore, it can be easier to limit the selection to a part of the data element groups.
In some such embodiments, the server data selection step can comprise for each data request selecting data elements from the at least one data element group based on the data request and a result of evaluating the server selection condition for the respective data element group. That is, a result of verifying the server selection condition can be used as selection criterion for data element groups, as implied above.
In such embodiments, selecting data elements from the at least one data element group based on the data request and a result of evaluating the server selection condition for the respective data element group can comprise selecting the data elements from the at least one data element group that are indicated by the first list of fields if the server selection condition was matched for the respective data element group. An advantage of this can be that there is a clear criterion for the selection of data elements from data element groups for answering a data request.
In such embodiments, the at least one data request can comprise a second list of fields.
Furthermore, selecting data elements from the at least one data element group based on the data request and a result of evaluating the server selection condition for the respective data element group can comprise selecting data elements from the at least one data element group that are indicated by the second list of fields if the server selection condition was matched for the respective data element group.
Alternatively, selecting data elements from the at least one data element group based on the data request and a result of evaluating the server selection condition for the respective data element group can comprise furthermore selecting data elements from the at least one data element group that are indicated by the second list of fields if the server selection condition was matched for the respective data element group, until the part of the server condition regarding the proportion of data elements indicated by the corresponding data request's first list of fields and a data element group's data elements is not matched anymore. That is, if the first list of fields specifies x data elements to be sent and by the aforementioned criterion regarding the proportion of data elements, y data elements may be selected, then for y>x, up to y-x data elements are selected according to the second list of fields. Furthermore, an optional advantage can be that data element groups are considered that only comprise all data elements from the first list of fields and none or not all of the data elements from the second list of fields. So, this option allows the specification of optional data elements that are selected from the data element group, but does at the same time not limit the quantity of data element groups that match the server selection condition.
In such embodiments, the at least one data request can comprise at least one further list of fields, such as a third list of fields. Furthermore, selecting data elements from the at least one data element group based on the data request and a result of evaluating the server selection condition for the respective data element group can comprise furthermore selecting data elements from the at least one data element group that are indicated by the further list of fields, such as the third list of fields, when there are no data elements left that are indicated by the first and the second list of fields, if the server selection condition was matched for the respective data element group, until the part of the server condition regarding the proportion of data elements indicated by the corresponding data request's first list of fields and a data element group's data elements is not matched anymore. This can have the same advantages as specified in the preceding paragraph.
In some such embodiments, the server data selection step can comprise finishing selecting data elements from a data element group when a condition referring to a proportion of data elements that are selected and a data element group's data elements is matched, such as the condition discussed above. That is, the data elements can be selected from the second and subsequently from the third list of fields as long as the proportion of selected data elements and available data elements of a respective data element group does fulfil a condition, such as not exceed a ratio of 50%.
In some embodiments, the at least one container releasing condition can be verified for each of the at least one data container separately. In such embodiments, the at least one container releasing condition can comprise a minimum number of different data element groups from which data elements were selected for the respective data container.
In such embodiments, the server container releasing step can comprise preventing releasing each of the at least one data container before the at least one container releasing condition is matched for the respective data container. That is that sharing a data container comprising data elements from too few data element groups can be prevented. This can be optionally advantageous in a case where the absolute number of data element groups satisfying the data request condition or the server selection condition respectively is small. As an example, in an extreme case with only one matching data element group that matches the data request condition, in a case where a data element group corresponds to data of a user, a shared part of the user's data would be exposed and it would be possible to match said part of the user's data to the user is the data request condition is sufficiently specific and another party obtains knowledge about the specificity by other means. For example, an insurance company insuring patients with a very rare disease could thus obtain information on their customers. Avoiding such a scenario may be an advantage of the option discussed in this paragraph.
In some such embodiments, the server packaging step can comprise furthermore a server container releasing step that comprises preventing releasing each of the at least one data container respectively before at least one container releasing condition that is specific to the respective data container is matched. That is, the container releasing conditions can be adapted to the container and thus to the data request condition of the respective data request, for example depending on how specific the data request condition is.
In such embodiments, at least one of the at least one container releasing condition can comprise a minimum number of different data element groups from which data elements were selected for the respective data container. That is, again for the example with data element groups having a user device indicator as key element, that data from a minimum number of users must be selected. The optional advantages of the penultimate paragraph apply respectively.
In some such embodiments, at least one of the at least one container releasing condition can comprise a condition regarding a uniqueness of data elements from data element groups that were selected for the respective data container. The uniqueness can also be measured with a vectoral proximity measure or by fuzzy measures and does not need to be strict. Many unique data elements can be an indicator for a high variety of data, which can be an optional advantage in particular for third parties or research partners that want to research a phenomenon without limiting themselves to special cases or for example if the data are used for data mining.
In a second embodiment, a system for sending combined parts of distributed data from user devices to at least one recipient is disclosed. The system comprises at least one user device configured to store user data relating to the respective user device on said respective user device and to send at least one data set from the at least one user device to a server. The system also comprises at least one server configured to receive the at least one data set and combine data elements of the at least one received data set to at least one data container. The system further comprises at least one data container configured to store data.
The various system elements (server, user device, user data etc) and their functions can be as described above with respect to the method embodiments. The present system may be particularly configured to execute or perform the method for selectively transmitting data as described in the above embodiments.
In some embodiments, the user data can comprise medical data. In some embodiments, user data can comprise at least partially technical medical data. The user device can be further configured to at least partially encode the technical medical data by replacing at least parts of the data by machine-interpretable expressions.
In some embodiments, user data can comprise at least partially automatically generated medical data that can comprise at least one of at least one medical image, at least one result of a laboratory analysis of material originating from or expelled by the human body, and data from a sensing device that senses biometrical or medical data of the user. The user device may even be configured to generate some of the original data such as images via the user device's sensors such as a camera (further sensors such as biometric sensors may also be used).
In some embodiments, the system can further comprise at least one requesting party. The requesting party may be interested in receiving parts or all of user data for further analysis and/or research (particularly medical research for example). However, the user data should be anonymised and protected from abuse while maintaining its statistical features. In some such embodiments, the server can be configured to receive a data request from the requesting party.
The user device can be further configured to processing at least one data element of at least one data set of the user data. In some such embodiments, the processing can comprise at least one of removing information from at least a part of the user data and limiting precision of at least a part of the user data.
In some such embodiments, the processing can further comprise processing at least one numerical data element, wherein the at least one numerical data element is a data element which comprises at least one numeric value, and wherein the device processing step comprises furthermore combining numerical noise and the numerical data element.
In some embodiments, the server can be configured to perform an operation to anonymize at least a part of the server data.
The following numbered embodiments also form part of the invention. Below, method embodiments will be discussed. These embodiments are abbreviated by the letter "M" followed by a number. Whenever reference is herein made to "method embodiments", these embodiments are meant.
Ml A method for sending combined parts of distributed data from user devices to at least one recipient, comprising
a device data storing step (DD) that comprises for each of at least one user device (11), storing user data (1) relating to the respective user device (11) on said respective user device (11),
a device sending step (DS) that comprises sending at least one data set (2) from the at least one user device (11) to a server (12),
a server receiving step (SR) that comprises receiving the at least one data set (2) by the server, and
a server packaging step (SP) that comprises combining data elements of the at least one received data set (2) to at least one data container (5).
M2 The method according to any of the preceding method embodiments,
wherein for at least one of the at least one user device (11), the user data (1) are specific to the respective user device (11).
M3 The method according to any of the preceding method embodiments,
wherein the device data storing step (DD) comprises storing medical data and wherein the user data (1) comprise medical data.
M4 The method according to the preceding embodiment,
wherein the device data storing step (DD) comprises storing at least a part of the user data (1) in a machine-interpretable form.
M5 The method according to the preceding embodiment,
wherein storing at least the part of the user data (1) in a machine-interpretable form comprises at least one of
(a) using a homogenous naming for fields and
(b) for each field, encoding values with a same dimension unit.
M6 The method according to any of the preceding method embodiments,
wherein the device data storing step (DD) comprises storing at least partially automatically generated medical data that comprise at least one of
(a) at least one medical image,
(b) at least one result of a laboratory analysis of material originating from or expelled by the human body, and
(c) data from a sensing device that senses biometrical or medical data of the user. M7 The method according to the preceding method embodiment,
wherein the at least partially automatically generated medical data are automatically generated.
M8 The method according to any of the preceding method embodiments,
wherein the device sending step (DS) comprises a device processing step (DPS) that comprises processing the at least one data set on the at least one device.
M9 The method according to any of the preceding method embodiments,
wherein the device sending step (DS) comprises a device data set selection step (DDS) that comprises selecting at least one data set (2) from the user data (1) on the at least one user device (11).
M10 The method according to any of the preceding method embodiments,
wherein the device sending step (DS) is performed by at least one of the at least one user device (11) periodically and/or upon request by the server (12).
Mi l The method according to any of the preceding method embodiments,
wherein the server receiving step (SR) comprises connecting at least one of the at least one user device (11) at least at some points in time to the server (12).
M12 The method according to any of the preceding method embodiments,
wherein the server receiving step (SR) comprises storing server data (6) on the server (12),
wherein the server data (6) comprise at least a part of at least one of the at least one data set (2) received by the server (12).
M13 The method according to any of the preceding method embodiments,
wherein the server packaging step (SP) comprises receiving at least one data request (4) from at least one requesting party (13).
M14 The method according to any of the preceding method embodiments,
wherein the server packaging step (SP) comprises furthermore a server processing step (SPS).
M15 The method according to any of the preceding method embodiments,
wherein the server packaging step comprises furthermore a server data selection step (SDS) that comprises selecting the data elements of the at least one received data set (2) to be combined to the at least one data container (5).
M16 The method according to any of the preceding method embodiments,
wherein the server packaging step comprises furthermore a server container releasing step (SCR) that comprises preventing releasing at least one of the at least one data container (5) before at least one container releasing condition (25) is matched.
M17 The method according to any of the preceding method embodiments, wherein the at least one user device (11) is a plurality of user devices (11).
M18 The method according to any of the preceding method embodiments,
wherein the user data (1) on the at least one user device (11) comprise at least one data element (3), wherein at least one data element (3) of the at least one data element (3) comprises at least one of the following data :
(a) at least one numeric value,
(b) single selectable options from at least one list,
(c) multiple selectable options from at least one list,
(d) at least one time-stamped value, and
(e) at least one binary value.
M19 The method according to any of the preceding method embodiments with the features of M18,
wherein the method comprises a device processing step (DPS) that comprises processing at least one data element (3) of at least one data set (2) of the user data (1) on at least one user device (11) by the respective user device (11).
M20 The method according to the preceding method embodiment,
wherein the device processing step comprises on at least one user device (11) at least one of removing information from at least a part of the user data (1) and limiting a precision of at least a part of the user data (1).
M21 The method according to any of the two preceding embodiments,
wherein the device processing step (DPS) comprises processing at least one numerical data element,
wherein the at least one numerical data element is a data element (3) which comprises at least one numeric value,
and wherein the device processing step (DPS) comprises furthermore combining numerical noise and the numerical data element.
M22 The method according to the preceding method embodiment,
wherein combining numerical noise and the at least one numerical data element comprises adding the numerical noise to at least one of the at least one numeric value of the numerical data element.
M23 The method according to the penultimate method embodiment,
wherein combining numerical noise and the at least one numerical data element comprises adding the numerical noise to the numeric value of the respective numeric data element and for at least one of the at least one numerical data element, limiting the numerical noise so that the respective numeric value does not exceed a pre-defined interval.
M24 The method according to any of the two preceding method embodiments,
wherein the numerical noise is generated by a Laplace-distribution with an appropriate scaling.
M25 The method according to any of the preceding method embodiments with the features of M19,
wherein the device processing step (DPS) comprises processing at least one data element (3) by converting a representation of the data element (3) from a first encoding to a second encoding.
M26 The method according to the preceding embodiment,
wherein the first and the second encoding are at least for some values of the data element (3) not equivalent and converting a representation of the data element (3) in the second encoding comprises using an appropriate random function.
M27 The method according to any of the preceding method embodiments with the features of M19,
wherein the device processing step (DPS) comprises processing at least a timestamped data element,
wherein the at least one timestamped data element is a data element (3) which comprises at least one timestamped value,
and wherein the device processing step (DPS) comprises replacing a timestamp of at least one of the at least one timestamped value.
M28 The method according to the preceding method embodiment,
wherein the step of replacing a timestamp of the at least one of the at least one timestamped value comprises limiting the precision of said timestamp.
M29 The method according to the penultimate method embodiment,
wherein the step of replacing a timestamp of the at least one of the at least one timestamped value comprises replacing said timestamp by a temporal distance relative to another point in time, such as a timestamp of another data element (3).
M30 The method according to the preceding embodiment,
comprising limiting the precision of said temporal distance relative to another point in time.
M31 The method according to any of the preceding method embodiments with the features of M19, wherein the device processing step (DPS) comprises an operation to anonymize at least a part of the user data (1) on at least one user device (11).
M32 The method according to any of the preceding method embodiments that comprise server data (6),
wherein the server data (6) on the server comprise at least one data element (3), wherein at least one data element (3) of the at least one data element (3) comprises at least one of the following data :
(a) at least one numeric value,
(b) single selectable options from at least one list,
(c) multiple selectable options from at least one list,
(d) at least one time-stamped value, and
(e) at least one binary value.
M33 The method according to any of the preceding method embodiments with the features of M32 and M14,
wherein the server processing step (SPS) comprises processing at least one data element (3) of at least one data set (2) of the server data (6) on the server (12).
M34 The method according to the preceding method embodiment,
wherein the server processing step (SPS) comprises at least one of removing information from at least a part of the server data (6) and limiting a precision of at least a part of the server data (6).
M35 The method according to the preceding embodiment,
wherein the server processing step (SPS) comprises processing at least one numerical data element,
wherein the at least one numerical data element is a data element (3) which comprises at least one numeric value,
and wherein the server processing step (SPS) comprises furthermore combining numerical noise and the numerical data element.
M36 The method according to the preceding method embodiment,
wherein combining numerical noise and the at least one numerical data element comprises adding the numerical noise to at least one of the at least one numeric value of the numerical data element.
M37 The method according to the penultimate method embodiment,
wherein combining numerical noise and the at least one numerical data element comprises adding the numerical noise to the numeric value of the respective numeric data element and for at least one of the at least one numerical data element, limiting the numerical noise so that the respective numeric value does not exceed a pre-defined interval.
M38 The method according to any of the two preceding method embodiments,
wherein the numerical noise is generated by a Laplace-distribution with appropriate scaling [mathematical function will go to the description].
M39 The method according to any of the preceding method embodiments with the features of M33,
wherein the server processing step (SPS) comprises processing at least one data element (3) by converting a representation of the data element (3) from a first encoding to a second encoding.
M40 The method according to the preceding embodiment,
wherein the first and the second encoding are at least for some values of the data element (3) not equivalent and converting a representation of the data element (3) in the second encoding comprises using an appropriate random function.
M41 The method according to any of the preceding method embodiments with the features of M33,
wherein the server processing step (SPS) processing at least a timestamped data element,
wherein the at least one timestamped data element is a data element (3) which comprises at least one timestamped value,
and wherein the server processing step (SPS) comprises replacing a timestamp of at least one of the at least one timestamped value.
M42 The method according to the preceding method embodiment,
wherein the step of replacing a timestamp of the at least one of the at least one timestamped value comprises limiting the precision of said timestamp.
M43 The method according to the penultimate method embodiment,
wherein the step of replacing a timestamp of the at least one of the at least one timestamped value comprises replacing said timestamp by a temporal distance relative to another point in time, such as a timestamp of another data element (3).
M44 The method according to the preceding embodiment,
comprising limiting the precision of said temporal distance relative to another point in time.
M45 The method according to any of the preceding embodiments with the features of M33, wherein the server processing step (SPS) comprises an operation to anonymize at least a part of the server data (6).
M46 The method according to any of the preceding method embodiments with the features of M9,
wherein the at least one data set (2) in the device data set selection step (DDS) is selected only from a pre-defined part of user data (1) on the user device (11).
M47 The method according to any of the preceding method embodiments with the features of M13 and M15,
wherein the server data selection step (SDS) comprises receiving at least one data request (4).
M48 The method according to the preceding method embodiment,
wherein the at least one data request (4) comprises a data request condition (29) and a first list of fields (20).
M49 The method according to any of the two preceding method embodiments,
wherein the at least one data request (4) is a plurality of data requests (4).
M50 The method according to any of the preceding two method embodiments,
wherein each of the at least one data container (5) is specific to a respective data request (4).
M51 The method according to the preceding method embodiment and with the features of M12,
wherein the server data (6) comprise at least one data element group (7), wherein the at least one data element group (7) comprises at least one data element (3) and the at least one data element (3) comprises a common group key (8) that corresponds to the at least one data element group (7).
M52 The method according to the preceding method embodiment,
wherein the common group key (7) comprises a user device indicator (UDI).
M53 The method according to any of the two preceding method embodiments,
wherein the at least one data element group (7) is a plurality of data element groups (7).
M54 The method according to any of the preceding method embodiments with the features of M48,
wherein the server data selection step (SDS) comprises evaluating for each data request (29) at least one server selection condition (30) by the server,
wherein each server selection condition (30) corresponds to one data request condition (29). M55 The method according to the preceding method embodiment,
wherein each server selection condition (30) comprises the corresponding data request condition (29).
M56 The method according to any of the preceding two method embodiments,
wherein each server selection condition (30) comprises a condition regarding whether a data element group (7) comprises at least some or all data elements (3) indicated by the corresponding data request's (4) first list of fields (20).
M57 The method according to any of the three preceding method embodiments,
wherein each server selection condition (30) comprises a condition regarding a proportion of data elements (3) indicated by the corresponding data request's (4) first list of fields (20) and a data element group's (7) data elements (3).
M58 The method according to any of the preceding method embodiments with the features of M33 and M47,
wherein the data selection step comprises adding a selection flag (28) to each data element (3) that is selected during the processing of a data request (4).
M59 The method according to any of the preceding method embodiments with the features of M58 and M54,
wherein each server selection condition (30) comprises a condition referring to an amount of data elements (3) of a data element group (7) that are indicated by the corresponding data request's (4) first list of fields (20) and to which a selection flag (28) was added.
M60 The method according to any of the preceding method embodiments with the features of the preceding embodiment and M54,
wherein each server selection condition (30) comprises a condition referring to a proportion of data elements (3) of a data element group (7) that are indicated by the corresponding data request's (4) first list of fields (20) and to which a selection flag (28) was added to the data elements (3) of a data element group (7) that are indicated by the corresponding data request's (4) first list of fields (20).
M61 The method according to any of the preceding method embodiments with the features of M54,
wherein each server selection condition (30) comprises a condition regarding a maximum number of data element groups (7) that are selected for the data request (4) corresponding to the server selection condition (30).
M61 The method according to any of the preceding method embodiments with the features of M54, wherein the server data selection step (SDS) comprises for each data request (4), evaluating the server selection condition (30) data element group-wise (7) until a finishing condition (31) is matched.
M62 The method according to the preceding method embodiment,
wherein the server data selection step (SDS) comprises for each data request (4) selecting data elements (3) from the at least one data element group (7) based on the data request (4) and a result of evaluating the server selection condition (30) for the respective data element group (7).
M63 The method according to the preceding method embodiment,
wherein selecting data elements (3) from the at least one data element group (7) based on the data request (4) and a result of evaluating the server selection condition (30) for the respective data element group (7)
comprises selecting the data elements (3) from the at least one data element group (7) that are indicated by the first list of fields (20) if the server selection condition (30) was matched for the respective data element group (7).
M64 The method according to the preceding method embodiment,
wherein the at least one data request (4) comprises a second list of fields (21) and wherein selecting data elements (3) from the at least one data element group (7) based on the data request (4) and a result of evaluating the server selection condition (30) for the respective data element group (7)
comprises furthermore selecting data elements (3) from the at least one data element group (7) that are indicated by the second list of fields (20) if the server selection condition (30) was matched for the respective data element group (7).
M65 The method according to the penultimate method embodiment and with the features of M57,
wherein the at least one data request (4) comprises a second list of fields (21) and
wherein selecting data elements (3) from the at least one data element group (7) based on the data request (4) and a result of evaluating the server selection condition (30) for the respective data element group (7)
comprises furthermore selecting data elements (3) from the at least one data element group (7) that are indicated by the second list of fields (20) if the server selection condition (30) was matched for the respective data element group (7), until the part of the server condition (30) specified in M57 is not matched anymore.
M66 The method according to the preceding method embodiment,
wherein the at least one data request comprises at least one further list of fields, such as a third list of fields (22),
and
wherein selecting data elements (3) from the at least one data element group (7) based on the data request (4) and a result of evaluating the server selection condition (30) for the respective data element group (7)
comprises furthermore selecting data elements (3) from the at least one data element group (7) that are indicated by the further list of fields, such as the third list of fields (22), when there are no data elements (3) left that are indicated by the first and the second list of fields (20, 21), if the server selection condition (30) was matched for the respective data element group (7), until the part of the server condition (30) specified in M57 is not matched anymore.
M67 The method according to any of the preceding method embodiments with the features of M51,
wherein the server data selection step comprises
finishing selecting data elements (3) from a data element group (7) when a condition referring to a proportion of data elements (3) that are selected and a data element group's (7) data elements (3) is matched.
M68 The method according to any of the preceding method embodiments with the features of M16 and M51,
the at least one container releasing condition (25) is verified for each of the at least one data container (5) separately and wherein the at least one container releasing condition (25) comprises a minimum number of different data element groups (7) from which data elements (3) were selected for the respective data container (5).
M69 The method according to the preceding method embodiment,
wherein the server container releasing step (SCR) comprises preventing releasing each of the at least one data container (5) before the at least one container releasing condition (25) is matched for the respective data container (5).
M70 The method according to any of the preceding method embodiments with the features of M51,
wherein the server packaging step comprises furthermore a server container releasing step (SCR) that comprises preventing releasing each of the at least one data container (5) respectively before at least one container releasing condition (25) that is specific to the respective data container (5) is matched.
M71 The method according to the preceding method embodiment,
wherein at least one of the at least one container releasing condition (25) comprises a minimum number of different data element groups (7) from which data elements (3) were selected for the respective data container (5).
M72 The method according to any of the preceding four method embodiments,
wherein at least one of the at least one container releasing condition (25) comprises a condition regarding a uniqueness of data elements (3) from data element groups (7) that were selected for the respective data container (7).
Below, system embodiments will be discussed. These embodiments are abbreviated by the letter "S" followed by a number. Whenever reference is herein made to "system embodiments", these embodiments are meant.
51. A system for sending combined parts of distributed data from user devices to at least one recipient, the system comprising
At least one user device (11) configured to store user data (1) relating to the respective user device (11) on said respective user device (11) and to send at least one data set (2) from the at least one user device (11) to a server (12);
At least one server (12) configured to receive the at least one data set (2) and combine data elements of the at least one received data set (2) to at least one data container (5);
At least one data container (5) configured to store data.
52. The system according to the preceding embodiment wherein the user data (1) comprise medical data.
53. The system according to any of the preceding system embodiments wherein user data (1) comprises at least partially automatically generated medical data that comprise at least one of
(a) at least one medical image,
(b) at least one result of a laboratory analysis of material originating from or expelled by the human body , and
(c) data from a sensing device that senses biometrical or medical data of the user.
54. The system according to any of the preceding system embodiments further comprising at least one requesting party (13).
55. The system according to the preceding embodiment wherein the server (12) is configured to receive a data request (4) from the requesting party. 56. The system according to any of the preceding system embodiments wherein the user device (11) is further configured to processing at least one data element (3) of at least one data set (2) of the user data (1).
57. The system according to the preceding embodiment wherein the processing comprises at least one of removing information from at least a part of the user data (1) and limiting precision of at least a part of the user data (1).
58. The system according to the preceding embodiment wherein the processing further comprises processing at least one numerical data element, wherein the at least one numerical data element is a data element (3) which comprises at least one numeric value, and wherein the device processing step (DPS) comprises furthermore combining numerical noise and the numerical data element.
59. The system according to any of the preceding system embodiments wherein the server (12) is configured to perform an operation to anonymize at least a part of the server data (6).
S10. The system according to any of the preceding system embodiments configured to perform the method according to any of the preceding method embodiments.
Figure Description
Figure 1 depicts a part of the method comprising method steps that are performed on user devices and a step to send data to a server as well as receiving it there.
Figure 2 depicts a part of the method comprising method steps that are performed on a server, receiving data requests and releasing data to third parties.
Figure 1 shows as an example three user devices 11. On each of the three user devices 11, a device data storing step DD is performed. The device data storing step comprises storing user data 1 on each user device 11. The user data 1 are specific to at least one user of the respective user device 11. The user data are in this example medical data. They can for example be a patient record of the user of the respective user device. The device data storing step may be performed for a period of time during which other steps of the method can be performed.
From the user data 1, data are selected in a device data set selection step DDS. The device data set selection step DDS can be performed while the respective user device 11 is still storing user data 1. The device data set selection step comprises selecting a data set 2 from the user data 1, wherein the data set 2 comprises at least one or a plurality of data elements 3. In this example, the device data set selection step selects no more than a maximum of 50 % of data elements 3 from the user data 1 for the data set 2.
The selected data are processed in the device data processing step DPS. The device data processing step comprises anonymizing at least a part of the data in the data set 2. For anonymizing, the encoding of data elements 3 can be changed, their precision can be limited, parts of the data can be removed and noise can be added. The noise is preferably according to a Laplace-distribution with an appropriate scaling, as this distribution leaves the statistical features of a collection of such data sets 2 unchanged. Also, other anonymisation techniques can be applied.
The processed data sets 2 are then sent to the server 12 in a device sending step DS. This step comprises sending the processed data sets from the respective device to the server 12.
The server receiving step SR comprises receiving the data sets 2 from the at least one or in this example three user devices 11. The server receiving step can, as well as the device data storing step, be performed for a period of time during which other steps of the method can be performed, as a person skilled in the art will easily understand.
The device data set selection step DDS can be triggered periodically, on request of the server or by a different trigger. The three device data set selection steps DDS are to be understood as an example of one or a plurality of device data set selection steps DDS per user device 11. Also, the user devices 11 do not need to perform the same number of device data set selection steps DDS and subsequent steps.
Figure 2 shows the server 12, as an example three incoming data requests 4 and three outgoing data containers 5, whereas those might also be more or less data requests 4 and data containers 5. The number of data requests 4 can be the same as the number of data containers 5, as each data container 5 corresponds to a data request 4. The server receiving step SR generates server data 6.
The server packaging step SP comprises for each data request 4, performing a server data selection step SDS. Each data request 4 comprises a data request condition 29 that needs to be satisfied by data of the server data 6 that are selected for the respective data container 5. Each data request comprises furthermore at least a first list of fields 20 that specifies which fields of the server 6 data relating to a user device 11 that satisfy the request condition 29 are requested by the data request. The first list of fields 20 is a mandatory list. The server then checks the server data 6 for corresponding data and selects data element groups 7 that match the data request condition 29 and that comprise at least data elements 3 that are specified by the first list of fields 20. Furthermore, the method comprises a server processing step SPS that comprises processing the data that are selected in the server data selection step analogously to the processing on the user device 11.
The method further comprises a server container releasing step SCR that comprises preventing releasing the respective data container 5 before a certain number of data element groups 7 was added to the container to prevent a de-anonymisation of a user in case that the data request condition 20 is only matched by few data element groups 7 at all.
After the certain number of data element groups 7 was added to the respective data container 5, the data container is released. It can be sent to a third party such as a research partner, but it can also only be made available to said third party who then accesses it on the server 12.
Numbered elements
1 user data
2 data set
3 data element
4 data request
5 data container
6 server data
7 data element group
8 group key
11 user device
12 server
13 requesting party
20 first list of fields
21 second list of fields
22 third list of fields
23 user device indicator (UDI)
24 set of user indicators
25 container releasing condition
28 selection flag
29 data request condition
30 server selection condition
31 finishing condition Named method steps
DD device data storing step
DDS device data set selection step
DPS device processing step
DS device sending step
SCR server container releasing step SP server packaging step
SPS server processing step
SR server receiving step
SDS server data selection step SUI user indicator set selection step

Claims

Claims
1. A method for sending combined parts of distributed data from user devices to at least one recipient, comprising
a device data storing step (DD) that comprises for each of at least one user device (11), storing user data (1) relating to the respective user device (11) on said respective user device (11),
a device sending step (DS) that comprises sending at least one data set (2) from the at least one user device (11) to a server (12),
a server receiving step (SR) that comprises receiving the at least one data set (2) by the server, and
a server packaging step (SP) that comprises combining data elements of the at least one received data set (2) to at least one data container (5).
2. The method according to the preceding claim,
wherein the device data storing step (DD) comprises storing medical data;
and wherein the user data (1) comprise medical data; and
wherein the device data storing step (DD) comprises storing at least a part of the user data (1) in a machine-interpretable form; and
wherein storing at least the part of the user data (1) in a machine-interpretable form comprises at least one of
(a) using a homogenous naming for fields and
(b) for each field, encoding values with a same dimension unit.
3. The method according to any of the preceding claims,
wherein the device data storing step (DD) comprises storing at least partially automatically generated medical data that comprise at least one of
(a) at least one medical image,
(b) at least one result of a laboratory analysis of material originating from or expelled by the human body, and
(c) data from a sensing device that senses biometrical or medical data of the user.
4. The method according to any of the preceding claims wherein the server packaging step comprises furthermore a server data selection step (SDS) that comprises selecting the data elements of the at least one received data set (2) to be combined to the at least one data container (5).
5. The method according to any of the preceding claims wherein the server packaging step comprises furthermore a server container releasing step (SCR) that comprises preventing releasing at least one of the at least one data container (5) before at least one container releasing condition (25) is matched.
6. The method according to any of the preceding method claims, wherein the user data (1) on the at least one user device (11) comprise at least one data element (3), wherein at least one data element (3) of the at least one data element (3) comprises at least one of the following data :
(a) at least one numeric value,
(b) single selectable options from at least one list,
(c) multiple selectable options from at least one list,
(d) at least one time-stamped value, and
(e) at least one binary value.
7. The method according to the preceding claim wherein the method comprises a device processing step (DPS) that comprises processing at least one data element (3) of at least one data set (2) of the user data (1) on at least one user device (11) by the respective user device (11) and wherein
the device processing step (DPS) comprises processing at least one numerical data element and the at least one numerical data element is a data element (3) which comprises at least one numeric value; and
wherein the device processing step (DPS) comprises furthermore combining numerical noise and the numerical data element; and
combining numerical noise and the at least one numerical data element comprises adding the numerical noise to at least one of the at least one numeric value of the numerical data element.
8. The method according to the preceding claim, wherein the numerical noise is generated by a Laplace-distribution with an appropriate scaling.
9. The method according to any of the two preceding claims wherein the device processing step (DPS) comprises an operation to anonymize at least a part of the user data (1) on at least one user device (11).
10. The method according to any of the preceding claims and with features of claim 4 wherein the server data (6) on the server comprise at least one data element (3), wherein at least one data element (3) of the at least one data element (3) comprises at least one of the following data :
(a) at least one numeric value,
(b) single selectable options from at least one list,
(c) multiple selectable options from at least one list,
(d) at least one time-stamped value, and
(e) at least one binary value; and
Wherein the server processing step (SPS) comprises processing at least one data element (3) of at least one data set (2) of the server data (6) on the server (12).
11. The method according to the preceding claim wherein the server processing step (SPS) comprises processing at least one data element (3) by converting a representation of the data element (3) from a first encoding to a second encoding and wherein the first and the second encoding are at least for some values of the data element (3) not equivalent and converting a representation of the data element (3) in the second encoding comprises using an appropriate random function.
12. The method according to any of the two preceding claims wherein the server processing step (SPS) comprises processing at least a timestamped data element,
wherein the at least one timestamped data element is a data element (3) which comprises at least one timestamped value,
and wherein the server processing step (SPS) comprises replacing a timestamp of at least one of the at least one timestamped value.
13. The method according to any of the three preceding claims wherein the server processing step (SPS) comprises an operation to anonymize at least a part of the server data (6).
14. The method according to any of the preceding claims and with features of claim 4 wherein the server data selection step (SDS) comprises receiving at least one data request (4) comprising a data request condition (29) and a first list of fields (20) and
Wherein the data selection step comprises adding a selection flag (28) to each data element (3) that is selected during the processing of a data request (4).
15. A system for sending combined parts of distributed data from user devices to at least one recipient, the system comprising
At least one user device (11) configured to store user data (1) relating to the respective user device (11) on said respective user device (11) and to send at least one data set (2) from the at least one user device (11) to a server (12);
At least one server (12) configured to receive the at least one data set (2) and combine data elements of the at least one received data set (2) to at least one data container (5); and
At least one data container (5) configured to store data.
16. The system according to the preceding claim wherein the user data (1) comprises at least partially automatically generated medical data that comprise at least one of
(a) at least one medical image,
(b) at least one result of a laboratory analysis of material originating from or expelled by the human body , and
(c) data from a sensing device that senses biometrical or medical data of the user.
PCT/EP2020/060927 2019-04-18 2020-04-17 Method and system for transmitting combined parts of distributed data WO2020212611A1 (en)

Applications Claiming Priority (8)

Application Number Priority Date Filing Date Title
EP19170091.3 2019-04-18
EP19170100 2019-04-18
EP19170091 2019-04-18
EP19170111 2019-04-18
EP19170111.9 2019-04-18
EP19170096 2019-04-18
EP19170100.2 2019-04-18
EP19170096.2 2019-04-18

Publications (1)

Publication Number Publication Date
WO2020212611A1 true WO2020212611A1 (en) 2020-10-22

Family

ID=69846023

Family Applications (4)

Application Number Title Priority Date Filing Date
PCT/EP2020/060916 WO2020212604A1 (en) 2019-04-18 2020-04-17 Method and system for selectively transmitting data
PCT/EP2020/060927 WO2020212611A1 (en) 2019-04-18 2020-04-17 Method and system for transmitting combined parts of distributed data
PCT/EP2020/060926 WO2020212610A1 (en) 2019-04-18 2020-04-17 Method and system for selective broadcasting
PCT/EP2020/060925 WO2020212609A1 (en) 2019-04-18 2020-04-17 Secure medical data analysis for mobile devices

Family Applications Before (1)

Application Number Title Priority Date Filing Date
PCT/EP2020/060916 WO2020212604A1 (en) 2019-04-18 2020-04-17 Method and system for selectively transmitting data

Family Applications After (2)

Application Number Title Priority Date Filing Date
PCT/EP2020/060926 WO2020212610A1 (en) 2019-04-18 2020-04-17 Method and system for selective broadcasting
PCT/EP2020/060925 WO2020212609A1 (en) 2019-04-18 2020-04-17 Secure medical data analysis for mobile devices

Country Status (1)

Country Link
WO (4) WO2020212604A1 (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6397224B1 (en) 1999-12-10 2002-05-28 Gordon W. Romney Anonymously linking a plurality of data records
US20020116227A1 (en) 2000-06-19 2002-08-22 Dick Richard S. Method and apparatus for requesting, retrieving, and obtaining de-identified medical informatiion
US20060069957A1 (en) 2004-09-13 2006-03-30 Sangeetha Ganesh Distributed expert system for automated problem resolution in a communication system
US7543149B2 (en) 2003-04-22 2009-06-02 Ge Medical Systems Information Technologies Inc. Method, system and computer product for securing patient identity
US20090150362A1 (en) 2006-08-02 2009-06-11 Epas Double Blinded Privacy-Safe Distributed Data Mining Protocol
US20090326981A1 (en) 2008-06-27 2009-12-31 Microsoft Corporation Universal health data collector and advisor for people
US7823207B2 (en) 2004-04-02 2010-10-26 Crossix Solutions Inc. Privacy preserving data-mining protocol
US20140114675A1 (en) * 2011-03-22 2014-04-24 Nant Holdings Ip, Llc Healthcare Management Objects
US20170177798A1 (en) * 2015-12-18 2017-06-22 Aetna Inc. System and method of aggregating and interpreting data from connected devices
US20170249432A1 (en) * 2014-09-23 2017-08-31 Surgical Safety Technologies Inc. Operating room black-box device, system, method and computer readable medium

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5660176A (en) 1993-12-29 1997-08-26 First Opinion Corporation Computerized medical diagnostic and treatment advice system
DE69507724T2 (en) * 1995-06-19 1999-09-30 Ibm Semea S.P.A., Mailand/Milano METHOD AND METHOD FOR RECEIVING DATA PACKAGES IN A ONE-WAY TRANSMISSION DEVICE
NL1019277C2 (en) 2001-11-01 2003-05-07 Vivici Device for making a diagnosis.
US20030225597A1 (en) 2002-05-29 2003-12-04 Levine Joseph H. Methods and systems for the creation and use of medical information
US7966368B2 (en) * 2003-05-02 2011-06-21 Microsoft Corporation Communicating messages over transient connections in a peer-to-peer network
US20050086481A1 (en) * 2003-10-15 2005-04-21 Cisco Technology, Inc. Naming of 802.11 group keys to allow support of multiple broadcast and multicast domains
US7433853B2 (en) 2004-07-12 2008-10-07 Cardiac Pacemakers, Inc. Expert system for patient medical information analysis
DE202005012454U1 (en) 2005-08-08 2005-10-20 Bitos Gmbh Mobile medical expert system, e.g. a first aid system, comprises a mobile terminal with a medical expert system software application which can connect to a central database via wireless communications for information exchange
US10410308B2 (en) 2006-04-14 2019-09-10 Fuzzmed, Inc. System, method, and device for personal medical care, intelligent analysis, and diagnosis
US10231077B2 (en) * 2007-07-03 2019-03-12 Eingot Llc Records access and management
EP2948880A1 (en) 2013-01-25 2015-12-02 Vanderbilt University Smart mobile health monitoring system and related methods
US20160357173A1 (en) * 2015-06-08 2016-12-08 Evidation Health Evidence Generation and Data Interpretation Platform
US20180129900A1 (en) * 2016-11-04 2018-05-10 Siemens Healthcare Gmbh Anonymous and Secure Classification Using a Deep Learning Network

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6397224B1 (en) 1999-12-10 2002-05-28 Gordon W. Romney Anonymously linking a plurality of data records
US20020116227A1 (en) 2000-06-19 2002-08-22 Dick Richard S. Method and apparatus for requesting, retrieving, and obtaining de-identified medical informatiion
US7543149B2 (en) 2003-04-22 2009-06-02 Ge Medical Systems Information Technologies Inc. Method, system and computer product for securing patient identity
US7823207B2 (en) 2004-04-02 2010-10-26 Crossix Solutions Inc. Privacy preserving data-mining protocol
US20060069957A1 (en) 2004-09-13 2006-03-30 Sangeetha Ganesh Distributed expert system for automated problem resolution in a communication system
US20090150362A1 (en) 2006-08-02 2009-06-11 Epas Double Blinded Privacy-Safe Distributed Data Mining Protocol
US20090326981A1 (en) 2008-06-27 2009-12-31 Microsoft Corporation Universal health data collector and advisor for people
US20140114675A1 (en) * 2011-03-22 2014-04-24 Nant Holdings Ip, Llc Healthcare Management Objects
US20170249432A1 (en) * 2014-09-23 2017-08-31 Surgical Safety Technologies Inc. Operating room black-box device, system, method and computer readable medium
US20170177798A1 (en) * 2015-12-18 2017-06-22 Aetna Inc. System and method of aggregating and interpreting data from connected devices

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
WIKIPEDIA: "Differential privacy", 17 April 2019 (2019-04-17), XP055703133, Retrieved from the Internet <URL:https://en.wikipedia.org/w/index.php?title=Differential_privacy&oldid=892939666> [retrieved on 20200609] *

Also Published As

Publication number Publication date
WO2020212609A1 (en) 2020-10-22
WO2020212610A1 (en) 2020-10-22
WO2020212604A1 (en) 2020-10-22

Similar Documents

Publication Publication Date Title
Esposito et al. Blockchain: A panacea for healthcare cloud-based data security and privacy?
US11688015B2 (en) Using de-identified healthcare data to evaluate post-healthcare facility encounter treatment outcomes
US20180046766A1 (en) System for rapid tracking of genetic and biomedical information using a distributed cryptographic hash ledger
US10886012B1 (en) De-identifying medical history information for medical underwriting
US10176340B2 (en) Abstracted graphs from social relationship graph
US20170277907A1 (en) Abstracted Graphs from Social Relationship Graph
US20040199781A1 (en) Data source privacy screening systems and methods
WO2005094175A2 (en) A privacy preserving data-mining protocol
US10622104B2 (en) System and method utilizing facial recognition with online (social) network to access casualty health information in an emergency situation
KR20170052465A (en) Computer-implemented system and method for anonymizing encrypted data
Pear et al. Firearm violence following the implementation of California’s gun violence restraining order law
Kim et al. A trusted sharing model for patient records based on permissioned Blockchain
EP3816835A1 (en) Personal information analysis system and personal information analysis method
WO2019148248A1 (en) Personal record repository arrangement and method for incentivised data analytics
Kartal et al. Protecting privacy when sharing and releasing data with multiple records per person
WO2018207016A1 (en) Generating synthetic non-reversible electronic data records based on real-time electronic querying
CN113139168A (en) Apparatus and method for processing data request sent from client
WO2020212611A1 (en) Method and system for transmitting combined parts of distributed data
US20190050457A1 (en) Secure low-weight data hub
CN113591154A (en) Diagnosis and treatment data de-identification method and device and query system
Driver et al. Encrypted data-sharing for preserving privacy in wastewater-based epidemiology
Pasierb et al. Privacy-preserving data mining, sharing and publishing
Kitamura et al. Privacy preserving medical knowledge discovery by multiple “patient characteristics” formatted data
US12100490B1 (en) De-identifying medical history information for medical underwriting
Westbrook et al. Patterns of utilisation of the Clinical Information Access Program (CIAP) by clinicians in NSW: An analysis of web server logs

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20718682

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 28/02/2022)

122 Ep: pct application non-entry in european phase

Ref document number: 20718682

Country of ref document: EP

Kind code of ref document: A1