WO2008098106A2 - Translational data mart - Google Patents

Translational data mart Download PDF

Info

Publication number
WO2008098106A2
WO2008098106A2 PCT/US2008/053279 US2008053279W WO2008098106A2 WO 2008098106 A2 WO2008098106 A2 WO 2008098106A2 US 2008053279 W US2008053279 W US 2008053279W WO 2008098106 A2 WO2008098106 A2 WO 2008098106A2
Authority
WO
WIPO (PCT)
Prior art keywords
data
translational
tram
mapped
data system
Prior art date
Application number
PCT/US2008/053279
Other languages
French (fr)
Other versions
WO2008098106A9 (en
WO2008098106A3 (en
Inventor
Xiaoming Wang
Olufunmilayo I. Olopade
Original Assignee
University Of Chicago
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University Of Chicago filed Critical University Of Chicago
Publication of WO2008098106A2 publication Critical patent/WO2008098106A2/en
Publication of WO2008098106A3 publication Critical patent/WO2008098106A3/en
Publication of WO2008098106A9 publication Critical patent/WO2008098106A9/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/20Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B50/00ICT programming tools or database systems specially adapted for bioinformatics
    • G16B50/20Heterogeneous data integration
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations

Definitions

  • translational medicine or “translational research” and typically refers to the “translation” of basic research into real therapies for real patients.
  • Well-designed interactive and systematic studies facilitate medical applications based on knowledge from basic research and animal model experiments, clinical data, clinical research trial data, patient history and diagnosis data.
  • This personalized, continuous, bidirectional research spectrum termed "translational research” continues to evolve.
  • Translational research has its roots in many different domains, or fields of research, and the connections made among these various domains (domain research operations), may at first appear to be no different from the traditional approaches.
  • the overall translational research scheme is not a simple mix of unplanned domain research activities. Instead, interactive multidisciplinary efforts need to be carefully planned by translational researchers and regulated by translational logic.
  • This newly emerging scientific discipline has been adopted in many medical specialties, including cancer, neurological dysfunctions and mental disabilities, immune and metabolic disorders, and a variety of genetically related diseases.
  • the concept of translational research has been essential to biomarker and pharmaceutical discoveries. As new technologies are developed and scientific insights expand, data increases exponentially, with data types and sources tending to be more heterogeneous than those of the pre- genome era.
  • a continuous translational data system comprising retrieving, from a plurality of databases, data reflecting a plurality of research domains wherein the data have translational elements, standardizing the data, mapping the translational elements between the data, and aggregating the standardized data into a centralized data structure wherein the mapped translational elements allow for continuous translational dataflow.
  • Also provided are methods and systems for providing a continuous translational data system comprising assembling a data system having mapped translational elements wherein data comprising the data system is received from a first data source and a second data source and providing access to the data system for a fee.
  • Also provided are methods and systems for querying a continuous translational data system comprising identifying a targeted object, accessing a data system wherein the data system comprises data having mapped translational elements, querying the data system, receiving results associated with the query wherein the results reflect a continuous translational dataflow, and displaying the received results.
  • Figure 1 is an exemplary translational workflow
  • Figure 2 is serves as an example of the research domains that can be relevant in a translational workflow in the specific translational research domain of Genomic Medicine;
  • Figure 3 is a block diagram illustrating an exemplary operating environment for performing the disclosed methods
  • Figure 4 is an exemplary questionnaire tree structure
  • Figure 5 shows an example of computational data dependency flow control
  • Figure 6 is an example of overall data aggregation mechanisms
  • Figure 7A illustrates a set of allele data in a Microsoft Excel spreadsheet before standardization
  • Figure 7B illustrates the data from FIG. 7A after it has been reconfigured and standardized and deployed in a TraM database
  • Figure 8B illustrates an exemplary Curator Person Module Page
  • Figure 8B illustrates TraM data privacy control
  • Figure 9 is a flowchart illustrating an exemplary method
  • Figure 10 is a flowchart illustrating an exemplary method
  • Figure 11 is a flowchart illustrating an exemplary method.
  • Object or “research object” refers to animals or human individuals, biospecimens derived from animals or humans, bio-samples processed from the biospecimens, and the like. From which demographic, genotypic, phenotypic, progeny-related, progenitor-related and other information is obtained.
  • a targeted object refers to an object that is the focus of a query.
  • Researchers refers to patient care providers, physicians, clinicians, scientific investigators, clinical and laboratory managers and personnel, operating within an institution, which can be any type of entity, such as an on-going business, a University or College, a Foundation, a Research Institute, or any other organization.
  • a continuous translational data system referred to as a translational data management mart (TraM) system.
  • the TraM system can be used as a stand-alone, streamlined data administration and utilization tool for continuous translational data assembly, management, and use.
  • the TraM system can generally be applied to a plurality of translational research branches, such as cancer research, diabetes research, neuronal-disorders, and the like, and can provide personalized translational data continuity and integrity.
  • Each branch of translational research can involve multiple types of research data that all contribute to a translational research workflow. As shown in FIG.
  • translational research workflow specifically encompasses unique research domains, such as patient demographic data, patient lifestyle data, patient family history data, clinical data, pathology data, biospecimen samples, laboratory results, and the like.
  • Clinical data can include any data generated in a clinic, for example, clinical patient data, clinical diagnosis data, clinical trial data, clinical treatment data, and the like.
  • FIG. 2 serves as an example of the research domains that can be relevant in a translational workflow in the specific translational research domain of Genomic Medicine.
  • the scheme shows how disparate domains of basic medical research and patient data can be effectively combined in order to produce a translational workflow, which is able to more directly connect basic medical research and patient data to patient care
  • Some of the key features that allow the TraM system to uniquely address translational research comprise the following:
  • the TraM system is stand-alone in that TraM does not rely on software components that are under construction by third parties.
  • the TraM system is administrative in that TraM provides a curator application interface, which allows users to actively manage the translational data.
  • the TraM system is streamlined in that TraM focuses on the essential translational research results rather than attempting to process and store all detailed records obtained from distinct local and satellite data management systems.
  • the TraM system is personalized in that TraM sets up a data dependency mechanism to enforce data curation in the context of translational data continuity. For example, such a mechanism allows all patient data for individual human subjects who have participated in a translational research project, or have been treated in a hospital setting, to be tracked over subjects' lifetimes, as well as allowing for tracking using genealogical relationships.
  • the TraM system is generic in that one database system and a single application software package supports a plurality of translational research branches. Finally, it is powerful enough to support user-driven query strategies for a global range of data searches of the TraM database.
  • the TraM system can export data for analysis with specialized tools such as SAS, MATLAB, MAPLE, and the like.
  • the TraM system uses an integrated computational approach to build data supply pipelines.
  • the global TraM system can establish data pipelines between satellite databases and remote TraM systems.
  • There can also be a distributed TraM system whereby a plurality of TraM systems can exchange data between other, remote TraM systems, making the remote TraM systems, in effect, satellite databases.
  • a data pipeline between multiple TraM systems can be two-way. This allows for updating and retrieval of the most current data available in the plurality of TraM systems.
  • the Tram system can aggregate data using any means known in the art for data exchange, for example, a web service.
  • a web service As the data in a TraM database represents the complete life cycle of translational data, the database thus functions as a "one-stop" data resource for translational researchers (data-mart).
  • the TraM system can also utilize a web service API to exchange data with outside communities, such as the caGrid network, to enable a broader range of translational data curation.
  • TraM can be used as a "data-mart" for human subjects, it is important that the system be compliant with the Health Insurance Portability and Accountability Act (HIPAA).
  • HIPAA Health Insurance Portability and Accountability Act
  • the TraM system can use three levels of control for HIPAA compliance: The first level is a network system configuration and firewall settings for TraM servers, the second is authentication and authorization procedures for all TraM users, and the third is a filter for de-identification of patient data. The combination of the second and third levels of control prevents certain un-authorized TraM users, such as public users, from seeing any HDPAA-protected information.
  • TraM is able to address the need for a system that can handle the complex nature of the basic research that must be translated into real therapies for patients.
  • the nature of translational research is such that translational logic regulates the meaningful connections that can be established among a series of research processes conducted in distinct and seemingly disparate research domains.
  • the datasets generated in the different domains and stages of this translational workflow are complicated, but they are regularly attached to unique identifiers, referred to as translational elements (TEs).
  • TEs unique identifiers
  • the data can be stored in a domain knowledge-specific computational data management system, referred to as a satellite database (SD).
  • SD domain knowledge-specific computational data management system
  • the TEs can contribute to standardizing the data in order to facilitate the translational workflow. For example, the TEs allow the mapping process to determine which identifiers from an original data source provide datasets representing only distinct data entities. The data can then be further standardized and normalized based on the common data elements (CDE) defined in the TraM system. The CDEs accomplish this by providing a core set of requisite data elements that are necessary for the successful deposition of complete datasets into the TraM system. hi addition, the relationship of the distinct data entities can be reevaluated according to translational research logic.
  • CDE common data elements
  • Translational research logic is defined by the different meaningful scientific connections that exist between/among distinct datasets that can be transformed into medical knowledge for patient treatments.
  • a patient care provider might apply such logic to a dataset of genetic polymorphisms in a related patient population (familial relations for example) and determine susceptibility of certain patients in that population to a given disease that had been meaningfully correlated with the presence of said genetic polymorphism.
  • This logic is facilitated in the TraM system because, the TraM system is based on an entity relation design/diagram (ERD), whereby TEs can be mapped within the system using consistent TraM identifiers such as "equal" or "parent-child”.
  • ERP entity relation design/diagram
  • TEs can be recorded in the TraM system as required TraM identifiers, ensuring the field is filled in for data tracking purposes.
  • a foreign key field can be set to NOT NULL to enforce upstream and downstream data continuity.
  • the TraM system overcomes these obstacles by providing users with a curator interface, allowing them to bridge the gaps, patch the holes, and complete the originally broken datasets as they go into and are maintained by the TraM system. All of these features uniquely enable the TraM system to give researchers the necessary tools to reach the goal of truly personalized treatment in medicine. As such, the TraM system needs to keep the personalized translational dataflow intact.
  • the translational dataflow from this research typically includes records from a vulnerable population, genetic and environmental factors, clinical research and/or trials, pathological diagnoses, and a variety of laboratorial results.
  • the research objects can be animals or human individuals, biospecimens derived from animals or humans, a variety of bio-samples processed from the biospecimens, and the like.
  • the sample types can vary from normal or diseased tissue samples, DNA, RNA, and proteins, to primary or transformed cell lines cultured from the specimens.
  • the TraM system overcomes such limitations in data storage and analysis by ensuring data continuity per research object (i.e., a person recruited in a study) in the translational dataflow.
  • the TraM system allows accessing and sharing of data generated over disparate research domains. This allows researchers to oversee research efforts and the progress of research in a translational workflow.
  • the TraM system computationally assures personalized data continuity and dataflow updated simultaneously through a data dependency mechanism.
  • the TraM system recognizes that the copy of domain data in a satellite database does not equal the copy of translational data in the TraM system, due to data update frequency, data standardization and normalization levels, data continuity and completeness status, and data coverage scopes. However, the substance of information in the overlapped area between the TraM system and satellite databases is consistent.
  • Simple data aggregation methodology can not efficiently provide a continuous translational data solution, because such methodology does not improve the data integrity of translational records. For example, a piece of domain data that has all the required mapping identifiers may still be unrelated to the individuals who are recruited to a translational research project. The situation can be likened to an attempt to import the entire human genome dataset into a data-sharing platform combining it with a massive number of clinical datasets. One cannot make a simple connection between such datasets, as the genomic data are not personalized with the entity of the subjects from which the clinical records were obtained.
  • the quantity of data is not the major obstacle in the translational data analysis, instead it is the quality of the data that is essential to make meaningful connections between/among different data sets.
  • Satellite databases are individually, independently and often inefficiently operated data management systems. There exist no synchronization mechanisms that allow satellite databases to recruit data derived from the same person, so records associated with the same person could be recruited separately within different satellite databases and as such, there exists no meaningful association between these persons. Since the TraM system has a single database system with strong data dependency control, any data entered into the TraM database, no matter which research area it comes from within a users organization, it is automatically positioned within a personalized dataflow, so that a researcher can easily tell which part of the domain records is missing or needs to be improved.
  • Satellite databases are often unable to update records for a person simultaneously, so that from one satellite database, one can not see the updated information stored in the other satellite databases, and vice-versa.
  • any domain data updated in the TraM system automatically updates the entire dataflow for an object, and this update can easily be viewed and tracked by TraM users in all research areas.
  • Satellite databases generally include very detailed local operation records, including experimental processes, scheduling, billing, and sample locations. The TraM system does not store all these records, as they are only important to local operations. Instead, the data coverage in TraM is highly streamlined, results- oriented, and continuity-based. Satellite databases may contain data in various formats and named with locally invented terminologies.
  • the TraM system follows standardized guidelines and nomenclature, such as caBIG, SNOMED, ICD, and NCI Thesaurus and keeps metadata records updated during development.
  • the TraM system allows translational researchers to see domain data from disparate domain satellite databases and connects the data from up- to down-stream in a complete translational research cycle resulting in continuous translational data.
  • FIG. 3 is a block diagram illustrating an exemplary operating environment for performing the disclosed methods.
  • This exemplary operating environment is only an example of an operating environment and is not intended to suggest any limitation as to the scope of use or functionality of operating environment architecture. Neither should the operating environment be interpreted as having any dependency or requirement relating to any one or combination of components illustrated in the exemplary operating environment.
  • the system and method of the present invention can be operational with numerous other general purpose or special purpose computing system environments or configurations.
  • Examples of well known computing systems, environments, and/or configurations that can be suitable for use with the system and method comprise, but are not limited to, personal computers, server computers, laptop devices, and multiprocessor systems. Additional examples comprise set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that comprise any of the above systems or devices, and the like.
  • the processing of the disclosed system and method of the present invention can be performed by software components.
  • the disclosed system and method can be described in the general context of computer-executable instructions, such as program modules, being executed by one or more computers or other devices.
  • program modules comprise computer code, routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types.
  • the disclosed method can also be practiced in grid- based and distributed computing environments where tasks are performed by remote processing devices that are linked through a communication network.
  • program modules can be located in both local and remote computer storage media including memory storage devices.
  • the components of the computer 301 can comprise, but are not limited to, one or more processors or processing units 303, a system memory 312, and a system bus 313 that couples various system components including the processor 303 to the system memory 312.
  • the system bus 313 represents one or more of several possible types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures.
  • bus architectures can comprise an Industry Standard Architecture (ISA) bus, a Micro Channel Architecture (MCA) bus, an Enhanced ISA (EISA) bus, a Video Electronics Standards Association (VESA) local bus, an Accelerated Graphics Port (AGP) bus, and a Peripheral Component Interconnects (PCI) bus also known as a Mezzanine bus.
  • ISA Industry Standard Architecture
  • MCA Micro Channel Architecture
  • EISA Enhanced ISA
  • VESA Video Electronics Standards Association
  • AGP Accelerated Graphics Port
  • PCI Peripheral Component Interconnects
  • the bus 313, and all buses specified in this description can also be implemented over a wired or wireless network connection and each of the subsystems, including the processor 303, a mass storage device 304, an operating system 305, TraM software 306, TraM data 307, a network adapter 308, system memory 312, an Input/Output Interface 310, a display adapter 309, a display device 311, and a human machine interface 302, can be contained within one or more remote computing devices 314a,b,c at physically separate locations, connected through buses of this form, in effect implementing a fully distributed system.
  • the computer 301 typically comprises a variety of computer readable media. Exemplary readable media can be any available media that is accessible by the computer 301 and comprises, for example and not meant to be limiting, both volatile and non- volatile media, removable and non-removable media.
  • the system memory 312 comprises computer readable media in the form of volatile memory, such as random access memory (RAM), and/or non- volatile memory, such as read only memory (ROM).
  • the system memory 312 typically contains data such as TraM data 307 and/or program modules such as operating system 305 and TraM software 306 that are immediately accessible to and/or are presently operated on by the processing unit 303.
  • the computer 301 can also comprise other removable/nonremovable, volatile/non- volatile computer storage media.
  • FIG. 3 illustrates a mass storage device 304 which can provide non- volatile storage of computer code, computer readable instructions, data structures, program modules, and other data for the computer 301.
  • a mass storage device 304 can be a hard disk, a removable magnetic disk, a removable optical disk, magnetic cassettes or other magnetic storage devices, flash memory cards, CD-ROM, digital versatile disks (DVD) or other optical storage, random access memories (RAM), read only memories (ROM), electrically erasable programmable read-only memory (EEPROM), and the like.
  • any number of program modules can be stored on the mass storage device 304, including by way of example, an operating system 305 and TraM software 306.
  • Each of the operating system 305 and TraM software 306 (or some combination thereof) can comprise elements of the programming and the TraM software 306.
  • TraM data 307 can also be stored on the mass storage device 304.
  • TraM data 307 can be stored in any of one or more databases known in the art. Examples of such databases comprise, DB2®, Microsoft® Access, Microsoft® SQL Server, Oracle®, mySQL, PostgreSQL, and the like. The databases can be centralized or distributed across multiple systems.
  • TraM data 307 can be stored in a centralized data structure, such as a standardized relational satellite database or a centralized TraM database.
  • a TraM database can be used for assembling and storing continuous translational research data.
  • the TraM database allows a user to focus on translational element mapping and streamlined data collection.
  • the TraM database system is designed to provide flexibility across a plurality of translational research branches.
  • the demographic field section of the TraM system is highly generic, easy to maintain, and under user control.
  • the TraM system takes advantage of the discovery that there exists a common logic among various branches of translational research and thus, the resulting system is able to cover a broad range of branch specific research, treatment and scientific applications.
  • the TraM system is highly abstractive and representative.
  • a domain ontology can be used as an integrated ERM structure for organizing constantly evolving domain knowledge and concepts.
  • a DO structure alone does not accommodate many-to-may relationships and therefore is not suitable for connecting research objects (person, specimen, and sample) for research data.
  • the hierarchy of a DO cannot connect to other DOs without higher order ontology.
  • the TraM system can focus on domain data that are generated during a translational workflow. These data may belong to the concepts classified in a DO. Therefore, the ER model can establish a relationship between a leaf class of a DO and a research object to integrate domain data. In one aspect, if knowledge concepts in a particular research domain are poorly standardized, they can be organized in a DO structure.
  • the TraM system makes use of consistent descriptors for the same data concepts.
  • concept names associated with a question. Each one has a specific meaning, and will be used consistently.
  • a concept branch name defines a bigger area that a question belongs to, such as social behavior or reproductive system.
  • the sub concepts under it are not the question itself, but the category names of a group of questions, such as drinking, smoking, diet, and etc.
  • the concept name of item is the end node or the leaf of the questionnaire tree, so that there is no sub concept under it, but only the properties to describe it, such as unit of measure (UOM), data type, description, and answer options.
  • UOM unit of measure
  • the TraM system utilizes unique identifiers from source data. Sometimes, a piece of data may have more than one identifier (ID) to describe the same object, such as a different clinical trial database ID and a hospital ID associated with the same person.
  • ID identifier
  • the combination of original data ID and site data ID can be used to decide the uniqueness of an object's record. Thus it is important to define which ID should be considered the original data ID.
  • Original data IDs in the TraM system can be defined as the IDs that were associated with the object's original records.
  • the original data ID of the person will be the MRN instead of the CTN, even if the data could be directly transferred from a clinical trial database (satellite database).
  • the CTN can be the secondary identifier.
  • the secondary identifier field can have a list of identifiers, but they have the equivalent rank at the semantic level, as they are all dedicated to the same object. Tracking the secondary identifiers allows the TraM system to match an object that might be assigned an original data ID after the secondary identifier to the original ID once issued. Therefore, there will be no ambiguity when data is imported from satellite databases.
  • the TraM system implements streamlined data coverage in order not to duplicate unrelated records in the TraM database.
  • Streamlined data coverage ignores data that may be necessary in a satellite database for domain operations, but which are irrelevant to translational research.
  • required data includes sample identifiers and basic property descriptions which are associated with the translational research subjects.
  • Sample location records at the departmental or institutional levels can also be required, as translational research is typically conducted in an interdepartmental or cross-institutional fashion, hi terms of data acquisition into the TraM database, this type of data, can either be entered into a particular box and a particular cell, or can be NULL since the data likely has no direct impact on the translational dataflow.
  • the TraM system also supports project- specific extensions, such as subsequently diversifying questionnaires in a longitudinal demographic survey, without breaking the generic architecture of the relational schema.
  • the TraM database can store both high-resolution data and high- throughput data. However, it is more efficient to store data tracking identifiers for high throughput data which are stored in satellite databases, such as array or sequence databases. For example, the TraM system does not preserve the sequencing data for SNP screening, but can save the mutation records to some extent once they are identified and characterized. Similarly, TraM does not preserve expression array data points. Those records can go to specialized databases, such as caArray and ITTACA. Instead, the TraM system can more efficiently register the experiment number that will make the connection between the patient sample and gene profiling results.
  • the TraM system can, but does not need to, include databases that have special built-in analytical functions, such as image storage databases with 3D- visualization functions, with which researchers can manipulate images in order to study them, and pedigree databases, which are used to store genetic annotation records and have graphic displays of pedigree trees. Therefore, having a full copy of such data saved locally in the TraM database does not improve users' ability to utilize them. Instead, storing the experimental log ID that maps to the research object in the TraM system can efficiently facilitate users locatation and utilization of data at source database systems. Because the TraM system holds the data tracking identifiers, researchers can easily follow these leads to pull out all the information accurately and efficiently.
  • special built-in analytical functions such as image storage databases with 3D- visualization functions, with which researchers can manipulate images in order to study them
  • pedigree databases which are used to store genetic annotation records and have graphic displays of pedigree trees. Therefore, having a full copy of such data saved locally in the TraM database does not improve users' ability to utilize them. Instead, storing the
  • the TraM system allows for querying of TraM data 307 globally.
  • the TraM system can utilize SQL scripts applied against the static TraM schema to assure consistent and controllable query performance, for example.
  • the query strategies can be instantly assembled and dynamically executed, and controlled by users.
  • the TraM system also allows for cross-domain data query, enabling translational researchers to query data across research domains.
  • the TraM database schema provides a foundation for global queries, as all the tables are related to each other.
  • a sophisticated query application can provide all the options to globally query TraM data 307, with constitutively displayed and dynamically displayed fields.
  • the TraM system allows for a user driven dynamic query strategy. It allows users to both determine whether they need to add another layer of specific filters, and choose desired conditions to specify the query targets. All of these options can be determined instantly so that users can adjust their query strategies dynamically to reach the most satisfactory results.
  • the TraM system allows read and write processes to happen in parallel.
  • the TraM system allows users to query any predefined fields in a field option list.
  • the field option list can be determined by a TraM database designer after discussions with translational researchers.
  • Each predefined field can support certain data type options, such as keywords, a range of values, or Boolean options.
  • the TraM system allows users to decide which operator to use, including AND, OR, NOT. Users can iterate the query selection process until they are satisfied by the search returns.
  • the TraM system allows users to control query returning fields and data output formats.
  • the query returns are standardized with fixed and dynamic/selected fields as the first result is displayed.
  • a person ID, ethnicity, age, gender, and primary diagnosis can be the constitutive fields displayed each time results are returned in an initial query.
  • Dynamic fields can be search variables captured through the user interface when a user assembles a query.
  • the program allows the user to see the details of each field. For example, from a public person ID, the user can retrieve all the defined information for that person, such as demographic information and family history (Progeny) records.
  • the TraM system allows a user to edit which field should be displayed until the user has obtained the desired results. Once query results are satisfactory, the TraM system allows users to select data output formats for different purposes, e.g. XML for data exchange, Excel/table for data manipulation, and SAS for statistical analysis.
  • the computer 301 can operate in a networked environment using logical connections to one or more remote computing devices 314a,b,c.
  • the one or more remote computing devices can comprise one or more domain satellite databases.
  • a remote computing device can be a personal computer, portable computer, a server, a router, a network computer, a peer device or other common network node, and so on.
  • Logical connections between the computer 301 and a remote computing device 314a,b,c can be made via a local area network (LAN) and a general wide area network (WAN).
  • LAN local area network
  • WAN general wide area network
  • a network adapter 308 can be implemented in both wired and wireless environments. Such networking environments are conventional and commonplace in offices, enterprise-wide computer networks, intranets, and the Internet 315.
  • the TraM system can utilize various methods for data aggregation.
  • One method for data aggregation occurs through a front-end data curation interface that provides a user with read and write capability to enter and manage data directly into the TraM system.
  • Another method, for automated data aggregation occurs through a data pipeline established between the computer 301 and on or more of the remote computing devices 314a,b,c over a network such as the Internet 315 through network adapter 308.
  • a two-way data pipeline can be established between the TraM systems allowing retrieval and updating of data between the TraM systems.
  • a data dependency control is used to gain translational data continuity when data is received from the user or the disparate satellite databases.
  • Data is often fragmented or discontinuous, causing a barrier to translational analysis.
  • Data in the workflow are often irrelevant to each other, as such is the case when data do not come from the same object.
  • a simple distributed search engine that aggregates the data from all individual satellite databases does not overcome this barrier.
  • the TraM system uses a more realistic and practical computational regulatory tool to enforce object- related dataflow continuity.
  • the TraM system maps translational elements across the heterogeneous domain data to assure data continuity for a given research object, such as a consented patient. Additionally, researchers commonly lose track of data for individuals in a particular research domain.
  • the TraM system implements a data dependency mechanism in order to keep data continuity of the objects in the study, and this mechanism is particularly enforced in the transition of major research objects.
  • FIG. 5 shows an example of computational data dependency flow control.
  • This exemplary curation flow chart describes the data dependency logic control in when a curator needs to enter the records in an Allele Map table 501 in a TraM system.
  • the solid lines show a required action before the next step takes place.
  • the arrows show the direction of logic control.
  • the dashed lines show events without strong data dependency, which means that the event may or may not happen - the data may or may not be required.
  • Diamonds show a decision that is made in order to force curator to complete the information required. Note that a public user takes a completely different route (to query home 502) and sees a different interface. For example, if a curator needs to build an allele map for a person, the curator must fill in all related data entities that are connected with solid lines in FIG. 5.
  • the TraM system requires the curator to always fill in the necessary upstream data before moving on to the downstream records, so that the available data values in the parent table can easily be arranged as the option list when a child entity needs to make an association.
  • TraM allows the curator to fill in "unknown” as an option without breaking data dependency. Therefore, if TraM can not completely resume data continuity for retrospective data, it at least assures prospective data continuity and integrity.
  • the TraM curator interface allows patching of missing data when the missing data are available; "unknown" fields can be updated with known data so that data quality and continuity are significantly improved.
  • the front-end data curation interface utilized by the TraM system allows for
  • TraM system data entry and management Translational data curators and other users, such as researchers, can view the data integrity and continuity over the entire translational workflow.
  • a statistical overview can be provided in the curator interface for each project with a built-in trigger, so that curators and translational researchers can see a global picture of their data collection and research progress in different areas from time to time.
  • the TraM system provides flexibility and regulatory functions for data entry. For example, demographic surveys on research subjects are indispensable elements in translational research. As each survey has a mixture of questionnaires defined by individual principal investigators, this may generate a plurality of descriptors for the same concepts.
  • a tree structure can be used, an example is shown in FIG. 4, to organize the questionnaires and support almost unlimited flexibility for the question options for various branches.
  • dictionary data tables can be pre-deployed by an experienced curator or bioinformatician, to control annotation ambiguity and redundancy.
  • Dictionary data describe concepts or official nomenclature used across a domain, such as disease concept names defined by SNOMED or ICD. The TraM system can directly adopt these concepts from public domains.
  • This type of data can be relatively static and without upper level data dependency; a relationship table can be used to establish the connection between research objects and biomedical concepts. This relationship describes when and how an action is taken, for example, a diagnostic date or an experiment date and the diagnostic methods or experimental methods.
  • the front-end data curation interface described above can be used to conduct translational data management.
  • a complementary solution is provided herein to interact with satellite databases for consistent and high-throughput data aggregation.
  • data aggregation mechanisms available in the art, such as web services.
  • limitations include: translational data stored in satellite databases are currently in a mixture of formats; semantic integration of data concepts has not been applied; satellite database systems do not always reside in a web server environment; and satellite database administrators may resist the idea of setting up a web service interface for data export. Therefore, web services are but one solution for transferring data from a satellite database to the TraM system, and accordingly, other means for transferring the data are specifically contemplated.
  • the TraM web service allows for data aggregation from the translational community, and relies on content compliant with standards (i.e., caBIG, SNOMED, etc.).
  • An example of overall data aggregation mechanisms are diagramed in FIG. 6.
  • satellite databases could have been built with an assortment of technologies. Also, satellite database architectures could have been constructed based on a mixture of design philosophies, or developed by people with various qualifications. Finally, the systems can be under various administrative policies. Sometimes, there is no straightforward way to retrieve the data from them, or even to access the data remotely. These satellite databases can be treated on a case by case basis. If the satellite database does not reside on a web server, the TraM system can access the database through a programming API to retrieve the data. This can be used to aggregate large amounts of data from, for example, HIPAA-compliant satellite databases which are not supported by web servers and are under restrictive management policies.
  • the TraM system can interact with HIPAA-compliant databases and public databases.
  • Data supply pipelines can connect to HIPAA-compliant satellite databases, since TraM users often have no other efficient way to access these satellite databases, nor are they able to make a connection between the satellite databases. There could be a lag between the updating of satellite databases and that of the TraM system.
  • public databases such as dbSNP and Entrez Gene.
  • Using the Internet through direct links to public data can be sufficient for researchers to view the most updated public information.
  • the TraM database can store public data identifiers rather than store the public data.
  • a method for data standardization comprises identifying a mapping relation between source data and an existing TraM data structure, redefining and reconfiguring the data value and concept domain, using consistent terminology to describe the data concept and keeping the metadata record for the definition, and reformatting the value domain expression, converting data from various sources into a consistent format.
  • FIG. 7 A shows a set of allele data in a Microsoft Excel spreadsheet before standardization.
  • the value domain of the allele was misused as the column name - which should be a concept domain.
  • the association of these alleles with a patient was treated as Boolean (shown as check mark), so that the allele records arranged this way do not have any flexibility and scalability, hi addition, this piece of data has not been integrated into translational dataflow, so that there is no easy way to make a connection between the allele map and patient diagnosis or treatments.
  • FIG. 7B shows data that has been reconfigured and standardized and deployed in a TraM database. Therefore information can easily be pulled from any field of interest starting from the listed fields as long as they are attributes within the TraM database.
  • FIG. 7A and 7B provide a confirmation that the schema is able to manage the data from a full life cycle of translational research.
  • the TraM system is an efficient tool that allows researchers to focus on their scientific problems, shielded from computation requirements, and to some degree, independent from informatics support, by providing an end user administration interface to a TraM administrator, or any type of special end user.
  • the TraM system accomplishes this by providing user-controllable utilities so that they can make system and data management an efficient and satisfactory experience.
  • the TraM system allows for user authentication.
  • user credentials can be stored in a LDAP (Lightweight Directory Access Protocol) server, which is commonly available in a Unix/Linux server environment. If a server or server system supports more than one database at the same time, then the authentication procedure can be shared among them. The validation of a credential record can only be known by the return value of a comparison result, not the credential itself, so this prevents anyone other than the users themselves from uncovering these credentials.
  • the TraM system allows for role authorization.
  • Authorization sets permission for users to access curator pages and/or query pages. This can be implemented, for example, as a static HashMap object with pairs made of a string name and an integer number to represent different levels of role authorization.
  • the TraM system can provide a "broker function" that regulates different users to access different levels of information or the same level of information but different kind of records.
  • the administrator, curator, power user, and public user can be represented by levels 0, 1, 2, and 3, respectively.
  • TraM users are grouped and authorized to work or view the data within their own translational branches. Under each translational branch, TraM can allow, for example, four types of users: 1) TraM administrator, who can authorize the other users' proper roles with different privileges in TraM applications, so that TraM administration does not depend on a database administrator, 2) TraM curator, who can read and write TraM data and who can see both "private" (before de- identification) and "public” (after de-identification) data.
  • the curator can be responsible for inputting patient original data identifiers to keep tracking records of the data's origins.
  • Private identifiers such as medical record numbers, can be HIPAA-protected information
  • TraM public/regular user who after logging in, will see read-only data to conduct data queries.
  • Public users only see the de-ID data results,
  • Power user who is usually a principal investigator or authorized researcher. This type of user, like the public user, will have read-only permission, but can view private identifiers, through a special ID mapping table, which can facilitate the tracking of original records if needed.
  • a curator is also a power user by default. All these authorization procedures and role privilege controls have been implemented and tested, and will be continuously tested when the new application module is delivered.
  • the user's signature can be compared to the page-allowed profile, such as project group, user group, and application group, and then decide if the privilege is granted.
  • the user profile can be kept in an account table in the TraM database.
  • the account table contains all the information about a user profile, including user role, application type, which project the user belongs to, and the user's basic login information.
  • the user profile (in account) functions as a user's signature. It provides sufficient information to make authorization decisions.
  • the signature can be kept in a session object and a user's activities can be recorded, especially in the curation process.
  • the project table is the root table. It connects user account and person. If the project is matched and the role level is higher or equal to the required role level, then the user is permitted to access the page. Otherwise the user can be denied access and, for example, presented with a message requesting a higher level of the authorization.
  • FIG. 8A illustrates an exemplary Curator Person Module Page.
  • a project name is listed on the upper right corner as a variable, m this case, the curator belongs to Breast Cancer Project group, the horizontal tags show the major curation modules, and each one has a side-link list for sub modules, here person module side links are shown.
  • This interface illustrates that record 123456 exists in breast cancer group.
  • FIG. 8B illustrates TraM data privacy control.
  • the curator belongs to a Head and Neck Project. Note that the project variable has changed at upper right corner.
  • the TraM system blocked this attempt.
  • the curators can use identical interfaces, the viewable data is different.
  • the user can enter commands and information into the computer 301 via an input device (not shown).
  • input devices comprise, but are not limited to, a keyboard, pointing device (e.g., a "mouse"), a microphone, a joystick, a scanner, tactile input devices such as gloves, and other body coverings, and the like
  • a human machine interface 302 that is coupled to the system bus 313, but can be connected by other interface and bus structures, such as a parallel port, game port, an IEEE 1394 Port (also known as a Firewire port), a serial port, or a universal serial bus (USB).
  • a display device 311 can also be connected to the system bus 313 via an interface, such as a display adapter 309. It is contemplated that the computer 301 can have more than one display adapter 309 and the computer 301 can have more than one display device 311.
  • a display device can be a monitor, an LCD (Liquid Crystal Display), or a projector.
  • other output peripheral devices can comprise components such as speakers (not shown) and a printer (not shown) which can be connected to the computer 301 via Input/Output Interface 310.
  • Computer readable media can be any available media that can be accessed by a computer.
  • Computer readable media can comprise “computer storage media” and “communications media.”
  • Computer storage media comprise volatile and non-volatile, removable and nonremovable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules, or other data.
  • Exemplary computer storage media comprises, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD- ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer.
  • AI Intelligence
  • machine learning and iterative learning examples include, but are not limited to, expert systems, case- based reasoning, Bayesian networks, behavior-based AI, neural networks, fuzzy systems, evolutionary computation (e.g. genetic algorithms), swarm intelligence (e.g. ant algorithms), and hybrid intelligent systems (e.g. Expert inference rules generated through a neural network or production rules from statistical learning).
  • FIG. 9 Provided, and illustrated in FIG. 9, are methods for assembling a continuous translational data system, comprising retrieving, from a plurality of databases, data reflecting a plurality of research domains wherein the data have translational elements at 901, standardizing the data at 902, mapping the translational elements between the data at 903, and aggregating the standardized data into a centralized data structure wherein the mapped translational elements allow for continuous translational dataflow at 904.
  • Data reflecting a plurality of research domains can comprise patient demographic data, patient lifestyle data, patient progenitor and progeny data, clinical data, pathology data, tissue bank samples, and laboratory results.
  • Clinical data can include any data generated in a clinic, for example, clinical patient data, clinical diagnosis data, clinical trial data, clinical treatment data, and the like.
  • Translational elements can comprise the unique identifiers regularly attached to datasets generated in the different domains and stages of a translational workflow.
  • the step of retrieving can comprise retrieving data from a satellite database for incorporation into the TraM system, retrieving data from another TraM database, and the like.
  • Standardizing the data can comprise identifying a mapping relation between source data and an existing TraM data structure, redefining and reconfiguring the data value and concept domain, using consistent terminology to describe the data concept and keeping the metadata record for the definition, and reformatting the value domain expression, converting data from various sources into a consistent format.
  • Standardized data can comprise data that utilizes unique identifiers to determine which identifiers from an original data source provide datasets representing only distinct data entities.
  • Standardized data can further comprise data based on the common data elements (CDE) defined in the TraM system.
  • CDEs accomplish this by providing a core set of requisite data elements that are necessary for the successful deposition of complete datasets into the TraM system.
  • Mapping the translational elements can comprise determining each identifier from an original data source representing a distinct data entity. The data can then be further standardized and normalized based on the common data elements (CDE) defined in TraM. The relationship of the distinct data entities (represented by TEs) can be reevaluated according to translational research logic. TEs can be mapped to consistent TraM identifiers as "equal" or "parent and child relation” relations constructed in the TraM schema based on an entity relation design/diagram (ERD).
  • CDE common data elements
  • Continuous translational dataflow can comprise the contribution of multiple types of research data from various types of research branches to a research workflow, such data encompassing unique research domains and including but not limited to patient demographic data, patient lifestyle data, patient progenitor and progeny data, clinical data, pathology data, tissue bank samples, and laboratory results.
  • Clinical data can include any data generated in a clinic, for example, clinical patient data, clinical diagnosis data, clinical trial data, clinical treatment data, and the like. Such data can be tracked through generations or compared across individual records.
  • TEs are recorded within the system using consistent TraM identifiers such as "equal” or "parent-child,” such recorded elements being referred to as mapped translational elements.
  • a data source can comprise an Institution that provides data to the TraM system.
  • An Institution can be any type of entity, such as an on-going business, such as IBM or Amgen Inc., or a University or College, such as the University of Chicago, a Foundation, such as the Diabetes Foundation, a Research Institute, such as Scripps in San Diego, or any other organization. It is understood that the Institution can be public or private. Also the Institution can be for-profit or a nonprofit.
  • a non data source can comprise an Institution that does not provide data to the centralized Tram database, but utilizes the TraM system.
  • the step of assembling can comprise aggregating data from a plurality of satellite databases, standardizing the aggregated data and mapping translational elements from the standardized aggregated data. Assembling a data system can further comprise owning the mapped translational elements. Providing access to the data system can comprise licensing use of the mapped translational elements.
  • a fee can be any form of compensation. This can be in the form of monetary means, security, such as stock, or in the form of a cross license for example.
  • a flat fee can comprise a fee that is not based on any other factor to determine the fee amount.
  • Providing access can comprise allowing an Institution to contribute, manage, edit, query, and/or retrieve data to/from the data system either locally, remotely, or both.
  • Providing access for a fee can comprise providing access to the data system to the first data source for a fee based on the quantity of data received from the first data source, and providing access to the data system to the second data source for a flat fee.
  • Providing access to the data system for a fee can further comprise providing access to the data system to the first data source for a fee inversely proportional to the quantity of data received from the first data source.
  • Providing access to the data system for a fee can further comprise providing access to the data system to a non data source for a flat fee.
  • Providing access to the data system for a fee can further comprise providing access to the data system for a usage based fee.
  • the methods can further comprise assembling a data system having mapped translational elements wherein data comprising the data system is received from a third data source and providing access only to the data in the data system received from the third data source to the first data source for a fee.
  • An object can comprise, for example, an animal, a human, biospecimens derived from animals or humans, bio-samples processed from the biospecimens, and the like, from which demographic, genotypic, phenotypic, progeny-related, progenitor-related and other information is obtained.
  • a targeted object can comprise an object that is the focus of a query.
  • TEs are recorded within the system using consistent TraM identifiers such as "equal” or "parent-child,” such recorded elements being referred to as mapped translational elements.
  • the mapped translational elements can represent mapped patient demographic data, patient lifestyle data, patient progenitor and progeny data, clinical data, pathology data, tissue bank samples, and laboratory results.
  • Clinical data can include any data generated in a clinic, for example, clinical patient data, clinical diagnosis data, clinical trial data, clinical treatment data, and the like.
  • objects sharing genetic data with the targeted object objects sharing at least one demographic attribute with the targeted object, objects sharing at least one lifestyle attribute with the targeted object, objects sharing at least one progenitor and progeny attribute with the targeted object, objects sharing at least one diagnostic result with the targeted object, objects sharing at least one clinical trial result with the targeted object, objects sharing at least one pathology attribute with the targeted object, objects sharing at least one tissue bank sample attribute with the targeted object, objects sharing at least one laboratory results with the targeted object, and the like.
  • Demographic attributes can comprise gender, age, race, weight, height, and the like.
  • Lifestyle attributes can comprise food consumption, alcohol consumption, medicine consumption, weight, and the like.
  • Querying the data system can further comprise a demographic parameter.
  • the methods can further comprise determining the targeted object's susceptibility for a condition based on the received results, determining a treatment course for a condition of interest of the targeted object based on the received results, and determining a relationship between targeted objects based on the received results.
  • Such determinations are based on the application of translational logic, which regulates the meaningful connections that can be established among a series of research processes conducted in distinct and seemingly disparate research domains.
  • a condition of interest can comprise susceptibility for a disease state, susceptibility for toxicity, normal, or enhanced response to a given treatment course, a disease state.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Databases & Information Systems (AREA)
  • Biotechnology (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • Bioethics (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Molecular Biology (AREA)
  • Genetics & Genomics (AREA)
  • Analytical Chemistry (AREA)
  • Chemical & Material Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

Provided are methods and systems for assembling, managing, and using a continuous translational data system.

Description

TRANSLATIONAL DATA MART
CROSS REFERENCE TO RELATED PATENT APPLICATION
[0001] This application claims priority to U.S. Provisional Application
No. 60/888,566 filed February 7, 2007, herein incorporated by reference in its entirety.
BACKGROUND
[0002] With knowledge of the complete human genome sequence, scientists are beginning to systematically study the molecular basis of disease and determine individualized medical therapies based on patients' genetic signatures. Researchers are breaking the boundaries of research domains between the patient bedside and experimental laboratories. This new approach to medicine has been termed "translational medicine" or "translational research" and typically refers to the "translation" of basic research into real therapies for real patients. Well-designed interactive and systematic studies facilitate medical applications based on knowledge from basic research and animal model experiments, clinical data, clinical research trial data, patient history and diagnosis data. This personalized, continuous, bidirectional research spectrum termed "translational research" continues to evolve. Translational research has its roots in many different domains, or fields of research, and the connections made among these various domains (domain research operations), may at first appear to be no different from the traditional approaches. However, the overall translational research scheme is not a simple mix of unplanned domain research activities. Instead, interactive multidisciplinary efforts need to be carefully planned by translational researchers and regulated by translational logic. This newly emerging scientific discipline has been adopted in many medical specialties, including cancer, neurological dysfunctions and mental disabilities, immune and metabolic disorders, and a variety of genetically related diseases. In fact, the concept of translational research has been essential to biomarker and pharmaceutical discoveries. As new technologies are developed and scientific insights expand, data increases exponentially, with data types and sources tending to be more heterogeneous than those of the pre- genome era.
[0003] Even modest-sized dataset integration has been nearly impossible, because of many factors including, inconsistent descriptors for similar or identical data concepts, fragmentation among and within research data coming from different domains or fields, discontinuity of tracking records for the observed individuals, and irregular formats of variable expressions within databases, even at the primary identifier level. As a consequence, the great value of data accumulated at high cost and over a long period of time may be lost. Furthermore, data recollection and verification are expensive, labor intensive, and time consuming, and the downstream data analysis can be only as good as the data included in the original research. Thus, the situation of inadequate, fragmented, and incomplete datasets has generated unnecessary barriers for translational researchers and for "bench to bedside" discovery.
[0004] Because of the inadequacies that currently exist, an overwhelming demand for informatics support for data management applications has arisen. While there have been efforts to provide data gathering, visualization, and sharing platforms, unfortunately there currently is no solution available that allows researchers to actively manage data, assure data quality, synchronize data collection efforts across different research domains, and guarantee data continuity over the full life-cycle of a translational workflow.
SUMMARY
[0005] Provided are methods and systems for assembling a continuous translational data system, comprising retrieving, from a plurality of databases, data reflecting a plurality of research domains wherein the data have translational elements, standardizing the data, mapping the translational elements between the data, and aggregating the standardized data into a centralized data structure wherein the mapped translational elements allow for continuous translational dataflow.
[0006] Also provided are methods and systems for providing a continuous translational data system, comprising assembling a data system having mapped translational elements wherein data comprising the data system is received from a first data source and a second data source and providing access to the data system for a fee.
[0007] Also provided are methods and systems for querying a continuous translational data system, comprising identifying a targeted object, accessing a data system wherein the data system comprises data having mapped translational elements, querying the data system, receiving results associated with the query wherein the results reflect a continuous translational dataflow, and displaying the received results. [0008] Additional advantages will be set forth in part in the description which follows or may be learned by practice. The advantages will be realized and attained by means of the elements and combinations particularly pointed out in the appended claims. It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive, as claimed.
BRIEF DESCRIPTION OF THE DRAWINGS
[0009] The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments and together with the description, serve to explain the principles.
Figure 1 is an exemplary translational workflow;
Figure 2 is serves as an example of the research domains that can be relevant in a translational workflow in the specific translational research domain of Genomic Medicine;
Figure 3 is a block diagram illustrating an exemplary operating environment for performing the disclosed methods; Figure 4 is an exemplary questionnaire tree structure; Figure 5 shows an example of computational data dependency flow control; Figure 6 is an example of overall data aggregation mechanisms; Figure 7A illustrates a set of allele data in a Microsoft Excel spreadsheet before standardization;
Figure 7B illustrates the data from FIG. 7A after it has been reconfigured and standardized and deployed in a TraM database; Figure 8B illustrates an exemplary Curator Person Module Page; Figure 8B illustrates TraM data privacy control; Figure 9 is a flowchart illustrating an exemplary method; Figure 10 is a flowchart illustrating an exemplary method; and Figure 11 is a flowchart illustrating an exemplary method. DETAILED DESCRIPTION
[0010] Before the present methods and systems are disclosed and described, it is to be understood that this invention is not limited to specific synthetic methods, specific components, or to particular compositions, as such may, of course, vary. It is also to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting.
[0011] As used in the specification and the appended claims, the singular forms "a,"
"an" and "the" include plural referents unless the context clearly dictates otherwise. Ranges may be expressed herein as from "about" one particular value, and/or to "about" another particular value. When such a range is expressed, another embodiment includes from the one particular value and/or to the other particular value. Similarly, when values are expressed as approximations, by use of the antecedent "about," it will be understood that the particular value forms another embodiment. It will be further understood that the endpoints of each of the ranges are significant both in relation to the other endpoint, and independently of the other endpoint.
[0012] "Optional" or "optionally" means that the subsequently described event or circumstance may or may not occur, and that the description includes instances where said event or circumstance occurs and instances where it does not.
[0013] "Object" or "research object" refers to animals or human individuals, biospecimens derived from animals or humans, bio-samples processed from the biospecimens, and the like. From which demographic, genotypic, phenotypic, progeny-related, progenitor-related and other information is obtained. A targeted object refers to an object that is the focus of a query.
[0014] "Researchers" refers to patient care providers, physicians, clinicians, scientific investigators, clinical and laboratory managers and personnel, operating within an institution, which can be any type of entity, such as an on-going business, a University or College, a Foundation, a Research Institute, or any other organization.
[0015] The present invention may be understood more readily by reference to the following detailed description of preferred embodiments and the Examples included therein and to the Figures and their previous and following description.
[0016] Provided is a continuous translational data system, referred to as a translational data management mart (TraM) system. The TraM system can be used as a stand-alone, streamlined data administration and utilization tool for continuous translational data assembly, management, and use. The TraM system can generally be applied to a plurality of translational research branches, such as cancer research, diabetes research, neuronal-disorders, and the like, and can provide personalized translational data continuity and integrity. Each branch of translational research can involve multiple types of research data that all contribute to a translational research workflow. As shown in FIG. 1, translational research workflow specifically encompasses unique research domains, such as patient demographic data, patient lifestyle data, patient family history data, clinical data, pathology data, biospecimen samples, laboratory results, and the like. Clinical data can include any data generated in a clinic, for example, clinical patient data, clinical diagnosis data, clinical trial data, clinical treatment data, and the like. FIG. 2 serves as an example of the research domains that can be relevant in a translational workflow in the specific translational research domain of Genomic Medicine. The scheme shows how disparate domains of basic medical research and patient data can be effectively combined in order to produce a translational workflow, which is able to more directly connect basic medical research and patient data to patient care Some of the key features that allow the TraM system to uniquely address translational research comprise the following: The TraM system is stand-alone in that TraM does not rely on software components that are under construction by third parties. The TraM system is administrative in that TraM provides a curator application interface, which allows users to actively manage the translational data. The TraM system is streamlined in that TraM focuses on the essential translational research results rather than attempting to process and store all detailed records obtained from distinct local and satellite data management systems. The TraM system is personalized in that TraM sets up a data dependency mechanism to enforce data curation in the context of translational data continuity. For example, such a mechanism allows all patient data for individual human subjects who have participated in a translational research project, or have been treated in a hospital setting, to be tracked over subjects' lifetimes, as well as allowing for tracking using genealogical relationships. The TraM system is generic in that one database system and a single application software package supports a plurality of translational research branches. Finally, it is powerful enough to support user-driven query strategies for a global range of data searches of the TraM database. The TraM system can export data for analysis with specialized tools such as SAS, MATLAB, MAPLE, and the like. For data acquisition from domain satellite databases to the TraM system, the TraM system uses an integrated computational approach to build data supply pipelines. There can be, for example, a global TraM system that aggregates data from satellite databases (including local TraM systems) across the globe. The global TraM system can establish data pipelines between satellite databases and remote TraM systems. There can also be a distributed TraM system whereby a plurality of TraM systems can exchange data between other, remote TraM systems, making the remote TraM systems, in effect, satellite databases. However, unlike a typical satellite database, a data pipeline between multiple TraM systems can be two-way. This allows for updating and retrieval of the most current data available in the plurality of TraM systems. The Tram system can aggregate data using any means known in the art for data exchange, for example, a web service. As the data in a TraM database represents the complete life cycle of translational data, the database thus functions as a "one-stop" data resource for translational researchers (data-mart). In addition, the TraM system can also utilize a web service API to exchange data with outside communities, such as the caGrid network, to enable a broader range of translational data curation.
[0018] Because TraM can be used as a "data-mart" for human subjects, it is important that the system be compliant with the Health Insurance Portability and Accountability Act (HIPAA). The TraM system can use three levels of control for HIPAA compliance: The first level is a network system configuration and firewall settings for TraM servers, the second is authentication and authorization procedures for all TraM users, and the third is a filter for de-identification of patient data. The combination of the second and third levels of control prevents certain un-authorized TraM users, such as public users, from seeing any HDPAA-protected information.
[0019] In addition, TraM is able to address the need for a system that can handle the complex nature of the basic research that must be translated into real therapies for patients. The nature of translational research is such that translational logic regulates the meaningful connections that can be established among a series of research processes conducted in distinct and seemingly disparate research domains. The datasets generated in the different domains and stages of this translational workflow are complicated, but they are regularly attached to unique identifiers, referred to as translational elements (TEs). Outside of a TraM database, the data can be stored in a domain knowledge-specific computational data management system, referred to as a satellite database (SD). Mapping the TEs from various satellite databases allows for the assembly of a visible and meaningful translational dataflow, thus allowing for meaningful scientific connections to be made between/among targeted objects. The TEs can contribute to standardizing the data in order to facilitate the translational workflow. For example, the TEs allow the mapping process to determine which identifiers from an original data source provide datasets representing only distinct data entities. The data can then be further standardized and normalized based on the common data elements (CDE) defined in the TraM system. The CDEs accomplish this by providing a core set of requisite data elements that are necessary for the successful deposition of complete datasets into the TraM system. hi addition, the relationship of the distinct data entities can be reevaluated according to translational research logic. Translational research logic is defined by the different meaningful scientific connections that exist between/among distinct datasets that can be transformed into medical knowledge for patient treatments. For example, a patient care provider might apply such logic to a dataset of genetic polymorphisms in a related patient population (familial relations for example) and determine susceptibility of certain patients in that population to a given disease that had been meaningfully correlated with the presence of said genetic polymorphism. This logic is facilitated in the TraM system because, the TraM system is based on an entity relation design/diagram (ERD), whereby TEs can be mapped within the system using consistent TraM identifiers such as "equal" or "parent-child". TEs can be recorded in the TraM system as required TraM identifiers, ensuring the field is filled in for data tracking purposes. When the "parent-child" relation is required, a foreign key field can be set to NOT NULL to enforce upstream and downstream data continuity. While establishing such requirements for data entry into the TraM system is both a translationally powerful process, outside of the TraM system, it is not trivial to convert a mixture of data of varying quality, irregular formats, and described with inconsistent terminology, into a standardized, meaningful, and sharable knowledge resource. This is evidenced by the highly fragmented nature of such datasets, even if the domain data has all of the necessary identifiers. Currently, there is no synchronization mechanism triggered by work done in individually operated satellite databases which would prompt users to use a relational scheme when entering distinct datasets associated with the same targeted object or a related targeted object. At the same time, there are no regulations that would promote the simultaneous updating of records. The TraM system overcomes these obstacles by providing users with a curator interface, allowing them to bridge the gaps, patch the holes, and complete the originally broken datasets as they go into and are maintained by the TraM system. All of these features uniquely enable the TraM system to give researchers the necessary tools to reach the goal of truly personalized treatment in medicine. As such, the TraM system needs to keep the personalized translational dataflow intact.
[0021] The challenge of data management for translational research comes from the nature of the translational workflows described above. For example, to evaluate knowledge derived from basic research laboratories for clinical applications, researchers use patient biomaterials for studies. As illustrated in FIG. 1, the translational dataflow from this research typically includes records from a vulnerable population, genetic and environmental factors, clinical research and/or trials, pathological diagnoses, and a variety of laboratorial results. The research objects can be animals or human individuals, biospecimens derived from animals or humans, a variety of bio-samples processed from the biospecimens, and the like. The sample types can vary from normal or diseased tissue samples, DNA, RNA, and proteins, to primary or transformed cell lines cultured from the specimens. After further analysis, researchers are likely to focus on a set of genes, a group of pathways, or a regulatory system in their studies and/or patient treatments. Currently, such raw data can be stored on paper, a spreadsheet, binary files (for images), casually and internally developed databases, and more professionally designed data management systems.
[0022] The TraM system overcomes such limitations in data storage and analysis by ensuring data continuity per research object (i.e., a person recruited in a study) in the translational dataflow. The TraM system allows accessing and sharing of data generated over disparate research domains. This allows researchers to oversee research efforts and the progress of research in a translational workflow. The TraM system computationally assures personalized data continuity and dataflow updated simultaneously through a data dependency mechanism. The TraM system recognizes that the copy of domain data in a satellite database does not equal the copy of translational data in the TraM system, due to data update frequency, data standardization and normalization levels, data continuity and completeness status, and data coverage scopes. However, the substance of information in the overlapped area between the TraM system and satellite databases is consistent. Although users can curate data via a TraM user interface, high-throughput data aggregation from satellite databases is an important data aggregation method. Simple data aggregation methodology can not efficiently provide a continuous translational data solution, because such methodology does not improve the data integrity of translational records. For example, a piece of domain data that has all the required mapping identifiers may still be unrelated to the individuals who are recruited to a translational research project. The situation can be likened to an attempt to import the entire human genome dataset into a data-sharing platform combining it with a massive number of clinical datasets. One cannot make a simple connection between such datasets, as the genomic data are not personalized with the entity of the subjects from which the clinical records were obtained. In this context, the quantity of data is not the major obstacle in the translational data analysis, instead it is the quality of the data that is essential to make meaningful connections between/among different data sets. Satellite databases are individually, independently and often inefficiently operated data management systems. There exist no synchronization mechanisms that allow satellite databases to recruit data derived from the same person, so records associated with the same person could be recruited separately within different satellite databases and as such, there exists no meaningful association between these persons. Since the TraM system has a single database system with strong data dependency control, any data entered into the TraM database, no matter which research area it comes from within a users organization, it is automatically positioned within a personalized dataflow, so that a researcher can easily tell which part of the domain records is missing or needs to be improved. Satellite databases are often unable to update records for a person simultaneously, so that from one satellite database, one can not see the updated information stored in the other satellite databases, and vice-versa. However, any domain data updated in the TraM system automatically updates the entire dataflow for an object, and this update can easily be viewed and tracked by TraM users in all research areas. Satellite databases generally include very detailed local operation records, including experimental processes, scheduling, billing, and sample locations. The TraM system does not store all these records, as they are only important to local operations. Instead, the data coverage in TraM is highly streamlined, results- oriented, and continuity-based. Satellite databases may contain data in various formats and named with locally invented terminologies. The TraM system follows standardized guidelines and nomenclature, such as caBIG, SNOMED, ICD, and NCI Thesaurus and keeps metadata records updated during development. Thus, unlike simple data aggregation platforms, the TraM system allows translational researchers to see domain data from disparate domain satellite databases and connects the data from up- to down-stream in a complete translational research cycle resulting in continuous translational data.
[0024] Provided is a functional description of an exemplary system, the respective functions can be performed by software, hardware, or a combination of software and hardware. FIG. 3 is a block diagram illustrating an exemplary operating environment for performing the disclosed methods. This exemplary operating environment is only an example of an operating environment and is not intended to suggest any limitation as to the scope of use or functionality of operating environment architecture. Neither should the operating environment be interpreted as having any dependency or requirement relating to any one or combination of components illustrated in the exemplary operating environment.
[0025] The system and method of the present invention can be operational with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known computing systems, environments, and/or configurations that can be suitable for use with the system and method comprise, but are not limited to, personal computers, server computers, laptop devices, and multiprocessor systems. Additional examples comprise set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that comprise any of the above systems or devices, and the like.
[0026] The processing of the disclosed system and method of the present invention can be performed by software components. The disclosed system and method can be described in the general context of computer-executable instructions, such as program modules, being executed by one or more computers or other devices. Generally, program modules comprise computer code, routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The disclosed method can also be practiced in grid- based and distributed computing environments where tasks are performed by remote processing devices that are linked through a communication network. In a distributed computing environment, program modules can be located in both local and remote computer storage media including memory storage devices.
[0027] Further, one skilled in the art will appreciate that the system and method disclosed herein can be implemented via a general-purpose computing device in the form of a computer 301. The components of the computer 301 can comprise, but are not limited to, one or more processors or processing units 303, a system memory 312, and a system bus 313 that couples various system components including the processor 303 to the system memory 312.
[0028] The system bus 313 represents one or more of several possible types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures can comprise an Industry Standard Architecture (ISA) bus, a Micro Channel Architecture (MCA) bus, an Enhanced ISA (EISA) bus, a Video Electronics Standards Association (VESA) local bus, an Accelerated Graphics Port (AGP) bus, and a Peripheral Component Interconnects (PCI) bus also known as a Mezzanine bus. The bus 313, and all buses specified in this description can also be implemented over a wired or wireless network connection and each of the subsystems, including the processor 303, a mass storage device 304, an operating system 305, TraM software 306, TraM data 307, a network adapter 308, system memory 312, an Input/Output Interface 310, a display adapter 309, a display device 311, and a human machine interface 302, can be contained within one or more remote computing devices 314a,b,c at physically separate locations, connected through buses of this form, in effect implementing a fully distributed system.
[0029] The computer 301 typically comprises a variety of computer readable media. Exemplary readable media can be any available media that is accessible by the computer 301 and comprises, for example and not meant to be limiting, both volatile and non- volatile media, removable and non-removable media. The system memory 312 comprises computer readable media in the form of volatile memory, such as random access memory (RAM), and/or non- volatile memory, such as read only memory (ROM). The system memory 312 typically contains data such as TraM data 307 and/or program modules such as operating system 305 and TraM software 306 that are immediately accessible to and/or are presently operated on by the processing unit 303.
[0030] In another aspect, the computer 301 can also comprise other removable/nonremovable, volatile/non- volatile computer storage media. By way of example, FIG. 3 illustrates a mass storage device 304 which can provide non- volatile storage of computer code, computer readable instructions, data structures, program modules, and other data for the computer 301. For example and not meant to be limiting, a mass storage device 304 can be a hard disk, a removable magnetic disk, a removable optical disk, magnetic cassettes or other magnetic storage devices, flash memory cards, CD-ROM, digital versatile disks (DVD) or other optical storage, random access memories (RAM), read only memories (ROM), electrically erasable programmable read-only memory (EEPROM), and the like.
[0031] Optionally, any number of program modules can be stored on the mass storage device 304, including by way of example, an operating system 305 and TraM software 306. Each of the operating system 305 and TraM software 306 (or some combination thereof) can comprise elements of the programming and the TraM software 306. TraM data 307 can also be stored on the mass storage device 304. TraM data 307 can be stored in any of one or more databases known in the art. Examples of such databases comprise, DB2®, Microsoft® Access, Microsoft® SQL Server, Oracle®, mySQL, PostgreSQL, and the like. The databases can be centralized or distributed across multiple systems.
[0032] TraM data 307 can be stored in a centralized data structure, such as a standardized relational satellite database or a centralized TraM database. A TraM database can be used for assembling and storing continuous translational research data. The TraM database allows a user to focus on translational element mapping and streamlined data collection. Thus, the TraM database system is designed to provide flexibility across a plurality of translational research branches. For example, the demographic field section of the TraM system is highly generic, easy to maintain, and under user control. The TraM system takes advantage of the discovery that there exists a common logic among various branches of translational research and thus, the resulting system is able to cover a broad range of branch specific research, treatment and scientific applications. The TraM system is highly abstractive and representative. In the absence of the TraM system, even minimal project-specific variations may generate unnecessary development work if the attempt was to construct branch specific databases. This in turn can lead to long- term instability and even damage to future software improvement and maintenance in such a database. However, in the TraM system design, once the concept of a data element is correctly abstracted, its generic nature stands out. A regular relational model can define a separate attribute for each distinct data property. This approach can provide excellent data clarity, but, in some aspects, can require that a new column be created for each new data concept. Ongoing scientific inquiry can produce new concepts and/or classes of new concepts dynamically, which demands a more flexible data model able to capture these concepts in controlled vocabulary without disturbing the DB structure. Hence, a domain ontology (DO) can be used as an integrated ERM structure for organizing constantly evolving domain knowledge and concepts. A DO structure alone does not accommodate many-to-may relationships and therefore is not suitable for connecting research objects (person, specimen, and sample) for research data. Furthermore, the hierarchy of a DO cannot connect to other DOs without higher order ontology. The TraM system can focus on domain data that are generated during a translational workflow. These data may belong to the concepts classified in a DO. Therefore, the ER model can establish a relationship between a leaf class of a DO and a research object to integrate domain data. In one aspect, if knowledge concepts in a particular research domain are poorly standardized, they can be organized in a DO structure. For example, when there is a significant uniqueness in survey questionnaires from different translational projects. This is, in fact, a significant challenge with regard to common data elements for questions within a questionnaire. This problem can be alleviated by arranging all the questions in a DO structure as shown in FIG. 4, with the nodes of branch, category (can be one or many layers), and items. Therefore, the question by itself becomes a variable in a value domain under the concept of a question item, and each item can have four other fields to describe its properties. These common data elements create a manageable and powerful questionnaire tree structure that can support a variety of questionnaires from various translational domains. Users can make use of the interoperable survey design by adding new questionnaires without altering the entity relation structure in the schema if a unique demand emerges, and a query overview to see all the survey records per person is also provided.
[0034] As shown in FIG. 4, the TraM system makes use of consistent descriptors for the same data concepts. For example, in questionnaire nomenclature, there could be three concept names associated with a question. Each one has a specific meaning, and will be used consistently. For example, a concept branch name defines a bigger area that a question belongs to, such as social behavior or reproductive system. The sub concepts under it are not the question itself, but the category names of a group of questions, such as drinking, smoking, diet, and etc. The concept name of item is the end node or the leaf of the questionnaire tree, so that there is no sub concept under it, but only the properties to describe it, such as unit of measure (UOM), data type, description, and answer options.
[0035] The TraM system utilizes unique identifiers from source data. Sometimes, a piece of data may have more than one identifier (ID) to describe the same object, such as a different clinical trial database ID and a hospital ID associated with the same person. In the TraM system, the combination of original data ID and site data ID can be used to decide the uniqueness of an object's record. Thus it is important to define which ID should be considered the original data ID. Original data IDs in the TraM system can be defined as the IDs that were associated with the object's original records. For example, if a person always has a hospital medical record number (MRN) before that person has a clinical trial number (CTN), then the original data ID of the person will be the MRN instead of the CTN, even if the data could be directly transferred from a clinical trial database (satellite database). If the clinical trial records also need to be tracked, then the CTN can be the secondary identifier. The secondary identifier field can have a list of identifiers, but they have the equivalent rank at the semantic level, as they are all dedicated to the same object. Tracking the secondary identifiers allows the TraM system to match an object that might be assigned an original data ID after the secondary identifier to the original ID once issued. Therefore, there will be no ambiguity when data is imported from satellite databases.
[0036] Furthermore, the TraM system implements streamlined data coverage in order not to duplicate unrelated records in the TraM database. Streamlined data coverage ignores data that may be necessary in a satellite database for domain operations, but which are irrelevant to translational research. For example, from a fairly large specimen banking data management system, required data includes sample identifiers and basic property descriptions which are associated with the translational research subjects. Sample location records at the departmental or institutional levels can also be required, as translational research is typically conducted in an interdepartmental or cross-institutional fashion, hi terms of data acquisition into the TraM database, this type of data, can either be entered into a particular box and a particular cell, or can be NULL since the data likely has no direct impact on the translational dataflow. The TraM system also supports project- specific extensions, such as subsequently diversifying questionnaires in a longitudinal demographic survey, without breaking the generic architecture of the relational schema.
[0037] Data clarity and query efficiency are also taken into consideration in the
TraM system. Over-normalization of the schema is avoided to prevent the generation of too many NULL cell/table entries "clean tables," which may affect query performance and also add unnecessary work to data deployment. However, in some cases related properties from an entity can be normalized to an extensive degree when the property records are not in need of an entity. For example, some properties/attributes of a sample, such as the results of sample QA/QC, are normally recorded in a specimen bank facility. When data records are acquired from a specimen bank satellite database, these details can be eliminated to avoid unnecessary replication. However, an end table can be attached to the sample entity for the QA/QC records just in case the lab researchers re-do the QA/QC procedure and wish to document the results. The separation of these records permits two sets of QA/QC results, one done in a sample preparation facility and another done in a research lab.
[0038] The TraM database can store both high-resolution data and high- throughput data. However, it is more efficient to store data tracking identifiers for high throughput data which are stored in satellite databases, such as array or sequence databases. For example, the TraM system does not preserve the sequencing data for SNP screening, but can save the mutation records to some extent once they are identified and characterized. Similarly, TraM does not preserve expression array data points. Those records can go to specialized databases, such as caArray and ITTACA. Instead, the TraM system can more efficiently register the experiment number that will make the connection between the patient sample and gene profiling results. Similarly, the TraM system can, but does not need to, include databases that have special built-in analytical functions, such as image storage databases with 3D- visualization functions, with which researchers can manipulate images in order to study them, and pedigree databases, which are used to store genetic annotation records and have graphic displays of pedigree trees. Therefore, having a full copy of such data saved locally in the TraM database does not improve users' ability to utilize them. Instead, storing the experimental log ID that maps to the research object in the TraM system can efficiently facilitate users locatation and utilization of data at source database systems. Because the TraM system holds the data tracking identifiers, researchers can easily follow these leads to pull out all the information accurately and efficiently.
[0039] The TraM system allows for querying of TraM data 307 globally. The TraM system can utilize SQL scripts applied against the static TraM schema to assure consistent and controllable query performance, for example. The query strategies can be instantly assembled and dynamically executed, and controlled by users. The TraM system also allows for cross-domain data query, enabling translational researchers to query data across research domains. The TraM database schema provides a foundation for global queries, as all the tables are related to each other. A sophisticated query application can provide all the options to globally query TraM data 307, with constitutively displayed and dynamically displayed fields.
[0040] The TraM system allows for a user driven dynamic query strategy. It allows users to both determine whether they need to add another layer of specific filters, and choose desired conditions to specify the query targets. All of these options can be determined instantly so that users can adjust their query strategies dynamically to reach the most satisfactory results. The TraM system allows read and write processes to happen in parallel.
[0041] The TraM system allows users to query any predefined fields in a field option list. The field option list can be determined by a TraM database designer after discussions with translational researchers. Each predefined field can support certain data type options, such as keywords, a range of values, or Boolean options. There is no limitation on a query field selection. Decisions regarding which field should be listed depend only on necessity. The TraM system allows users to decide which operator to use, including AND, OR, NOT. Users can iterate the query selection process until they are satisfied by the search returns.
[0042] The TraM system allows users to control query returning fields and data output formats. To simplify the logic-driven process, the query returns are standardized with fixed and dynamic/selected fields as the first result is displayed. For example, a person ID, ethnicity, age, gender, and primary diagnosis (the reason for a person to be enrolled in the study) can be the constitutive fields displayed each time results are returned in an initial query. Dynamic fields can be search variables captured through the user interface when a user assembles a query. After the initial results are returned, the program allows the user to see the details of each field. For example, from a public person ID, the user can retrieve all the defined information for that person, such as demographic information and family history (Progeny) records. Additionally, the TraM system allows a user to edit which field should be displayed until the user has obtained the desired results. Once query results are satisfactory, the TraM system allows users to select data output formats for different purposes, e.g. XML for data exchange, Excel/table for data manipulation, and SAS for statistical analysis.
[0043] In a further aspect, the computer 301 can operate in a networked environment using logical connections to one or more remote computing devices 314a,b,c. The one or more remote computing devices can comprise one or more domain satellite databases. By way of example, a remote computing device can be a personal computer, portable computer, a server, a router, a network computer, a peer device or other common network node, and so on. Logical connections between the computer 301 and a remote computing device 314a,b,c can be made via a local area network (LAN) and a general wide area network (WAN). Such network connections can be through a network adapter 308. A network adapter 308 can be implemented in both wired and wireless environments. Such networking environments are conventional and commonplace in offices, enterprise-wide computer networks, intranets, and the Internet 315.
[0044] The TraM system can utilize various methods for data aggregation. One method for data aggregation occurs through a front-end data curation interface that provides a user with read and write capability to enter and manage data directly into the TraM system. Another method, for automated data aggregation, occurs through a data pipeline established between the computer 301 and on or more of the remote computing devices 314a,b,c over a network such as the Internet 315 through network adapter 308. hi the case of communication between two or more TraM systems, a two-way data pipeline can be established between the TraM systems allowing retrieval and updating of data between the TraM systems.
[0045] Regardless of the data aggregation method used, a data dependency control is used to gain translational data continuity when data is received from the user or the disparate satellite databases. Data is often fragmented or discontinuous, causing a barrier to translational analysis. Data in the workflow are often irrelevant to each other, as such is the case when data do not come from the same object. A simple distributed search engine that aggregates the data from all individual satellite databases does not overcome this barrier. To overcome this barrier, the TraM system uses a more realistic and practical computational regulatory tool to enforce object- related dataflow continuity. The TraM system maps translational elements across the heterogeneous domain data to assure data continuity for a given research object, such as a consented patient. Additionally, researchers commonly lose track of data for individuals in a particular research domain. The admonishment for the curators to always remember to keep upstream identifiers is not effective. As shown in FIG. 4, the TraM system implements a data dependency mechanism in order to keep data continuity of the objects in the study, and this mechanism is particularly enforced in the transition of major research objects.
[0046] FIG. 5 shows an example of computational data dependency flow control.
This exemplary curation flow chart describes the data dependency logic control in when a curator needs to enter the records in an Allele Map table 501 in a TraM system. The solid lines show a required action before the next step takes place. The arrows show the direction of logic control. The dashed lines show events without strong data dependency, which means that the event may or may not happen - the data may or may not be required. Diamonds show a decision that is made in order to force curator to complete the information required. Note that a public user takes a completely different route (to query home 502) and sees a different interface. For example, if a curator needs to build an allele map for a person, the curator must fill in all related data entities that are connected with solid lines in FIG. 5. The TraM system requires the curator to always fill in the necessary upstream data before moving on to the downstream records, so that the available data values in the parent table can easily be arranged as the option list when a child entity needs to make an association. For a data legacy that has no way to patch missing records, TraM allows the curator to fill in "unknown" as an option without breaking data dependency. Therefore, if TraM can not completely resume data continuity for retrospective data, it at least assures prospective data continuity and integrity. In addition, the TraM curator interface allows patching of missing data when the missing data are available; "unknown" fields can be updated with known data so that data quality and continuity are significantly improved. The front-end data curation interface utilized by the TraM system allows for
TraM system data entry and management. Translational data curators and other users, such as researchers, can view the data integrity and continuity over the entire translational workflow. A statistical overview can be provided in the curator interface for each project with a built-in trigger, so that curators and translational researchers can see a global picture of their data collection and research progress in different areas from time to time. The TraM system provides flexibility and regulatory functions for data entry. For example, demographic surveys on research subjects are indispensable elements in translational research. As each survey has a mixture of questionnaires defined by individual principal investigators, this may generate a plurality of descriptors for the same concepts. A tree structure can be used, an example is shown in FIG. 4, to organize the questionnaires and support almost unlimited flexibility for the question options for various branches. To minimize the redundancy of questions, dictionary data tables can be pre-deployed by an experienced curator or bioinformatician, to control annotation ambiguity and redundancy. Dictionary data describe concepts or official nomenclature used across a domain, such as disease concept names defined by SNOMED or ICD. The TraM system can directly adopt these concepts from public domains. This type of data can be relatively static and without upper level data dependency; a relationship table can be used to establish the connection between research objects and biomedical concepts. This relationship describes when and how an action is taken, for example, a diagnostic date or an experiment date and the diagnostic methods or experimental methods.
[0048] The front-end data curation interface described above can be used to conduct translational data management. A complementary solution is provided herein to interact with satellite databases for consistent and high-throughput data aggregation. There exist several data aggregation mechanisms available in the art, such as web services. However, limitations include: translational data stored in satellite databases are currently in a mixture of formats; semantic integration of data concepts has not been applied; satellite database systems do not always reside in a web server environment; and satellite database administrators may resist the idea of setting up a web service interface for data export. Therefore, web services are but one solution for transferring data from a satellite database to the TraM system, and accordingly, other means for transferring the data are specifically contemplated. The TraM web service allows for data aggregation from the translational community, and relies on content compliant with standards (i.e., caBIG, SNOMED, etc.). An example of overall data aggregation mechanisms are diagramed in FIG. 6.
[0049] Some satellite databases could have been built with an assortment of technologies. Also, satellite database architectures could have been constructed based on a mixture of design philosophies, or developed by people with various qualifications. Finally, the systems can be under various administrative policies. Sometimes, there is no straightforward way to retrieve the data from them, or even to access the data remotely. These satellite databases can be treated on a case by case basis. If the satellite database does not reside on a web server, the TraM system can access the database through a programming API to retrieve the data. This can be used to aggregate large amounts of data from, for example, HIPAA-compliant satellite databases which are not supported by web servers and are under restrictive management policies.
[0050] The TraM system can interact with HIPAA-compliant databases and public databases. Data supply pipelines can connect to HIPAA-compliant satellite databases, since TraM users often have no other efficient way to access these satellite databases, nor are they able to make a connection between the satellite databases. There could be a lag between the updating of satellite databases and that of the TraM system. However, there is no need to build a data supply pipeline to public databases, such as dbSNP and Entrez Gene. Using the Internet through direct links to public data can be sufficient for researchers to view the most updated public information. For example, the TraM database can store public data identifiers rather than store the public data.
[0051] Two basic processes are integrated into TraM data aggregation procedures regardless of the aggregation mechanism used. These processes are streamlined data selection and standardization. Keeping the TraM data coverage streamlined makes data transfer manageable. Streamlined data elements are defined in the TraM system, but the selection of these elements from satellite databases is not straightforward. Data element selection requires familiarity with the satellite database schema, and an understanding of the data descriptors and their definitions. Once these data objects and attributes are obtained, the TraM system can perform data standardization.
[0052] A method for data standardization comprises identifying a mapping relation between source data and an existing TraM data structure, redefining and reconfiguring the data value and concept domain, using consistent terminology to describe the data concept and keeping the metadata record for the definition, and reformatting the value domain expression, converting data from various sources into a consistent format.
[0053] Some datasets are difficult to aggregate for a variety of reasons. For example, some data are in paper or Excel spreadsheets, and some are in casually built databases. Here the term "casually" describes the type of databases that are not professionally designed, and lack routine database maintenance, or are no longer in use but still contain data. Datasets under such storage conditions can be particularly corrupted, since the names of data concepts are often locally invented, related data within a data storage method or between storage methods are not updated simultaneously, and some records are contradictory. In such cases, TraM system users can either curate the data through the user interface that TraM provides or perform the data standardization procedure as described above. [0054] FIG. 7 A shows a set of allele data in a Microsoft Excel spreadsheet before standardization. The value domain of the allele, a data element in the table, was misused as the column name - which should be a concept domain. The association of these alleles with a patient was treated as Boolean (shown as check mark), so that the allele records arranged this way do not have any flexibility and scalability, hi addition, this piece of data has not been integrated into translational dataflow, so that there is no easy way to make a connection between the allele map and patient diagnosis or treatments. FIG. 7B shows data that has been reconfigured and standardized and deployed in a TraM database. Therefore information can easily be pulled from any field of interest starting from the listed fields as long as they are attributes within the TraM database. A dataflow has been developed from a vulnerable population to an individual's genotype, the exemplary dataflow shown in FIG. 1. The results shown in FIG. 7A and 7B provide a confirmation that the schema is able to manage the data from a full life cycle of translational research.
[0055] The TraM system is an efficient tool that allows researchers to focus on their scientific problems, shielded from computation requirements, and to some degree, independent from informatics support, by providing an end user administration interface to a TraM administrator, or any type of special end user. The TraM system accomplishes this by providing user-controllable utilities so that they can make system and data management an efficient and satisfactory experience.
[0056] The TraM system allows for user authentication. For example, user credentials can be stored in a LDAP (Lightweight Directory Access Protocol) server, which is commonly available in a Unix/Linux server environment. If a server or server system supports more than one database at the same time, then the authentication procedure can be shared among them. The validation of a credential record can only be known by the return value of a comparison result, not the credential itself, so this prevents anyone other than the users themselves from uncovering these credentials. The TraM system allows for role authorization. Authorization sets permission for users to access curator pages and/or query pages. This can be implemented, for example, as a static HashMap object with pairs made of a string name and an integer number to represent different levels of role authorization.
[0057] The TraM system can provide a "broker function" that regulates different users to access different levels of information or the same level of information but different kind of records. For example, the administrator, curator, power user, and public user can be represented by levels 0, 1, 2, and 3, respectively. In general, TraM users are grouped and authorized to work or view the data within their own translational branches. Under each translational branch, TraM can allow, for example, four types of users: 1) TraM administrator, who can authorize the other users' proper roles with different privileges in TraM applications, so that TraM administration does not depend on a database administrator, 2) TraM curator, who can read and write TraM data and who can see both "private" (before de- identification) and "public" (after de-identification) data. The curator can be responsible for inputting patient original data identifiers to keep tracking records of the data's origins. Private identifiers, such as medical record numbers, can be HIPAA-protected information 3) TraM public/regular user, who after logging in, will see read-only data to conduct data queries. Public users only see the de-ID data results, 4) Power user, who is usually a principal investigator or authorized researcher. This type of user, like the public user, will have read-only permission, but can view private identifiers, through a special ID mapping table, which can facilitate the tracking of original records if needed. A curator is also a power user by default. All these authorization procedures and role privilege controls have been implemented and tested, and will be continuously tested when the new application module is delivered. When a user requests permission to enter to a particular project page after login, the user's signature can be compared to the page-allowed profile, such as project group, user group, and application group, and then decide if the privilege is granted. The user profile can be kept in an account table in the TraM database. The account table contains all the information about a user profile, including user role, application type, which project the user belongs to, and the user's basic login information. The user profile (in account) functions as a user's signature. It provides sufficient information to make authorization decisions. The signature can be kept in a session object and a user's activities can be recorded, especially in the curation process. In this group of physical tables, the project table is the root table. It connects user account and person. If the project is matched and the role level is higher or equal to the required role level, then the user is permitted to access the page. Otherwise the user can be denied access and, for example, presented with a message requesting a higher level of the authorization.
[0058] In the ongoing research process, it is a common practice for researchers to keep data private until their work is published. The TraM system provides data privacy control. Each translational project user group can make decisions on when and how to share their data with others. When multiple translational projects and their users access the TraM database in parallel, each user is confined by his or her role in application privilege and also by the project to which they belong. For example, lung cancer researchers cannot see the data managed and owned by breast cancer researchers and vice-versa. FIG. 8A illustrates an exemplary Curator Person Module Page. A project name is listed on the upper right corner as a variable, m this case, the curator belongs to Breast Cancer Project group, the horizontal tags show the major curation modules, and each one has a side-link list for sub modules, here person module side links are shown. This interface illustrates that record 123456 exists in breast cancer group. FIG. 8B illustrates TraM data privacy control. In this case, the curator belongs to a Head and Neck Project. Note that the project variable has changed at upper right corner. When this curator attempted to view the record 123456 in the Breast Cancer Project, the TraM system blocked this attempt. Although the curators can use identical interfaces, the viewable data is different.
[0059] In another aspect, the user can enter commands and information into the computer 301 via an input device (not shown). Examples of such input devices comprise, but are not limited to, a keyboard, pointing device (e.g., a "mouse"), a microphone, a joystick, a scanner, tactile input devices such as gloves, and other body coverings, and the like These and other input devices can be connected to the processing unit 303 via a human machine interface 302 that is coupled to the system bus 313, but can be connected by other interface and bus structures, such as a parallel port, game port, an IEEE 1394 Port (also known as a Firewire port), a serial port, or a universal serial bus (USB).
[0060] In yet another aspect of the present invention, a display device 311 can also be connected to the system bus 313 via an interface, such as a display adapter 309. It is contemplated that the computer 301 can have more than one display adapter 309 and the computer 301 can have more than one display device 311. For example, a display device can be a monitor, an LCD (Liquid Crystal Display), or a projector. In addition to the display device 311, other output peripheral devices can comprise components such as speakers (not shown) and a printer (not shown) which can be connected to the computer 301 via Input/Output Interface 310.
[0061] For purposes of illustration, application programs and other executable program components such as the operating system 305 are illustrated herein as discrete blocks, although it is recognized that such programs and components reside at various times in different storage components of the computing device 301, and are executed by the data processor(s) of the computer. An implementation of TraM software 306 can be stored on or transmitted across some form of computer readable media. Computer readable media can be any available media that can be accessed by a computer. By way of example and not meant to be limiting, computer readable media can comprise "computer storage media" and "communications media." "Computer storage media" comprise volatile and non-volatile, removable and nonremovable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules, or other data. Exemplary computer storage media comprises, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD- ROM, digital versatile disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by a computer.
[0062] The methods and systems of the present invention can employ Artificial
Intelligence (AI) techniques such as machine learning and iterative learning. Examples of such techniques include, but are not limited to, expert systems, case- based reasoning, Bayesian networks, behavior-based AI, neural networks, fuzzy systems, evolutionary computation (e.g. genetic algorithms), swarm intelligence (e.g. ant algorithms), and hybrid intelligent systems (e.g. Expert inference rules generated through a neural network or production rules from statistical learning). II. Methods
[0063] Provided are methods for assembling, managing, and using the continuous translational data system described above. Unless otherwise expressly stated, it is in no way intended that any method set forth herein be construed as requiring that its steps be performed in a specific order. Accordingly, where a method claim does not actually recite an order to be followed by its steps or it is not otherwise specifically stated in the claims or descriptions that the steps are to be limited to a specific order, it is no way intended that an order be inferred, in any respect. This holds for any possible non-express basis for interpretation, including: matters of logic with respect to arrangement of steps or operational flow; plain meaning derived from grammatical organization or punctuation; the number or type of embodiments described in the specification.
[0064] Provided, and illustrated in FIG. 9, are methods for assembling a continuous translational data system, comprising retrieving, from a plurality of databases, data reflecting a plurality of research domains wherein the data have translational elements at 901, standardizing the data at 902, mapping the translational elements between the data at 903, and aggregating the standardized data into a centralized data structure wherein the mapped translational elements allow for continuous translational dataflow at 904.
[0065] Data reflecting a plurality of research domains can comprise patient demographic data, patient lifestyle data, patient progenitor and progeny data, clinical data, pathology data, tissue bank samples, and laboratory results. Clinical data can include any data generated in a clinic, for example, clinical patient data, clinical diagnosis data, clinical trial data, clinical treatment data, and the like.
[0066] Translational elements (TEs) can comprise the unique identifiers regularly attached to datasets generated in the different domains and stages of a translational workflow.
[0067] The step of retrieving can comprise retrieving data from a satellite database for incorporation into the TraM system, retrieving data from another TraM database, and the like.
[0068] Standardizing the data can comprise identifying a mapping relation between source data and an existing TraM data structure, redefining and reconfiguring the data value and concept domain, using consistent terminology to describe the data concept and keeping the metadata record for the definition, and reformatting the value domain expression, converting data from various sources into a consistent format.
[0069] The TraM system incorporates standardization of data in order to facilitate the translational workflow. Standardized data can comprise data that utilizes unique identifiers to determine which identifiers from an original data source provide datasets representing only distinct data entities. Standardized data can further comprise data based on the common data elements (CDE) defined in the TraM system. The CDEs accomplish this by providing a core set of requisite data elements that are necessary for the successful deposition of complete datasets into the TraM system.
[0070] Mapping the translational elements can comprise determining each identifier from an original data source representing a distinct data entity. The data can then be further standardized and normalized based on the common data elements (CDE) defined in TraM. The relationship of the distinct data entities (represented by TEs) can be reevaluated according to translational research logic. TEs can be mapped to consistent TraM identifiers as "equal" or "parent and child relation" relations constructed in the TraM schema based on an entity relation design/diagram (ERD).
[0071] Continuous translational dataflow can comprise the contribution of multiple types of research data from various types of research branches to a research workflow, such data encompassing unique research domains and including but not limited to patient demographic data, patient lifestyle data, patient progenitor and progeny data, clinical data, pathology data, tissue bank samples, and laboratory results. Clinical data can include any data generated in a clinic, for example, clinical patient data, clinical diagnosis data, clinical trial data, clinical treatment data, and the like. Such data can be tracked through generations or compared across individual records.
[0072] Also provided, and illustrated in FIG. 10, are methods for providing a continuous translational data system, comprising assembling a data system having mapped translational elements wherein data comprising the data system is received from a first data source and a second data source at 1001 and providing access to the data system for a fee at 1002.
[0073] Because the TraM system is based on an entity relation design/diagram
(ERD), TEs are recorded within the system using consistent TraM identifiers such as "equal" or "parent-child," such recorded elements being referred to as mapped translational elements.
[0074] A data source can comprise an Institution that provides data to the TraM system. An Institution can be any type of entity, such as an on-going business, such as IBM or Amgen Inc., or a University or College, such as the University of Chicago, a Foundation, such as the Diabetes Foundation, a Research Institute, such as Scripps in San Diego, or any other organization. It is understood that the Institution can be public or private. Also the Institution can be for-profit or a nonprofit. A non data source can comprise an Institution that does not provide data to the centralized Tram database, but utilizes the TraM system.
[0075] The step of assembling can comprise aggregating data from a plurality of satellite databases, standardizing the aggregated data and mapping translational elements from the standardized aggregated data. Assembling a data system can further comprise owning the mapped translational elements. Providing access to the data system can comprise licensing use of the mapped translational elements.
[0076] A fee can be any form of compensation. This can be in the form of monetary means, security, such as stock, or in the form of a cross license for example. A flat fee can comprise a fee that is not based on any other factor to determine the fee amount.
[0077] Providing access can comprise allowing an Institution to contribute, manage, edit, query, and/or retrieve data to/from the data system either locally, remotely, or both.
[0078] Providing access for a fee can comprise providing access to the data system to the first data source for a fee based on the quantity of data received from the first data source, and providing access to the data system to the second data source for a flat fee.
[0079] Providing access to the data system for a fee can further comprise providing access to the data system to the first data source for a fee inversely proportional to the quantity of data received from the first data source.
[0080] Providing access to the data system for a fee can further comprise providing access to the data system to a non data source for a flat fee. Providing access to the data system for a fee can further comprise providing access to the data system for a usage based fee.
[0081] The methods can further comprise assembling a data system having mapped translational elements wherein data comprising the data system is received from a third data source and providing access only to the data in the data system received from the third data source to the first data source for a fee.
[0082] Also provided, and illustrated in FIG. 11, are methods for querying a continuous translational data system, comprising identifying a targeted object at 1101, accessing a data system wherein the data system comprises data having mapped translational elements at 1102, querying the data system at 1103, receiving results associated with the query wherein the results reflect a continuous translational dataflow at 1104, and displaying the received results at 1105.
[0083] An object can comprise, for example, an animal, a human, biospecimens derived from animals or humans, bio-samples processed from the biospecimens, and the like, from which demographic, genotypic, phenotypic, progeny-related, progenitor-related and other information is obtained. A targeted object can comprise an object that is the focus of a query.
[0084] Because the TraM system is based on an entity relation design/diagram
(ERD), TEs are recorded within the system using consistent TraM identifiers such as "equal" or "parent-child," such recorded elements being referred to as mapped translational elements.
[0085] The mapped translational elements can represent mapped patient demographic data, patient lifestyle data, patient progenitor and progeny data, clinical data, pathology data, tissue bank samples, and laboratory results. Clinical data can include any data generated in a clinic, for example, clinical patient data, clinical diagnosis data, clinical trial data, clinical treatment data, and the like. For example, objects sharing genetic data with the targeted object, objects sharing at least one demographic attribute with the targeted object, objects sharing at least one lifestyle attribute with the targeted object, objects sharing at least one progenitor and progeny attribute with the targeted object, objects sharing at least one diagnostic result with the targeted object, objects sharing at least one clinical trial result with the targeted object, objects sharing at least one pathology attribute with the targeted object, objects sharing at least one tissue bank sample attribute with the targeted object, objects sharing at least one laboratory results with the targeted object, and the like.
[0086] Demographic attributes can comprise gender, age, race, weight, height, and the like.
[0087] Lifestyle attributes can comprise food consumption, alcohol consumption, medicine consumption, weight, and the like.
[0088] Querying the data system can further comprise a demographic parameter.
[0089] The methods can further comprise determining the targeted object's susceptibility for a condition based on the received results, determining a treatment course for a condition of interest of the targeted object based on the received results, and determining a relationship between targeted objects based on the received results. Such determinations are based on the application of translational logic, which regulates the meaningful connections that can be established among a series of research processes conducted in distinct and seemingly disparate research domains.
[0090] A condition of interest can comprise susceptibility for a disease state, susceptibility for toxicity, normal, or enhanced response to a given treatment course, a disease state.
[0091] While this invention has been described in connection with preferred embodiments and specific examples, it is not intended that the scope be limited to the particular embodiments set forth, as the embodiments herein are intended in all respects to be illustrative rather than restrictive.
[0092] It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the scope or spirit. Other embodiments will be apparent to those skilled in the art from consideration of the specification and practice disclosed herein. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit being indicated by the following claims.

Claims

Claims
1. A method for assembling a continuous translational data system, comprising: retrieving, from a plurality of satellite databases, data reflecting a plurality of research domains wherein the data have translational elements; standardizing the data; mapping the translational elements between the data; and aggregating the standardized data into a centralized data structure wherein the mapped translational elements allow for continuous translational dataflow.
2. A method for providing a continuous translational data system, comprising: assembling a data system having mapped translational elements wherein data comprising the data system is received from a first data source and a second data source; and providing access to the data system for a fee.
3. The method of claim 2, wherein assembling a data system further comprises owning the mapped translational elements.
4. The method of claim 3, wherein providing access to the data system comprises licensing use of the mapped translational elements.
5. The method of claim 2, wherein providing access to the data system for a fee further comprises: providing access to the data system to the first data source for a fee inversely proportional to the quantity of data received from the first data source.
6. The method of claim 2, wherein providing access to the data system for a fee further comprises providing access to the data system to a non data source for a flat fee.
7. The method of claim 2, wherein providing access to the data system for a fee further comprises providing access to the data system for a usage based fee.
8. The method of claim 2, further comprising: assembling a data system having mapped translational elements wherein data comprising the data system is received from a third data source; and providing access only to the data in the data system received from the third data source to the first data source for a fee.
9. A method for querying a continuous translational data system, comprising: identifying a targeted object; accessing a data system wherein the data system comprises data having mapped translational elements; querying the data system; receiving results associated with the query wherein the results reflect a continuous translational dataflow; and displaying the received results.
10. The method of claim 9, further comprising: wherein the mapped translational elements represent objects sharing genetic data with the targeted object; and determining the targeted object's susceptibility for a condition based on the received results.
11. The method of claim 10, further comprising: wherein the mapped translational elements represent objects sharing at least one demographic attribute with the targeted object; and wherein querying the data system comprises a demographic parameter.
12. The method of claim 9, further comprising: wherein the mapped translational elements represent objects sharing genetic data with the targeted object; and determining a treatment course for a condition of interest of the targeted object based on the received results.
13. The method of claim 12, further comprising: wherein the mapped translational elements represent objects sharing at least one demographic attribute with the targeted object; and wherein querying the data system comprises a demographic parameter.
14. The method of claim 9, further comprising: wherein the mapped translational elements represent objects sharing genetic data with the targeted object; and determining a relationship between the targeted object and the objects sharing genetic data with the targeted object based on the received results.
15. The method of claim 14, further comprising: wherein the mapped translational elements represent objects sharing at least one demographic attribute with the targeted object; and wherein querying the data system comprises a demographic parameter.
16. The method of claim 9, further comprising: wherein the mapped translational elements represent objects sharing at least one demographic attribute with the targeted object; wherein querying the data system comprises a demographic parameter; and determining the targeted object's susceptibility for a condition based on the received results.
17. The method of claim 16, wherein the mapped translational elements represent objects sharing genetic data with the targeted object.
18. The method of claim 9, further comprising: wherein the mapped translational elements represent objects sharing at least one demographic attribute with the targeted object; wherein querying the data system comprises a demographic parameter; and determining a treatment course for a condition of interest of the targeted object based on the received results.
19. The method of claim 18, wherein the mapped translational elements represent objects sharing genetic data with the targeted object.
20. The method of claim 9, further comprising: wherein the mapped translational elements represent objects sharing at least one demographic attribute with the targeted object; wherein querying the data system comprises a demographic parameter; and determining a relationship between the demographic parameter and the condition of interest based on the received results.
21. The method of claim 20, wherein the mapped translational elements represent objects sharing genetic data with the targeted object.
PCT/US2008/053279 2007-02-07 2008-02-07 Translational data mart WO2008098106A2 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US88856607P 2007-02-07 2007-02-07
US60/888,566 2007-02-07

Publications (3)

Publication Number Publication Date
WO2008098106A2 true WO2008098106A2 (en) 2008-08-14
WO2008098106A3 WO2008098106A3 (en) 2008-10-16
WO2008098106A9 WO2008098106A9 (en) 2008-12-04

Family

ID=39682408

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2008/053279 WO2008098106A2 (en) 2007-02-07 2008-02-07 Translational data mart

Country Status (1)

Country Link
WO (1) WO2008098106A2 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9251215B2 (en) 2011-01-14 2016-02-02 Hewlett Packard Enterprise Development Lp Data staging for results of analytics
CN117995332A (en) * 2024-04-07 2024-05-07 北方健康医疗大数据科技有限公司 Value range code standardized conversion system and method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6035403A (en) * 1996-09-11 2000-03-07 Hush, Inc. Biometric based method for software distribution
US20020065758A1 (en) * 2000-03-02 2002-05-30 Henley Julian L. Method and system for provision and acquisition of medical services and products
US20030195770A1 (en) * 2002-04-16 2003-10-16 Yokogawa Electric Corporation Medical data processing system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6035403A (en) * 1996-09-11 2000-03-07 Hush, Inc. Biometric based method for software distribution
US20020065758A1 (en) * 2000-03-02 2002-05-30 Henley Julian L. Method and system for provision and acquisition of medical services and products
US20030195770A1 (en) * 2002-04-16 2003-10-16 Yokogawa Electric Corporation Medical data processing system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
WANG E. ET AL.: 'A strategy for detection of known and unknown SNP using a minimum number of oligonucleotides applicable in the clinical settings' JOURNAL OF TRANSLATIONAL MEDICINE vol. 1, 2003, pages 1 - 14, XP021009805 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9251215B2 (en) 2011-01-14 2016-02-02 Hewlett Packard Enterprise Development Lp Data staging for results of analytics
CN117995332A (en) * 2024-04-07 2024-05-07 北方健康医疗大数据科技有限公司 Value range code standardized conversion system and method

Also Published As

Publication number Publication date
WO2008098106A9 (en) 2008-12-04
WO2008098106A3 (en) 2008-10-16

Similar Documents

Publication Publication Date Title
US8494874B2 (en) System and method for recruiting patients for medical research across a vast patient population
Chen et al. Medical informatics: knowledge management and data mining in biomedicine
Crowley et al. caTIES: a grid based system for coding and retrieval of surgical pathology reports and tissue specimens in support of translational research
Luo et al. A hybrid solution for extracting structured medical information from unstructured data in medical records via a double-reading/entry system
Sittig et al. A survey of informatics platforms that enable distributed comparative effectiveness research using multi-institutional heterogenous clinical data
Foran et al. Roadmap to a comprehensive clinical data warehouse for precision medicine applications in oncology
Deshmukh et al. Evaluating the informatics for integrating biology and the bedside system for clinical research
Khan et al. Towards development of health data warehouse: Bangladesh perspective
Toga et al. The informatics core of the Alzheimer's Disease Neuroimaging Initiative
WO2019129884A1 (en) Method of using medical data related to patients suffering a given disease
Niland et al. An informatics blueprint for healthcare quality information systems
Cremonesi et al. The need for multimodal health data modeling: A practical approach for a federated-learning healthcare platform
Begoli et al. Towards a heterogeneous, polystore-like data architecture for the US Department of Veteran Affairs (VA) enterprise analytics
Drake et al. A system for sharing routine surgical pathology specimens across institutions: the Shared Pathology Informatics Network
Wang et al. Translational integrity and continuity: personalized biomedical data integration
Corradi et al. A repository based on a dynamically extensible data model supporting multidisciplinary research in neuroscience
WO2008098106A2 (en) Translational data mart
Samra et al. GENE2D: a NoSQL integrated data repository of genetic disorders data
Harrison Jr Pathology informatics questions and answers from the University of Pittsburgh pathology residency informatics rotation
Chen et al. Development of a Radiation Oncology–Specific Prospective Data Registry for Research and Quality Improvement: A Clinical Workflow-Based Solution
Hibbert et al. The molecular medicine informatics model (MMIM)
Mallika et al. Technological perspective on precision medicine in the context of big data—a review
Amin et al. An informatics supported web-based data annotation and query tool to expedite translational research for head and neck malignancies
Alghamdi Health data warehouses: reviewing advanced solutions for medical knowledge discovery
Crichton et al. An informatics architecture for the virtual pediatric intensive care unit

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08729257

Country of ref document: EP

Kind code of ref document: A2

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08729257

Country of ref document: EP

Kind code of ref document: A2