EP2506160A1 - Structures de données unifiées pour système d'informations de données scientifiques - Google Patents
Structures de données unifiées pour système d'informations de données scientifiques Download PDFInfo
- Publication number
- EP2506160A1 EP2506160A1 EP11002765A EP11002765A EP2506160A1 EP 2506160 A1 EP2506160 A1 EP 2506160A1 EP 11002765 A EP11002765 A EP 11002765A EP 11002765 A EP11002765 A EP 11002765A EP 2506160 A1 EP2506160 A1 EP 2506160A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- data
- unified
- data structure
- unified data
- content
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/256—Integrating or interfacing systems involving database management systems in federated or virtual databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/211—Schema design and management
- G06F16/212—Schema design and management with details for data modelling support
Definitions
- This disclosure relates to scientific data information systems, and in particular to unified data structures for scientific data information systems.
- GxP regulatory/e.g., GxP
- product e.g., biologic/pharmaceutical product
- GxP compliance requires a system of administrative and information access controls, as well as audit trails of user activities and record alterations.
- the invention provides a computer-implemented method includes storing a plurality unified data structures within a data base of a scientific data information system.
- the unified data structures can include information corresponding to different data types.
- Implementations may include one or more of the following features.
- each unified data structure includes header information, data streams, structured XML based data and signatures.
- each unified data structure includes primary data elements and supporting data elements.
- the primary data elements can include an admin data element comprising basic fields, an instance data element comprising data specific to the unified data structure in XML form, a stream element comprising maintenance information concerning stored data streams, stream data comprising content of data streams, and a signature data element comprising electronic/digital signatures applied on the unified data structure.
- the supporting data elements can include a catalog comprising redundant searchable data extracted from the instance data element, comments comprising file attachments, hyperlinks to external data sources and/or to other ones of the unified data structures, search keywords, and thumbnails comprising renditions of the unified data structure content or a default data format specific image to be used in hit lists or other areas to provide a first visual impression of the unified data structure content.
- the different data types can be analyses, raw data acquired from laboratory instruments, reports, and/or methods (e.g., analysis methods, signature methods, etc.).
- the unified data structures are generic and do not contain information about any applications which work on the associated data.
- the data structure can include primary data elements and supporting data elements.
- the primary data elements can include an admin data element comprising basic fields, an instance data element comprising data specific to the unified data structure in XML form, a stream element comprising maintenance information concerning stored data streams, stream data comprising content of data streams, and a signature data element comprising electronic/digital signatures applied on the unified data structure.
- the supporting data elements can include a catalog comprising redundant searchable data extracted from the instance data element, comments comprising file attachments, hyperlinks to external data sources and/or to other ones of the unified data structures, search keywords, and thumbnails comprising renditions of the unified data structure content or a default data format specific image to be used in hit lists or other areas to provide a first visual impression of the unified data structure content.
- FIG. 1 illustrates a scientific data information system 10 that provides for acquisition of chromatography and mass spectrometry data (within a single software environment), instrument control, data processing and mining, and reporting, with GxP laboratory compatibility that allows for deployment throughout a science-driven organization.
- the scientific data information system includes an instrument systems server (ISS) 20, an application server 30, a database 40, and client software 50.
- the ISS 20 is in communication with laboratory instruments 60a-c and includes computer-executable instructions for handling instrument control and data acquisition.
- the laboratory instruments 60a-c can include, for example, chromatographic instruments 60a, detectors 60b (e.g., UV detectors), and mass spectrometers 60c.
- Exemplary chromatographic instruments include ACQUITY UPLC® H-Class Bio System, available from Waters Corporation of Milford, Massachusetts.
- Exemplary detectors include the ACQUITY UPLC® Tunable UV (TUV) Detector, available from Waters Corporation.
- Exemplary mass spectrometers include the Xevo® G2 Tof mass spectrometer, available from Waters Corporation.
- the ISS 20 performs two functions (i) system coordination, and (ii) data buffering.
- the ISS 20 can coordinate operation of the instruments based on information (e.g., instrument method and sample set information) received from the application server 30, which allows the ISS 20 to set up the instruments and start an acquisition.
- Instrument methods include instructions for controlling operating parameters of one of an attached instrument.
- the ISS 20 also provides status information back to the application server 30 during a run.
- the acquired data (e.g., chromatographic and/or mass spectrometry (MS) data) is received by the ISS 20 from the laboratory instruments 60a-c in native instrument format.
- the data is then translated by the ISS 20 to unified datafile format. Converted data is stored by the ISS 20 in a secure file buffer, and a rolling SHA1 checksum, incremented with each data packet, ensures fidelity and security of data.
- a final checksum is calculated upon acquisition completion and the raw data file is delivered to the database 40 where it is stored and locked.
- the application server 30 is in communication with the ISS 20, the database 40, and the client software 50.
- the application server 30 is a collection of software that handles the business logic (i.e., the functions that the associated software performs on the data).
- the application server 30 retrieves data (from the database 40), processes and presents data to a graphical user interface 70, processes input data (e.g., from the graphical user interface 70), and sends method (e.g., instrument method) and sample set information to the ISS 20 to set up the instruments and start an acquisition.
- the application server 30 and the ISS 20 communicate on a host of configuration and setup issues, such as downloading instrument drivers to the ISS 20, configuring instrument systems, etc. This is driven from the application server 30 to the ISS 20.
- the application server 30 includes computer-executable instructions for providing administrative and information access controls, as well as for providing audit trails of user activities and record alterations in accordance with GxP compliance requirements.
- Each unique user has tunable information access (method, data, results, etc.) limitations and activity restrictions dictated by their assigned roles. Users can include administrators, managers, analysts, and principal scientists.
- the application server 30 also includes computer-executable instructions for performing data processing, e.g., to reduce the raw data acquired from the laboratory instruments 60a-c into usable reports.
- Data e.g., chromatographic data, spectral (MS) data, and bioinformatics
- MS spectral
- bioinformatics can be processed, by the application server 30, while acquisition is ongoing if processing parameters are specified within a method (e.g., an analysis method).
- Analysis methods can describe expected system hardware configurations, separation and MS parameters, spectral processing and bioinformatics analysis tasks, and links to automated reporting templates, which can be used to automate production of standardized reports.
- a copy of the corresponding analysis method can be stored as part of each results set.
- the application server 30 also relays information, e.g., method and sample set information, to the ISS 20, which then controls the instruments 60a-c according to the information provided.
- the application server 30 can also include a search engine, which can allow users to search the contents of the database 40.
- the database 40 is a relational database. Relational databases enable real-time acquisition, processing, and management of large volumes of data from multiple sources. This can allow for simultaneous processing, review, and acquisition of data and parallel data acquisition from multiple instrument systems. Suitable relational databases include the Oracle® 11gR2 relational database, available from Oracle Corporation of Redwood Shores, California. Information stored in the database 40 can include many different data types (e.g., analyses, raw data, reports, historical data, methods, etc.) which may be stored in a unified data structure (also referred to as a "Content Item"). The use of a unified data structure can help to enable all laboratory functions to work with a common backbone of analytical information. This data standardization can also help to increase the exchange of information within an organization (e.g., between product development and product manufacturing), and, in some cases, even globally (e.g., with third-party partners).
- a unified data structure also referred to as a "Content Item"
- the client software 50 includes computer-executable instructions (e.g., a Windows Presentation Foundation (or WPF) piece of code) for providing the graphical user interface 70 which displays data and allows the user to interact with the data (via the application server 30). Users can use the graphical user interface to select/define methods (e.g., instrument methods, analysis methods, capture methods, signature methods), to process data, and electronically review and sign reports. When the user decides to process data, instructions are sent the application server 30, where the processing takes place.
- WPF Windows Presentation Foundation
- the client software 50 also includes a print driver 80 for performing print capture.
- the software generates a print file and moves the file through the application server 30 and stores it within the database 40.
- the print capture feature can be used for brining in auxiliary information into the system.
- the client software 50 can also include a browser that can communicate with the search engine of the application server 30 to allow the user to perform text searches for data within the database 40.
- machine-readable medium and computer-readable medium refer to any computer program product, apparatus and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal.
- PLDs Programmable Logic Devices
- machine-readable signal refers to any signal used to provide machine instructions and/or data to a programmable processor.
- the scientific data information system 10 can be implemented in a variety of configurations, from an individual workstation model, to a network model, such as a laboratory-based workgroup or networked enterprise environment.
- a workstation model one computer handles both the low-level (e.g., database management and instrument control) and high level (e.g., data processing and user interface) functions.
- FIG. 2 illustrates a workstation 100 in which the client software 50, the application server 30, the ISS 20, and the database 40 all reside on a single computer 110.
- a suitable computer 110 for the workstation 100 is a Lenovo D20 Workstation configured with dual Xeon E5504 2.0 GHz processors, 8 GB RAM, Nvidia Quadro FX 18000 graphics card under the Windows 7 64-bit operating system.
- the workstation 100 can also include a key board and a pointing device (e.g., a mouse or a trackball) for receiving user input, and a display device for displaying the graphical user interface 70.
- the workstation 100 can be physically located next to a laboratory instrument 120, such as an liquid chromatography (LC)/mass spectrometry (MS) system.
- LC liquid chromatography
- MS mass spectrometry
- FIG. 3 illustrates a network 200 that includes an information system computer 210, an application server computer 220, a database management computer 230 and one or more client PC's 240a, 240b on which the ISS 20, the application server 30, the database 40, and the client software 50 reside, respectively.
- the application server and the database may reside on a common computer.
- Each of the network computers can include a processor for processing instructions (e.g., stored in memory or on a storage device) for execution within the corresponding computer; a memory (e.g., volatile memory, non-volatile memory, a magnetic disk, an optical disk, etc.) for storing information within the corresponding computer; and a storage device (e.g., a floppy disk device, a hard disk device, an optical disk device, or a tape device, a flash memory, etc.) for providing mass storage for the corresponding computer.
- a processor for processing instructions (e.g., stored in memory or on a storage device) for execution within the corresponding computer
- a memory e.g., volatile memory, non-volatile memory, a magnetic disk, an optical disk, etc.
- a storage device e.g., a floppy disk device, a hard disk device, an optical disk device, or a tape device, a flash memory, etc.
- data is stored within the database 40 in a unified data structure ("Content Item").
- Content Item a unified data structure
- many different data types e.g., analyses, raw data, reports, methods, etc.
- analyses, raw data, reports, methods, etc. can be stored and managed in a generic way.
- a potentially unlimited number of data types can be supported with the need for type specific functionality.
- the "content item” data structure is generic and does not contain information about the application specific use of the maintained data.
- the content item is a composed data object and contains some header information, streams (unstructured data/files), structured XML based data and signatures. Text and XML data of the content item can be stored as Unicode data.
- FIG. 4 illustrates a high level overview of a content item 300.
- Each content item 300 includes primary data elements 310 and supporting data elements 320.
- the primary data elements 310 include an admin data element 311, an instance data element 312, a stream element 313, stream data 314, and a signature data element 315.
- the admin data element 311 contains basic fields that are available for all items. "Fields" refer to metadata that is used to describe or provide information about the corresponding content item. The fields can include standard fields for every content item, which cannot be changed by the user, and custom fields, which are defined by and can be changed by the user.
- the instance data element 312 contains the item specific data in XML form (structured according to a specific XML schema).
- the stream element 313 contains maintenance information about the stored streams (system internal data such as StreamID, message digest, size, etc.).
- the stream data 314 contains the content (file) of the dependent streams.
- the signature data element 315 contains the electronic/digital signatures applied on the content item.
- the supporting data elements 320 include a catalog 321, comments 322, links 323, keywords 324, and thumbnails 325.
- the catalog 321 contains redundant searchable data extracted from the instance data element 312.
- the comments 322 contain file attachments.
- the links 323 provide hyperlinks to external data sources and/or to other content items.
- the keywords 324 provide the ability to add keywords to the content item for search.
- the thumbnails 325 contain renditions of the item content or a default data format specific image to be used in hit lists or other areas to give the user a first visual impression of the item content.
Landscapes
- Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP11002765A EP2506160A1 (fr) | 2011-04-01 | 2011-04-01 | Structures de données unifiées pour système d'informations de données scientifiques |
US13/435,200 US20120254872A1 (en) | 2011-04-01 | 2012-03-30 | Content Items for Scientific Data Information Systems |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP11002765A EP2506160A1 (fr) | 2011-04-01 | 2011-04-01 | Structures de données unifiées pour système d'informations de données scientifiques |
Publications (1)
Publication Number | Publication Date |
---|---|
EP2506160A1 true EP2506160A1 (fr) | 2012-10-03 |
Family
ID=44511686
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP11002765A Withdrawn EP2506160A1 (fr) | 2011-04-01 | 2011-04-01 | Structures de données unifiées pour système d'informations de données scientifiques |
Country Status (2)
Country | Link |
---|---|
US (1) | US20120254872A1 (fr) |
EP (1) | EP2506160A1 (fr) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170300641A1 (en) * | 2014-09-15 | 2017-10-19 | Leica Biosystems Melbourne Pty Ltd | Instrument management system |
JP7059752B2 (ja) * | 2018-03-29 | 2022-04-26 | ブラザー工業株式会社 | アプリケーションプログラム |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB9810574D0 (en) * | 1998-05-18 | 1998-07-15 | Thermo Bio Analysis Corp | Apparatus and method for monitoring and controlling laboratory information and/or instruments |
GB2381340A (en) * | 2001-10-27 | 2003-04-30 | Hewlett Packard Co | Document generation in a distributed information network |
US20050275566A1 (en) * | 2004-06-14 | 2005-12-15 | Nokia Corporation | System and method for transferring content |
US7890285B2 (en) * | 2005-04-29 | 2011-02-15 | Agilent Technologies, Inc. | Scalable integrated tool for compliance testing |
US8843482B2 (en) * | 2005-10-28 | 2014-09-23 | Telecom Italia S.P.A. | Method of providing selected content items to a user |
-
2011
- 2011-04-01 EP EP11002765A patent/EP2506160A1/fr not_active Withdrawn
-
2012
- 2012-03-30 US US13/435,200 patent/US20120254872A1/en not_active Abandoned
Non-Patent Citations (3)
Title |
---|
ELMASRI ET AL: "Fundamentals of Database Systems Fourth Edition", FUNDAMENTALS OF DATABASE SYSTEMS, XX, XX, no. ed.4, 1 September 2004 (2004-09-01), pages 25 - 45,207, XP002333626 * |
FLORESCU D ET AL: "Storing and querying XML data using an RDBMS", QUARTERLY BULLETIN OF THE COMPUTER SOCIETY OF THE IEEE TECHNICAL COMMITTEE ON DATA ENGINEERING, THE COMMITTEE, WASHINGTON, DC, US, vol. 22, no. 3, 1 September 1999 (1999-09-01), pages 27 - 34, XP002274305, ISSN: 1053-1238 * |
PARDEDE E ET AL: "New sql standard for object-relational database applications", STANDARDIZATION AND INNOVATION IN INFORMATION TECHNOLOGY, 2003. THE 3R D CONFERENCE ON OCT. 22-24, 2003, PISCATAWAY, NJ, USA,IEEE, 22 October 2003 (2003-10-22), pages 191 - 203, XP010672995, ISBN: 978-0-7803-8172-8, DOI: 10.1109/SIIT.2003.1251207 * |
Also Published As
Publication number | Publication date |
---|---|
US20120254872A1 (en) | 2012-10-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Wang et al. | Big data management challenges in health research—a literature review | |
US11049596B2 (en) | Systems and methods for managing clinical research | |
TWI549006B (zh) | 資料集和資料服務的上下文趨向 | |
US7711704B2 (en) | System and method of providing date, arithmetic and other relational functions for OLAP sources | |
US20110295791A1 (en) | System and method for specifying metadata extension input for extending data warehouse | |
US9449188B2 (en) | Integration user for analytical access to read only data stores generated from transactional systems | |
US20120246170A1 (en) | Managing compliance of data integration implementations | |
US10248720B1 (en) | Systems and methods for preparing raw data for use in data visualizations | |
US20190034047A1 (en) | Web-Based Data Upload and Visualization Platform Enabling Creation of Code-Free Exploration of MS-Based Omics Data | |
Huser et al. | Impending challenges for the use of big data | |
US8892505B2 (en) | Method for scheduling a task in a data warehouse | |
US20170069020A1 (en) | Xbrl comparative reporting | |
Husain et al. | SOCR data dashboard: an integrated big data archive mashing medicare, labor, census and econometric information | |
US20160196304A1 (en) | Abstractly implemented data analysis systems and methods therefor | |
US20040249836A1 (en) | Synchronized data-centric and document-centric knowledge management system for drug discovery and development | |
US20130086058A1 (en) | Synonym Groups | |
US20130185094A1 (en) | Automated ICD-9 To ICD-10 Code Conversion System | |
US11868708B2 (en) | Method and system for labeling and organizing data for summarizing and referencing content via a communication network | |
US20160140295A1 (en) | Health Care Event Matching | |
EP2506160A1 (fr) | Structures de données unifiées pour système d'informations de données scientifiques | |
Harris et al. | i2b2t2: unlocking visualization for clinical research | |
US20180196858A1 (en) | Api driven etl for complex data lakes | |
US20160378830A1 (en) | Data processing system and data processing method | |
US20130074196A1 (en) | Signature Methods For Scientific Data Information Systems | |
US20070282804A1 (en) | Apparatus and method for extracting database information from a report |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
AX | Request for extension of the european patent |
Extension state: BA ME |
|
17P | Request for examination filed |
Effective date: 20130403 |
|
17Q | First examination report despatched |
Effective date: 20170724 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION HAS BEEN WITHDRAWN |
|
18W | Application withdrawn |
Effective date: 20171108 |