WO2017082813A1 - Methods and systems for generating workflows for analysing large data sets - Google Patents
Methods and systems for generating workflows for analysing large data sets Download PDFInfo
- Publication number
- WO2017082813A1 WO2017082813A1 PCT/SG2015/050446 SG2015050446W WO2017082813A1 WO 2017082813 A1 WO2017082813 A1 WO 2017082813A1 SG 2015050446 W SG2015050446 W SG 2015050446W WO 2017082813 A1 WO2017082813 A1 WO 2017082813A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- algorithms
- user
- data
- computer
- computer system
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F8/00—Arrangements for software engineering
- G06F8/30—Creation or generation of source code
- G06F8/34—Graphical or visual programming
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
- G16B50/30—Data warehousing; Computing architectures
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
Definitions
- the present invention proposed methods and computer systems to be used by a user (typically a scientist or scientific researcher) to analyse a large data-set, typically composed of scientific data.
- a user typically a scientist or scientific researcher
- the invention proposes ways of defining a computational workflow to be performed on the data set.
- NGS Next Generation Sequencing
- NGS is just one example of the use of computers to analyse vast data- sets generated in a complex scientific field.
- Another such field is phyloinformatics (the use of a data system structured and queried according to the hierarchical relationships of living creatures according to their evolutionary taxa).
- a common problem in such scientific fields is that a user who instigates the data analysis is typically a scientist or researcher with training and expertise in the scientific field itself, rather than in data analysis.
- Existing data analysis software is often command line based so a user has to input complex commands including values for each of a number of numerical parameters associated with the data analysis. This means that the user must engage or at least consult an expert in data analysis in order to set the values of all of the parameters. Once the values of the parameters are set it is difficult to tweak certain parameters without the need to consult the data analysis expert again.
- the invention proposes a method of designing a computational process ("workflow") for performing a computational task, in which a user who wishes to instigate a workflow for analysing a dataset is presented with a graphical user interface (GUI) comprising plurality of areas representative of respective associated data analysis algorithms, each of the algorithms being characterized by one or more numerical parameters. Default values have been defined for some or all of the numerical parameters, and one or more of the parameters of the algorithms may be modified using the GUI.
- the GUI allows the user to select and combine a plurality of the icons (e.g. by drag and drop operations) and set one or more of the modifiable parameters, to form a workflow comprising the corresponding algorithms.
- the FRDs are populated by a computer programmer with indications of which parameters are interactive (that is able to be modified by the user). Typically, the FRDs contain default values for some or all of the interactive parameters; if the user choses not to change one or more of the default values, then later the algorithm will be performed using the default values for the corresponding parameters. The FRDs also specify the default values for the parameters which are not interactive.
- a set of shell scripts generates commands for the algorithms based on the parameter values selected by the user and the default values stored in the FRD.
- the implementation of the workflow could in principle be carried out in a platform which is a single computer system (such as a single server, or two or more proximate servers operating as a single computer system). However it is more preferable for the implementation to be carried out by a platform "in the cloud”: a network of preferably geographically distributed data nodes ("cores"), typically implemented by respective servers.
- the data nodes may be coordinated by a master node.
- the master node (which may be the computer which generated the GUI and received the user choices) may be accessible to a user terminal over a communication network such as the internet.
- the output of the workflow is preferably stored in the cloud (that is, in a logical space ("pool") supported by a plurality of geographically distributed servers), and preferably in a format such as XML (extensible mark-up language) which is suitable for processing using a MapReduce framework, such as using the programming language XQuery or its extension ChuQL.
- the platform preferably includes visualisation tools appropriate to the type of data.
- the visualisation tools may include standard XML utilities, which are bundled with a data storage and/or visualisation tool which the user is also permitted to select using the GUI, for example by positioning an icon representing the tool at the end of the workflow.
- the visualisation tools may be ones which access pre-existing databases, such as databases of public-domain data.
- the algorithms may be ones suitable for the implementation of NGS.
- the visualisation tools may be ones for analysis of genetic data
- the public-domain data may be genome data.
- the invention may be expressed as a system for generating the GUI and receiving the user's choices, to define the workflow.
- the system may include the cluster of data nodes which actually implement the work flow. Alternatively, it may be defined as a method performed by the system.
- the system may perform the method by running a set of computer program instructions stored in non-transitory form on a tangible data storage device.
- Fig. 1 shows the overall structure of a network of computer units which can cooperate to implement a method which is an embodiment of the invention, and incorporating a server which is an embodiment of the invention;
- Fig. 2 shows the logical structure of elements for implementing the invention
- Fig. 3 is a flow diagram of a method according to the invention
- Fig. 4 shows a first graphical user interface (GUI) presented to a user of the system of Fig. 1 at a first time;
- GUI graphical user interface
- Fig. 5 is an expanded view of a portion of Fig. 4, showing a workflow defined using the invention
- Fig. 6 is an expanded view of one of the portions of the GUI of Fig. 4 at a different time;
- Fig. 7 is a flow diagram of one of the steps of the workflow illustrated in Fig. 5.
- Fig. 8 is a view of a GUI for browsing the results of an NGS;
- Fig. 9 is a view of a GUI for examining the results of an NGS.
- Fig. 10 is an expanded view of a portion of Fig. 4, showing a second workflow defined using the network of Fig. 1.
- the system includes a number of client nodes 1 , 2, 3 operated by respective users. Although three client nodes are illustrated, the number of client nodes may be lower, higher or much higher than this.
- the structure of the client node 3 is shown in more detail than the other client nodes 1 , 2. It includes a terminal 3a having a screen and one or more user input devices (not shown) such as keyboard, mouse etc, and a database 3b storing one or more large datasets, which may be datasets of structured, semi-structured or unstructured data. Each dataset is preferably in the XML format.
- the terminal 3a is in read/write communication with the database 3b.
- the datasets are composed of data generated in a scientific field, and the client nodes 1 , 2, 3 are operated by respective users who are scientists or researchers in the scientific field of the data.
- the terminal 3a is a "dumb terminal" which has limited commuting power.
- the platform 5 may be provided as a single server, or a single cluster of neighbouring servers. However, more preferably it is provided in the form of a cloud-based network of distributed units as illustrated in Fig. 1. It includes a master node 6, and a number m of data nodes 61 , 62 6m which are geographically distributed (e.g. at least one pair of the data nodes is spaced apart by at least 10 km). The master node 6 may not be in physical proximity to the data nodes 61 ,
- the master node 6 has the task of coordinating the data nodes 61 , 62 6m including passing to them program instructions for the data nodes 61 ,
- the programming of the master node 6 (both its own programming, and the software elements which the master node 6 is operative to transfer to the data nodes 61 , 62 , 6m), is preferably controlled by one or more developers operating respective terminals 8 (for simplicity only one such terminal is shown).
- the data in the database 3b can be accessed via the terminal 3a by the platform 5.
- the platform 5 uses XQuery, a publicaliy known language for parsing XML data, to access and handle one or more of the datasets.
- the platform 5 further preferably uses Hadoop, an open-source software framework written in Java for distributed storage and distributed processing of very large data sets on computer clusters built from commodity hardware.
- Hadoop consists of a file distribution system HFDS mentioned above, and a computational model MapReduce.
- the HFDS is shown on Fig. 1 as element 7 though it is in fact simply a framework used by the master node 6 and data notes 61 , 62, ... 6m.
- HDFS is responsible for distributing data with built-in tolerance between the data nodes 61 , 62, ...6m through the master node 6.
- MapReduce is responsible for parallel processing of the data. It has two components: a
- Map component which maps the data to the data nodes 61 , 62 6m, and a second component Reduce which collects the analysed data and presents a unified picture to the end user.
- the Map component receives records from the data nodes as key/value pairs, and the results are written to an initial output and replicated and distributed using HDFS.
- the Reduce function receives all the results and aggregates them. The aggregated results are written to the final output, which is replicated and distributed again.
- MapReduce is hardware fault tolerant and has the intrinsic ability to distribute load based on hardware availability.
- the system further preferably employs Docker, a tool for running isolated containers on Linux. This allows rapid development of applications and workflows.
- the system further preferably uses InfiniBand technology, to provide network linking.
- Fig. 2 shows the logical structure of the operation of the platform 5 in relation to the client node 3.
- the platform 5 comprises a GUI module 51 for generating GUIs for display on the screen of the terminal 3a. It further comprises a shell scripts module 52 comprising a number of shell scripts, and n sections of code 531 , 532, ... 53n, which implement n corresponding algorithms referred to as Alg_1 , Alg_2, Alg_3 Alg_n. . It further has access to a storage unit 54, which stores a list 55 of the n algorithms. For each of the algorithms there is a corresponding Functionality Requirements Document (FRD): FRD Alg_1 , FRD Alg_2 FRD Alg_2.
- FRD Alg_1 FRD Alg_1
- FRD Alg_2 FRD Alg_2.
- Each of the m algorithms accepts input in a certain format ("Input”) and generates output in a certain format (“Output”). These formats are included in the corresponding FRD.
- Each FRD also stores a location of software implementing the algorithm. The locations may be stored as specific paths or as environmental variables.
- Each algorithm includes a number of parameters, typically including one or more which are predefined (take "default” values) and one or more which are settable by the user.
- the FRD stored the default values.
- the algorithm Alg_1 includes four parameters: Paral which takes the default value 3; Para 2 which is selectable (i.e. settable by a user, for example by choosing from a plurality of predefined options); and Para 3 which takes the default value 1 ; and Para 4 which is selectable.
- the FRD 58 stores the default values for the algorithm Alg_1.
- the storage unit 54 further stores, for each of the client nodes 1 , 2, 3, a corresponding a user selection document 56 showing selections the corresponding user has made (see below) of values for the settable parameters of one or more of the algorithms the user has selected.
- the system of Fig. 1 may be operable to provide computing services to users from a number of different scientific fields. Each field may have a different set of applicable algorithms. A given user would initially specify which scientific field he is interested in, and Fig. 2 illustrates only the logical elements applicable to the scientific field chosen to the user of the client node 3.
- Fig. 2 If the platform is implemented by a single system, such as a server, then all of the elements of Fig. 2 would be present in the server.
- the structure of Fig. 2 may optionally be provided in the master node 6. Alternatively, it may be provided by another unit (not shown) which is able to communicate with the terminal 3a. Or the elements of the structure of Fig. 2 may be distributed (e.g. with part on the master node 6 and part elsewhere).
- a method according to the embodiment is shown for defining and running a workflow.
- the GUI module 51 controls the terminal 3a to display a GUI as illustrated in Fig. 4.
- the GUI includes a portion 41 which lists the algorithms from list 55.
- the GUI further includes a workflow space 42, an area 43 for defining parameters of the algorithms (one algorithm at a time), and a section 44 for displaying information generated when the workflow is implemented. Areas 42, 43, 44 are initially empty.
- step 32 the user successively uses the data input devices of the client node 3 to generate user input to select one or more algorithms from the list of available algorithms 41 , and drags each algorithm to a corresponding portion of the area 42.
- An icon representing the selected algorithm is shown in the area 42.
- the set of algorithms selected by the user, and the order in which the selected algorithms are performed, defines a work flow which is illustrated by the icons. Note that the user may select one or more of the algorithms more than once, so that the workflow contains multiple instances of the same algorithm.
- Fig. 5 shows the area 42 after the user has chosen 4 algorithms labelled BWA, Gatk, HPC, and SNPEFF and positioned them in area 42 to produce a NGS workflow. The significance of these algorithms is explained below when the NGS workflow is considered in detail.
- Area 42 comprises arrows defined by the user to show the order in which the 5 algorithms are performed and/or the flow of data between them.
- the item "Print” in area 42 is not an algorithm, but rather a graphical representation of software tool which is invoked at the end of the workflow to visualize the data output.
- the user may have specified the tool by clicking on a corresponding area of the GUI.
- the area labelled "print” may then be automatically created in the area 42 at the end of the workflow, or more preferably the user may position it there (e.g. by a drag and drop operation).
- the function of the tool is explained in more detail below.
- the "print” module is also associated with an FRD. This FRD specifies a number of utilities, such as standard XML utilities, for forming graphical representations of data generated by the selected algorithms of the workflow.
- step 33 the user selects parameters for the algorithms.
- the user sets the parameters of the algorithms in the workflow algorithm-by-algorithm. When the user is setting the parameters of a given algorithm, those parameters are shown in area 43.
- the user selects one of the algorithms in the workflow shown in area 42 (thus, in Fig. 5 the algorithm BWA has been selected by clicking on the area 45; this area is therefore highlighted in Fig. 5).
- the GUI uses the FRD corresponding to that algorithm to display to the user in area 43 the corresponding settable parameters of the highlighted algorithm (and typically the default parameter values also), and receive user input specifying a selection for a given settable parameter.
- An example is shown in Fig. 6, where the GUI is showing that the parameter Nst of algorithm Bayes can take the values 1 , 2, 6 or "mixed". The user selects one of these options, thereby selecting a value for this interactive parameter.
- Nst is a parameter specifying the nucleotide substitution rate of a model which is to be studied using the Bayes algorithm. Its value is selected to be "mixed" when the input is an amino acid.
- the user may be free to enter a numerical value freely, e.g. by typing, however providing the user with options has the advantage that the user is able to make a choice which is likely to be reasonable, even if he has little insight into the algorithm.
- the values of the parameters are stored in the user selection 56.
- the user may also be able to select the input location (i.e. the location of an input file containing data to be processed by the algorithm) and the output location to which the result of the workflow is written.
- the input and output may be in the same database, or different databases, or within different sections of the same database.
- the input location may be in the database 3b, and similarly for the last algorithm in the workflow the output location may be in the database 3b.
- Other inputs/output are chosen so as to define the arrows in Fig. 5 between algorithms. Note that this can be done after the algorithms themselves are selected. Furthermore, the choice of how data flows between the algorithms can be modified later by moving the arrows.
- step 34 the workflow is executed.
- the shell scripts module 52 generates commands for the algorithms using the values of the parameters stored in the FRDs and the User Selection data 56.
- Each shell script module may be provided as a Java wrapper.
- IDEs integrated development environments
- the module may be Java wrapped for processing XML data, such as using the MapReduce framework.
- the implementation of the workflow is managed by the master node 6 by controlling a plurality of the geographically distributed data nodes 61 , 62 6m in parallel, to provide a high performance computing (HPC) environment.
- HPC high performance computing
- the user may be able to choose (e.g. in step 33) the number of data nodes
- a user may choose for 10 data nodes, 100 data nodes or 1000 data nodes to run any one of the selected algorithms, such as the (computationally expensive) BWA algorithm of Fig. 5.
- the area 44 shows any error messages. This allows a user to stop the workflow if there is an error. This is particularly useful when the workflow is run in the cloud when resources may be charged for on the basis, for example, of the number of processors used multiplied by the number of hours for which each processor is used.
- the user can "tweak" the workflow, for example by clicking on one of the icons representing the algorithms of the workflow, and resetting one or more parameters of the algorithm corresponding to the icon using the GUI portion 43.
- the platform 5 and/or the database 3b may store an output file with the results of the processing carried out by the workflow.
- the output file may store results generated by each of the algorithms. Additionally, the output file may store results for running the workflow for different input data.
- the database is in the cloud, and in an XML format. It may be created by XML utilities.
- step 36 the data is visualised.
- This step may include accessing one or more previously generated database(s), such as public domain databases, to compare previously generated data to data generated by the selected algorithms.
- Step 36 too may be performed using XML utilities.
- XML utilities can be used in all three stages of data integration: the performance of the workflows in step 34 (data generation), database creation (step 35), and data visualisation (step 36).
- the selection of exactly what actions the computer systems performs in steps 35 and/or 36 may be made during the construction of the workflow, and be graphically represented on the GUI as an area of the workflow.
- the user may input data to the computerized network which is recognised as a selection of at least one tool providing database construction and visualisation.
- a representation of the tool may be displayed in area 42, for example as a final block of the workflow (e.g. the area "print” in Fig. 5).
- the input here is a FASTQ file, which represents a set of fragments of DNA.
- the output of the NGS workflow is an indication of the Variants that have been identified. Each Variant is a location at which one of the DNA fragments differs from a known genome sequence ("reference genome").
- the first stage of the NGS workflow is the BWA (Burrows-Wheeler Aligner) algorithm. This is an algorithm that aligns sequence fragments to the reference genome. This algorithm takes a FASTQ input of DNA fragments (raw sequence data) and outputs a BAM file which is an aligned DNA sequence.
- step 71 the raw sequence data is divided into m paired end files.
- steps 721 , 722, ... 72m each of these are processed separately and simultaneously by the data nodes 61 , 62, ...6m.
- a known BWA algorithm such as BWA MEM
- SAM aligned sequence alignment map
- step 73 the results are combined to produce a BAM file.
- step 74 the BAM file is improved using known tools such as the PICARD tools supplied by the Broad Institute of Cambridge MA.
- step 75 SAM tools are used for indexing, giving an indexed BAM file.
- Gatk This refers to a Genome Analysis Toolkit developed by the Broad Institute of Cambridge MA. It takes the aligned DNA and compares it with the reference genome to identify variants. A first stage is a local realignment process which is designed to input one or more BAM files and to locally realign reads such that the number of mismatching bases is minimized across all the reads. Then there is a base recalibration step, followed by a step of Base Quality Score Recalibration (BQSR).
- BQSR Base Quality Score Recalibration
- HPC Haplotype caller
- haplotypes that is, sets of genes which a progeny tends to inherit from a parent.
- the output is a raw VCF (variant call format) file, i.e. a text file in a format commonly used for storing gene sequence variations.
- VCF variant call format
- a unified type caller which is less accurate but only needs low memory. Both of these algorithms are preferably available in the section 41 of the GUI for the user to select during the definition of the workflow.
- SNPEFF This annotates the VCF file.
- the output is an annotated variant list.
- Print This takes the output of the above algorithms and puts it all in a database file, and then calls a visualization tool to view the data.
- the database file may be stored in the database 3b of the client node 3, or on the platform 5.
- the output of the workflow (in the case of the NGS workflow, the output of SNPEFF) is stored in the cloud, preferably as an XML format database.
- the format of the data may initially be according to a relational database management system (RDBMS) such as MYSQL or PostgreSQL (also called Postgres), but if so it is converted to an XML format database.
- RDBMS relational database management system
- MYSQL or PostgreSQL also called Postgres
- Postgres Postgres
- the XML format of the data makes it possible for the data to be analysed using XQuery, a programming language for XML data.
- XQuery programs can be implemented in Hadoop using the MapReduce function.
- the output data generated by the NGS workflow of Fig. 5 can be viewed by the user in various ways.
- One way of doing this is for the "Print" module (i.e. the module represented by the last block of the workflow in area 42 of Fig. 5) to call a visualization tool which is a browser which opens automatically on the terminal 3a to access a website which is operative to read the output database file, and display it in a graphical format.
- the browser includes a number of utilities, which are specified by the FRD of the print module. This is shown in Figs. 8 and 9.
- Fig. 8 shows how the website presents a GUI defining a number of options the user can select to select portions of the output database.
- Fig. 9 shows a screen in which a user can cause the website to display an image to visualize the data.
- Fig. 9 shows a case in which the workflow of Fig. 5 has been run for 6 sets of data SS1 to SS6. These are listed at the top of the area 91 on the left hand side of the GUI. By selecting one (or more) of the tickboxes in the area 91 , the user indicates to the website the set(s) of data to be visualised. The results are then shown in the area 92.
- the browser includes bundled utilities (such as standard XML utilities) for analysing the data generated by the workflow automatically, At least some of the utilities preferably do this by accessing pre-existing databases, such as public- domain databases.
- Fig. 10 shows the workflow space 42 generated by another of the users to generate a different workflow.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biophysics (AREA)
- Databases & Information Systems (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Bioethics (AREA)
- Business, Economics & Management (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Human Resources & Organizations (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- General Physics & Mathematics (AREA)
- Entrepreneurship & Innovation (AREA)
- Educational Administration (AREA)
- Game Theory and Decision Science (AREA)
- Development Economics (AREA)
- Marketing (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
- Stored Programmes (AREA)
Abstract
Description
Claims
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
NZ743378A NZ743378A (en) | 2015-11-12 | 2015-11-12 | Methods and systems for generating workflows for analysing large data sets |
GB1813187.0A GB2565439A (en) | 2015-11-12 | 2015-11-12 | Methods and systems for generating workflows for analysing large data sets |
AU2015414467A AU2015414467A1 (en) | 2015-11-12 | 2015-11-12 | Methods and systems for generating workflows for analysing large data sets |
PCT/SG2015/050446 WO2017082813A1 (en) | 2015-11-12 | 2015-11-12 | Methods and systems for generating workflows for analysing large data sets |
AU2022241571A AU2022241571A1 (en) | 2015-11-12 | 2022-09-29 | Methods and systems for generating workflows for analysing large data sets |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/SG2015/050446 WO2017082813A1 (en) | 2015-11-12 | 2015-11-12 | Methods and systems for generating workflows for analysing large data sets |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2017082813A1 true WO2017082813A1 (en) | 2017-05-18 |
Family
ID=58695833
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/SG2015/050446 WO2017082813A1 (en) | 2015-11-12 | 2015-11-12 | Methods and systems for generating workflows for analysing large data sets |
Country Status (4)
Country | Link |
---|---|
AU (2) | AU2015414467A1 (en) |
GB (1) | GB2565439A (en) |
NZ (1) | NZ743378A (en) |
WO (1) | WO2017082813A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2024055890A1 (en) * | 2022-09-15 | 2024-03-21 | International Business Machines Corporation | Auto-wrappering tools with guidance from exemplar commands |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040117348A1 (en) * | 2002-12-10 | 2004-06-17 | Nec Corporation | Sequence display method and homology search method for facilitating access to information relating to regions of mutation and regions of similarity between plurality of sequences |
US20120005139A1 (en) * | 2010-05-25 | 2012-01-05 | Sony Corporation | Information processing apparatus, information processing method, and program |
WO2013037687A1 (en) * | 2011-09-14 | 2013-03-21 | Siemens Aktiengesellschaft | A system and method for managing development of a test piece of code |
US20140282177A1 (en) * | 2013-03-15 | 2014-09-18 | Palantir Technologies, Inc. | Computer graphical user interface with genomic workflow |
US20150066383A1 (en) * | 2013-09-03 | 2015-03-05 | Seven Bridges Genomics Inc. | Collapsible modular genomic pipeline |
-
2015
- 2015-11-12 WO PCT/SG2015/050446 patent/WO2017082813A1/en active Application Filing
- 2015-11-12 GB GB1813187.0A patent/GB2565439A/en not_active Withdrawn
- 2015-11-12 NZ NZ743378A patent/NZ743378A/en unknown
- 2015-11-12 AU AU2015414467A patent/AU2015414467A1/en not_active Abandoned
-
2022
- 2022-09-29 AU AU2022241571A patent/AU2022241571A1/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040117348A1 (en) * | 2002-12-10 | 2004-06-17 | Nec Corporation | Sequence display method and homology search method for facilitating access to information relating to regions of mutation and regions of similarity between plurality of sequences |
US20120005139A1 (en) * | 2010-05-25 | 2012-01-05 | Sony Corporation | Information processing apparatus, information processing method, and program |
WO2013037687A1 (en) * | 2011-09-14 | 2013-03-21 | Siemens Aktiengesellschaft | A system and method for managing development of a test piece of code |
US20140282177A1 (en) * | 2013-03-15 | 2014-09-18 | Palantir Technologies, Inc. | Computer graphical user interface with genomic workflow |
US20150066383A1 (en) * | 2013-09-03 | 2015-03-05 | Seven Bridges Genomics Inc. | Collapsible modular genomic pipeline |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2024055890A1 (en) * | 2022-09-15 | 2024-03-21 | International Business Machines Corporation | Auto-wrappering tools with guidance from exemplar commands |
US12020007B2 (en) | 2022-09-15 | 2024-06-25 | International Business Machines Corporation | Auto-wrappering tools with guidance from exemplar commands |
Also Published As
Publication number | Publication date |
---|---|
NZ743378A (en) | 2022-10-28 |
GB201813187D0 (en) | 2018-09-26 |
GB2565439A (en) | 2019-02-13 |
AU2015414467A1 (en) | 2018-06-28 |
AU2022241571A1 (en) | 2022-10-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Wang et al. | ALKEMIE: An intelligent computational platform for accelerating materials discovery and design | |
Buels et al. | JBrowse: a dynamic web platform for genome visualization and analysis | |
US20160232457A1 (en) | User Interface for Unified Data Science Platform Including Management of Models, Experiments, Data Sets, Projects, Actions and Features | |
Demenkov et al. | ANDVisio: a new tool for graphic visualization and analysis of literature mined associative gene networks in the ANDSystem | |
US8510288B2 (en) | Applying analytic patterns to data | |
US20190228095A1 (en) | Systems and methods for storing and accessing database queries | |
US11922140B2 (en) | Platform for integrating back-end data analysis tools using schema | |
CN102622335A (en) | Automated table transformations from examples | |
de Brevern et al. | Trends in IT innovation to build a next generation bioinformatics solution to manage and analyse biological big data produced by NGS technologies | |
CA3056755A1 (en) | Analytics engine for detecting medical fraud, waste, and abuse | |
WO2019169452A1 (en) | Visualising clinical and genetic data | |
WO2021216865A1 (en) | Unified people connector | |
AU2022241571A1 (en) | Methods and systems for generating workflows for analysing large data sets | |
US20140310306A1 (en) | System And Method For Pattern Recognition And User Interaction | |
Nazipova et al. | Big Data in bioinformatics | |
Yu et al. | Genotet: An interactive web-based visual exploration framework to support validation of gene regulatory networks | |
Voigt et al. | Using expert and empirical knowledge for context-aware recommendation of visualization components | |
US10140344B2 (en) | Extract metadata from datasets to mine data for insights | |
Karp et al. | Data mining in the MetaCyc family of pathway databases | |
BR112021005061A2 (en) | parsing natural language expressions in a data visualization user interface | |
Iñiguez-Jarrín et al. | GenDomus: interactive and collaboration mechanisms for diagnosing genetic diseases | |
Jianu et al. | Visual integration of quantitative proteomic data, pathways, and protein interactions | |
Campos et al. | Egas–collaborative biomedical annotation as a service | |
Irshad et al. | Integration and querying of heterogeneous omics semantic annotations for biomedical and biomolecular knowledge discovery | |
Chatzopoulos et al. | SciNeM: A Scalable Data Science Tool for Heterogeneous Network Mining. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 15908402 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
ENP | Entry into the national phase |
Ref document number: 2015414467 Country of ref document: AU Date of ref document: 20151112 Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 201813187 Country of ref document: GB Kind code of ref document: A Free format text: PCT FILING DATE = 20151112 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 1813187.0 Country of ref document: GB |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 15908402 Country of ref document: EP Kind code of ref document: A1 |