WO2016175330A1 - ゲノム解析装置及びゲノム可視化方法 - Google Patents
ゲノム解析装置及びゲノム可視化方法 Download PDFInfo
- Publication number
- WO2016175330A1 WO2016175330A1 PCT/JP2016/063509 JP2016063509W WO2016175330A1 WO 2016175330 A1 WO2016175330 A1 WO 2016175330A1 JP 2016063509 W JP2016063509 W JP 2016063509W WO 2016175330 A1 WO2016175330 A1 WO 2016175330A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- data
- genome
- output
- request
- output request
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B45/00—ICT specially adapted for bioinformatics-related data visualisation, e.g. displaying of maps or networks
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/20—Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
- G16B50/10—Ontologies; Annotations
Definitions
- the present invention relates to a genome analysis device and a genome visualization method.
- the base sequence of the genome to be analyzed has an extremely large amount of data per sample.
- a sequence decoding device called a next-generation sequencer capable of decoding a genomic base sequence at ultra-high speed and at low cost has been developed and used.
- the next-generation sequencer makes very short fragments of DNA or RNA to be analyzed, reads them in parallel to read DNA or RNA at high speed, and analyzes each loaded fragment to determine the base sequence of each fragment To do. Thereafter, the determined base sequence information of each fragment is output as sequence data called a lead sequence, for example, data in FASTQ format.
- a lead sequence for example, data in FASTQ format.
- data in which the lead sequence is aligned (mapped) to a known genomic base sequence hereinafter also referred to as “reference sequence”
- reference sequence for example, data in SAM format or BAM format is output (for example, see Patent Document 1). .
- Patent Document 1 refers to a step of identifying a plurality of high quality lead sequences from a plurality of lead sequences, a step of extracting a plurality of unique lead sequences from a plurality of high quality lead sequences, and a plurality of unique read sequences Techniques are disclosed that allow high quality alignment by comparing to a reference sequence corresponding to a sample.
- Chromosome sample data (hereinafter, generically referred to as “genome data”) in FASTQ format, SAM format, BAM format, etc. output from this next-generation sequencer is ChIP-Seq (Chromatin Immunoprecipitation-sequence) or RNA. -Used for various analyzes such as Seq.
- visualization technology that enables visual grasp of analysis results of ChIP-Seq, RNA-Seq, etc. and the base sequence of the genome has been developed.
- viewers such as Integrative Genomics Viewer (Broad Institute), Integrated Genome Browser (Affymetrix), UCSC Genome Browser (UCSC), and Gbrowse.
- the present invention has been made in view of the circumstances as described above, and an object of the present invention is to provide a genome analysis apparatus and a genome visualization method that enable simple and seamless visualization using a web browser mechanism. To do.
- the genome analysis device analyzes a large amount of genome data consisting of fragmented genome base sequences and outputs requests from client devices connected via a network.
- the genome analysis apparatus for transmitting output data related to the genome data, the storage means for storing visualization data of a plurality of different layers for the genome data, and an output request from the client apparatus
- the request receiving means and the output request receiving means receive the output request, select the visualization data of the layer corresponding to the output request from the storage means, and based on the visualization data of the selected layer
- Output data creation means for creating output data; Characterized in that was.
- the genome visualization method analyzes genome data composed of a large amount of fragmented genome base sequences and has a storage unit for storing data related to the genome data.
- a genome visualization method in a genome analysis device that transmits output data related to the genome data in response to an output request from a client device connected via a network, wherein the storage unit And the genome visualization method includes: a request receiving step for receiving an output request from the client device; and the storage unit when the output request is received in the output request receiving step.
- the layer corresponding to the output request from Select data characterized in that it comprises an output data generation step of generating output data based on the visualization data of the selected layer, a.
- FIG. 1 is a diagram showing a system configuration example of a genome analysis system according to the present embodiment.
- a genome analysis system 1 shown in FIG. 1 has a genome analysis device 2 and a client device 3 connected via a network 4 such as the Internet.
- Genome data 11 includes a large amount of fragmented base sequence information output from the next-generation sequencer, such as sequence data called a FASTQ format read sequence, data in which the read sequence is mapped to a reference sequence, such as SAM format, BAM Format data. This genome data 11 is input to the genome analyzer 2.
- the genome analysis apparatus 2 is an apparatus that inputs genome data 11 and performs various analyzes such as ChIP-Seq, RNA-Seq, and mutation analysis on the input genome data 11.
- the genome analysis device 2 functions as an application server that performs an analysis related to the analysis request in response to an analysis request from the client device 3 connected via the network 4.
- the genome analysis device 2 creates output data related to the genome data 11 in response to the output request from the client device 3 and transmits it to the client device 3.
- the output data here is Web page data (hereinafter also referred to as “display data”), report data in which analysis results and the like are represented in a table format or a PDF format.
- display data the genome analysis device 2 functions as a Web server.
- the client device 3 transmits an analysis request input by the user on the device to the genome analysis device 2. Further, the client device 3 transmits an output request to the genome analysis device 2.
- the output request here is a request related to display of the display data (hereinafter also referred to as “display request”) or a request related to output of the report data.
- display request a request related to display of the display data
- the client device 3 functions as a Web client and has a Web browser type viewer that displays display data received from the genome analysis device 2.
- the genome analysis device 2 analyzes the genome data 11 and transmits output data indicating the analysis result and the like to the client device 3.
- the genome analysis device 2 may be a server group constructed on the cloud or an on-premises server.
- FIG. 2 is a diagram illustrating a hardware configuration example of the genome analysis apparatus 2 according to the present embodiment.
- the same components as those described above are denoted by the same reference numerals, and redundant description is omitted as appropriate.
- the 2 includes a CPU (Central Processing Unit) 21, a memory 22, an SSD (Solid State Drive) 23, and an interface device 24 connected via a bus 25.
- the CPU 21 is a central processing unit that executes various programs stored in the memory 22.
- the memory 22 is a storage device such as a RAM (Random Access Memory) that stores a program executed by the CPU 21 and data used by the program.
- the SSD 23 is a storage device that stores various data. It may be an HDD (Hard Disk Disk Drive).
- the interface device 24 is an interface device for connecting to the network 4 (see FIG. 1).
- the genome analyzer 2 is not limited to a physical computer. It may be configured by combining a plurality of computers, or may be a virtual server that is virtually provided on the cloud by using virtualization technology.
- FIG. 3 is a diagram illustrating a functional configuration example of the genome analysis apparatus 2 according to the present embodiment.
- a storage unit 209 includes a data reception unit 201, a request issue unit 202, a request reception unit 203, a task control unit 204, a preprocessing unit 205, an analysis unit 206, an output data creation unit 207, a data transmission unit 208, A storage unit 209 is included.
- the data receiving unit 201 receives genomic data 11 of a predetermined chromosome sample made up of a large amount of fragmented genomic base sequences.
- the form of reception may be performed by manual or automatic upload from a computer device (not shown in FIG. 1) connected via the network 4 and storing the genome data 11, or the genome data 11 on the cloud may be received. It may be done by import.
- the request issuing unit 202 internally issues a request for storing the received genomic data 11 in the storage unit 209 when the data receiving unit 201 receives the genomic data 11.
- the request receiving unit 203 receives the analysis request 12 and the output request 13 transmitted from the client device 3.
- the analysis request 12 is a request related to analysis such as ChIP-Seq, RNA-Seq, mutation analysis, and analysis of a predetermined disease such as colorectal cancer or breast cancer.
- the output request 13 is a request related to a display request or report data output.
- the display request describes the designation of display target chromosomes and base coordinates, an instruction for enlargement or reduction, designation of a chromosome sample to be displayed, an instruction for search, and the like.
- the request relating to the output of report data describes the output data format (table format or PDF format), the designation of the gene to be output, and the like.
- the task control unit 204 generates and manages tasks when the request issuing unit 202 issues a request or when the request receiving unit 203 receives an analysis request 12 and an output request 13.
- a task to be executed by the preprocessing unit 205 is generated.
- the request receiving unit 203 receives the analysis request 12
- the task executed by the analyzing unit 206 is generated.
- a task to be executed by the output data creating unit 207 is generated.
- the preprocessing unit 205 performs preprocessing on the genome data 11 by parallel distributed processing.
- the preprocessing here is preprocessing of analysis performed by the analysis unit 206.
- Various data generated as a result of the preprocessing by the preprocessing unit 205 is stored in the storage unit 209.
- the preprocessing unit 205 reads the reference sequence information stored in the sequence DB 212 of the storage unit 209 and sets the FASTQ format sequence to the reference sequence. Data mapping is performed as preprocessing.
- the analysis unit 206 performs the analysis related to the analysis request 12 on the data stored in the storage unit 209 by parallel distributed processing.
- the analysis result by the analysis unit 206 is stored in the storage unit 209.
- the output data creation unit 207 creates output data related to the output request 13 by parallel distributed processing based on the data stored in the storage unit 209.
- the data transmission unit 208 transmits the output data created by the output data creation unit 207 to the client device 3 as a response 14 to the output request 13.
- the storage unit 209 includes data created as a result of preprocessing by the preprocessing unit 205, analysis results by the analysis unit 206, annotation-related data acquired from a public database in advance, and data related to mutation information (hereinafter, collectively referred to as “annotation data”). .) Etc.
- the storage unit 209 includes a file DB 211, an array DB 212, a coverage DB 213, various information DBs 214, and a cache 215.
- the file DB 211 is storage means for storing file information of the input genome data 11 of a predetermined chromosome sample.
- the file information here refers to chromosome sample status information, chromosome sample chromosome information, tag information used for management, bookmark information (chromosome and base coordinates), layout information (chromosome sample data set), etc. It is.
- the tag information is information for facilitating the search of the genome data 11.
- Bookmark information is information consisting of a combination of chromosomes and base coordinates. By storing the bookmark information, it is possible to read the genome data 11 of a desired chromosome sample at high speed by specifying the chromosome and base coordinates.
- the layout information is a data set of chromosome samples. By saving the layout information, it is possible to read a data set of chromosome samples to be displayed at a time.
- the sequence DB 212 is a storage unit that stores information of reference sequences (known genome sequences) for each chromosome acquired in advance from a public database or the like. Specifically, for each chromosome, the ATGC base sequence information of the reference sequence is stored, for example, as a byte string continuous at 1 byte per character. Thereby, it is possible to perform a high-speed search by designating the start position and end position of the base coordinates and random access to arbitrary coordinates.
- reference sequences known genome sequences
- the coverage DB 213 is storage means for storing coverage information between the input genome data 11 and the reference sequence of the chromosome corresponding to the genome data 11.
- the coverage is for overlooking the amount of data, and is calculated by the preprocessing unit 205.
- coverage is stored using chromosomes and base coordinates as keys. Thereby, high-speed search and random access to arbitrary coordinates are possible.
- the coverage DB 213 will be described later in detail with reference to FIGS.
- the various information DB 214 is a storage means for storing various genome information such as annotation data, mutation information, alignment of individual genome data 11, and the like.
- Annotation data is data generated from public gene information such as RefSeq (Reference Sequence) acquired in advance from a public database or the like.
- the mutation information is public mutation information such as dbSNP (Single Nucleotide Polymorphism) acquired in advance from a public database or the like.
- the alignment is the base coordinates of each piece of fragmented data (hereinafter also referred to as “fragmented data”) constituting the input genome data 11, and the base coordinates are determined by referring to a reference sequence.
- various information DB 214 similar to the coverage DB 213, various information is stored using chromosomes and base coordinates as keys. Therefore, high-speed search and random access to arbitrary coordinates are possible.
- the various information DB 214 will be described later in detail with reference to FIG.
- the various information DB 214 also stores new (improved) annotation data generated as a result of analysis performed by the analysis unit 206 using the annotation data stored in the various information DB 214.
- annotation data for the genome data 11 generated by the preprocessing of the preprocessing unit 205 is also stored.
- the cache 215 is a storage unit for caching data necessary when the analysis unit 206 performs analysis or the output data creation unit 207 creates output data. That is, the cache 215 is for accessing data at high speed.
- the data receiving unit 201, the request receiving unit 203, and the data transmitting unit 208 are realized by the CPU 21 and the interface device 24 of FIG.
- the request issuing unit 202, the task control unit 204, the preprocessing unit 205, the analysis unit 206, and the output data creation unit 207 are realized by the CPU 21 in FIG.
- the storage unit 209 is realized by the CPU 21, the memory 22, and the SSD 23 in FIG.
- the coverage DB 213 and various information DBs 214 store data for visualization of a plurality of different layers for the genome data 11 generated by the preprocessing by the preprocessing unit 205 or the like. Then, the output data creation unit 207 selects the visualization data for the layer corresponding to the output request 13, and creates output data based on the visualization data for the selected layer. As a result, seamless visualization can be easily performed using a Web browser mechanism without requiring recalculation of output data or the like.
- the output data creation unit 207 has a so-called prefetch function for creating display data with a slightly wider data range than the display data of the data range currently displayed in the display area 68 (see FIG. 6) when creating display data. Shall. Thereby, for example, even when the display range is changed by dragging the mouse (input device 34 in FIG. 4) on the display area 68 up and down, left and right, seamless visualization can be performed according to the change of the display range. .
- FIG. 4 is a diagram illustrating a hardware configuration example of the client device according to the present embodiment.
- the 4 includes a CPU 31, a memory 32, an SSD 33, an input device 34, a display device 35, and an interface device 36 connected via a bus 37.
- the CPU 31, the memory 32, the SSD 33, and the interface device 36 are the same as the CPU 21, the memory 22, the SSD 23, and the interface device 24 shown in FIG.
- the input device 34 is a device for the user to input various information, such as a keyboard and a mouse.
- the display device 35 is a display, for example.
- FIG. 5 is a diagram illustrating a functional configuration example of the client device according to the present embodiment.
- 5 includes an input unit 301, a request transmission unit 302, a data reception unit 303, and an output unit 304.
- the input unit 301 inputs input information for the input device 34 (see FIG. 4).
- Input information here refers to instruction information related to analysis such as ChIP-Seq, RNA-Seq, mutation analysis, designation of display target chromosome and base coordinates, designation of enlargement or reduction, designation of chromosome sample to be displayed, search Instruction information related to the display of the instruction or the like, or the output data format (table format or PDF format) of the report data to be output and the designation information such as the gene to be output.
- the request transmission unit 302 issues an analysis request 12 and an output request 13 according to the input information in the input unit 301 and transmits the analysis request 12 and the output request 13 to the genome analysis device 2.
- the data receiving unit 303 receives the response 14 transmitted from the genome analysis device 2.
- the output unit 304 analyzes the response 14 received by the data receiving unit 303 and displays the display data on the display device 35 (see FIG. 4) or outputs report data.
- the input unit 301 and the output unit 304 are realized by the CPU 31 in FIG.
- the request transmission unit 302 and the data reception unit 303 are realized by the CPU 31 and the interface device 36 of FIG.
- FIG. 6 is an example of a screen displayed on the client device according to the present embodiment.
- a chromosome designation field 61 for designating the chromosome to be displayed, an input field 62 for inputting the start position of the base coordinate range to be displayed, and an input field for inputting the end position.
- an enlargement button 64 for inputting an enlargement instruction
- a reduction button 65 for inputting a reduction instruction
- a keyword input field 66 for inputting a search keyword
- a chromosome sample designation field 67 for designating a chromosome sample to be displayed
- a display area 68 in which display data is displayed is included.
- the user can input various instruction information related to the display.
- FIG. 7 is a diagram for explaining a task control unit, a preprocessing unit, an analysis unit, and an output data creation unit of the genome analysis apparatus 2 according to the present embodiment.
- parallel distributed processing performed by the task control unit 204, the preprocessing unit 205, the analysis unit 206, and the output data creation unit 207 of FIG. 3 will be described.
- the task control unit 204 includes a request queue 241, a process manager 242, and a task queue 243.
- the request queue 241 is a FIFO queue that stores requests issued by the request issuing unit 202, such as requests, analysis requests 12, and output requests 13 (see FIG. 3).
- the process manager 242 takes out the request stored in the request queue 241 and generates one or more tasks based on the request.
- the generated task includes a parallel task that is executed without waiting for the end of execution of the previous task, and a sequential task that is executed after the end of execution of the previous task.
- the generated task is stored in the task queue 243 which is a FIFO type queue in principle.
- the pre-processing unit 205 includes one or more worker instances 251.
- Each worker instance 251 has a worker process 252 that sequentially executes executable tasks among tasks stored in the task queue 243 and actually executes them, and a worker manager 253 that monitors the operation of the worker processes 252.
- the number of worker instances 251 dynamically increases or decreases according to the number of tasks stored in the task queue 243, and the tasks stored in the task queue 243 are processed in parallel and distributed. The same applies to the worker instance 261, worker process 262, and worker manager 263 of the analysis unit 206, and the worker instance 271, worker process 272, and worker manager 273 of the output data creation unit 207.
- the task control unit 204 generates and manages tasks based on the request, and the preprocessing unit 205, the analysis unit 206, and the output data creation unit 207 perform parallel and distributed processing on the generated tasks. Thereby, high-speed processing is possible.
- each request stored in the request queue 241 is independent, and a plurality of requests are processed in parallel.
- each worker instance 251 is independent and is a simple mechanism that only processes what can be processed by its own instance, so it can be easily scaled out.
- the request queue 241 and the task queue 243 are not limited to FIFO type queues. Other types of queues may be used.
- FIG. 8 is a flowchart showing an example of the control logic related to the preprocessing of the genome analyzing apparatus 2 according to this embodiment.
- the genome analysis apparatus 2 receives genome data 11 in the SAM format or the BAM format will be described with reference to FIGS. 3 and 7 as appropriate.
- step S11 the data receiving unit 201 receives the SAM format or BAM format genome data 11 (S11). Then, the request issuing unit 202 internally issues a request for storing the received genome data 11 in the storage unit 209.
- step S12 the task control unit 204 (process manager 242) generates four tasks, a sort task, an index assignment task, a coverage calculation task, and a DB output task for the genome data 11 based on the request (step S12). S12). The generated task is stored in the task queue 243.
- the sort task is a task for rearranging each fragmented data of the input genome data 11 in the order of the base sequence.
- the index assignment task is a task for assigning an index to each fragmented data rearranged by the sort task. These sort task and index assignment task are tasks for speeding up the processing.
- the coverage calculation task is a task for calculating the coverage between the reference sequence (a known genome sequence) and the genome data 11.
- the DB output task is a task for outputting the calculated coverage to the storage unit 209 (coverage DB 213).
- step S13 the preprocessing unit 205 (a plurality of worker instances 251) executes a sort process on the genome data 11 (S13), and then proceeds to step S14 to execute an index assignment process (S14).
- step S15 the preprocessing unit 205 (a plurality of worker instances 251) executes the coverage calculation of the genome data 11 and the output to the storage unit 209 in parallel (S15).
- the genome analysis device 2 calculates the coverage of the input genome data 11 in the SAM format or BAM format and outputs it to the coverage DB 213.
- FIG. 9 is a diagram for explaining an example of the process of step S15 of FIG.
- the base coordinates and the fragmentation data of the chromosome sample X which is the genome data 11 mapped to the reference sequence of a predetermined chromosome, are simply illustrated.
- the leftmost base coordinate is 1 for convenience of explanation.
- the preprocessing unit 205 calculates the coverage when the bin size is 1 (bin_1).
- the bin size is the number of base units for which coverage is calculated. That is, here, the coverage of each base is calculated. In the example shown in FIG. 9, the coverage of each base of 0, 0, 0, 0, 1, 2, 3, 4, 4,.
- the preprocessing unit 205 calculates the coverage when the bin size is doubled and the bin size is 2 (bin_2), that is, the coverage for every two bases.
- the coverage may be halved, that is, the average value of the coverage may be calculated.
- correction may be performed to avoid gaps in the numerical values of the coverage when the bin sizes are different. preferable.
- it is assumed that correction for calculating the average value of coverage is performed (the same applies hereinafter). In the example shown in FIG. 9, the coverage for each of the two bases 0, 0, 1.5, 3.5, 4,.
- the preprocessing unit 205 further doubles the bin size and calculates the coverage when the bin size is 4 (bin_4), that is, the coverage for every four bases. In the example shown in FIG. 9, coverage for each of the four bases 0, 2.5, 4, 5, 5.25,. Thereafter, the preprocessing unit 205 calculates the coverage by doubling the repetition bin size. The coverage calculated in this way is output to the coverage DB 213.
- FIG. 10 is a diagram illustrating an example of the coverage DB of the genome analysis apparatus according to the present embodiment.
- an example of the coverage DB 213 is indicated by a table 100 (hereinafter also referred to as “coverage table 100”).
- the attributes of the coverage table 100 include bin size 101, base coordinates 102A, coverage 102B, base coordinates 103A, coverage 103B, base coordinates 104A, coverage 104B,.
- the bin size 101 is the number of base units for which coverage is calculated. In FIG. 10, for convenience of explanation, the minimum value of the bin size 101 is 512.
- the base coordinates 102A indicate the base coordinates of the coverage calculation target indicated by the coverage 102B by a combination of the start position and the end position.
- the coverage 102B is a calculated coverage. The same applies to the base coordinates 103A, the coverage 103B, the base coordinates 104A, the coverage 104B,.
- the coverage of bases at coordinates 1 to 512 when the bin size is 512 is “XX”, and the coverage of bases at coordinates 4097 to 6144 when the bin size is 2048 is “ ⁇ . ⁇ ”.
- coverage and base coordinates are stored in the coverage table 100 in association with each different bin size.
- Such a coverage table 100 is generated for each chromosome and for each chromosome sample (input genome data 11). Further, the coverage for each bin size stored in the coverage table 100 is an example of the aforementioned “data for visualization of a plurality of different layers for the genome data 11”.
- the output data creation unit 207 displays the designated chromosome.
- the coverage of the bin size corresponding to the chromosome sample and the base coordinate range is selected from the coverage table 100 and read.
- the output data creation unit 207 creates display data for, for example, histogram display based on the coverage of the selected bin size.
- the output data creation unit 207 creates display data having a slightly wider data range than the display data of the data range currently displayed in the display area 68 (see FIG. 6). .
- seamless visualization can be performed according to the change of the display range. .
- the output data creation unit 207 provides the coverage with the next smallest bin size (or the next largest bin size). Is read from the coverage table 100, and display data for histogram display is created based on the coverage. Thereby, even when the base coordinate range to be displayed is changed, the displayed coverage can be easily switched without recalculating the display data. Therefore, simple and seamless visualization is possible.
- the process of calculating the coverage with the bin size doubled is repeated, but this is not a limitation. For example, it may be three times or more. Further, the preprocessing unit 205 may generate data for visualization of a plurality of different layers by using an index other than the bin size.
- FIG. 11 is a diagram for explaining an example of various information DBs of the genome analysis apparatus according to the present embodiment.
- annotation data is one of data stored in various information DBs 214, as an example.
- the reference base is simply illustrated as having all base coordinates of 1 to 99999.
- each node (bin0, bin1, bin2,...) Constituting the N-ary tree data structure 110A has the base coordinates (start position / end position) of the node and a pointer to the intermediate data structure 110B. Hold.
- the intermediate data structure 110B holds a base coordinate position, a base length, and a pointer to the data body 110C.
- the data body 110C holds the data bodies A, B, and C of each annotation data in an arbitrary length and an arbitrary format.
- the base coordinate position, the base length, and the annotation data are stored in association with each node. That is, for each base coordinate range specified by the base coordinate position and the base length from the position, the base coordinate range and annotation data are stored in association with each other.
- N-ary tree data structure 110A an intermediate data structure 110B, and a data body 110C are generated for each chromosome and for each chromosome sample (genome data 11).
- annotation data of each base coordinate range is an example of the above-mentioned “data for visualization of a plurality of different layers for the genome data 11”.
- the output data creation unit 207 displays the designated chromosome.
- the annotation data of the node corresponding to the chromosome sample and the base coordinate range is selected from various information DBs 214 and read.
- the output data creation unit 207 creates display data for displaying the annotation data of the selected node.
- the display data creation unit 207 creates display data having a slightly wider data range than the display data of the data range currently displayed in the display area 68 (see FIG. 6). . Thereby, for example, even when the display range is changed by dragging the mouse (input device 34 in FIG. 4) on the display area 68 up and down, left and right, seamless visualization can be performed according to the change of the display range. .
- the output data creation unit 207 displays the annotation data of the child node of the own node (or the parent node of the own node). And create display data to display the annotation data.
- the displayed annotation data can be easily switched without recalculating the display data. Therefore, seamless visualization is possible.
- the N-ary tree data structure 110A has a tri-ary tree structure
- the present invention is not limited to this case.
- a binary tree may be used.
- the various information DBs 214 may store data for visualization of a plurality of different layers using a data structure other than the N-ary tree data structure.
- annotation data held in the data body 110C is public gene information such as RefSeq acquired in advance from a public database as described above. Therefore, when the gene information in the public database or the like is updated, it is only necessary to update only the data body 110C without changing the structure of the N-ary tree data structure 110A and the intermediate data 110B. Also, new (improved) annotation data generated as a result of analysis performed by the analysis unit 206 using the annotation data stored in the various information DBs 214 is stored in the data body 110C. In addition, annotation data for the genome data 11 generated by the preprocessing of the preprocessing unit 205 is also stored.
- the various information DB 214 may store a gene and annotation data in association with each gene.
- the annotation data for each gene is an example of the “data for visualization of a plurality of different layers for the genome data 11” described above. Details will be described later with reference to FIGS.
- FIG. 12 is a flowchart showing an example of control logic related to output data creation of the genome analyzing apparatus 2 according to the present embodiment.
- control logic related to output data creation of the genome analyzing apparatus 2 according to the present embodiment.
- FIGS. 3 and 7 an example of processing when the genome analysis apparatus 2 receives the output request 13 will be described with reference to FIGS. 3 and 7 as appropriate.
- step S21 the request receiving unit 203 receives the output request 13 (S21). Then, the process proceeds to step S22, and the task control unit 204 (process manager 242) generates two tasks, a data selection task and an output data creation task, based on the request (S22). The generated task is stored in the task queue 243.
- the data selection task is a task for selecting and reading data from the storage unit 209 according to the description content of the output request 13.
- the description content of the output request 13 includes a display target chromosome and base coordinates, an instruction to enlarge or reduce, a specification of a chromosome sample to be displayed, and an instruction to search.
- the description content of the output request 13 includes specification of an output data format (table format or PDF format) of report data to be output, a gene to be output, and the like.
- the output data creation task is a task for creating output data based on the data selected and read by the data selection task.
- step S23 the output data creation unit 207 (a plurality of worker instances 271) selects and reads data from the storage unit 209 (S23).
- step S24 output data is created based on the selected and read data (S24).
- step S25 the data transmission unit 208 transmits the output data created by the output data creation unit 207 to the client device 3 as a response 14 to the output request 13 (S25).
- the genome analysis device 2 visualizes genome information by transmitting output data related to the genome data 11 in response to the output request 13 from the client device 3.
- the visualization data of the layer corresponding to the output request 13 is selected from the storage unit 209.
- the output data is created based on the visualization data of the selected layer. Therefore, in particular, when the output data is display data and the display range is changed after the genome data 11 is once displayed in the predetermined display range, the mechanism of the Web browser is performed without recalculating the display data. It is possible to easily and seamlessly visualize using.
- FIGS. 13 to 17 are diagrams showing first to fifth specific examples of display screens provided by the genome analyzing apparatus according to the present embodiment.
- the chromosome A is displayed in the chromosome designation field 61, “27,135,000” is displayed in the start position input field 62 of the base coordinate range to be displayed, “27,160,000” is displayed in the end position input field 63, and the chromosome sample designation field 67 is displayed. Chromosome samples X, Y, and Z are input.
- the coverage of the chromosome samples X, Y, Z and the chromosome A in the base coordinate range of 27,135,000 to 27,160,000 (base number 25,000) is displayed as a histogram. In this way, a large number of chromosome samples can be compared on one screen.
- the output data creation unit 207 in FIG. 3 selects a coverage (see FIG. 10) whose bin size is larger than the coverage used in this screen display, Display data is created based on the coverage of the selected bin size. As a result, the screen shifts to a screen as shown in FIG.
- the output data creation unit 207 in FIG. 3 has a coverage whose bin size is smaller than the coverage used in this screen display (see FIG. 10). ) To create display data based on the coverage of the selected bin size. However, you may transfer to a screen like FIG.15 or FIG.16.
- the output data creation unit 207 in FIG. 3 may create display data indicating the mapping mode of each fragmented data.
- the bases constituting the fragmentation data of the chromosome samples X, Y, and Z are displayed in a distinguishable manner.
- the output data creation unit 207 of FIG. 3 may create display data that shows each base constituting each fragmented data in a distinguishable manner.
- the reference sequence (bottom part) of chromosome A, the base sequence of predetermined fragmentation data of chromosome samples X, Y, and Z and the annotation data are displayed in a distinguishable manner.
- the output data creation unit 207 in FIG. 3 may create display data that indicates the reference sequence, the base sequence of the fragmented data, and the annotation data in a distinguishable manner. Note that the output data creation unit 207 in FIG. 3 also creates display data that indicates the reference sequence, the base sequence of the fragmented data, and the annotation data in a manner in which the annotation data can be distinguished, even when displaying in a wide area as shown in FIGS. can do.
- FIG. 18 and FIG. 19 are diagrams showing first and second specific examples of report data output from the genome analyzing apparatus according to the present embodiment.
- Report data 200 relating to colorectal cancer is shown.
- Report data 200 includes gene name 201, chromosome position 202, exon 203, mutation 204, dbSNP 205, mutation frequency 206 of the target gene, mutation frequency 207 within the target gene, drug responsiveness 208, drug name 209, and source 210. Includes each column.
- Each information in each column is annotation data stored in association with the gene “KRAS” in the various information DB 214 (see FIG. 3).
- the first line of the report data 200 describes that the base position in the chromosome of the gene “KRAS” indicated by the gene name 201 is “12p12.1” (chromosome position 202).
- the frequency of occurrence of the mutation indicated by the mutation 204 is “36-40%” (mutation frequency 206) and the target gene “KRAS” It is described that the frequency of mutation is “33.5-34.4%” (mutation frequency 207).
- rs112445441 described in the dbSNP 205 indicates an identification number of information related to the mutation of the gene in the dbSNP which is a database of SNP ((Single Nucleotide Polymorphism). Since it is the same as that of the 1st line, description is abbreviate
- the second specific example shown in FIG. 19 shows report data 300 relating to breast cancer.
- Report data 300 includes gene name 301, chromosome position 302, exon 303, mutation 304, dbSNP 305, target gene mutation frequency 306, mutation frequency 307 within the target gene, drug responsiveness 308, drug name 309, and source 310 Includes each column.
- Each information in each column is annotation data stored in association with the gene “PIK3CA” in the various information DB 214 (see FIG. 3).
- the first line of the report data 300 describes that the base position in the chromosome of the gene “PIK3CA” indicated by the gene name 301 is “3q26.3” (chromosome position 302).
- the frequency of occurrence of the mutation indicated by mutation 304 is “26%” (mutation frequency 306), and within the target gene “PIK3CA” It is described that the mutation frequency is “ ⁇ 11%” (mutation frequency 307).
- “rs12193273” described in the dbSNP 305 indicates an identification number of information relating to the mutation of the gene in the dbSNP that is the SNP database. Since the second and third lines of the report data 300 are the same as the first line, description thereof is omitted.
- the mutation of the gene “PIK3CA” shown in the first to third lines in the report data 300 there is a combined use of two types of medicines “trastuzumab” and “lapatinib” shown by the drug name 309, and this drug has no effect It is described.
- the drug responsiveness 308 describes the response of the drug.
- the source of the information regarding the responsiveness of the medicine is a website on the Internet indicated by the source 310.
- the report data 200 and 300 described with reference to FIGS. 18 and 19 are the following steps when the output request 13 is a request related to the output of report data in the process according to step S21 in FIG. It is created by the processing related to S22 to S24.
- step S23 the output data creation unit 207 (a plurality of worker instances 271) selects and reads data from the storage unit 209 (S23).
- annotation data associated with a gene for example, “KRAS” or “PIK3CA” is selected and read.
- report data such as report data 200 and 300 is created based on the selected and read data (S24).
- the output data creation unit 207 creates report data 200 and 300 as shown in FIG. 18 and FIG. 19 when a gene mutation corresponding to a predetermined disease is detected as a result of the analysis by the genome analyzer 2. Good.
- an operator of the client apparatus 3 for example, a doctor, can use the created report data for medical diagnosis of a predetermined disease (for example, “colon cancer” or “breast cancer”).
- the annotation data for each gene stored in the various information DBs 214 is not limited to the data illustrated in FIG. 18 and FIG. For example, it may be data indicating past diagnosis information related to the gene, basic experiment information, or patent document information deeply related to drugs.
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Biophysics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Analytical Chemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Chemical & Material Sciences (AREA)
- Molecular Biology (AREA)
- Genetics & Genomics (AREA)
- Databases & Information Systems (AREA)
- Bioethics (AREA)
- Data Mining & Analysis (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
Description
2 ゲノム解析装置
3 クライアント装置
4 ネットワーク
11 ゲノムデータ
12 解析リクエスト
13 出力リクエスト
14 レスポンス
201 データ受信部
202 リクエスト発行部
203 リクエスト受信部
204 タスク制御部
205 前処理部
206 解析部
207 出力データ作成部
208 データ送信部
209 記憶部
211 ファイルDB
212 配列DB
213 カバレッジDB
214 各種情報DB
215 キャッシュ
Claims (8)
- 大量の断片化されたゲノム塩基配列からなるゲノムデータの解析を行うとともに、ネットワークを介して接続されたクライアント装置からの出力リクエストに応じて、前記ゲノムデータに関する出力データを送信するゲノム解析装置であって、
前記ゲノムデータについて、複数の異なるレイヤーの可視化用データを記憶する記憶手段と、
前記クライアント装置からの出力リクエストを受信するリクエスト受信手段と、
前記表示リクエスト受信手段が前記出力リクエストを受信した場合に、前記記憶手段から当該出力リクエストに対応するレイヤーの可視化用データを選択し、選択されたレイヤーの可視化用データに基づいて出力データを作成する出力データ作成手段と、
を備えたことを特徴とするゲノム解析装置。 - 前記複数の異なるレイヤーの可視化用データは、異なるビンサイズ毎に計算された、前記ゲノムデータの塩基配列と既知のゲノムの塩基配列とのカバレッジであることを特徴とする請求項1に記載のゲノム解析装置。
- 前記出力データ作成手段は、前記出力リクエスト受信手段が前記出力リクエストを受信した場合に、前記記憶手段から当該出力リクエストに対応するビンサイズのカバレッジを選択し、選択されたカバレッジをヒストグラム表示するための表示データを作成することを特徴とする請求項2に記載のゲノム解析装置。
- 前記複数の異なるレイヤーの可視化用データは、異なる塩基座標範囲毎に、塩基座標範囲とアノテーションデータとを対応付けたものであることを特徴とする請求項1に記載のゲノム解析装置。
- 前記出力データ作成手段は、前記出力リクエスト受信手段が前記出力リクエストを受信した場合に、前記記憶手段から当該出力リクエストで指定された塩基座標範囲のアノテーションデータを選択し、選択されたアノテーションデータを表示するための表示データを作成することを特徴とする請求項4に記載のゲノム解析装置。
- 前記複数の異なるレイヤーの可視化用データは、遺伝子毎に、遺伝子とアノテーションデータとを対応付けたものであることを特徴とする請求項1に記載のゲノム解析装置。
- 前記出力データ作成手段は、前記出力リクエスト受信手段が前記出力リクエストを受信した場合に、前記記憶手段から当該出力リクエストで指定された遺伝子のアノテーションデータを選択し、選択されたアノテーションデータに係るレポートデータを作成することを特徴とする請求項6に記載のゲノム解析装置。
- 大量の断片化されたゲノム塩基配列からなるゲノムデータの解析を行うとともに、前記ゲノムデータに関するデータを記憶する記憶部を有し、ネットワークを介して接続されたクライアント装置からの出力リクエストに応じて、前記ゲノムデータに関する出力データを送信するゲノム解析装置におけるゲノム可視化方法であって、
前記記憶部は、前記ゲノムデータについて、複数の異なるレイヤーの可視化用データを記憶し、
前記ゲノム可視化方法は、
前記クライアント装置からの出力リクエストを受信するリクエスト受信工程と、
前記表示リクエスト受信工程で前記出力リクエストを受信した場合に、前記記憶部から当該出力リクエストに対応するレイヤーの可視化用データを選択し、選択されたレイヤーの可視化用データに基づいて出力データを作成する出力データ作成工程と、
を含むことを特徴とするゲノム可視化方法。
Priority Applications (5)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP16786606.0A EP3291114B1 (en) | 2015-04-30 | 2016-04-28 | Genome analysis device and genome visualization method |
US15/532,810 US10573405B2 (en) | 2015-04-30 | 2016-04-28 | Genome analysis and visualization using coverages for bin sizes and ranges of genomic base coordinates calculated and stored before an output request |
CN201680003789.3A CN107004069B (zh) | 2015-04-30 | 2016-04-28 | 基因组解析装置及基因组可视化方法 |
KR1020177017545A KR102140032B1 (ko) | 2015-04-30 | 2016-04-28 | 게놈 해석 장치 및 게놈 가시화 방법 |
JP2017515639A JP6593763B2 (ja) | 2015-04-30 | 2016-04-28 | ゲノム解析装置及びゲノム可視化方法 |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2015093739 | 2015-04-30 | ||
JP2015-093739 | 2015-04-30 |
Publications (2)
Publication Number | Publication Date |
---|---|
WO2016175330A1 true WO2016175330A1 (ja) | 2016-11-03 |
WO2016175330A9 WO2016175330A9 (ja) | 2017-05-11 |
Family
ID=57199371
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2016/063509 WO2016175330A1 (ja) | 2015-04-30 | 2016-04-28 | ゲノム解析装置及びゲノム可視化方法 |
Country Status (6)
Country | Link |
---|---|
US (1) | US10573405B2 (ja) |
EP (1) | EP3291114B1 (ja) |
JP (1) | JP6593763B2 (ja) |
KR (1) | KR102140032B1 (ja) |
CN (1) | CN107004069B (ja) |
WO (1) | WO2016175330A1 (ja) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2022024221A1 (ja) | 2020-07-28 | 2022-02-03 | 株式会社テンクー | プログラム、学習モデル、情報処理装置、情報処理方法および学習モデルの生成方法 |
JP2022180553A (ja) * | 2018-06-29 | 2022-12-06 | シスメックス株式会社 | 解析方法、情報処理装置、レポート提供方法 |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP7320345B2 (ja) * | 2017-10-27 | 2023-08-03 | シスメックス株式会社 | 遺伝子解析方法、遺伝子解析装置、遺伝子解析システム、プログラム、および記録媒体 |
CN108052800A (zh) * | 2017-12-19 | 2018-05-18 | 石家庄铁道大学 | 一种传染性病毒传播过程的可视化重建方法及终端 |
CN109326330B (zh) * | 2018-08-30 | 2020-10-16 | 武汉古奥基因科技有限公司 | 生物信息分析工具的制作方法、装置及可存储介质 |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH08110909A (ja) * | 1994-10-13 | 1996-04-30 | Hitachi Ltd | 配列検索方法および装置 |
JP2002091991A (ja) * | 2000-09-20 | 2002-03-29 | Intec Web & Genome Informatics Corp | 遺伝子ネットワーク研究支援システム及び方法 |
JP2005234697A (ja) * | 2004-02-17 | 2005-09-02 | Hitachi Software Eng Co Ltd | 遺伝子情報の表示方法及び表示装置 |
JP2011229817A (ja) * | 2010-04-30 | 2011-11-17 | Hitachi Aloka Medical Ltd | 超音波診断装置 |
WO2013024810A1 (ja) * | 2011-08-12 | 2013-02-21 | 株式会社モーションラボ | 高速演算装置、高速演算プログラム及び高速演算プログラムを記録した記録媒体、機器制御システム、並びにシミュレーションシステム |
JP2013126131A (ja) * | 2011-12-15 | 2013-06-24 | Toyota Motor Corp | ラジオ雑音除去装置 |
JP2014505935A (ja) * | 2010-12-29 | 2014-03-06 | ダウ アグロサイエンシィズ エルエルシー | Dna配列のデータ解析法 |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR101949569B1 (ko) | 2011-12-08 | 2019-02-18 | 파이브3 제노믹스, 엘엘씨 | 게놈 데이터의 동적 인덱싱 및 시각화를 제공하는 분산 시스템 |
-
2016
- 2016-04-28 EP EP16786606.0A patent/EP3291114B1/en active Active
- 2016-04-28 WO PCT/JP2016/063509 patent/WO2016175330A1/ja active Application Filing
- 2016-04-28 US US15/532,810 patent/US10573405B2/en active Active
- 2016-04-28 CN CN201680003789.3A patent/CN107004069B/zh active Active
- 2016-04-28 KR KR1020177017545A patent/KR102140032B1/ko active IP Right Grant
- 2016-04-28 JP JP2017515639A patent/JP6593763B2/ja active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH08110909A (ja) * | 1994-10-13 | 1996-04-30 | Hitachi Ltd | 配列検索方法および装置 |
JP2002091991A (ja) * | 2000-09-20 | 2002-03-29 | Intec Web & Genome Informatics Corp | 遺伝子ネットワーク研究支援システム及び方法 |
JP2005234697A (ja) * | 2004-02-17 | 2005-09-02 | Hitachi Software Eng Co Ltd | 遺伝子情報の表示方法及び表示装置 |
JP2011229817A (ja) * | 2010-04-30 | 2011-11-17 | Hitachi Aloka Medical Ltd | 超音波診断装置 |
JP2014505935A (ja) * | 2010-12-29 | 2014-03-06 | ダウ アグロサイエンシィズ エルエルシー | Dna配列のデータ解析法 |
WO2013024810A1 (ja) * | 2011-08-12 | 2013-02-21 | 株式会社モーションラボ | 高速演算装置、高速演算プログラム及び高速演算プログラムを記録した記録媒体、機器制御システム、並びにシミュレーションシステム |
JP2013126131A (ja) * | 2011-12-15 | 2013-06-24 | Toyota Motor Corp | ラジオ雑音除去装置 |
Non-Patent Citations (1)
Title |
---|
See also references of EP3291114A4 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2022180553A (ja) * | 2018-06-29 | 2022-12-06 | シスメックス株式会社 | 解析方法、情報処理装置、レポート提供方法 |
JP7399238B2 (ja) | 2018-06-29 | 2023-12-15 | シスメックス株式会社 | 解析方法、情報処理装置、レポート提供方法 |
WO2022024221A1 (ja) | 2020-07-28 | 2022-02-03 | 株式会社テンクー | プログラム、学習モデル、情報処理装置、情報処理方法および学習モデルの生成方法 |
Also Published As
Publication number | Publication date |
---|---|
EP3291114B1 (en) | 2024-01-17 |
JP6593763B2 (ja) | 2019-10-23 |
EP3291114A4 (en) | 2018-12-26 |
KR20170087508A (ko) | 2017-07-28 |
CN107004069A (zh) | 2017-08-01 |
US20170372003A1 (en) | 2017-12-28 |
KR102140032B1 (ko) | 2020-07-31 |
WO2016175330A9 (ja) | 2017-05-11 |
EP3291114A1 (en) | 2018-03-07 |
JPWO2016175330A1 (ja) | 2018-03-29 |
CN107004069B (zh) | 2021-12-03 |
US10573405B2 (en) | 2020-02-25 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Li et al. | Hiplot: a comprehensive and easy-to-use web service for boosting publication-ready biomedical data visualization | |
JP6593763B2 (ja) | ゲノム解析装置及びゲノム可視化方法 | |
JP7046840B2 (ja) | 二次および/または三次処理を実行するためのバイオインフォマティクスシステム、装置、および方法 | |
CN110121747B (zh) | 用于执行二级和/或三级处理的生物信息学系统、设备和方法 | |
Kalari et al. | MAP-RSeq: Mayo analysis pipeline for RNA sequencing | |
Khomtchouk et al. | HeatmapGenerator: high performance RNAseq and microarray visualization software suite to examine differential gene expression levels using an R and C++ hybrid computational pipeline | |
D'Antonio et al. | WEP: a high-performance analysis pipeline for whole-exome data | |
Bare et al. | Integration and visualization of systems biology data in context of the genome | |
Curk et al. | SNPsyn: detection and exploration of SNP–SNP interactions | |
Lajugie et al. | GenPlay, a multipurpose genome analyzer and browser | |
Zeng et al. | G2PDeep: a web-based deep-learning framework for quantitative phenotype prediction and discovery of genomic markers | |
Palatnick et al. | iGenomics: Comprehensive DNA sequence analysis on your Smartphone | |
Djekidel et al. | HiC‐3DViewer: a new tool to visualize Hi‐C data in 3D space | |
Spector et al. | ClinTAD: a tool for copy number variant interpretation in the context of topologically associated domains | |
Rudan et al. | Developing biobanks in developing countries | |
Sulkowska et al. | KnotGenome: a server to analyze entanglements of chromosomes | |
Pearce et al. | Interactive browser-based genomics data visualization tools for translational and clinical laboratory applications | |
Jianu et al. | What Google Maps can do for biomedical data dissemination: examples and a design study | |
Hung et al. | fast_protein_cluster: parallel and optimized clustering of large-scale protein modeling data | |
Killcoyne et al. | FIGG: simulating populations of whole genome sequences for heterogeneous data analyses | |
Zou et al. | eQTL Viewer: visualizing how sequence variation affects genome-wide transcription | |
Reid et al. | XenMine: a genomic interaction tool for the Xenopus community | |
Fortmann-Grote et al. | RAREFAN: A webservice to identify REPINs and RAYTs in bacterial genomes | |
Newman | Interactive analysis of large cancer copy number studies with Copy Number Explorer | |
Valeev et al. | BioUML genome browser |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 16786606 Country of ref document: EP Kind code of ref document: A1 |
|
REEP | Request for entry into the european phase |
Ref document number: 2016786606 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 15532810 Country of ref document: US |
|
ENP | Entry into the national phase |
Ref document number: 20177017545 Country of ref document: KR Kind code of ref document: A |
|
ENP | Entry into the national phase |
Ref document number: 2017515639 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |