WO2015198074A1 - Methods, applications and systems for processing and presenting gene sequencing information - Google Patents

Methods, applications and systems for processing and presenting gene sequencing information Download PDF

Info

Publication number
WO2015198074A1
WO2015198074A1 PCT/GB2015/051880 GB2015051880W WO2015198074A1 WO 2015198074 A1 WO2015198074 A1 WO 2015198074A1 GB 2015051880 W GB2015051880 W GB 2015051880W WO 2015198074 A1 WO2015198074 A1 WO 2015198074A1
Authority
WO
WIPO (PCT)
Prior art keywords
sequence
samples
sample
recited
histology
Prior art date
Application number
PCT/GB2015/051880
Other languages
French (fr)
Inventor
Jennifer BECQ
Andrew Warren
Keira CHEETHAM
Original Assignee
Illumina Cambridge Limited
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Illumina Cambridge Limited filed Critical Illumina Cambridge Limited
Publication of WO2015198074A1 publication Critical patent/WO2015198074A1/en

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B45/00ICT specially adapted for bioinformatics-related data visualisation, e.g. displaying of maps or networks

Definitions

  • the disclosed embodiments concern methods, apparatus, systems and computer program products for presenting sequence information.
  • this includes obtaining a first sequence and a second sequence, determining a similarity between the first sequence and the second sequence, wherein the similarity is based upon distance between the first sequence and the second sequence, and displaying a block at an intersection point on a matrix plot based on the similarity between the first sequence and the second sequence.
  • Figure 1 illustrates a block diagram of an example of a system that comprises computer applications for processing and presenting gene sequencing information
  • Figures 2 through 11 show an example of a control panel of the GUI of a space/time application of the system of Figure 1 ;
  • Figures 12 through 25 show examples of presenting selected information using the space/time application of the system of Figure 1;
  • Figures 26 through 29 show an example of a control panel of the GUI of a hierarchical clustering tree application of the system of Figure 1;
  • Figures 30 through 40 show examples of presenting selected information using the hierarchical clustering tree application of the system of Figure 1;
  • Figures 41 through 43 show examples of presenting sequencing information using the scatter plots application of the system of Figure 1;
  • Figures 44 through 47 show examples of presenting gene sequencing information using the matrix plot application of the system of Figure 1. Description
  • a system includes a space/time application that provides a map of a sampled organ (e.g., the esophagus) in which each sample is plotted according to level (e.g., depth or length) in centimeters (cm) along the organ.
  • level e.g., depth or length
  • centimeters centimeters
  • the user has the option to select samples from specified levels, dates, and histological grades and then view those samples in a variety of formats, such as a Manhattan-chart-like format.
  • plots are provided in which the x-axis represents the position along the genome with the chromosomes labeled and separated by lines.
  • the y-axis may represent the variant allele frequency (VAF) (or allele frequency) of each variant for that sample.
  • VAF variant allele frequency
  • the individual data points represent variants and may be color-coded according to their consequence provided by annotation.
  • the system includes a tree application that provides, for example, a binary tree, meaning that similar samples are grouped together in pairs, not in clusters.
  • a group may be paired with a sample or another group of pairs. How samples are paired depends on the distance measure, which can be one of binary, maximum, Manhattan, Euclidian or the Pearson correlation, among others.
  • the system may include a scatter plots application that provides, for example, a grid-like representation of samples at different levels and different time points. When selecting two sample rectangles, a scatter plot is shown of the allele frequencies of all variants between those two samples.
  • the samples may be color coded by histology.
  • the variants may be color coded by their consequence according to annotation.
  • the system includes a matrix plot application that provides, for example, a visualization of the correlation between samples by drawing a grid with the samples on the x- and y-axis and, at the intersection point. Each square may show a darker color if the samples are closely related and a lighter color may be shown if the samples are not closely related.
  • the distance measure can be one of binary, maximum, Manhattan, Euclidian or the Pearson correlation, among others.
  • the grid may be sorted by date, level, and histology, among others.
  • Figure 1 illustrates a block diagram of an example of a system 100 that, in accordance with some embodiments, comprises computer applications for processing and presenting sequencing information, wherein the sequencing data is presented interactively.
  • System 100 includes a computing device 110.
  • Computing device 110 can be any computing device capable of running software applications and optionally connecting to a network.
  • Computing device 110 can be, for example, a server, a desktop computer, a laptop computer, a tablet device (e.g., Apple iPad, Samsung Galaxy Tab 2, MicrosoftTM Surface Pro 3), Smart mobile phone, and the like.
  • One or more applications 115 are installed on computing device 110.
  • the one or more applications 115 are provided for processing and presenting sequencing information, such as the information stored in sequencing data 120.
  • DNA sequencing is the procedure of determining the order of nucleotides in a DNA section.
  • stored in sequencing data 120 is a collection of nucleic acid sequences, protein sequences, or other polymer sequences.
  • stored in sequencing data 120 is a call at an individual base or group of bases made against a reference sequence, or variant call.
  • the variant call may be given as an allele frequency.
  • the allele frequency may be a variable from 0 - 1, or may be a binary 0 or 1.
  • sequencing data 120 may also include relative expressions of mRNA, which may be any continuous variable, negative or positive or log.
  • sequencing data 120 resides locally at computing device 110.
  • sequencing data 120 resides external to computing device 110 and is accessed via a network, such as a network 180.
  • Network 180 can be, for example, any local area network (LAN) or wide area network (WAN).
  • sequencing data 120 can reside both locally at computing device 110 and external to computing device 110.
  • applications 115 includes at least one of a space/time application 130 that has a certain graphical user interface (GUI) 135, a tree application 140 that has a certain GUI 145, a scatter plots application 150 that has a certain GUI 155, and a matrix plot application 160 that has a certain GUI 165.
  • GUI graphical user interface
  • applications 115 includes any combinations of space/time application 130, phylogeny tree application 140, scatter plots application 150, and matrix plot application 160.
  • algorithms 170 may reside on computing device 110 for supporting space/time application 130, tree application 140, scatter plots application 150, and/or matrix plot application 160.
  • algorithms 170 include, but are not limited to, a binary distance measure algorithm, a maximum distance measure algorithm, a Manhattan distance measure algorithm, a Euclidian distance measure algorithm, a Pearson distance measure algorithm, a hierarchical clustering algorithm, a neighbor joining algorithm, any other method for calculating correlations of allele frequencies, means of plotting phylogenetic trees, means of plotting graphs, or methods for selecting/dropping/recalculating data.
  • GUI 135 of space/time application 130, GUI 145 of tree application 140, GUI 155 of scatter plots application 150, and GUI 165 of matrix plot application 160 can be presented to the user in, for example, an internet browser (not shown) on computing device 110.
  • Space/time application 130 and its GUI 135, tree application 140 and its GUI 145, scatter plots application 150 and its GUI 155, and matrix plot application 160 and its GUI 165 can be implemented using any programing code.
  • the applications and GUIs may be implemented using JavaScript, such as, but not limited to, the D3.js JavaScript library, the NJS JavaScript Interpreter, and the Phylogenetic tree JavaScript (jsPhyloSVG), which is an open-source JavaScript library.
  • JavaScript such as, but not limited to, the D3.js JavaScript library, the NJS JavaScript Interpreter, and the Phylogenetic tree JavaScript (jsPhyloSVG), which is an open-source JavaScript library.
  • Space/time application 130, tree application 140, scatter plots application 150, and matrix plot application 160 are computer applications for processing and presenting, for example, sequencing data 120 in an interactive fashion.
  • space/time application 130 provides a map of the sampled organ (e.g., the esophagus) in which each sample is plotted according to location (e.g., depth or length) in centimeters (cm) along the organ. In some embodiments, this may include a 2D or 3D information or plot, if appropriate coordinates were available. The user has the option to select samples from specified levels, dates, and histological grades and then view those samples in a Manhattan-chart-like format.
  • plots are provided in which the x-axis represents the location along the reference genome, which may in some embodiments, include the human genome, with the chromosomes labeled and separated by lines, and the data points may be plotted in order by genomic coordinates.
  • the y-axis represents the variant allele frequency (VAF) (or allele frequency) of each variant for that sample.
  • VAF variant allele frequency
  • the individual data points represent variants and are color-coded according to their consequence provided by annotation. Aspects of space/time application 130 are shown and described herein with reference to Figures 2 through 25.
  • tree application 140 provides, for example, a binary tree, meaning that similar samples are grouped together in pairs, not in clusters.
  • a group may be paired with a sample or another group of pairs. How samples are paired depends on the distance measure, which may include binary, maximum, Manhattan, Euclidian or the Pearson correlation, among others.
  • the distance between pairs is represented by the horizontal distance between the pair and their parent node.
  • Each node stores a list of variants that have allele frequencies above 0.01 in all children. This list can be viewed as, for example, a table by clicking on a sample or node. More details of tree application 140 are shown and described herein with reference to Figures 26 through 40.
  • scatter plots application 150 provides, for example, a grid-like representation of samples at different levels and different time points. When selecting two sample rectangles, a scatter plot is shown of the allele frequencies of all variants in those two samples. The samples are color coded by histology. The variants are color coded by consequence of the variant. The scatter plot may be filtered to show only variants with a certain consequence, e.g., non-synonymous coding. Variants may be selected in the plot to build a table of variants. More details of scatter plots application 150 are shown and described herein with reference to Figures 41 through 43.
  • matrix plot application 160 provides, for example, a visualization of the similarity between samples by drawing a grid with the samples on the x- and y-axis. The intersection point is shown in a darker color if the samples are closely related and a lighter color if the samples are not closely related. The same distance measures as are in tree application 140 are available in matrix plot application 160.
  • the grid may be sorted by date, level, and histology. Selecting a square will show the scatter plot of variant allele frequencies for those two samples. More details of matrix plot application 160 are shown and described below with reference to Figures 44 through 47.
  • FIGS 2 through 11 show an example of a control panel 210 of GUI 135 of space/time application 130 of system 100 of Figure 1 in accordance with some embodiments.
  • 96 samples were taken for a patient that has, for example, Barrett's esophagus.
  • Barrett's esophagus is a condition in which the cells of the lower esophagus become damaged, usually from repeated exposure to stomach acid.
  • the 96 samples were taken along the patient's esophagus over a period of time using known techniques, such as endoscopic mucosal resection (EMR), among others.
  • EMR endoscopic mucosal resection
  • the 96 samples are sequenced at specific locations and the processed targeted sequencing data (i.e.
  • sequencing data 120 of system 100 is stored in sequencing data 120 of system 100.
  • Space/time application 130 of system 100 is an example of a bioinformatics tool that is designed and constructed using, for example, the D3.js JavaScript library, to allow interrogation and exploration of the TruSeq Custom Amplicon (TSCA) sequencing data in accordance with some embodiments.
  • TSCA TruSeq Custom Amplicon
  • space/time application 130 loads three data files (in JSON format): the sample metadata (date of sampling, location down the esophagus, histology report, DNA concentration, and TSCA average depth), the TSCA targets metadata (genomic location, base change and annotation, including gene and consequence when relevant) and the variant allele frequency (VAF) (or allele frequency) values for all targets in all samples.
  • VAF are calculated as number of reads used by the variant caller that support the variant allele divided by the total number of reference-base-matching reads used by the variant caller.
  • the variant caller which may be Strelka in some embodiments, may not use all reads. Contextual information may be used, such as reads in a matching normal sample, base and mapping quality, depth, among others. Individual positions with read depth ⁇ 20 were considered as zero depth and NaN values are converted to 0.
  • Space/time application 130 allows three main interactive explorations of the data: (1) a location (or level) plot of the samples (2) visualization of TSCA data for individually selected samples in a Manhattan-like plot, where the Y-axis displays the VAF values of the targets within each given sample ordered on the X-axis according to genomic coordinate; and (3) a pairwise scatterplot of VAF values between two selected samples.
  • the location (or level) plot display is dynamic and interactive.
  • Figures 2 through 11 depict an example of 96 samples and of processing a targeted genome (e.g., about 1,500 variants of a genome), it may be more typical to collect 5-10 samples on the same patient and use space/time application 130 to process the whole genome instead of a targeted genome.
  • space/time application 130 may be used to process sequencing data from multiple patients that have, for example, the same cancer, wherein the sequencing data from the multiple patients is combined in sequencing data 120.
  • a mixture of data types and patients may also be used, as long as such data types can be matched up by genomic coordinates.
  • control panel 210 is subdivided into multiple panels.
  • control panel 210 includes a histology panel 212, a dates panel 214, and a samples panel 216.
  • Dates panel 214 includes a list of all dates on which a sample was acquired from the patient. The user may select/deselect one or more dates in dates panel 214. Dates panel 214 also includes an all dates pushbutton (PB) 215 for selecting/deselecting all of the dates in dates panel 214.
  • PB all dates pushbutton
  • the border of each of the dates in dates panel 214 can be color coded to impart information. For example, any date that treatment EMR was performed is bordered in yellow, while any date that RFA was performed is bordered in red/orange.
  • Histology panel 212 includes a GO PB 218. Histology panel 212 also includes a histology key 220 that indicates a plurality of histology grades as well as an all histologies PB 221. Each of the histology grades is represented by a color-coded pushbutton for selecting/deselecting the histology grade. One or more histology grades can be selected at one time. In histology key 220, the pushbuttons that are selected are enlarged as compared with those that are not selected. Using all histologies PB 221 in histology key 220, all histology grades can be selected/deselected. An example of the histology grades in histology key 220 are shown in Table 1.
  • Control panel 210 of GUI 135 also includes a toggle button 222.
  • Toggle button 222 is used to toggle between a prevailing histology grade and a highest histology grade. For example, some samples may indicate 20%HGD, 5%cancer. Cancer has the highest histology grade, while HGD was the prevailing grade. Toggle button 222 may be used to can display the colors according to preference. In some embodiments, one may sort histology grades from least severe to most severe.
  • Samples panel 216 includes a sketch (or image) 224 of the sampled organ.
  • sketch 224 depicts the esophagus, wherein the esophagus is shown in a horizontal orientation within samples panel 216.
  • Sketch 224 is hereafter called esophagus sketch 224.
  • one end of esophagus sketch 224 is the end of the esophagus oriented toward the patient's throat, while the other end of esophagus sketch 224 is the end of the esophagus oriented toward the patient's stomach.
  • a level scale 226 is provided along the length of esophagus sketch 224.
  • level scale 226 indicate physical positions or distance (e.g., in cm) along the length of the esophagus.
  • level scale 226 is numbered 24 through 38, meaning the 24-cm position through the 38-cm position (or distance or depth) along the length of the esophagus.
  • D of level scale 226 indicates the patient's duodenum and ?? of level scale 226 indicates samples for which the level was known.
  • Each sample taken at any position along level scale 226 is indicated by a sample block 228.
  • Figure 3 shows 96 sample blocks 228 that correlate to 96 samples taken for the patient over a period of time.
  • Each of the sample blocks 228 is color coded according to histology key 220 of histology panel 212.
  • the sample blocks 228 are presented in a stacked fashion. Accordingly, the sample blocks 228 are presented in bar graph fashion in samples panel 216.
  • a string sponge icon 230 is depicted in samples panel 216 with respect to esophagus sketch 224.
  • sponge icon 230 represents two sponge devices that are swallowed by the patient for collecting two samples that are not assigned a level.
  • a cytosponge may be used as part of a clinical trial. The patient may swallow the cytosponge, then have the cytosponge pulled back through the mouth by a string. Cells stick to the cytosponge, so o it is sampling along the esophagus as it moves. The sample is sequencing of the cells that were attached to the sponge, the string is the mechanism of retrieving it.
  • Samples panel 216 also includes a sort by date PB 232 and a sort by histology PB 234.
  • sort by date PB 232 is used to sort sample blocks 228 by date
  • sort by histology PB 234 is used to sort sample blocks 228 by histology grade.
  • each location along level scale 226 is selectable. Namely, the user can select one or more locations along level ssccaallee 222266 aanndd vviieeww oonnllyy tthhoossee ssaammpplleess aatt tthhee sseelleecctteedd lleevveellss..
  • IInn tthhiiss eexxaammppllee iinn eeaacchh ooff tthhee 9966 ssaammpplleess,, aabboouutt 11,,550000 mmuuttaattiioonnss ooff tthhee ggeennoommee wweerree ttaarrggeetteedd..
  • a popup window 238 displays the consequence for that block.
  • One or more blocks in consequences key 236 can be selected at one time.
  • An all consequences PB 237 is provided for selecting all consequences in consequences key 236.
  • a popup window 240 displays the metadata for that sample block 228.
  • Figure 5 shows that level 34 along level scale 226 is selected. In this example, only those sample blocks 228 collected at level 34 are displayed. To show all sample blocks 228 collected at level 34, all dates PB 215 and all histologies PB 221 are selected. Further, sort by date PB 232 is selected, which simply orders the stack of sample blocks 228 by date (e.g., earliest date on bottom, latest date on top).
  • Figure 6 shows the same selections, but with sort by histology PB 234 selected instead of sort by date PB 232, which simply reorders the stack of sample blocks 228 at level 34 by histology grade rather than by date.
  • the all levels PB 225 is selected, which shows sample blocks 228 at all the levels at which at least one sample was collected.
  • the level numbers at which samples exist are highlighted.
  • all dates PB 215, all histologies PB 221, and sort by date PB 232 are selected.
  • the all levels PB 225 is selected as shown in Figure 7, but filtered now by date using dates panel 214. Namely, all histologies PB 221 remains selected, but all dates PB 215 is not selected. Instead one specific date in dates panel 214 is selected. In so doing, only those sample blocks 228 collected on the selected date are shown and all other sample blocks 228 are faded into the background.
  • sample blocks 228 are filtered by histology grade using histology key 220 in histology panel 212. Namely, all dates PB 215 is selected and, for example, the HGD PB in histology key 220 is selected. In so doing, only those sample blocks 228 categorized into the HGD histology grade are shown and all other sample blocks 228 are faded into the background.
  • sample blocks 228 are filtered by both date and histology grade. For example, one specific date in dates panel 214 is selected and the HGD PB in histology key 220 is selected. In so doing, only those sample blocks 228 collected on the selected date and categorized into the HGD histology grade are shown and all other sample blocks 228 are faded into the background. More than one date can be selected at time and more than one histology grade can be selected at a time, as shown in Figure 11.
  • sample blocks 228 are filtered by two dates in dates panel 214 and two histology grades (e.g., IM and HGD) in histology key 220. In so doing, only those sample blocks 228 collected on the two dates and categorized into the two histology grades are shown and all other sample blocks 228 are faded into the background.
  • two histology grades e.g., IM and HGD
  • Figures 12 through 25 show examples of presenting selected information using space/time application 130 of system 100 of Figure 1.
  • "pipe" charts, scatter plots, and tables that are based on user-selections made in histology panel 212, dates panel 214, and samples panel 216 of control panel 210 are generated using space/time application 130 of system 100 of Figure 1 and presented to the user in GUI 135.
  • level 31 of level scale 226 is selected, sponge icon 230 is selected, all dates PB 215 is selected, all consequences PB 237 is selected, three histology grades (e.g., LGD, IM, and HGD) in histology key 220 are selected.
  • level 31 of level scale 226 means the 31 -cm position (or distance or depth) of the esophagus. Having made these selections, five sample blocks 228 are presented in a stack at level 31 of level scale 226 in samples panel 216. Namely, samples 28, 29, 30, 31, and 60 are presented. These samples are hereafter called S28, S29, S30, S31, and S60, respectively.
  • any information in sequencing data 120 that corresponds to S28, S29, S30, S31, and S60 is retrieved and then presented to the user in GUI 135.
  • a set of pipe charts 250 is automatically generated that correspond to the selected samples (e.g., a pipe chart for each of S28, S29, S30, S31, and S60)
  • two pipe charts 260 are automatically generated that correspond to two samples represented by sponge icon 230.
  • one pipe chart 260 is for S73 and the other pipe chart 260 is for S74, wherein S73 and S74 are samples collected using the two sponge devices.
  • pipe charts 250 are rendered just below control panel 210 and pipe charts 260 are rendered just below pipe charts 250.
  • control panel 210, pipe charts 250, and pipe charts 260 i.e., GUI 135) can be rendered in an Internet browser. If sponge icon 230 is not selected when the user selects GO PB 218, only pipe charts 250 for the samples are generated and rendered and not pipe charts 260.
  • Figures 13 and 14 show more details of pipe charts 250. The details of pipe charts 250 are likewise applicable to pipe charts 260.
  • the pipes will be shown in multiple "250" panels. For example, if one selects level 31 and level 36, you'll see a first row of pipes for all samples of level 31, than a second row of pipes for all samples of row 36. In fact you see a black bordered line around samples from the same level, a new row is shown according to the width of the browser. For example, if you select level 25 and level 27 on a wide screen you'll see all four pipes on the same row, on a narrow screen the 2 levels will be in a different row. Samples may be ordered by date from left to right within a panel.
  • each pipe chart 250 includes a title bar 252 that shows, for example, the sample collection date, the level, and the histology grade.
  • Title bar 252 is color coded according to the histology grade of the sample. For example, if the histology grade is IM, the color of title bar 252 is yellow.
  • a scatter plot icon 254 and a delete/add icon 256 is also on title bar 252. Using scatter plot icon 254, a scatter plot of the sample (against another sample) is automatically generated, which is described in more detail in Figures 15 and 16.
  • Delete/add icon 256 toggles between a minus (-) sign and a plus (+) sign.
  • clicking on the minus (-) sign causes a button "Delete Samples" to appear between panels 210 and 250 and the pipe chart is then scheduled to disappear from panel 250 once the button "Delete Samples" is clicked while clicking on the plus (+) sign the pipe chart is no longer scheduled to disappear from panel 250.
  • Data is not deleted, just removed from the view. This may allow better visualization of the data by removing the panels of samples that look odd or have bad quality, etc. after having visually inspected them.
  • hovering the cursor over title bar 252 will display all known metadata of this sample, as shown in Figure 14.
  • This metadata can include, for example, the date and location that the sample was obtained, the histological grading, the quality of DNA used for sequencing, the average depth of sequencing and any other description.
  • Each pipe chart 250 is a plot of data points 258, which are the data points in sequencing data 120 for the sample of interest.
  • hovering the cursor over any data point 258 will display all known metadata of this data point, as shown in Figure 14, displaying the genomic location and base change of the variant, the gene name (if relevant), the consequence, the HGVS annotation (HGVS is an official nomenclature for variations/mutations in the human genome, see htt : //www . hgvs . org/mutnomen/) .
  • space/time application 130 is showing the percent of variant base, which is the variant allele frequency (VAF).
  • VAF variant allele frequency
  • other data may be used, such as relative expression or copy number, among others.
  • Allele frequency is defined as the proportion of a particular allele (base call at a given genomic coordinate against a reference sequence) among all basecalls seen at that position.
  • a program may use allele frequencies derived from raw counts of A,C,G and T at targeted positions, the only "pre-processing" done being to align the sequencing reads to the human genome reference.
  • the y-axis of pipe chart 250 indicates the allele frequency.
  • the y-axis indicates a scale of 0-1, wherein 0 means 0% (ie matching the reference sequence) and 1 means 100% (ie not matching the reference base). Accordingly, 0.1 means 10%, 0.2 means 20%, 0.3 means 30%, and so on. In one example, a data point 258 on pipe chart 250 at 0.3 means 30% mutation base and 70% reference base.
  • the x-axis of pipe chart 250 indicates the position of the targeted variant along the (human) reference genome, and is separated by vertical lines to indicate the limits of the chromosomes 1, 2, ... 22, X, Y. Accordingly, data points 258 on pipe chart 250 are ordered along the genome. Further, data points 258 are color coded according to consequences key 236 in control panel 210, wherein each color represents a different variant (e.g., mutation or polymorphism) consequence.
  • each color represents a different variant (e.g., mutation or polymorphism) consequence.
  • a scatter plot of one sample with respect to another sample can be automatically generated using space/time application 130. Namely, by selecting the scatter plot icon 254 on two of the pipe charts 250, the two samples can be plotted against one another to determine how the VAF of all variants compare in one and the other sample. For example, Figure 15 shows the scatter plot icon 254 of S29 and the scatter plot icon 254 of S30 are selected. In so doing, space/time application 130 automatically generates, for example, a scatter plot 262.
  • GUI 135 scatter plot 262 can be rendered beside or below pipe charts 260. However, if pipe charts 260 are not present, scatter plot 262 can be rendered beside or below pipe charts 250, depending on the width of the browser.
  • the allele frequency of one sample is plotted on the y-axis and the allele frequency of the other sample is plotted on the x-axis.
  • S29 is plotted on the y-axis and S30 is plotted on the x-axis.
  • the VAF in S29 and S30 show the same trend.
  • Figure 16 shows a scatter plot 262 of S60 and S28. In this example, the VAF values do not show the same trend.
  • the correlation between the samples and state what the significance ie R-squared is on the plot may be calculated.
  • Figures 12 through 16 show examples of all consequences being displayed because, for example, the all consequences PB 237 in consequences key 236 is selected.
  • the contents of GUI 135 i.e., pipe charts 250, pipe charts 260, and scatter plot 262 can be filtered in a number of ways.
  • the user may select only certain specific consequences in consequences key 236.
  • One, two, or any number of consequences can be selected at one time.
  • only those variants that correspond to the selected consequences are highlighted in pipe charts 250, pipe charts 260, and scatter plot 262, while all others are faded into the background.
  • Figure 17 shows three consequences in consequences key 236 are selected.
  • UPSTREAM; DOWNSTREAM; and WITHIN_NON_CODING_GENE, SPLICE_SITE are selected. Accordingly, the data points 258 in pipe charts 250, pipe charts 260, and scatter plot 262 that correspond to UPSTREAM; DOWNSTREAM; and WITHIN_NON_CODING_GENE, SPLICE_SITE are highlighted, while all others are faded into the background.
  • Another way to filter the contents of GUI 135 is to manually enter a gene name or chromosome location and filter accordingly.
  • a text entry field 264 is provided just above pipe charts 250, along with a SHOW PB 266.
  • the user types information into text entry field 264 and then selects the SHOW PB 266.
  • the contents of GUI 135 is filtered according to the information entered, an example of which is shown in Figure 18.
  • the user types the targeted variant information (it's location and base change) "chr7_154593906_A_C” into text entry field 264 and then clicks on SHOW PB 266.
  • the filtering settings can be cleared at any time using a CLEAR SELECTIONS PB 242, also shown in Figure 13.
  • GUI 135. it may be beneficial to change the color of the background in order for the user to better see certain colors in pipe charts 250, pipe charts 260, and scatter plot 262. Further, it may be beneficial to control the size of pipe charts 250, pipe charts 260, and scatter plot 262 when rendered in GUI 135.
  • Figure 19 shows GUI 135 with a dark background and pipe charts 250 and pipe charts 260 rendered in a small size and repositioned in the view. By contrast, the views shown in Figures 12, 15, 16, 17, and 18 are considered medium size.
  • screen rendering controls 240 are provided in the corner of control panel 210.
  • screen rendering controls 240 includes DARK/LIGHT, S, M, L pushbuttons, wherein S means small size, M means medium size, and L means large size.
  • the DARK/LIGHT pushbutton toggles. For example, when the current view has a light background (e.g., white, see Figure 18) the DARK pushbutton is displayed and can be selected to toggle the background to a dark color (e.g., brown). By contrast, when the current view has a dark background (e.g., brown, see Figure 19) the LIGHT pushbutton is displayed and can be selected to toggle the background to a light color (e.g., white).
  • Space/time application 130 includes other mechanisms for displaying information about the different variants to the user. Namely, space/time application 130 includes mechanisms for listing the information about certain selected data points 258 in pipe charts 250, pipe charts 260, and scatter plot 262. For example and referring now to Figure 20, the user may select a certain group of data points 258 in any one of pipe charts 250 or pipe charts 260. Figure 20 shows a group 251 of data points 258 selected in pipe chart 250 for S29.
  • This selection can be made, for example, by the user holding down the mouse button and dragging the cursor over a group of data points 258 and then releasing the mouse button, or by the user clicking the cursor on one corner of the desired area and then clicking the cursor on the opposite diagonal corner of the desired area.
  • Any variants that fall inside group 251 are highlighted.
  • the same variants in any of the pipe charts 250, pipe charts 260, and scatter plot 262 are also highlighted, as shown in Figure 20.
  • a table 270 is automatically generated and rendered that include details about each individual variant in the selected group.
  • table 270 includes a delete column 272, a position column 274, a consequence column 276, a gene column 278, an HGVS column 280, a deleteriousness column 282, a sample AF column 284, and a description column 286.
  • Table 270 includes an entry (i.e., a row) for each data point 258 of the selected group (e.g., group 251).
  • Delete column 272 includes a -/+ toggle button on each row, which is used to remove the entry.
  • Position column 274 indicates the position along the genome (i.e., the chromosome, the location, the reference base and the variant base).
  • Consequence column 276 indicates the variant consequence type corresponding to consequences key 236.
  • Gene column 278 indicates the standard gene name if the variant falls within a gene on the reference genome ("null" not in a gene).
  • HGVS column 280 indicates the HGVS nomenclature for reporting variants that fall within a gene (null if not in a gene).
  • Deleteriousness column 282 indicates the degree of deleteriousness of the variant.
  • Sample allele frequency AF column 284 indicates Sample AF will be populated if the table refers to one sample only. In this display, it's always multiple samples, thus that column is populated with undef.
  • Description column 286 indicates a general description of the gene product. In one example, the source of the consequence, Gene, HGVS, deleteriousness and description information is VEP.
  • the deleteriousness score is given as a value from 0 to 1, which indicates the probability of deleteriousness.
  • a score of 0 means least probability of having an effect on the protein.
  • a score of 1 means most probability of having an effect on the protein.
  • the deleteriousness score is returned by VEP.
  • the deleteriousness score can be the combination of two scores, such as a combination of Condel, which is a general method for calculating a consensus prediction score, and PolyPhen, which predicts the effect of an amino acid substitution on the structure and function of a protein.
  • the deleteriousness score is not limited to using Condel and PolyPhen only. The deleteriousness score can be generated using other combinations and types of scores.
  • each of position column 274, consequence column 276, gene column 278, HGVS column 280, deleteriousness column 282, and sample AF column 284 have a "sort" PB by which the entries in table 270 can be sorted according to the data type they contain.
  • table 270 shown in Figure 21 is sorted by position along the reference genome using the sort PB in position column 274.
  • table 270 shown in Figure 22 is sorted by consequence (from most severe to least severe) using the sort PB in consequence column 276.
  • table 270 shown in Figure 23 is sorted by gene name alphabetically using the sort PB in gene column 278.
  • table 270 shown in Figure 24 is sorted by HGVS alphabetically using the sort PB in HGVS column 280.
  • Figure 25 shows, for example, that three data points 258 are selected in pipe chart 250 for S29. Again, the variants and consequences of the three selected data points 258 are also highlighted in all of the pipe charts 250, pipe charts 260, and scatter plot 262.
  • table 270 is generated that has three entries only, i.e., those that correspond to the three selected data points 258.
  • Figures 26 through 29 show an example of a control panel 310 of GUI 145 of tree application 140 of system 100 of Figure 1.
  • a tree 312 is rendered in the main viewing area of control panel 310.
  • tree 312 consists of branches 314, nodes 316, and sample blocks 318.
  • tree application 140 can be used in combination with space/time application 130 or can be used independently of space/time application 130.
  • tree application 140 is used to process information from the same 96 samples that were described with reference to space/time application 130 in Figures 2 through 25. According, sample blocks 318 of tree 312 correspond to sample blocks 228 of GUI 135 of space/time application 130.
  • tree application 140 of system 100 is another example of a bioinformatics tool that is designed and constructed using, for example, the D3.js JavaScript library, to allow interrogation and exploration of the TSCA sequencing data.
  • space/time application 130 loads three data files (in JSON format): the sample metadata (date of sampling, location down the esophagus, histology report, DNA concentration, and TSCA average depth), the TSCA targets metadata (genomic location, base change and annotation, including gene and consequence when relevant) and the variant allele frequency (VAF) (or allele frequency) values for all targets in all samples.
  • VAF are calculated as number of reads supporting the variant allele divided by sum of reads supporting the variant base and reads supporting the reference base. Individual positions with read depth ⁇ 20 were considered as zero depth and NaN values are converted to 0.
  • Tree application 140 allows four main interactive explorations of the data: (1) a hierarchical clustering tree of the samples, wherein the tree display is dynamic and allows the user to choose between five distance metrics between samples; (2) visualization of TSCA data for individually selected samples in a Manhattan-like plot, where the Y-axis displays the VAF values of the targets within each given sample ordered on the X-axis according to genomic coordinate in which the user can select to remove samples from the tree which will be dynamically re-generated; (3) a pairwise scatterplot of VAF values between two selected samples; and (4) a table of variants corresponding to nodes 316 or blocks 318 in which the user can select to remove individual variants from the generation of the tree which will be dynamically re-generated.
  • tree application 140 is more a data analysis tool for sequencing data 120.
  • tree 312 is a binary tree.
  • the horizontal distance between a node 316 and its children nodes 316 represents the calculated distance between its two children nodes 316.
  • Each sample block 318 is color coded by histology grade as described with respect to space/time application 130. Further, the width of each sample block 228 is proportional to the number of variants it shows with an allele frequency greater than 0.01.
  • Control panel 310 also includes a dates panel 320 and a levels panel 322. Dates panel 320 and levels panel 322 are used for filtering the information displayed in phylogeny tree 312. Dates panel 320 includes a list of all dates on which a sample was acquired for the patient. The user may select/deselect one or more dates in dates panel 320. Dates panel 320 also includes an all dates PB for selecting/deselecting all of the dates in dates panel 320. In this example, dates panel 320 corresponds with dates panel 214 of space/time application 130. Further, in this example, levels panel 322 corresponds with level scale 226 of space/time application 130. The user may select/deselect one or more levels in levels panel 322. Levels panel 322 also includes an all levels PB for selecting/deselecting all of the levels in levels panel 322.
  • Control panel 310 includes yet other controls.
  • a toggle button 324 is provided to toggle between a prevailing histology grade and a highest histology grade.
  • a GENOMES PB 326 is provided to select/deselect samples that belong to a specific group (herein named "GENOMES") that can't be accessed via a specific date or a specific level. Some samples had had previous analysis done on them (they had been through whole genome sequencing) and they were spread across all different dates and all different levels, thus we wanted an easy way to highlight this specific subset of sampels].
  • a toggle button 328 is provided to toggle between a TABLE presentation and a BLOCKS presentation, which is described in more detail herein below in Figures 39 and 40.
  • Control panel 310 further includes a set of user controls 330.
  • user controls 330 incudes a help button, a key button, an options button, and a history button.
  • a dropdown window displays written help information.
  • a dropdown window displays color coded histology grades.
  • the color coded histology grades correspond to histology key 220 of space/time application 130.
  • Sample blocks 318 are color coded accordingly.
  • a dropdown window displays options for choosing the distance measure.
  • the options are Binary, Euclidian, Manhattan, Maximum (Max), and Pearson, which are different ways of calculating the distance between nodes 316 and/or sample blocks 318.
  • a description field 332 indicates the selected type of distance measure
  • tree 312 is editable, meaning that the variants and samples can be removed/added.
  • a dropdown window displays a record of changes that have made to tree 312. It may also display controls for restoring tree 312 to original, or for "undoing" certain changes.
  • tree application 140 is used to determine similarity of samples by doing a pairwise comparison of each sample using, for example, the Pearson correlation.
  • Each of the variants has a value, which is the allele frequency in each sample.
  • a correlation from sample to sample is performed to create a matrix of distances between, in this example, all 96 samples, [not useful]
  • a hierarchical clustering tree of the samples is provided (e.g., tree 312).
  • the display of tree 312 is dynamic and allows the user to remove samples and/or variants and to choose between five distance metrics between samples (using the options button of control panel 310), wherein the distance means similar variants.
  • each distance metric is calculated as described below.
  • a distance matrix is generated by calculating all pairwise distances between samples. In one example, this distance matrix is then used to generate tree 312 using hierarchical clustering using the complete methodology for linkage of the nodes.
  • the hierarchical clustering using the complete methodology for linkage of the nodes is one example of algorithms 170 of system 100 of Figure 1.
  • the neighbor joining algorithm is another example of such algorithms.
  • FIG 26 An example of tree 312 calculated according to the Pearson option is shown in Figure 26, as indicated in description field 332 of Figure 26.
  • FIG 28 An example of tree 312 calculated according to the Euclidian option is shown in Figure 28, as indicated in description field 332 of Figure 28.
  • hovering the cursor over a node 316 or a sample block 318 its metadata is displayed via a popup window. For example, hovering over a node 316 will display the distance between its children, and the number of variants with VAF > 0.01 shared by its children.
  • Figures 30 through 40 show examples of presenting selected information using tree application 140 of system 100 of Figure 1.
  • the information displayed in tree 312 can be filtered in various ways.
  • Figure 30 shows phylogeny tree 312 filtered by date.
  • the user may select one or more specific dates in dates panel 320, while the all PB is selected in levels panel 322. In so doing, only those sample blocks 318 collected on the selected date(s) are shown and all other sample blocks 318 are faded into the background.
  • Figure 31 shows phylogeny tree 312 filtered by level.
  • the user may select one or more specific levels in levels panel 322, while the all PB is selected in dates panel 320. In so doing, only those sample blocks 318 collected on the selected level(s) are shown and all other sample blocks 318 are faded into the background.
  • FIG. 32 the information displayed in tree 312 can be edited in various ways.
  • Figure 32 that the user has selected a certain node 316 in tree 312.
  • a set of pipe charts 350 is automatically generated that correspond to sample blocks 318 to the right of the selected node 316.
  • Pipe charts 350 are substantially the same as pipe charts 250 of space/time application 130.
  • a pipe chart 350 is generated for each of S28, S90, S76, S77, S63, S40, S54, s82, S52, and S92.
  • the user may select a certain group of data points in any one of pipe charts 350.
  • Figure 33 shows a group of data points selected in pipe chart 350 for S28.
  • a table 370 is automatically generated and rendered that includes details about each data point in the selected group.
  • Table 370 is substantially the same as table 270 of space/time application 130. Namely, table 370 may include the same delete column, position column, consequence column, gene column, HGVS column, deleteriousness column, and description column as table 270 of space/time application 130.
  • Table 370 includes an entry (i.e., a row) for each data point of the selected group.
  • a user may wish to further edit tree 312 by removing certain samples (e.g., sample blocks 318). Again, the user selects a certain node 316 and a set of pipe charts 350 is automatically generated that correspond to sample blocks 318 to the right of the selected node 316. Referring now to Figure 37, the user then selects the -/+ toggle button on each pipe chart 350 and clicking on PB Delete Samples. In so doing, the sample blocks 318 are removed from phylogeny tree 312.
  • certain samples e.g., sample blocks 318
  • phylogeny tree 312 is automatically recalculated and rendered without these sample blocks 318, as shown, for example, in Figure 38; i.e., see the difference between phylogeny tree 312 in Figure 35 and phylogeny tree 312 in Figure 38.
  • toggle button 328 which is used to toggle between a TABLE presentation and a BLOCKS presentation.
  • Toggle button 328 determines what happens when the user selects a sample block 318 or a node 316.
  • the default setting is BLOCKS, as shown in Figures 26 through 38. Namely, if set to BLOCKS, then selecting a node or a sample will draw sample blocks 318 for that sample or all descendant samples of that node.
  • the sample blocks 318 are arranged by level and date.
  • Figure 39 shows toggle button 328 switched to TABLE.
  • toggle button 328 When toggle button 328 is set to TABLE, then selecting a sample block 318 will generate a table 370 of all data points and ordered by consequence [the help page was outdated] [this is when the sample AF column is populated with 0 to 1 VAF values: if a *block* is selected rather than a *node*]; or selecting a node 316 will generate a table 370 of all data points that have VAF >0.01 in all children of the selected node 316.
  • table 370 is rendered just below control panel 310.
  • scatter plots such as scatter plots 262 of space/time application 130, can also be generated in GUI 145.
  • FIGs 41 through 43 show examples of presenting sequencing information using scatter plots application 150 of system lOO of Figure 1.
  • GUI 155 of scatter plots application 150 includes a panel 410 for displaying a grid-like representation of samples at different levels and different time points.
  • the y-axis of the grid in panel 410 is the levels along the esophagus and the x-axis is the dates on which the samples were collected.
  • Sample blocks 412 which substantially correspond to sample blocks 228 of space/time application 130, are displayed in panel 410 according to level and date. Sample blocks 412 are color coded according to the histology key shown in a panel 414.
  • a toggle button 416 is provided in panel 410 to toggle between a prevailing histology grade and a highest histology grade.
  • a scatter plot 462 is automatically rendered, showing the allele frequencies of all variants in those two sample blocks 412.
  • Scatter plot 462 is substantially the same as scatter plot 262 of space/time application 130. [said just above].
  • the variants are color coded by consequence.
  • Scatter plot 462 can be filtered to show only variants with a certain consequence, e.g., non-synonymous coding.
  • a table 470 is generated of all data points in the selected sample blocks 412 or in scatter plot 462.
  • table 470 is rendered beside panel 410 and scatter plot 462.
  • Table 470 is substantially the same as table 270 of space/time application 130.
  • Figure 44 shows an example of presenting sequencing information using matrix plot application 160 of system 100 of Figure 1.
  • Figure 45 shows a close-up view of a portion of matrix plot application 160 of Figure 44.
  • GUI 165 of matrix plot application 160 includes a panel 510 for displaying a visualization of the similarity between samples by drawing a grid with blocks 512 on the x- and y-axis and, at the intersection point.
  • Each block 512 represents two samples, showing a darker color if the samples are similar and a lighter color if the samples are divergent. Hovering over a block 512 displays its metadata and highlights the dates of the two samples.
  • the metadata can include, for example, the distance between the two samples, the histology grades of the two samples, the levels of the two samples, and the like.
  • a set of user controls 514 are provided in a panel 510.
  • Use controls 514 include, for example, a sort by date PB, a sort by histology PB, a sort by level PB, and a distance measure PB.
  • the grid can be sorted by date, histology, and level.
  • the distance measures in matrix plot application 160 can be the same as those in tree application 140.
  • Figures 44 and 45 show the matrix plot when using the binary distance measure. However, Figure 46 shows the matrix plot when using the Pearson distance measure.
  • Selecting a block 512 generates a scatter plot of variant allele frequencies for the two samples.
  • Selecting a block in some embodiments, can include tapping on a visual representation of the block on a mobile device screen.
  • Figure 47 shows a scatter plot 562 that can be generated. Scatter plot 562 is substantially the same as scatter plot 262 of space/time application 130.

Landscapes

  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • Theoretical Computer Science (AREA)
  • Medical Informatics (AREA)
  • Biophysics (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Data Mining & Analysis (AREA)
  • Apparatus Associated With Microorganisms And Enzymes (AREA)

Abstract

The disclosed embodiments concern methods, apparatus, systems and computer program products for presenting sequence information. In some embodiments, this includes obtaining a first sequence and a second sequence, determining a similarity between the first sequence and the second sequence, wherein the similarity is based upon distance between the first sequence and the second sequence, and displaying a block at an intersection point on a matrix plot based on the similarity between the first sequence and the second sequence.

Description

Methods, Applications, and System for Processing and Presenting Gene
Sequencing Information
By Inventors: Jennifer Becq (Cambridge, UK), Andrew Warren (Cambridge, UK) and Keira Cheetham (Cambridge, UK).
The disclosed embodiments concern methods, apparatus, systems and computer program products for presenting sequence information. In some embodiments, this includes obtaining a first sequence and a second sequence, determining a similarity between the first sequence and the second sequence, wherein the similarity is based upon distance between the first sequence and the second sequence, and displaying a block at an intersection point on a matrix plot based on the similarity between the first sequence and the second sequence.
1 Brief Description of the Drawings
Figure 1 illustrates a block diagram of an example of a system that comprises computer applications for processing and presenting gene sequencing information;
Figures 2 through 11 show an example of a control panel of the GUI of a space/time application of the system of Figure 1 ;
Figures 12 through 25 show examples of presenting selected information using the space/time application of the system of Figure 1;
Figures 26 through 29 show an example of a control panel of the GUI of a hierarchical clustering tree application of the system of Figure 1;
Figures 30 through 40 show examples of presenting selected information using the hierarchical clustering tree application of the system of Figure 1;
Figures 41 through 43 show examples of presenting sequencing information using the scatter plots application of the system of Figure 1; and
Figures 44 through 47 show examples of presenting gene sequencing information using the matrix plot application of the system of Figure 1. Description
Disclosed herein are methods, applications, and systems for processing and presenting sequencing information, wherein the gene sequencing data is presented interactively. More specifically, the disclosure includes interfaces that facilitate the interrogation and exploration of sequencing data.
In some embodiments, a system includes a space/time application that provides a map of a sampled organ (e.g., the esophagus) in which each sample is plotted according to level (e.g., depth or length) in centimeters (cm) along the organ. The user has the option to select samples from specified levels, dates, and histological grades and then view those samples in a variety of formats, such as a Manhattan-chart-like format. In some embodiments, plots are provided in which the x-axis represents the position along the genome with the chromosomes labeled and separated by lines. The y-axis may represent the variant allele frequency (VAF) (or allele frequency) of each variant for that sample. The individual data points represent variants and may be color-coded according to their consequence provided by annotation.
In some embodiments, the system includes a tree application that provides, for example, a binary tree, meaning that similar samples are grouped together in pairs, not in clusters. A group may be paired with a sample or another group of pairs. How samples are paired depends on the distance measure, which can be one of binary, maximum, Manhattan, Euclidian or the Pearson correlation, among others.
In some embodiments, the system may include a scatter plots application that provides, for example, a grid-like representation of samples at different levels and different time points. When selecting two sample rectangles, a scatter plot is shown of the allele frequencies of all variants between those two samples. The samples may be color coded by histology. The variants may be color coded by their consequence according to annotation.
In some embodiments, the system includes a matrix plot application that provides, for example, a visualization of the correlation between samples by drawing a grid with the samples on the x- and y-axis and, at the intersection point. Each square may show a darker color if the samples are closely related and a lighter color may be shown if the samples are not closely related. The distance measure can be one of binary, maximum, Manhattan, Euclidian or the Pearson correlation, among others. The grid may be sorted by date, level, and histology, among others. Figure 1 illustrates a block diagram of an example of a system 100 that, in accordance with some embodiments, comprises computer applications for processing and presenting sequencing information, wherein the sequencing data is presented interactively. System 100 includes a computing device 110. Computing device 110 can be any computing device capable of running software applications and optionally connecting to a network. Computing device 110 can be, for example, a server, a desktop computer, a laptop computer, a tablet device (e.g., Apple iPad, Samsung Galaxy Tab 2, Microsoft™ Surface Pro 3), Smart mobile phone, and the like. One or more applications 115 are installed on computing device 110. The one or more applications 115 are provided for processing and presenting sequencing information, such as the information stored in sequencing data 120.
DNA sequencing is the procedure of determining the order of nucleotides in a DNA section. In some embodiments, stored in sequencing data 120 is a collection of nucleic acid sequences, protein sequences, or other polymer sequences. In some embodiments, stored in sequencing data 120 is a call at an individual base or group of bases made against a reference sequence, or variant call. In some embodiments, the variant call may be given as an allele frequency. The allele frequency may be a variable from 0 - 1, or may be a binary 0 or 1. In some embodiments, sequencing data 120 may also include relative expressions of mRNA, which may be any continuous variable, negative or positive or log. In some embodiments, sequencing data 120 resides locally at computing device 110. In some embodiments, sequencing data 120 resides external to computing device 110 and is accessed via a network, such as a network 180. Network 180 can be, for example, any local area network (LAN) or wide area network (WAN). In yet another example, sequencing data 120 can reside both locally at computing device 110 and external to computing device 110.
In some embodiments, in system 100, applications 115 includes at least one of a space/time application 130 that has a certain graphical user interface (GUI) 135, a tree application 140 that has a certain GUI 145, a scatter plots application 150 that has a certain GUI 155, and a matrix plot application 160 that has a certain GUI 165. In another example, in system 100, applications 115 includes any combinations of space/time application 130, phylogeny tree application 140, scatter plots application 150, and matrix plot application 160.
Further, certain algorithms 170 may reside on computing device 110 for supporting space/time application 130, tree application 140, scatter plots application 150, and/or matrix plot application 160. Examples of algorithms 170 include, but are not limited to, a binary distance measure algorithm, a maximum distance measure algorithm, a Manhattan distance measure algorithm, a Euclidian distance measure algorithm, a Pearson distance measure algorithm, a hierarchical clustering algorithm, a neighbor joining algorithm, any other method for calculating correlations of allele frequencies, means of plotting phylogenetic trees, means of plotting graphs, or methods for selecting/dropping/recalculating data.
GUI 135 of space/time application 130, GUI 145 of tree application 140, GUI 155 of scatter plots application 150, and GUI 165 of matrix plot application 160 can be presented to the user in, for example, an internet browser (not shown) on computing device 110. Space/time application 130 and its GUI 135, tree application 140 and its GUI 145, scatter plots application 150 and its GUI 155, and matrix plot application 160 and its GUI 165 can be implemented using any programing code. For example, in some embodiments, the applications and GUIs may be implemented using JavaScript, such as, but not limited to, the D3.js JavaScript library, the NJS JavaScript Interpreter, and the Phylogenetic tree JavaScript (jsPhyloSVG), which is an open-source JavaScript library.
Space/time application 130, tree application 140, scatter plots application 150, and matrix plot application 160 are computer applications for processing and presenting, for example, sequencing data 120 in an interactive fashion. In some embodiments, space/time application 130 provides a map of the sampled organ (e.g., the esophagus) in which each sample is plotted according to location (e.g., depth or length) in centimeters (cm) along the organ. In some embodiments, this may include a 2D or 3D information or plot, if appropriate coordinates were available. The user has the option to select samples from specified levels, dates, and histological grades and then view those samples in a Manhattan-chart-like format. In some embodiments, plots are provided in which the x-axis represents the location along the reference genome, which may in some embodiments, include the human genome, with the chromosomes labeled and separated by lines, and the data points may be plotted in order by genomic coordinates. The y-axis represents the variant allele frequency (VAF) (or allele frequency) of each variant for that sample. The individual data points represent variants and are color-coded according to their consequence provided by annotation. Aspects of space/time application 130 are shown and described herein with reference to Figures 2 through 25.
In some embodiments, tree application 140 provides, for example, a binary tree, meaning that similar samples are grouped together in pairs, not in clusters. A group may be paired with a sample or another group of pairs. How samples are paired depends on the distance measure, which may include binary, maximum, Manhattan, Euclidian or the Pearson correlation, among others. The distance between pairs is represented by the horizontal distance between the pair and their parent node. Each node stores a list of variants that have allele frequencies above 0.01 in all children. This list can be viewed as, for example, a table by clicking on a sample or node. More details of tree application 140 are shown and described herein with reference to Figures 26 through 40.
In some embodiments, scatter plots application 150 provides, for example, a grid-like representation of samples at different levels and different time points. When selecting two sample rectangles, a scatter plot is shown of the allele frequencies of all variants in those two samples. The samples are color coded by histology. The variants are color coded by consequence of the variant. The scatter plot may be filtered to show only variants with a certain consequence, e.g., non-synonymous coding. Variants may be selected in the plot to build a table of variants. More details of scatter plots application 150 are shown and described herein with reference to Figures 41 through 43.
In some embodiments , matrix plot application 160 provides, for example, a visualization of the similarity between samples by drawing a grid with the samples on the x- and y-axis. The intersection point is shown in a darker color if the samples are closely related and a lighter color if the samples are not closely related. The same distance measures as are in tree application 140 are available in matrix plot application 160. The grid may be sorted by date, level, and histology. Selecting a square will show the scatter plot of variant allele frequencies for those two samples. More details of matrix plot application 160 are shown and described below with reference to Figures 44 through 47.
Figures 2 through 11 show an example of a control panel 210 of GUI 135 of space/time application 130 of system 100 of Figure 1 in accordance with some embodiments. In this example, 96 samples were taken for a patient that has, for example, Barrett's esophagus. Barrett's esophagus is a condition in which the cells of the lower esophagus become damaged, usually from repeated exposure to stomach acid. The 96 samples were taken along the patient's esophagus over a period of time using known techniques, such as endoscopic mucosal resection (EMR), among others. The 96 samples are sequenced at specific locations and the processed targeted sequencing data (i.e. allele frequencies of variant calls) for the 96 samples is stored in sequencing data 120 of system 100. Space/time application 130 of system 100 is an example of a bioinformatics tool that is designed and constructed using, for example, the D3.js JavaScript library, to allow interrogation and exploration of the TruSeq Custom Amplicon (TSCA) sequencing data in accordance with some embodiments. In some embodiments, space/time application 130 loads three data files (in JSON format): the sample metadata (date of sampling, location down the esophagus, histology report, DNA concentration, and TSCA average depth), the TSCA targets metadata (genomic location, base change and annotation, including gene and consequence when relevant) and the variant allele frequency (VAF) (or allele frequency) values for all targets in all samples. VAF are calculated as number of reads used by the variant caller that support the variant allele divided by the total number of reference-base-matching reads used by the variant caller. The variant caller, which may be Strelka in some embodiments, may not use all reads. Contextual information may be used, such as reads in a matching normal sample, base and mapping quality, depth, among others. Individual positions with read depth <20 were considered as zero depth and NaN values are converted to 0.
Space/time application 130 allows three main interactive explorations of the data: (1) a location (or level) plot of the samples (2) visualization of TSCA data for individually selected samples in a Manhattan-like plot, where the Y-axis displays the VAF values of the targets within each given sample ordered on the X-axis according to genomic coordinate; and (3) a pairwise scatterplot of VAF values between two selected samples. The location (or level) plot display is dynamic and interactive.
While Figures 2 through 11 depict an example of 96 samples and of processing a targeted genome (e.g., about 1,500 variants of a genome), it may be more typical to collect 5-10 samples on the same patient and use space/time application 130 to process the whole genome instead of a targeted genome. Further, in some embodiments, space/time application 130 may be used to process sequencing data from multiple patients that have, for example, the same cancer, wherein the sequencing data from the multiple patients is combined in sequencing data 120. A mixture of data types and patients may also be used, as long as such data types can be matched up by genomic coordinates.
Referring now to Figure 2, control panel 210 is subdivided into multiple panels. For example, control panel 210 includes a histology panel 212, a dates panel 214, and a samples panel 216. Dates panel 214 includes a list of all dates on which a sample was acquired from the patient. The user may select/deselect one or more dates in dates panel 214. Dates panel 214 also includes an all dates pushbutton (PB) 215 for selecting/deselecting all of the dates in dates panel 214. The border of each of the dates in dates panel 214 can be color coded to impart information. For example, any date that treatment EMR was performed is bordered in yellow, while any date that RFA was performed is bordered in red/orange.
Histology panel 212 includes a GO PB 218. Histology panel 212 also includes a histology key 220 that indicates a plurality of histology grades as well as an all histologies PB 221. Each of the histology grades is represented by a color-coded pushbutton for selecting/deselecting the histology grade. One or more histology grades can be selected at one time. In histology key 220, the pushbuttons that are selected are enlarged as compared with those that are not selected. Using all histologies PB 221 in histology key 220, all histology grades can be selected/deselected. An example of the histology grades in histology key 220 are shown in Table 1.
Figure imgf000009_0001
Control panel 210 of GUI 135 also includes a toggle button 222. Toggle button 222 is used to toggle between a prevailing histology grade and a highest histology grade. For example, some samples may indicate 20%HGD, 5%cancer. Cancer has the highest histology grade, while HGD was the prevailing grade. Toggle button 222 may be used to can display the colors according to preference. In some embodiments, one may sort histology grades from least severe to most severe.
Referring now to Figure 3, more details of samples panel 216 are shown and described. Samples panel 216 includes a sketch (or image) 224 of the sampled organ. In some embodiments, sketch 224 depicts the esophagus, wherein the esophagus is shown in a horizontal orientation within samples panel 216. Sketch 224 is hereafter called esophagus sketch 224. In this example, one end of esophagus sketch 224 is the end of the esophagus oriented toward the patient's throat, while the other end of esophagus sketch 224 is the end of the esophagus oriented toward the patient's stomach. A level scale 226 is provided along the length of esophagus sketch 224. The numbers along level scale 226 indicate physical positions or distance (e.g., in cm) along the length of the esophagus. For example, level scale 226 is numbered 24 through 38, meaning the 24-cm position through the 38-cm position (or distance or depth) along the length of the esophagus. "D" of level scale 226 indicates the patient's duodenum and ?? of level scale 226 indicates samples for which the level was known.
Each sample taken at any position along level scale 226 is indicated by a sample block 228. For example, Figure 3 shows 96 sample blocks 228 that correlate to 96 samples taken for the patient over a period of time. Each of the sample blocks 228 is color coded according to histology key 220 of histology panel 212. When multiple samples are taken at the same location along level scale 226, the sample blocks 228 are presented in a stacked fashion. Accordingly, the sample blocks 228 are presented in bar graph fashion in samples panel 216.
A string sponge icon 230 is depicted in samples panel 216 with respect to esophagus sketch 224. In this example, sponge icon 230 represents two sponge devices that are swallowed by the patient for collecting two samples that are not assigned a level. A cytosponge may be used as part of a clinical trial. The patient may swallow the cytosponge, then have the cytosponge pulled back through the mouth by a string. Cells stick to the cytosponge, so o it is sampling along the esophagus as it moves. The sample is sequencing of the cells that were attached to the sponge, the string is the mechanism of retrieving it. Samples panel 216 also includes a sort by date PB 232 and a sort by histology PB 234. With respect to presenting sample blocks 228 to the user in samples panel 216, sort by date PB 232 is used to sort sample blocks 228 by date, while sort by histology PB 234 is used to sort sample blocks 228 by histology grade. Further, each location along level scale 226 is selectable. Namely, the user can select one or more locations along level ssccaallee 222266 aanndd vviieeww oonnllyy tthhoossee ssaammpplleess aatt tthhee sseelleecctteedd lleevveellss.. AAddddiittiioonnaallllyy,, aann aallll lleevveellss PPBB 222255 iiss pprroovviiddeedd bbyy wwhhiicchh tthhee uusseerr ccaann qquuiicckkllyy sseelleecctt//ddeesseelleecctt aallll llooccaattiioonnss aalloonngg lleevveell ssccaallee 222266..
AAddddiittiioonnaallllyy,, ssaammpplleess ppaanneell 221166 iinncclluuddeess aa ccoonnsseeqquueenncceess kkeeyy 223366,, wwhhiicchh iiss aa kkeeyy ooff tthhee ddiiffffeerreenntt vvaarriiaanntt ((ee..gg..,, mmuuttaattiioonn oorr ppoollyymmoorrpphhiissmm)) ccoonnsseeqquueenncceess.. IInn tthhiiss eexxaammppllee,, iinn eeaacchh ooff tthhee 9966 ssaammpplleess,, aabboouutt 11,,550000 mmuuttaattiioonnss ooff tthhee ggeennoommee wweerree ttaarrggeetteedd.. CCoonnsseeqquueenncceess kkeeyy 223366 iinncclluuddeess,, ffoorr eexxaammppllee,, aa lliinnee ooff ccoolloorr ccooddeedd bblloocckkss,, wwhheerreeiinn eeaacchh bblloocckk rreepprreesseennttss aa ddiiffffeerreenntt vvaarriiaanntt ccoonnsseeqquueennccee.. IInn ssoommee eemmbbooddiimmeennttss,, VVaarriiaattiioonn EEffffeecctt PPrreeddiiccttoorr ((VVEEPP)) pprroovviiddeess aa lliisstt ooff ppoossiittiioonnss oonn tthhee ggeennoommee aanndd tthheeiirr ccoorrrreessppoonnddiinngg aannnnoottaattiioonnss ((ii..ee..,, lliisstt ooff ccoonnsseeqquueenncceess)).. AAnnootthheerr mmeetthhoodd ooff pprreeddiiccttiinngg tthhee eeffffeeccttss ooff vvaarriiaannttss iiss EEnnsseemmbbll,, ddeevveellooppeedd bbyy tthhee EEnnsseemmbbll pprroojjeecctt.. MMoorree ddeettaaiillss ooff wwhhiicchh aarree ffoouunndd aatt
Figure imgf000011_0001
IInn some embodiments, the consequences are, in order from left to right, as follows.
STOP_GAINED;
STOP_LOST;
ESSENTIAL_SPLICE_SITE;
NON_SYNONYMOUS_CODING, SPLICE_SITE;
SYNONYMOUS_CODING ;
SYNONYMOUS_CODING, SPLICE_SITE;
SPLICE_SITE, INTRONIC;
REGULATORY_REGION ;
SYNONYMOUS_CODING ;
5PRIME_UTR, SPLICE_SITE;
5PRIME_UTR;
3PRIME_UTR;
UPSTREAM;
DOWNSTREAM;
WITHIN_NON_CODING_GENE, SPLICE_SITE;
WITHIN_NON_CODING_GENE, SPLICE_SITE, INTRONIC;
WITHIN_NON_CODING_GENE, INTRONIC;
WITHIN_NON_CODING_GENE;
STOP_GAINED,NMD_TRANSCRIPT;
ESSENTIAL_SPLICE_SITE,NMD_TPANSCRIPT;
NMD_TRANSCRIPT,NON_SYNONYMOUS_CODING;
NMD_TRANSCRIPT, SYNONYMOUS_CODING, SPLICE_SITE;
3PRIME_UTR,NMD_TRANSCRIPT, SPLICE_SITE;
NMD_TRANSCRIPT, SPLICE_SITE, INTRONIC;
3PRIME_UTR,NMD_TRANSCRIPT;
5PRIME_UTR,NMD_TRANSCRIPT;
NMD_TRANSCRIPT, INTRONIC;
INTRONIC;
INTERGENIC;
By hovering the cursor over any block of consequences key 236, a popup window 238 displays the consequence for that block. One or more blocks in consequences key 236 can be selected at one time. An all consequences PB 237 is provided for selecting all consequences in consequences key 236. Further and referring now to Figure 4, by hovering the cursor over any sample block 228, a popup window 240 displays the metadata for that sample block 228.
Referring now to Figures 5 through 11, various examples are shown of using the different sorting and filtering features of control panel 210 of GUI 135 of space/time application 130 of system 100. For example, Figure 5 shows that level 34 along level scale 226 is selected. In this example, only those sample blocks 228 collected at level 34 are displayed. To show all sample blocks 228 collected at level 34, all dates PB 215 and all histologies PB 221 are selected. Further, sort by date PB 232 is selected, which simply orders the stack of sample blocks 228 by date (e.g., earliest date on bottom, latest date on top). By contrast, Figure 6 shows the same selections, but with sort by histology PB 234 selected instead of sort by date PB 232, which simply reorders the stack of sample blocks 228 at level 34 by histology grade rather than by date.
Referring now to Figure 7, the all levels PB 225 is selected, which shows sample blocks 228 at all the levels at which at least one sample was collected. The level numbers at which samples exist are highlighted. Again, all dates PB 215, all histologies PB 221, and sort by date PB 232 are selected.
Referring now to Figure 8, the all levels PB 225 is selected as shown in Figure 7, but filtered now by date using dates panel 214. Namely, all histologies PB 221 remains selected, but all dates PB 215 is not selected. Instead one specific date in dates panel 214 is selected. In so doing, only those sample blocks 228 collected on the selected date are shown and all other sample blocks 228 are faded into the background.
Referring now to Figure 9 and with the all levels PB 225 still selected, instead of filtering by date as shown in Figure 8, sample blocks 228 are filtered by histology grade using histology key 220 in histology panel 212. Namely, all dates PB 215 is selected and, for example, the HGD PB in histology key 220 is selected. In so doing, only those sample blocks 228 categorized into the HGD histology grade are shown and all other sample blocks 228 are faded into the background.
Referring now to Figure 10 and with the all levels PB 225 still selected, sample blocks 228 are filtered by both date and histology grade. For example, one specific date in dates panel 214 is selected and the HGD PB in histology key 220 is selected. In so doing, only those sample blocks 228 collected on the selected date and categorized into the HGD histology grade are shown and all other sample blocks 228 are faded into the background. More than one date can be selected at time and more than one histology grade can be selected at a time, as shown in Figure 11. In this example and with the all levels PB 225 still selected, sample blocks 228 are filtered by two dates in dates panel 214 and two histology grades (e.g., IM and HGD) in histology key 220. In so doing, only those sample blocks 228 collected on the two dates and categorized into the two histology grades are shown and all other sample blocks 228 are faded into the background.
Once all selections are made in histology panel 212, dates panel 214, and samples panel 216 of control panel 210, the user selects the GO PB 218. By selecting the GO PB 218, any information in sequencing data 120 that corresponds to the selections in control panel 210 is retrieved and processed, and then presented to the user in GUI 135. For example, Figures 12 through 25 show examples of presenting selected information using space/time application 130 of system 100 of Figure 1. For example, "pipe" charts, scatter plots, and tables that are based on user-selections made in histology panel 212, dates panel 214, and samples panel 216 of control panel 210 are generated using space/time application 130 of system 100 of Figure 1 and presented to the user in GUI 135.
In some embodiments and referring now to Figure 12, level 31 of level scale 226 is selected, sponge icon 230 is selected, all dates PB 215 is selected, all consequences PB 237 is selected, three histology grades (e.g., LGD, IM, and HGD) in histology key 220 are selected. In this example, level 31 of level scale 226 means the 31 -cm position (or distance or depth) of the esophagus. Having made these selections, five sample blocks 228 are presented in a stack at level 31 of level scale 226 in samples panel 216. Namely, samples 28, 29, 30, 31, and 60 are presented. These samples are hereafter called S28, S29, S30, S31, and S60, respectively.
If the user wishes to explore the sequencing data that corresponds to S28, S29, S30, S31, and S60, the user selects the GO PB 218. By selecting the GO PB 218, any information in sequencing data 120 that corresponds to S28, S29, S30, S31, and S60 is retrieved and then presented to the user in GUI 135. For example, once GO PB 218 is selected, (1) a set of pipe charts 250 is automatically generated that correspond to the selected samples (e.g., a pipe chart for each of S28, S29, S30, S31, and S60), and (2) two pipe charts 260 are automatically generated that correspond to two samples represented by sponge icon 230. In one example, one pipe chart 260 is for S73 and the other pipe chart 260 is for S74, wherein S73 and S74 are samples collected using the two sponge devices. In some embodiments, pipe charts 250 are rendered just below control panel 210 and pipe charts 260 are rendered just below pipe charts 250. Again, control panel 210, pipe charts 250, and pipe charts 260 (i.e., GUI 135) can be rendered in an Internet browser. If sponge icon 230 is not selected when the user selects GO PB 218, only pipe charts 250 for the samples are generated and rendered and not pipe charts 260. Figures 13 and 14 show more details of pipe charts 250. The details of pipe charts 250 are likewise applicable to pipe charts 260. In some embodiments, if multiple levels are selected from the panel 210, the pipes will be shown in multiple "250" panels. For example, if one selects level 31 and level 36, you'll see a first row of pipes for all samples of level 31, than a second row of pipes for all samples of row 36. In fact you see a black bordered line around samples from the same level, a new row is shown according to the width of the browser. For example, if you select level 25 and level 27 on a wide screen you'll see all four pipes on the same row, on a narrow screen the 2 levels will be in a different row. Samples may be ordered by date from left to right within a panel.
Referring now to Figures 13 and 14, each pipe chart 250 includes a title bar 252 that shows, for example, the sample collection date, the level, and the histology grade. Title bar 252 is color coded according to the histology grade of the sample. For example, if the histology grade is IM, the color of title bar 252 is yellow. A scatter plot icon 254 and a delete/add icon 256 is also on title bar 252. Using scatter plot icon 254, a scatter plot of the sample (against another sample) is automatically generated, which is described in more detail in Figures 15 and 16. Delete/add icon 256 toggles between a minus (-) sign and a plus (+) sign. For example, clicking on the minus (-) sign causes a button "Delete Samples" to appear between panels 210 and 250 and the pipe chart is then scheduled to disappear from panel 250 once the button "Delete Samples" is clicked while clicking on the plus (+) sign the pipe chart is no longer scheduled to disappear from panel 250. Data is not deleted, just removed from the view. This may allow better visualization of the data by removing the panels of samples that look odd or have bad quality, etc. after having visually inspected them.
In some embodiments, hovering the cursor over title bar 252 will display all known metadata of this sample, as shown in Figure 14. This metadata can include, for example, the date and location that the sample was obtained, the histological grading, the quality of DNA used for sequencing, the average depth of sequencing and any other description. Each pipe chart 250 is a plot of data points 258, which are the data points in sequencing data 120 for the sample of interest. In some embodiments, hovering the cursor over any data point 258 will display all known metadata of this data point, as shown in Figure 14, displaying the genomic location and base change of the variant, the gene name (if relevant), the consequence, the HGVS annotation (HGVS is an official nomenclature for variations/mutations in the human genome, see htt : //www . hgvs . org/mutnomen/) .
In some embodiments, for each variant in pipe chart 250, space/time application 130 is showing the percent of variant base, which is the variant allele frequency (VAF). In some embodiments, other data may be used, such as relative expression or copy number, among others. Allele frequency is defined as the proportion of a particular allele (base call at a given genomic coordinate against a reference sequence) among all basecalls seen at that position. In some embodiments, a program may use allele frequencies derived from raw counts of A,C,G and T at targeted positions, the only "pre-processing" done being to align the sequencing reads to the human genome reference. The y-axis of pipe chart 250 indicates the allele frequency. Namely, the y-axis indicates a scale of 0-1, wherein 0 means 0% (ie matching the reference sequence) and 1 means 100% (ie not matching the reference base). Accordingly, 0.1 means 10%, 0.2 means 20%, 0.3 means 30%, and so on. In one example, a data point 258 on pipe chart 250 at 0.3 means 30% mutation base and 70% reference base.
The x-axis of pipe chart 250 indicates the position of the targeted variant along the (human) reference genome, and is separated by vertical lines to indicate the limits of the chromosomes 1, 2, ... 22, X, Y. Accordingly, data points 258 on pipe chart 250 are ordered along the genome. Further, data points 258 are color coded according to consequences key 236 in control panel 210, wherein each color represents a different variant (e.g., mutation or polymorphism) consequence.
A scatter plot of one sample with respect to another sample can be automatically generated using space/time application 130. Namely, by selecting the scatter plot icon 254 on two of the pipe charts 250, the two samples can be plotted against one another to determine how the VAF of all variants compare in one and the other sample. For example, Figure 15 shows the scatter plot icon 254 of S29 and the scatter plot icon 254 of S30 are selected. In so doing, space/time application 130 automatically generates, for example, a scatter plot 262. In GUI 135, scatter plot 262 can be rendered beside or below pipe charts 260. However, if pipe charts 260 are not present, scatter plot 262 can be rendered beside or below pipe charts 250, depending on the width of the browser.
In scatter plot 262, the allele frequency of one sample is plotted on the y-axis and the allele frequency of the other sample is plotted on the x-axis. In the example shown in Figure 15, S29 is plotted on the y-axis and S30 is plotted on the x-axis. In this example, the VAF in S29 and S30 show the same trend. In another example, Figure 16 shows a scatter plot 262 of S60 and S28. In this example, the VAF values do not show the same trend. In some embodiments, the correlation between the samples and state what the significance ie R-squared is on the plot may be calculated.
Figures 12 through 16 show examples of all consequences being displayed because, for example, the all consequences PB 237 in consequences key 236 is selected. However, the contents of GUI 135 (i.e., pipe charts 250, pipe charts 260, and scatter plot 262) can be filtered in a number of ways. For example, the user may select only certain specific consequences in consequences key 236. One, two, or any number of consequences can be selected at one time. Then, only those variants that correspond to the selected consequences are highlighted in pipe charts 250, pipe charts 260, and scatter plot 262, while all others are faded into the background. For example, Figure 17 shows three consequences in consequences key 236 are selected. Namely, UPSTREAM; DOWNSTREAM; and WITHIN_NON_CODING_GENE, SPLICE_SITE are selected. Accordingly, the data points 258 in pipe charts 250, pipe charts 260, and scatter plot 262 that correspond to UPSTREAM; DOWNSTREAM; and WITHIN_NON_CODING_GENE, SPLICE_SITE are highlighted, while all others are faded into the background.
In some embodiments, using consequences key 236 is one way to filter the contents of GUI 135. However, another way to filter the contents of GUI 135 is to manually enter a gene name or chromosome location and filter accordingly. Referring again to Figure 13, a text entry field 264 is provided just above pipe charts 250, along with a SHOW PB 266. The user types information into text entry field 264 and then selects the SHOW PB 266. In doing so, the contents of GUI 135 is filtered according to the information entered, an example of which is shown in Figure 18. In this example, the user types the targeted variant information (it's location and base change) "chr7_154593906_A_C" into text entry field 264 and then clicks on SHOW PB 266. In doing so, the data points 258 in pipe charts 250, pipe charts 260, and scatter plot 262 that correspond to the "chr7_154593906_A_C" variant are highlighted, while all others are faded into the background. Regardless of the filtering selections and methods, the filtering settings can be cleared at any time using a CLEAR SELECTIONS PB 242, also shown in Figure 13.
Other screen rendering controls are provided in GUI 135. For example, it may be beneficial to change the color of the background in order for the user to better see certain colors in pipe charts 250, pipe charts 260, and scatter plot 262. Further, it may be beneficial to control the size of pipe charts 250, pipe charts 260, and scatter plot 262 when rendered in GUI 135. For example, Figure 19 shows GUI 135 with a dark background and pipe charts 250 and pipe charts 260 rendered in a small size and repositioned in the view. By contrast, the views shown in Figures 12, 15, 16, 17, and 18 are considered medium size.
Referring now again to Figure 3, more details of the screen rendering controls are shown. For example, screen rendering controls 240 are provided in the corner of control panel 210. For example, screen rendering controls 240 includes DARK/LIGHT, S, M, L pushbuttons, wherein S means small size, M means medium size, and L means large size. The DARK/LIGHT pushbutton toggles. For example, when the current view has a light background (e.g., white, see Figure 18) the DARK pushbutton is displayed and can be selected to toggle the background to a dark color (e.g., brown). By contrast, when the current view has a dark background (e.g., brown, see Figure 19) the LIGHT pushbutton is displayed and can be selected to toggle the background to a light color (e.g., white).
Space/time application 130 includes other mechanisms for displaying information about the different variants to the user. Namely, space/time application 130 includes mechanisms for listing the information about certain selected data points 258 in pipe charts 250, pipe charts 260, and scatter plot 262. For example and referring now to Figure 20, the user may select a certain group of data points 258 in any one of pipe charts 250 or pipe charts 260. Figure 20 shows a group 251 of data points 258 selected in pipe chart 250 for S29.
This selection can be made, for example, by the user holding down the mouse button and dragging the cursor over a group of data points 258 and then releasing the mouse button, or by the user clicking the cursor on one corner of the desired area and then clicking the cursor on the opposite diagonal corner of the desired area. Any variants that fall inside group 251 are highlighted. Also, the same variants in any of the pipe charts 250, pipe charts 260, and scatter plot 262 are also highlighted, as shown in Figure 20. Further, a table 270 is automatically generated and rendered that include details about each individual variant in the selected group.
Referring now to Figure 21, more details of table 270 are presented. Figure 21 shows only a portion of table 270, i.e., not all entries are shown. In this example, table 270 includes a delete column 272, a position column 274, a consequence column 276, a gene column 278, an HGVS column 280, a deleteriousness column 282, a sample AF column 284, and a description column 286. Table 270 includes an entry (i.e., a row) for each data point 258 of the selected group (e.g., group 251). Delete column 272 includes a -/+ toggle button on each row, which is used to remove the entry. Position column 274 indicates the position along the genome (i.e., the chromosome, the location, the reference base and the variant base). Consequence column 276 indicates the variant consequence type corresponding to consequences key 236. Gene column 278 indicates the standard gene name if the variant falls within a gene on the reference genome ("null" not in a gene). HGVS column 280 indicates the HGVS nomenclature for reporting variants that fall within a gene (null if not in a gene). Deleteriousness column 282 indicates the degree of deleteriousness of the variant. Sample allele frequency AF column 284 indicates Sample AF will be populated if the table refers to one sample only. In this display, it's always multiple samples, thus that column is populated with undef. Description column 286 indicates a general description of the gene product. In one example, the source of the consequence, Gene, HGVS, deleteriousness and description information is VEP.
With respect to deleteriousness column 282, the deleteriousness score is given as a value from 0 to 1, which indicates the probability of deleteriousness. A score of 0 means least probability of having an effect on the protein. A score of 1 means most probability of having an effect on the protein. In one example, the deleteriousness score is returned by VEP. For example, the deleteriousness score can be the combination of two scores, such as a combination of Condel, which is a general method for calculating a consensus prediction score, and PolyPhen, which predicts the effect of an amino acid substitution on the structure and function of a protein. However, the deleteriousness score is not limited to using Condel and PolyPhen only. The deleteriousness score can be generated using other combinations and types of scores.
In some embodiments, each of position column 274, consequence column 276, gene column 278, HGVS column 280, deleteriousness column 282, and sample AF column 284 have a "sort" PB by which the entries in table 270 can be sorted according to the data type they contain. In one example, table 270 shown in Figure 21 is sorted by position along the reference genome using the sort PB in position column 274. In another example, table 270 shown in Figure 22 is sorted by consequence (from most severe to least severe) using the sort PB in consequence column 276. In yet another example, table 270 shown in Figure 23 is sorted by gene name alphabetically using the sort PB in gene column 278. In still another example, table 270 shown in Figure 24 is sorted by HGVS alphabetically using the sort PB in HGVS column 280.
Rather than selecting a group of data points 258 (i.e., variants and consequences) as shown in Figure 20, the user may select individual data points 258 and the corresponding information is automatically generated and displayed in table 270, an example of which is shown in Figure 25. Namely, Figure 25 shows, for example, that three data points 258 are selected in pipe chart 250 for S29. Again, the variants and consequences of the three selected data points 258 are also highlighted in all of the pipe charts 250, pipe charts 260, and scatter plot 262. In this example, table 270 is generated that has three entries only, i.e., those that correspond to the three selected data points 258.
Figures 26 through 29 show an example of a control panel 310 of GUI 145 of tree application 140 of system 100 of Figure 1. A tree 312 is rendered in the main viewing area of control panel 310. tree 312 consists of branches 314, nodes 316, and sample blocks 318. tree application 140 can be used in combination with space/time application 130 or can be used independently of space/time application 130. In the example shown in Figures 26 through 29 and subsequently, tree application 140 is used to process information from the same 96 samples that were described with reference to space/time application 130 in Figures 2 through 25. According, sample blocks 318 of tree 312 correspond to sample blocks 228 of GUI 135 of space/time application 130. tree application 140 of system 100 is another example of a bioinformatics tool that is designed and constructed using, for example, the D3.js JavaScript library, to allow interrogation and exploration of the TSCA sequencing data. In one example, space/time application 130 loads three data files (in JSON format): the sample metadata (date of sampling, location down the esophagus, histology report, DNA concentration, and TSCA average depth), the TSCA targets metadata (genomic location, base change and annotation, including gene and consequence when relevant) and the variant allele frequency (VAF) (or allele frequency) values for all targets in all samples. VAF are calculated as number of reads supporting the variant allele divided by sum of reads supporting the variant base and reads supporting the reference base. Individual positions with read depth <20 were considered as zero depth and NaN values are converted to 0.
Tree application 140 allows four main interactive explorations of the data: (1) a hierarchical clustering tree of the samples, wherein the tree display is dynamic and allows the user to choose between five distance metrics between samples; (2) visualization of TSCA data for individually selected samples in a Manhattan-like plot, where the Y-axis displays the VAF values of the targets within each given sample ordered on the X-axis according to genomic coordinate in which the user can select to remove samples from the tree which will be dynamically re-generated; (3) a pairwise scatterplot of VAF values between two selected samples; and (4) a table of variants corresponding to nodes 316 or blocks 318 in which the user can select to remove individual variants from the generation of the tree which will be dynamically re-generated.
Where space/time application 130 is more of an exploratory tool for sequencing data 120, tree application 140 is more a data analysis tool for sequencing data 120. tree 312 is a binary tree. The horizontal distance between a node 316 and its children nodes 316 represents the calculated distance between its two children nodes 316. Each sample block 318 is color coded by histology grade as described with respect to space/time application 130. Further, the width of each sample block 228 is proportional to the number of variants it shows with an allele frequency greater than 0.01.
Control panel 310 also includes a dates panel 320 and a levels panel 322. Dates panel 320 and levels panel 322 are used for filtering the information displayed in phylogeny tree 312. Dates panel 320 includes a list of all dates on which a sample was acquired for the patient. The user may select/deselect one or more dates in dates panel 320. Dates panel 320 also includes an all dates PB for selecting/deselecting all of the dates in dates panel 320. In this example, dates panel 320 corresponds with dates panel 214 of space/time application 130. Further, in this example, levels panel 322 corresponds with level scale 226 of space/time application 130. The user may select/deselect one or more levels in levels panel 322. Levels panel 322 also includes an all levels PB for selecting/deselecting all of the levels in levels panel 322.
In some embodiments, Control panel 310 includes yet other controls. For example, a toggle button 324 is provided to toggle between a prevailing histology grade and a highest histology grade. Further, a GENOMES PB 326 is provided to select/deselect samples that belong to a specific group (herein named "GENOMES") that can't be accessed via a specific date or a specific level. Some samples had had previous analysis done on them (they had been through whole genome sequencing) and they were spread across all different dates and all different levels, thus we wanted an easy way to highlight this specific subset of sampels]. Further, a toggle button 328 is provided to toggle between a TABLE presentation and a BLOCKS presentation, which is described in more detail herein below in Figures 39 and 40.
Control panel 310 further includes a set of user controls 330. For example, user controls 330 incudes a help button, a key button, an options button, and a history button. Referring now to Figure 27, more details of user controls 330 is presented. By clicking on the help button, a dropdown window displays written help information. By clicking on the key button, a dropdown window displays color coded histology grades. In this example, the color coded histology grades correspond to histology key 220 of space/time application 130. Sample blocks 318 are color coded accordingly. By clicking on the options button, a dropdown window displays options for choosing the distance measure. In one example, the options are Binary, Euclidian, Manhattan, Maximum (Max), and Pearson, which are different ways of calculating the distance between nodes 316 and/or sample blocks 318.
Further, a description field 332 indicates the selected type of distance measure, tree 312 is editable, meaning that the variants and samples can be removed/added. By clicking on the history button, a dropdown window displays a record of changes that have made to tree 312. It may also display controls for restoring tree 312 to original, or for "undoing" certain changes. tree application 140 is used to determine similarity of samples by doing a pairwise comparison of each sample using, for example, the Pearson correlation. Each of the variants has a value, which is the allele frequency in each sample. A correlation from sample to sample is performed to create a matrix of distances between, in this example, all 96 samples, [not useful]
Using tree application 140, a hierarchical clustering tree of the samples is provided (e.g., tree 312). The display of tree 312 is dynamic and allows the user to remove samples and/or variants and to choose between five distance metrics between samples (using the options button of control panel 310), wherein the distance means similar variants. The more the similarity of the variants between samples the smaller the distance. The less the similarity of the variants between samples the greater the distance. Namely, the closer the nodes 316, the more similar the samples.
For example, each distance metric is calculated as described below. Each metric provided calculates the distance D(A,B) between samples A and B using the variant allele frequencies VAFi values of all mutations i={l..n} in both samples.
Binary: D(A,B) = 1 - (# of mutations with VAF > 0.01 in both A and B + # of mutations with VAF < 0.01 in both A and B) / n
Euclidian: D(A, B) =
Figure imgf000021_0001
Manhattan: D(A, B) =∑f=1\ VAF,A ■B
,n
Max: D(A, B) = max [\ VAFt A - VAFt B \] i = l Pearson:/) 04, β) = 1 -
Figure imgf000022_0001
A distance matrix is generated by calculating all pairwise distances between samples. In one example, this distance matrix is then used to generate tree 312 using hierarchical clustering using the complete methodology for linkage of the nodes. The hierarchical clustering using the complete methodology for linkage of the nodes is one example of algorithms 170 of system 100 of Figure 1. The neighbor joining algorithm is another example of such algorithms.
An example of tree 312 calculated according to the Pearson option is shown in Figure 26, as indicated in description field 332 of Figure 26. An example of tree 312 calculated according to the Euclidian option is shown in Figure 28, as indicated in description field 332 of Figure 28.
Referring now to Figure 29, by hovering the cursor over a node 316 or a sample block 318, its metadata is displayed via a popup window. For example, hovering over a node 316 will display the distance between its children, and the number of variants with VAF > 0.01 shared by its children.
Figures 30 through 40 show examples of presenting selected information using tree application 140 of system 100 of Figure 1. Referring now to Figures 30 and 31, the information displayed in tree 312 can be filtered in various ways. In one example, Figure 30 shows phylogeny tree 312 filtered by date. In this example, the user may select one or more specific dates in dates panel 320, while the all PB is selected in levels panel 322. In so doing, only those sample blocks 318 collected on the selected date(s) are shown and all other sample blocks 318 are faded into the background. In another example, Figure 31 shows phylogeny tree 312 filtered by level. In this example, the user may select one or more specific levels in levels panel 322, while the all PB is selected in dates panel 320. In so doing, only those sample blocks 318 collected on the selected level(s) are shown and all other sample blocks 318 are faded into the background.
Referring now to Figures 32 through 38, the information displayed in tree 312 can be edited in various ways. In one example, Figure 32 that the user has selected a certain node 316 in tree 312. In so doing, a set of pipe charts 350 is automatically generated that correspond to sample blocks 318 to the right of the selected node 316. Pipe charts 350 are substantially the same as pipe charts 250 of space/time application 130. In this example, a pipe chart 350 is generated for each of S28, S90, S76, S77, S63, S40, S54, s82, S52, and S92. Referring now to Figure 33, the user may select a certain group of data points in any one of pipe charts 350. Figure 33 shows a group of data points selected in pipe chart 350 for S28. Any vairants that fall inside this selected area are highlighted. Also, the same variants in any of the other pipe charts 350 are also highlighted, as shown in Figure 33. Further, a table 370 is automatically generated and rendered that includes details about each data point in the selected group. Table 370 is substantially the same as table 270 of space/time application 130. Namely, table 370 may include the same delete column, position column, consequence column, gene column, HGVS column, deleteriousness column, and description column as table 270 of space/time application 130. Table 370 includes an entry (i.e., a row) for each data point of the selected group.
Referring now to Figure 34, if the user wish to, for example, remove the selected data point from tree 312, the use selects the -/+ toggle button on each row and then click the PB Delete at the top of table 370 to remove the entry. In so doing, the VAF values for those variants are remove from the distance measureand these data points are removed from tree 312. Namely, tree 312 is automatically recalculated and rendered without these data point, as shown, for example, in Figure 35; i.e., see the difference between tree 312 in Figure 26 and phylogeny tree 312 in Figure 35.
Referring now to Figure 36, in addition to removing certain data points, a user may wish to further edit tree 312 by removing certain samples (e.g., sample blocks 318). Again, the user selects a certain node 316 and a set of pipe charts 350 is automatically generated that correspond to sample blocks 318 to the right of the selected node 316. Referring now to Figure 37, the user then selects the -/+ toggle button on each pipe chart 350 and clicking on PB Delete Samples. In so doing, the sample blocks 318 are removed from phylogeny tree 312. Namely, phylogeny tree 312 is automatically recalculated and rendered without these sample blocks 318, as shown, for example, in Figure 38; i.e., see the difference between phylogeny tree 312 in Figure 35 and phylogeny tree 312 in Figure 38.
Referring now to Figures 39 and 40, the function of toggle button 328, which is used to toggle between a TABLE presentation and a BLOCKS presentation. Toggle button 328 determines what happens when the user selects a sample block 318 or a node 316. The default setting is BLOCKS, as shown in Figures 26 through 38. Namely, if set to BLOCKS, then selecting a node or a sample will draw sample blocks 318 for that sample or all descendant samples of that node. The sample blocks 318 are arranged by level and date. Figure 39 shows toggle button 328 switched to TABLE. When toggle button 328 is set to TABLE, then selecting a sample block 318 will generate a table 370 of all data points and ordered by consequence [the help page was outdated] [this is when the sample AF column is populated with 0 to 1 VAF values: if a *block* is selected rather than a *node*]; or selecting a node 316 will generate a table 370 of all data points that have VAF >0.01 in all children of the selected node 316. In one example, table 370 is rendered just below control panel 310.
Additionally, in tree application 140, although not shown in Figures 26 through 40, scatter plots, such as scatter plots 262 of space/time application 130, can also be generated in GUI 145.
Figures 41 through 43 show examples of presenting sequencing information using scatter plots application 150 of system lOO of Figure 1. Referring now to Figure 41, GUI 155 of scatter plots application 150 includes a panel 410 for displaying a grid-like representation of samples at different levels and different time points. For example, if using the same 96 samples that are described with reference to space/time application 130 in Figures 2 through 25, the y-axis of the grid in panel 410 is the levels along the esophagus and the x-axis is the dates on which the samples were collected. Sample blocks 412, which substantially correspond to sample blocks 228 of space/time application 130, are displayed in panel 410 according to level and date. Sample blocks 412 are color coded according to the histology key shown in a panel 414. Further, a toggle button 416 is provided in panel 410 to toggle between a prevailing histology grade and a highest histology grade.
Referring now to Figure 42, when selecting two sample blocks 412, a scatter plot 462 is automatically rendered, showing the allele frequencies of all variants in those two sample blocks 412. Scatter plot 462 is substantially the same as scatter plot 262 of space/time application 130. [said just above]. The variants are color coded by consequence. Scatter plot 462 can be filtered to show only variants with a certain consequence, e.g., non-synonymous coding. Additionally and referring now to Figure 43, a table 470 is generated of all data points in the selected sample blocks 412 or in scatter plot 462. In one example, table 470 is rendered beside panel 410 and scatter plot 462. Table 470 is substantially the same as table 270 of space/time application 130.
Figure 44 shows an example of presenting sequencing information using matrix plot application 160 of system 100 of Figure 1. Figure 45 shows a close-up view of a portion of matrix plot application 160 of Figure 44. GUI 165 of matrix plot application 160 includes a panel 510 for displaying a visualization of the similarity between samples by drawing a grid with blocks 512 on the x- and y-axis and, at the intersection point. Each block 512 represents two samples, showing a darker color if the samples are similar and a lighter color if the samples are divergent. Hovering over a block 512 displays its metadata and highlights the dates of the two samples. The metadata can include, for example, the distance between the two samples, the histology grades of the two samples, the levels of the two samples, and the like.
A set of user controls 514 are provided in a panel 510. Use controls 514 include, for example, a sort by date PB, a sort by histology PB, a sort by level PB, and a distance measure PB. Using user controls 514, the grid can be sorted by date, histology, and level. The distance measures in matrix plot application 160 can be the same as those in tree application 140. Figures 44 and 45 show the matrix plot when using the binary distance measure. However, Figure 46 shows the matrix plot when using the Pearson distance measure.
Selecting a block 512 generates a scatter plot of variant allele frequencies for the two samples. Selecting a block, in some embodiments, can include tapping on a visual representation of the block on a mobile device screen. For example, Figure 47 shows a scatter plot 562 that can be generated. Scatter plot 562 is substantially the same as scatter plot 262 of space/time application 130.
The present disclosure may be embodied in other specific forms without departing from its spirit or essential characteristics. The described embodiments are to be considered in all respects only as illustrative and not restrictive. The scope of the disclosure is, therefore, indicated by the appended claims rather than by the foregoing description. All changes which come within the meaning and range of equivalency of the claims are to be embraced within their scope.
What is claimed is:

Claims

1. A method for presenting sequence information, the method comprising:
(a) obtaining a first sequence and a second sequence;
(b) determining a similarity between the first sequence and the second sequence, wherein the similarity is based upon distance between the first sequence and the second sequence; and
(c) displaying a block at an intersection point on a matrix plot based on the similarity between the first sequence and the second sequence.
2. The method of claim 1 , wherein the distance information is calculated using binary distance measure.
3. The method of claim 1, wherein the distance information is calculated using Pearson distance measure.
4. The method of claim 1, further comprising displaying metadata of the data point upon hovering over the data point.
5. The method of claim 4, wherein metadata includes at least one from the group comprising: the date the first sequence was taken, the date the second sequence was taken, the distance between the first sequence and the second sequence, the histology grades of the first sequence and the second sequence samples, and the levels of the first sequence and the second sequence.
6. The method as recited in claim 1, wherein the block is a darker color if the first sequence and the second sequence are similar.
7. The method as recited in claim 1, wherein the block is a lighter color if the first sequence and the second sequence are dissimilar.
8. The method as recited in claim 1, further comprising generating a scatter plot of variant allele frequencies for the first sequence and the second sequence upon selecting the block.
9. The method as recited in claim 8, wherein selecting the block includes tapping on a visual representation of the block displayed on a mobile device screen.
10. A method for presenting sequence information, the method comprising:
(a) displaying a visual representation of an anatomical feature of an organism;
(b) sectoring the visual representation of the anatomical feature of the organism;
(c) obtaining a first sequence at a first sector and a second sequence at a second sector from a sample at the first sector and a sample at the second sector;
(d) determining variants based on the first sequence and the second sequence; and
(e) displaying variant information for the first sequence at the first sector and for the second sequence at the second sector.
11. The method as recited in claim 10, wherein the anatomical feature includes an esophagus.
12. The method as recited in claim 10, further comprising, upon hovering over the first sector, loading at least one from the group comprising: sample metadata, variant allele frequency, and TruSeq Custom Amplicon metadata.
13. The method as recited in claim 12, wherein sample metadata includes at least one from the group comprising: date of sampling, histology report, average depth, and DNA concentration.
14. The method as recited in claim 13, wherein TruSeq Custom Amplicon metadata includes at least one from the group comprising: base change and annotation, gene and consequence.
15. The method as recited in claim 13, further comprising assigning a depth of 0 if the average depth was less than 20.
PCT/GB2015/051880 2014-06-27 2015-06-26 Methods, applications and systems for processing and presenting gene sequencing information WO2015198074A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201462018515P 2014-06-27 2014-06-27
US62/018,515 2014-06-27

Publications (1)

Publication Number Publication Date
WO2015198074A1 true WO2015198074A1 (en) 2015-12-30

Family

ID=53719789

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/GB2015/051880 WO2015198074A1 (en) 2014-06-27 2015-06-26 Methods, applications and systems for processing and presenting gene sequencing information

Country Status (1)

Country Link
WO (1) WO2015198074A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020150656A1 (en) 2017-08-07 2020-07-23 The Johns Hopkins University Methods for assessing and treating cancer
US11180803B2 (en) 2011-04-15 2021-11-23 The Johns Hopkins University Safe sequencing system
US11286531B2 (en) 2015-08-11 2022-03-29 The Johns Hopkins University Assaying ovarian cyst fluid
US20220310208A1 (en) * 2019-06-13 2022-09-29 Roche Sequencing Solutions, Inc. Systems and methods with improved user interface for interpreting and visualizing longitudinal
US11525163B2 (en) 2012-10-29 2022-12-13 The Johns Hopkins University Papanicolaou test for ovarian and endometrial cancers
US12006544B2 (en) 2023-11-27 2024-06-11 The Johns Hopkins University Safe sequencing system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020038185A1 (en) * 2000-09-25 2002-03-28 Hitachi, Ltd. Method for indicating relationship between cDNA sequence and genome recording medium, sequencer apparatus, and method for designing a primer
EP1429274A2 (en) * 2002-12-10 2004-06-16 Nec Corporation Methods for sequence display and homology search
US20050049795A1 (en) * 2001-08-21 2005-03-03 Miki Fikuda Biological sequence information reading method and storing method
US20070073147A1 (en) * 2005-09-28 2007-03-29 Siemens Medical Solutions Usa, Inc. Method and apparatus for displaying a measurement associated with an anatomical feature
US20140162274A1 (en) * 2012-06-28 2014-06-12 Taxon Biosciences, Inc. Compositions and methods for identifying and comparing members of microbial communities using amplicon sequences

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020038185A1 (en) * 2000-09-25 2002-03-28 Hitachi, Ltd. Method for indicating relationship between cDNA sequence and genome recording medium, sequencer apparatus, and method for designing a primer
US20050049795A1 (en) * 2001-08-21 2005-03-03 Miki Fikuda Biological sequence information reading method and storing method
EP1429274A2 (en) * 2002-12-10 2004-06-16 Nec Corporation Methods for sequence display and homology search
US20070073147A1 (en) * 2005-09-28 2007-03-29 Siemens Medical Solutions Usa, Inc. Method and apparatus for displaying a measurement associated with an anatomical feature
US20140162274A1 (en) * 2012-06-28 2014-06-12 Taxon Biosciences, Inc. Compositions and methods for identifying and comparing members of microbial communities using amplicon sequences

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
CARYN S ROSS-INNES ET AL: "Whole-genome sequencing provides new insights into the clonal architecture of Barrett's esophagus and esophageal adenocarcinoma", NATURE GENETICS., vol. 47, no. 9, 1 September 2015 (2015-09-01), NEW YORK, US, pages 1038 - 1046, XP055225894, ISSN: 1061-4036, DOI: 10.1038/ng.3357 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11180803B2 (en) 2011-04-15 2021-11-23 The Johns Hopkins University Safe sequencing system
US11453913B2 (en) 2011-04-15 2022-09-27 The Johns Hopkins University Safe sequencing system
US11459611B2 (en) 2011-04-15 2022-10-04 The Johns Hopkins University Safe sequencing system
US11773440B2 (en) 2011-04-15 2023-10-03 The Johns Hopkins University Safe sequencing system
US11525163B2 (en) 2012-10-29 2022-12-13 The Johns Hopkins University Papanicolaou test for ovarian and endometrial cancers
US11286531B2 (en) 2015-08-11 2022-03-29 The Johns Hopkins University Assaying ovarian cyst fluid
WO2020150656A1 (en) 2017-08-07 2020-07-23 The Johns Hopkins University Methods for assessing and treating cancer
US20220310208A1 (en) * 2019-06-13 2022-09-29 Roche Sequencing Solutions, Inc. Systems and methods with improved user interface for interpreting and visualizing longitudinal
US12006544B2 (en) 2023-11-27 2024-06-11 The Johns Hopkins University Safe sequencing system

Similar Documents

Publication Publication Date Title
Lex et al. Comparative analysis of multidimensional, quantitative data
Meyer et al. MizBee: a multiscale synteny browser
Forbes et al. COSMIC: exploring the world's knowledge of somatic mutations in human cancer
Carver et al. BamView: viewing mapped read alignment data in the context of the reference sequence
WO2015198074A1 (en) Methods, applications and systems for processing and presenting gene sequencing information
Gratzl et al. Domino: Extracting, comparing, and manipulating subsets across multiple tabular datasets
US7750908B2 (en) Focus plus context viewing and manipulation of large collections of graphs
Schatz et al. Hawkeye: an interactive visual analytics tool for genome assemblies
Seah et al. gbtools: interactive visualization of metagenome bins in R
Herbig et al. GenomeRing: alignment visualization based on SuperGenome coordinates
US20100281401A1 (en) Interactive Genome Browser
EP1388801A2 (en) Methods and system for simultaneous visualization and manipulation of multiple data types
Ferstay et al. Variant view: visualizing sequence variants in their gene context
US20200105370A1 (en) Genome browser
US20180314795A1 (en) Interactive precision medicine explorer for genomic abberations and treatment options
CN107004069B (en) Genome analysis device and genome visualization method
CN110603596A (en) Genomic data analysis systems and methods
Bandi et al. Visualization tools for genomic conservation
Hocking et al. SegAnnDB: interactive Web-based genomic segmentation
CN112292730B (en) Computing device with improved user interface for interpreting and visualizing data
Díaz et al. Morphing projections: a new visual technique for fast and interactive large-scale analysis of biomedical datasets
Wilkey et al. GCViT: a method for interactive, genome-wide visualization of resequencing and SNP array data
US20050066276A1 (en) Methods for identifying, viewing, and analyzing syntenic and orthologous genomic regions between two or more species
Gerhardt et al. RecruitPlotEasy: An advanced read recruitment plot tool for assessing metagenomic population abundance and genetic diversity
Ramesh et al. CNViz: An R/Shiny application for interactive copy number variant visualization in cancer

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15741584

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15741584

Country of ref document: EP

Kind code of ref document: A1