US10832797B2 - Method and system for quantifying sequence alignment - Google Patents
Method and system for quantifying sequence alignment Download PDFInfo
- Publication number
- US10832797B2 US10832797B2 US14/517,419 US201414517419A US10832797B2 US 10832797 B2 US10832797 B2 US 10832797B2 US 201414517419 A US201414517419 A US 201414517419A US 10832797 B2 US10832797 B2 US 10832797B2
- Authority
- US
- United States
- Prior art keywords
- node
- sequence
- sequence read
- data structure
- alignment
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active, expires
Links
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
- G16B30/10—Sequence alignment; Homology search
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B45/00—ICT specially adapted for bioinformatics-related data visualisation, e.g. displaying of maps or networks
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2537/00—Reactions characterised by the reaction format or use of a specific feature
- C12Q2537/10—Reactions characterised by the reaction format or use of a specific feature the purpose or use of
- C12Q2537/165—Mathematical modelling, e.g. logarithm, ratio
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
- G16B30/20—Sequence assembly
Definitions
- the invention relates to methods and systems for aligning sequences (e.g., nucleic acid sequences, amino acid sequences) to each other to produce a continuous sequence read corresponding to a sample (e.g., genetic sample, protein sample), and methods and systems for evaluating the resulting alignment.
- sequences e.g., nucleic acid sequences, amino acid sequences
- a sample e.g., genetic sample, protein sample
- NGS sequencing uses massive parallelization on smaller nucleic acid sequences that together make up a larger body of genetic information, e.g., a chromosome or a genome.
- the nucleic acids e.g., DNA
- the nucleic acids are broken up, amplified, and read with extreme speed.
- sequence alignment methods use massive computing power to align overlapping reads to a reference to produce a sequence that can be probed for important genetic or structural information (e.g., biomarkers for disease).
- sequence alignment is to combine the set of nucleic acid reads produced by the sequencer to achieve a longer read (i.e., a contig) or even the entire genome of the subject based upon a genetic sample from that subject. Because the sequence data from next generation sequencers often comprises millions of shorter sequences that together represent the totality of the target sequence, aligning the reads is complex and computationally expensive.
- each portion of the probed sequence is sequenced multiple times (e.g., 2 to 100 times, or more) to minimize the influence of any random sequencing errors on the final alignments and output sequences generated.
- the reads are aligned against a single reference sequence, e.g., GRCh37, in order to determine all (or some part of) the subject's sequence.
- GRCh37 a single reference sequence
- sequence alignment is constructed by aggregating pairwise alignments between two linear strings of sequence information.
- two strings S1 (SEQ ID NO. 17: AGCTACGTACACTACC) and S2 (SEQ ID NO. 18: AGCTATCGTACTAGC) can be aligned against each other.
- S1 typically corresponds to a read and S2 correspond to a portion of the reference sequence.
- S1 and S2 can consist of substitutions, deletions, and insertions.
- a substitution occurs when a letter or sequence in S2 is replaced by a different letter or sequence of the same length in S1
- a deletion occurs when a letter or sequence in S2 is “skipped” in the corresponding section of S1
- an insertion occurs when a letter or sequence occurs in S1 between two positions that are adjacent in S2.
- the two sequences S1 and S2 can be aligned as below. The alignment below represents thirteen matches, a deletion of length one, an insertion of length two, and one substitution:
- SW Smith-Waterman
- the Smith-Waterman (SW) algorithm aligns linear sequences by rewarding overlap between bases in the sequences, and penalizing gaps between the sequences.
- Smith-Waterman also differs from Needleman-Wunsch, in that SW does not require the shorter sequence to span the string of letters describing the longer sequence. That is, SW does not assume that one sequence is a read of the entirety of the other sequence.
- SW is not obligated to find an alignment that stretches across the entire length of the strings, a local alignment can begin and end anywhere within the two sequences.
- H ij max ⁇ H i-1,j-1 +s ( a i ,b j ), H i-1,j ⁇ W in ,H i,j-1 ⁇ W del ,0 ⁇ (for 1 ⁇ i ⁇ n and 1 ⁇ j ⁇ m )
- the resulting matrix has many elements that are zero. This representation makes it easier to backtrace from high-to-low, right-to-left in the matrix, thus
- the SW algorithm performs a backtrack to determine the alignment.
- the algorithm will backtrack based on which of the three values (H i-1,j-1 , H i-1,j ; or H i,j-1 ) was used to compute the final maximum value for each cell.
- the backtracking stops when a zero is reached. See, e.g., FIG. 3 part (B), which does not represent the prior art, but illustrates the concept of a backtrack, and the corresponding local alignment when the backtrack is read.
- the “best alignment,” as determined by the algorithm may contain more than the minimum possible number of insertions and deletions, but will contain far less than the maximum possible number of substitutions.
- the techniques When applied as SW or SW-Gotoh, the techniques use a dynamic programming algorithm to perform local sequence alignment of the two strings, S and A, of sizes m and n, respectively.
- This dynamic programming technique employs tables or matrices to preserve match scores and avoid recomputation for successive cells.
- the optimum alignment can be represented as B[j,k] in equation (2) below:
- B [ j,k ] max( p [ j,k ], i [ j,k ], d [ j,k ],0)(for 0 ⁇ j ⁇ m, 0 ⁇ k ⁇ n ) (2)
- the arguments of the maximum function, B[j,k] are outlined in equations (3)-(5) below, wherein MISMATCH_PENALTY, MATCH_BONUS, INSERTION_PENALTY, DELETION_PENALTY, and OPENING_PENALTY are all constants, and all negative except for MATCH_BONUS.
- the match argument, p[j,k] is given by equation (3), below:
- the scoring parameters are somewhat arbitrary, and can be adjusted to achieve the behavior of the computations.
- One example of the scoring parameter settings (Huang, Chapter 3 : Bio - Sequence Comparison and Alignment , ser. Curr Top Comp Mol Biol . Cambridge, Mass.: The MIT Press, 2002) for DNA would be:
- MISMATCH_PENALTY ⁇ 20
- the relationship between the gap penalties (INSERTION_PENALTY, OPENING_PENALTY) above help limit the number of gap openings, i.e., favor grouping gaps together, by setting the gap insertion penalty higher than the gap opening cost.
- MISMATCH_PENALTY, MATCH_BONUS, INSERTION_PENALTY, OPENING_PENALTY and DELETION_PENALTY are possible.
- the aligned sequences can be assembled to produce a sequence that can be compared to a reference (i.e., a genetic standard) to identify variants.
- a reference i.e., a genetic standard
- the variants can provide insight regarding diseases, stages of disease, recurrence and the like.
- the assembled amino acid sequences can be compared to a standard to determine evolutionary information about the protein, or functional information about the protein. This standard method of disease comparison is time consuming, however, because many of the variants are not necessarily correlated with a disease. For example, when the genetic standard is from a population having an ancestry different from the sample, many of the called variants are due to differences in things like hair color, skin color, etc.
- the invention provides algorithms and methods for their implementation that transform linear, local sequence alignment processes such as, for example, Smith-Waterman-Gotoh, into multi-dimensional alignment algorithms that provide increased parallelization, increased speed, increased accuracy, and the ability to align reads through an entire genome.
- Algorithms of the invention provide for a “look-back” type analysis of sequence information (as in Smith-Waterman), however, in contrast to known linear methods, the look back of the invention is conducted through a multi-dimensional space that includes multiple pathways and multiple nodes in order to provide more accurate alignment of complex and lengthy sequence reads, while achieving lower overall rates of mismatches, deletions, and insertions.
- the invention is implemented by aligning sequence reads to a series of directed, acyclic sequences spanning branch points that account for all, or nearly-all, of the possible sequence variation in the alignment, including insertions, deletions, substitutions, and structural variants.
- Such reference sequence constructs often represented as directed acyclic graphs (DAGs) can be easily assembled from available sequence databases, including “accepted” reference sequences and variant call format (VCF) entries.
- DAGs directed acyclic graphs
- VCF variant call format
- the quality of an alignment using these methods can be quickly assessed by monitoring the number of overlapping bases or amino acids between a sequence read and a reference sequence construct, thereby allowing certain alignments to be quickly discarded.
- the number of overlapping bases or amino acids can be used to determine the confidence of a genotype of a sample or a disease diagnosis.
- the invention also provides methods and systems for efficiently genotyping sequence reads by aligning the reads directly to a reference sequence construct that simultaneously accounts for multiple alleles at multiple loci in the genome of the organism. Furthermore, the methods and systems of the invention make it possible to deal with structural variations in an efficient way, greatly reducing the computing power necessary to genotype genetic samples using next generation sequencing (N.G.S.). Additionally, because the reference sequence construct accounts for the various possible alleles within the construct, it is possible to directly genotype the sample by merely aligning the reads of the sample to the construct. Particular patterns of alignment are only possible for particular genotypes, thus it is not necessary to compare the assembled sequence to a reference sequence and then compare the variations to mutation files associated with that reference.
- N.G.S. next generation sequencing
- the invention additionally provides methods to make specific base calls at specific loci using a reference sequence construct, e.g., a DAG that represents known variants at each locus of the genome. Because the sequence reads are aligned to the DAG during alignment, the subsequent step of comparing a mutation, vis-à-vis the reference genome, to a table of known mutations can be eliminated. Using the disclosed methods, it is merely a matter of identifying a nucleic acid read as being located at a known mutation represented on the DAG and calling that mutation. Alternatively, when a mutation is not known (i.e., not represented in the reference sequence construct), an alignment will be found and the variant identified as a new mutation.
- a reference sequence construct e.g., a DAG that represents known variants at each locus of the genome.
- the method also makes it possible to associate additional information, such as specific disease risk or disease progression, with known mutations that are incorporated into the reference sequence construct. Furthermore, in addition to having the potential to find all genetically relevant results during alignment, the disclosed methods reduce the computational resources required to make the alignments while allowing for simultaneous comparison to multiple reference sequences.
- the invention additionally includes methods for constructing a directed acyclic graph data structure (DAG) that represents known variants at positions within the sequence of an organism.
- DAG directed acyclic graph data structure
- the DAG may include multiple sequences at thousands of positions, and may include multiple variants at each position, including deletions, insertions, translations, inversions, and single-nucleotide polymorphisms (SNPs).
- SNPs single-nucleotide polymorphisms
- the variants will be scored, weighted, or correlated with other variants to reflect the prevalence of that variant as a marker for disease.
- a system comprises a distributed network of processors and storage capable of comparing a plurality of sequences (i.e., nucleic acid sequences, amino acid sequences) to a reference sequence construct (e.g., a DAG) representing observed variation in a genome or a region of a genome.
- the system is additionally capable of aligning the nucleic acid reads to produce a continuous sequence using an efficient alignment algorithm. Because the reference sequence construct compresses a great deal of redundant information, and because the alignment algorithm is so efficient, the reads can be tagged and assembled on an entire genome using commercially-available resources.
- the system comprises a plurality of processors that simultaneously execute a plurality of comparisons between a plurality of reads and the reference sequence construct.
- the comparison data may be accumulated and provided to a health care provider. Because the comparisons are computationally tractable, analyzing sequence reads will no longer represent a bottleneck between NGS sequencing and a meaningful discussion of a patient's genetic risks.
- FIGS. 1 (A) and (B) depict the construction of a directed acyclic graph (DAG) representing genetic variation in a reference sequence.
- DAG directed acyclic graph
- FIG. 1(A) shows the starting reference sequence and the addition of a deletion.
- FIG. 1(B) shows the addition of an insertion and a SNP, thus arriving at the Final DAG used for alignment;
- FIG. 2 depicts three variant call format (VCF) entries represented as directed acyclic graphs
- FIG. 3 shows a pictorial representation of aligning a nucleic acid sequence read against a construct that accounts for an insertion event as well as the reference sequence.
- FIG. 3 also shows the matrices and the backtrack used to identify the proper location of the nucleic acid sequence read “ATCGAA”;
- FIG. 4 depicts two sequences differing by a 15 base pair insertion.
- a reference sequence construct can be constructed to account for the insertion by creating two separate pathways, one of which incorporates the insertion;
- FIG. 5 demonstrates how the alignment of a sequence read to the reference sequence construct may be evaluated by determining the smallest number of overlapping bases (p) between the sequence read and a portion of the reference sequence construct;
- FIG. 6 depicts an associative computing model for parallel processing
- FIG. 7 depicts an architecture for parallel computation.
- the invention includes methods for aligning sequences (e.g., nucleic acid sequences, amino acid sequences) to a reference sequence construct, methods for building the reference sequence construct, and systems that use the alignment methods and constructs to produce alignments and assemblies.
- the reference sequence construct may be a directed acyclic graph (DAG), as described below, however the reference sequence can be any representation reflecting genetic variability in the sequences of different organisms within a species, provided the construct is formatted for alignment. The genetic variability may also be between different tissues or cells within an organism.
- the reference sequence construct will comprise portions that are identical and portions that vary between sampled sequences.
- the constructs can be thought of as having positions (i.e., according to some canonical ordering) that comprise the same sequence(s) and some positions that comprise alternative sequences, reflecting genetic variability.
- the application additionally discloses methods for identifying a disease or a genotype based upon alignment of a nucleic acid read to a location in the construct. The methods are broadly applicable to the fields of genetic sequencing and mutation screening.
- the invention additionally details methods for assessing the quality of the alignment between one or more sequence reads and the reference sequence construct.
- the number of amino acids or nucleic acids overlapping between the sequence read and the reference sequence construct are used as a metric. For example, the smallest number of overlaps between the sequence read and portions of the reference sequence construct can be assessed, and the lowest number compared to a threshold, below which the alignment will be rejected. Assessing the alignment quality makes it easier to evaluate the likelihood that secondary information gleaned from the alignment, such as genotype, or disease status, is correct. Alternative methods, such as ranking the highest number of overlaps are also possible, and are intended to be covered by the invention.
- the invention uses a construct that can account for the variability in genetic sequences within a species, population, or even among different cells in a single organism.
- Representations of the genetic variation can be presented as directed acyclic graphs (DAGs) (discussed above) row-column alignment matrices, or deBruijn graphs, and these constructs can be used with the alignment methods of the invention provided that the parameters of the alignment algorithms are set properly (discussed below).
- DAGs directed acyclic graphs
- deBruijn graphs row-column alignment matrices
- the construct is a directed acyclic graph (DAG), i.e., having a direction and having no cyclic paths. (That is, a sequence path cannot travel through a position on the reference construct more than once.)
- DAG directed acyclic graph
- genetic variation in a sequence is represented as alternate nodes.
- the nodes can be a section of conserved sequence, or a gene, or simply a nucleic acid.
- the different possible paths through the construct represent known genetic variation.
- a DAG may be constructed for an entire genome of an organism, or the DAG may be constructed only for a portion of the genome, e.g., a chromosome, or smaller segment of genetic information.
- the DAG represents greater than 1000 nucleic acids, e.g., greater than 10,000 nucleic acids, e.g., greater than 100,000 nucleic acids, e.g., greater than 1,000,000 nucleic acids.
- a DAG may represent a species (e.g., Homo Sapiens ) or a selected population (e.g., women having breast cancer), or even smaller subpopulations, such as genetic variation among different tumor cells in the same individual.
- the methods and systems of the invention can be used to align amino acid reads to an amino acid reference sequence construct.
- the amino acid sequence reads may be obtained through mass spectrometry or by using Edman degradation and the reference sequence construct, e.g., the DAG, may represent a protein.
- a “sequence read” is intended to encompass any ordered listing of nucleic acids or amino acids.
- the amino acids and nucleic acids in the sequences are typically “natural” amino acids and nucleic acids, however, non-natural amino acids and nucleic acids can be easily included with the methods of the invention by using other symbols to represent the non-natural amino acids or nucleic acids.
- FIGS. 1(A) and 1(B) A simple example of DAG construction is shown in FIGS. 1(A) and 1(B) .
- the DAG begins with a reference sequence, shown in FIG. 1(A) as SEQ ID NO. 1: CATAGTACCTAGGTCTTGGAGCTAGTC.
- SEQ ID NO. 1 CATAGTACCTAGGTCTTGGAGCTAGTC.
- the reference sequence is often much longer, and may be an entire genome.
- the sequence is typically stored as a FASTA or FASTQ file. (FASTQ has become the default format for sequence data produced from next generation sequencers).
- the reference sequence may be a standard reference, such as GRCh37.
- each letter (or symbol) in the sequence actually corresponds to a nucleotide (e.g., a deoxyribonucleotide or a ribonucleotide) or an amino acid (e.g., histidine, leucine, lysine, etc.).
- a nucleotide e.g., a deoxyribonucleotide or a ribonucleotide
- an amino acid e.g., histidine, leucine, lysine, etc.
- a variant is added to the reference sequence, as shown in the bottom image of FIG. 1(A) .
- the variant is the deletion of the sequence “AG” from the reference between the lines in the figure, i.e., SEQ ID NO. 2.
- this deletion is represented by breaking the reference sequence into nodes before and after the deletion, and inserting two strings between the nodes.
- One path between the nodes represents the reference sequence, while the other path represents the deletion.
- the variants are called to the DAG by applying the entries in a variant call format (VCF) file, such as can be found at the 1000 Genomes Project website.
- VCF variant call format
- each VCF file is keyed to a specific reference genome, it is not difficult to identify where the strings should be located.
- each entry in a VCF file can be thought of as combining with the reference to create separate graph, as displayed in FIG. 2 .
- a second VCF entry corresponding to an insertion “GG” at a specific position is added to produce an expanded DAG, i.e., including SEQ ID NO. 3 and SEQ ID NO. 4.
- a third VCF entry can be added to the expanded DAG to account for a SNP earlier in the reference sequence, i.e., including SEQ ID NOS. 5-8.
- a DAG has been created against which nucleic acid reads can be aligned (as discussed below.)
- the DAGs are represented in computer memory (hard disk, FLASH, cloud memory, etc.) as a set of nodes, S, wherein each node is defined by a string, a set of parent nodes, and a position.
- the string is the node's “content,” i.e., sequence; the parent nodes define the node's position with respect to the other nodes in the graph; and the position of the node is relative to some canonical ordering in the system, e.g., the reference genome. While it is not strictly necessary to define the graph with respect to a reference sequence, it does make manipulation of the output data simpler. Of course, a further constraint on S is that it cannot include loops.
- the nodes comprise a plurality of characters, as shown in FIGS. 1(A) and 1(B) , however it is possible that a node may be a single character, e.g., representing a single base, as shown in FIG. 2 .
- a node represents a string of characters
- all of the characters in the node can be aligned with a single comparison step, rather than character-by-character calculations, as is done with conventional Smith-Waterman techniques.
- the computational burden is greatly reduced as compared to state-of-the-are methods. The reduced computational burden allows the alignment to be completed quicker, and with fewer resources.
- DAGs that incorporate thousands of VCF entries representing the known variation in genetic sequences for a given region of a reference. Nonetheless, as a DAG becomes bulkier, the computations do take longer, and for many applications a smaller DAG is used that may only represent a portion of the sequence, e.g., a chromosome.
- a DAG may be made smaller by reducing the size of the population that is covered by the DAG, for instance going from a DAG representing variation in breast cancer to a DAG representing variation in triple negative breast cancer.
- longer DAGs can be used that are customized based upon easily identified genetic markers that will typically result in a large portion of the DAG being consistent between samples.
- aligning a set of nucleic acid reads from an African-ancestry female will be quicker against a DAG created with VCF entries from women of African ancestry as compared to a DAG accounting for all variations known in humans over the same sequence.
- the DAGs of the invention are dynamic constructs in that they can be modified over time to incorporate newly identified mutations. Additionally, algorithms in which the alignment results are recursively added to the DAG are also possible.
- the gap penalties can be adjusted to make gap insertions even more costly, thus favoring an alignment to a sequence rather than opening a new gap in the overall sequence.
- improvements in the DAG discussed above the incidence of gaps should decrease even further because mutations are accounted for in the DAG.
- an algorithm is used to align sequence reads against a directed acyclic graph (DAG).
- DAG directed acyclic graph
- the alignment algorithm identifies the maximum value for C i,j by identifying the maximum score with respect to each sequence contained at a position on the DAG (e.g., the reference sequence construct). In fact, by looking “backwards” at the preceding positions, it is possible to identify the optimum alignment across a plurality of possible paths.
- the algorithm of the invention is carried out on a read (a.k.a. “string”) and a directed acyclic graph (DAG), discussed above.
- a read a.k.a. “string”
- D directed acyclic graph
- each letter of the sequence of a node will be represented as a separate element, d.
- a predecessor of d is defined as:
- the algorithm seeks the value of M[j,d], the score of the optimal alignment of the first j elements of S with the portion of the DAG preceding (and including) d. This step is similar to finding H i,j in equation 1 in the Background section. Specifically, determining M[j,d] involves finding the maximum of a, i, e, and 0, as defined below:
- M ⁇ [ j , d ] ⁇ max ⁇ ⁇ a , i , e , 0 ⁇
- ⁇ e ⁇ max ⁇ ⁇ M ⁇ [ j - p * ] + DELETE_PENALTY ⁇ ⁇ ⁇ for ⁇ ⁇ p * ⁇ ⁇ in ⁇ ⁇ P ⁇ [ d ]
- i ⁇ M ⁇ [ j - 1 , d ]
- e is the highest of the alignments of the first j characters of S with the portions of the DAG up to, but not including, d, plus an additional DELETE_PENALTY. Accordingly, if d is not the first letter of the sequence of the node, then there is only one predecessor, p, and the alignment score of the first j characters of S with the DAG (up-to-and-including p) is equivalent to M[j,p]+DELETE_PENALTY.
- i is the alignment of the first j ⁇ 1 characters of the string S with the DAG up-to-and-including d, plus an INSERT_PENALTY, which is similar to the definition of the insertion argument in SW (see equation 1).
- a is the highest of the alignments of the first j characters of S with the portions of the DAG up to, but not including d, plus either a MATCH_SCORE (if the jth character of S is the same as the character d) or a MISMATCH_PENALTY (if the jth character of S is not the same as the character d).
- MATCH_SCORE if the jth character of S is the same as the character d
- MISMATCH_PENALTY if the jth character of S is not the same as the character d.
- a is the alignment score of the first j ⁇ 1 characters of S with the DAG (up-to-and-including p), i.e., M[j ⁇ 1,p], with either a MISMATCH_PENALTY or MATCH_SCORE added, depending upon whether d and the jth character of S match.
- d is the first letter of the sequence of its node, there can be multiple possible predecessors.
- maximizing ⁇ M[j, p*]+MISMATCH_PENALTY or MATCH_SCORE ⁇ is the same as choosing the predecessor with the highest alignment score with the first j ⁇ 1 characters of S (i.e., the highest of the candidate M[j ⁇ 1,p*] arguments) and adding either a MISMATCH_PENALTY or a MATCH_SCORE depending on whether d and the jth character of S match.
- the penalties e.g., DELETE_PENALTY, INSERT_PENALTY, MATCH_SCORE and MISMATCH_PENALTY, can be adjusted to encourage alignment with fewer gaps, etc.
- the algorithm finds the maximum value for each read by calculating not only the insertion, deletion, and match scores for that element, but looking backward (against the direction of the DAG) to any prior nodes on the DAG to find a maximum score.
- the algorithm is able to traverse the different paths through the DAG, which contain the known mutations. Because the graphs are directed, the backtracks, which move against the direction of the graph, follow the preferred variant sequence toward the origin of the graph, and the maximum alignment score identifies the most likely alignment within a high degree of certainty. While the equations above are represented as “maximum” values, “maximum” is intended to cover any form of optimization, including, for example, switching the signs on all of the equations and solving for a minimum value.
- FIG. 3 shows a pictorial representation of the read being compared to the DAG while FIG. 3 part (B) shows the actual matrices that correspond to the comparison.
- the algorithm of the invention identifies the highest score and performs a backtrack to identify the proper location of the read.
- FIGS. 4 and 5 exemplify the methods for evaluating the quality of an alignment between a sequence read and a reference sequence construct.
- two reference sequences i.e., #1 and #2, can be assembled into a reference sequence construct, whereby alternative paths through the construct account for a 15 bp insertion, as shown in FIG. 4 .
- a first portion corresponding to the conserved region CCCAGAACGTTG a first alternative portion corresponding to the insertion, i.e., CTATGCAACAAGGGA
- a second alternative portion corresponding to the horizontal arrow running between G and C a second conserved region
- CATCGTAGACGAGTTTCAGCATT a second conserved region
- Read #1, #2, and #3 can be aligned to the reference sequence construct of FIG. 4 , as shown in FIG. 5 .
- the overlap between the sequence read and each portion of the construct can be evaluated to determine the quality of the alignment.
- Read #1 (SEQ ID NO. 14) aligns completely to the insert portion, i.e., the first alternative portion. Because there are 10 nucleic acids in Read #1, and all of the nucleic acids align to the same portion, the smallest overlap value, ⁇ , is 10.
- Read #2 (SEQ ID NO. 15) aligns directly to reference sequence #1, and thus aligns to the first portion and the second portion of the reference sequence construct.
- FIG. 5 The utility of the smallest overlap is illustrated in FIG. 5 . Focusing on Read #3, it is evident that the greatest likelihood of mis-alignment between Reads #1, #2, and #3, is Read #3. In fact, as shown in FIG. 5 , Read #3 is only a single base away from matching with reference #1 instead of reference #2. In some instances, it may be beneficial to discount such reads for fear that either through amplification or sequencing error, the read was modified and the alignment is incorrect. Thus, by setting a threshold, e.g., of three overlaps, and comparing the smallest overlap number to this threshold, alignments such as Read #3 can be flagged, discounted, or removed from the process. Such quality control will minimize the likelihood that a sequence is called for the wrong alignment, which may, for example, lead to an incorrect genotype or even the mis-diagnosis of a disease.
- a threshold e.g., of three overlaps
- SWAMP Smith-Waterman using Associative Massive Parallelism
- Rognes and Seeberg ( Bioinformatics ( Oxford, England ), 16(8):699-706, 2000) use the Intel Pentium processor with SSE's predecessor, MMX SIMD instructions for their implementation.
- the approach that developed out of the work of Rognes and Seeberg ( Bioinformatics, 16(8):699-706, 2000) for ParAlign does not use the wavefront approach (Rognes, Nuc Acids Res, 29(7):1647-52, 2001; Saebo et al., Nuc Acids Res, 33(suppl 2):W535-W539, 2005). Instead, they align the SIMD registers parallel to the query sequence, computing eight values at a time, using a pre-computed query-specific score matrix.
- small-scale vector parallelization (8, 16 or 32-way parallelism) can be used to make the calculations accessible via GPU implementations that align multiple sequences in parallel.
- the theoretical peak speedup for the calculations is a factor of m, which is optimal.
- the main parallel model used to develop and extend Smith-Waterman sequence alignment is the ASsociative Computing (ASC) (Potter et al., Computer, 27(11):19-25, 1994). Efficient parallel versions of the Smith-Waterman algorithm are described herein. This model and one other model are described in detail in this section.
- ASC ASsociative Computing
- MIMD multiple-instruction, multiple-data
- SIMD single-instruction multiple-data
- MIMD Multiple Instruction, Multiple Data
- the multiple-data, multiple-instruction model or MIMD model describes the majority of parallel systems currently available, and include the currently popular cluster of computers.
- the MIMD processors have a full-fledged central processing unit (CPU), each with its own local memory (Quinn, Parallel Computing: Theory and Practice, 2nd ed., New York: McGraw-Hill, 1994).
- CPU central processing unit
- SIMD SIMD
- each of the MIMD processors stores and executes its own program asynchronously.
- the MIMD processors are connected via a network that allows them to communicate but the network used can vary widely, ranging from an Ethernet, Myrinet, and InfiniBand connection between machines (cluster nodes). The communications tend to employ a much looser communications structure than SIMDs, going outside of a single unit.
- the data is moved along the network asynchronously by individual processors under the control of their individual program they are executing.
- communication is handled by one of several different parallel languages that support message-passing.
- a very common library for this is known as the Message Passing Interface (MPI).
- MPI Message Passing Interface
- MIMDs Small Computer Memory Sticks
- Parallel computations by MIMDs usually require extensive communication and frequent synchronizations unless the various tasks being executed by the processors are highly independent (i.e. the so-called “embarrassingly parallel” or “pleasingly parallel” problems).
- the work presented in Section 8 uses an AMD Opteron cluster connected via InfiniBand.
- the worst-case time required for the message-passing is difficult or impossible to predict.
- the message-passing execution time for MIMD software is determined using the average case estimates, which are often determined by trial, rather than by a worst case theoretical evaluation, which is typical for SIMDs. Since the worst case for MIMD software is often very bad and rarely occurs, average case estimates are much more useful.
- the communication time required for a MIMD on a particular problem can be and is usually significantly higher than for a SIMD. This leads to the important goal in MIMD programming (especially when message-passing is used) to minimize the number of inter-processor communications required and to maximize the amount of time between processor communications. This is true even at a single card acceleration level, such as using graphics processors or GPUs.
- Data-parallel programming is also an important technique for MIMD programming, but here all the tasks perform the same operation on different data and are only synchronized at various critical points.
- SPMD Single-Program, Multiple-Data
- Each processor has its own copy of the same program, executing the sections of the code specific to that processor or core on its local data.
- the popularity of the SPMD paradigm stems from the fact that it is quite difficult to write a large number of different programs that will be executed concurrently across different processors and still be able to cooperate on solving a single problem.
- Another approach used for memory-intensive but not compute-intensive problems is to create a virtual memory server, as is done with JumboMem, using the work presented in Section 8. This uses MPI in its underlying implementation.
- SIMD Single Instruction, Multiple Data
- the SIMD model consists of multiple, simple arithmetic processing elements called PEs. Each PE has its own local memory that it can fetch and store from, but it does not have the ability to compile or execute a program.
- parallel memory refers to the local memories, collectively, in a computing system.
- a parallel memory can be the collective of local memories in a SIMD computer system (e.g., the local memories of PEs), the collective of local memories of the processors in a MIMD computer system (e.g., the local memories of the central processing units) and the like.
- control unit or front end
- the control unit is connected to all PEs, usually by a bus.
- All active PEs execute the program instructions received from the control unit synchronously in lockstep. “In any time unit, a single operation is in the same state of execution on multiple processing units, each manipulating different data” (Quinn, Parallel Computing: Theory and Practice, 2nd ed., New York: McGraw-Hill, 1994), at page 79. While the same instruction is executed at the same time in parallel by all active PEs, some PEs may be allowed to skip any particular instruction (Baker, SIMD and MASC: Course notes from CS 6/73301: Parallel and Distributed Computing—power point slides, (2004)2004). This is usually accomplished using an “if-else” branch structure where some of the PEs execute the if instructions and the remaining PEs execute the else part. This model is ideal for problems that are “data-parallel” in nature that have at most a small number of if-else branching structures that can occur simultaneously, such as image processing and matrix operations.
- Data can be broadcast to all active PEs by the control unit and the control unit can also obtain data values from a particular PE using the connection (usually a bus) between the control unit and the PEs.
- the set of PE are connected by an interconnection network, such as a linear array, 2-D mesh, or hypercube that provides parallel data movement between the PEs. Data is moved through this network in synchronous parallel fashion by the PEs, which execute the instructions including data movement, in lockstep. It is the control unit that broadcasts the instructions to the PEs.
- the SIMD network does not use the message-passing paradigm used by most parallel computers today. An important advantage of this is that SIMD network communication is extremely efficient and the maximum time required for the communication can be determined by the worst-case time of the algorithm controlling that particular communication.
- the ASsocative Computing (ASC) model is an extended SIMD based on the STARAN associative SIMD computer, designed by Dr. Kenneth Batcher at Goodyear Aerospace and its heavily Navy-utilized successor, the ASPRO.
- ASC is an algorithmic model for associative computing (Potter et al., Computer, 27(11):19-25, 1994) (Potter, Associative Computing: A Programming Paradigm for Massively Parallel Computers , Plenum Publishing, 1992).
- the ASC model grew out of work on the STARAN and MPP, associative processors built by Goodyear Aerospace. Although it is not currently supported in hardware, current research efforts are being made to both efficiently simulate and design a computer for this model.
- ASC uses synchronous data-parallel programming, avoiding both multi-tasking and asynchronous point-to-point communication routing. Multi-tasking is unnecessary since only one task is executed at any time, with multiple instances of this task executed in lockstep on all active processing elements (PEs).
- PEs active processing elements
- ASC like SIMD programmers, avoid problems involving load balancing, synchronization, and dynamic task scheduling, issues that must be explicitly handled in MPI and other MIMD cluster paradigms.
- FIG. 6 shows a conceptual model of an ASC computer.
- a single control unit also known as an instruction stream (IS)
- PEs processing elements
- the control unit and PE array are connected through a broadcast/reduction network and the PEs are connected together through a PE data interconnection network.
- a PE has access to data located in its own local memory.
- the data remains in place and responding (active) PEs process their local data in parallel.
- the reference to the word associative is related to the use of searching to locate data by content rather than memory addresses.
- the ASC model does not employ associative memory, instead it is an associative processor where the general cycle is to search-process-retrieve. An overview of the model is available in (Potter et al., Computer, 27(11):19-25, 1994).
- the associative operations are executed in constant time (Jin et al., 15 th International Parallel and Distributed Processing Symposium ( IPDPS '01) Workshops, San Francisco, p. 193, 2001), due to additional hardware required by the ASC model. These operations can be performed efficiently (but less rapidly) by any SIMD-like machine, and has been successfully adapted to run efficiently on several SIMD hardware platforms (Yuan et al., Parallel and Distributed Computing Systems ( PDCS ), Cambridge, M A, 2009; Trahan et al., J. of Parallel and Distributed Computing ( JPDC ), 2009). SWAMP and other ASC algorithms can therefore be efficiently implemented on other systems that are closely related to SIMDs including vector machines, which is why the model is used as a paradigm.
- the control unit fetches and decodes program instructions and broadcasts control signals to the PEs.
- the PEs under the direction of the control unit, execute these instructions using their own local data. All PEs execute instructions in a lockstep manner, with an implicit synchronization between instructions.
- ASC has several relevant high-speed global operations: associative search, maximum/minimum search, and responder selection/detection. These are described in the following section.
- the basic operation in an ASC algorithm is the associative search.
- An associative search simultaneously locates the PEs whose local data matches a given search key. Those PEs that have matching data are called responders and those with non-matching data are called non-responders. After performing a search, the algorithm can then restrict further processing to only affect the responders by disabling the non-responders (or vice versa). Performing additional searches may further refine the set of responders.
- Associative search is heavily utilized by SWAMP+ in selecting which PEs are active within a parallel act within a diagonal.
- an associative computer can also perform global searches, where data from the entire PE array is combined together to determine the set of responders.
- the most common type of global search is the maximum/minimum search, where the responders are those PEs whose data is the maximum or minimum value across the entire PE array. The maximum value is used by SWAMP+ in every diagonal it processes to track the highest value calculated so far. Use of the maximum search occurs frequently, once in a logical parallel act, m+n times per alignment.
- An associative search can result in multiple responders and an associative algorithm can process those responders in one of three different modes: parallel, sequential, or single selection.
- Parallel responder processing performs the same set of operations on each responder simultaneously.
- Sequential responder processing selects each responder individually, allowing a different set of operations for each responder.
- Single responder selection also known as pickOne
- pickOne selects one, arbitrarily chosen, responder to undergo processing.
- an associative search to result in no responders.
- the ASC model can detect whether there were any responders to a search and perform a separate set of actions in that case (known as anyResponders).
- SWAMP multiple responders that contain characters to be aligned are selected and processed in parallel, based on the associative searches mentioned above.
- Single responder selection occurs if and when there are multiple values that have the exact same maximum value when using the maximum/minimum search.
- associative processors include some type of PE interconnection network to allow parallel data movement within the array.
- the ASC model itself does not specify any particular interconnection network and, in fact, many useful associative algorithms do not require one.
- associative processors implement simple networks such as 1D linear arrays or 2D meshes. These networks are simple to implement and allow data to be transferred quickly in a synchronous manner.
- the 1D linear array is sufficient for the explicit communication between PEs in the SWAMP algorithms, for example.
- FIG. 7 A generalized parallel processing architecture is shown in FIG. 7 . While each component is shown as having a direct connection, it is to be understood that the various elements may be geographically separated but connected via a network, e.g., the internet. While hybrid configurations are possible, the main memory in a parallel computer is typically either shared between all processing elements in a single address space, or distributed, i.e., each processing element has its own local address space. (Distributed memory refers to the fact that the memory is logically distributed, but often implies that it is physically distributed as well.) Distributed shared memory and memory virtualization combine the two approaches, where the processing element has its own local memory and access to the memory on non-local processors. Accesses to local memory are typically faster than accesses to non-local memory.
- UMA Uniform Memory Access
- NUMA Non-Uniform Memory Access
- Distributed memory systems have non-uniform memory access.
- Processor-processor and processor-memory communication can be implemented in hardware in several ways, including via shared (either multiported or multiplexed) memory, a crossbar switch, a shared bus or an interconnect network of a myriad of topologies including star, ring, tree, hypercube, fat hypercube (a hypercube with more than one processor at a node), or n-dimensional mesh.
- shared either multiported or multiplexed
- crossbar switch a shared bus or an interconnect network of a myriad of topologies including star, ring, tree, hypercube, fat hypercube (a hypercube with more than one processor at a node), or n-dimensional mesh.
- Parallel computers based on interconnected networks must incorporate routing to enable the passing of messages between nodes that are not directly connected.
- the medium used for communication between the processors is likely to be hierarchical in large multiprocessor machines. Such resources are commercially available for purchase for dedicated use, or these resources can be accessed via “the cloud,” e.g., Amazon Cloud Computing.
- a computer generally includes a processor coupled to a memory via a bus.
- Memory can include RAM or ROM and preferably includes at least one tangible, non-transitory medium storing instructions executable to cause the system to perform functions described herein.
- systems of the invention include one or more processors (e.g., a central processing unit (CPU), a graphics processing unit (GPU), etc.), computer-readable storage devices (e.g., main memory, static memory, etc.), or combinations thereof which communicate with each other via a bus.
- processors e.g., a central processing unit (CPU), a graphics processing unit (GPU), etc.
- computer-readable storage devices e.g., main memory, static memory, etc.
- a processor may be any suitable processor known in the art, such as the processor sold under the trademark XEON E7 by Intel (Santa Clara, Calif.) or the processor sold under the trademark OPTERON 6200 by AMD (Sunnyvale, Calif.).
- Memory may refer to a computer-readable storage device and can include any machine-readable medium on which is stored one or more sets of instructions (e.g., software embodying any methodology or function found herein), data (e.g., embodying any tangible physical objects such as the genetic sequences found in a patient's chromosomes), or both. While the computer-readable storage device can in an exemplary embodiment be a single medium, the term “computer-readable storage device” should be taken to include a single medium or multiple media (e.g., a centralized or distributed database, and/or associated caches and servers) that store the one or more sets of instructions or data.
- sets of instructions e.g., software embodying any methodology or function found herein
- data e.g., embodying any tangible physical objects such as the genetic sequences found in a patient's chromosomes
- the computer-readable storage device can in an exemplary embodiment be a single medium, the term “computer-readable storage device” should be taken to include a single medium or multiple media (e.
- a computer-readable storage device shall accordingly be taken to include, without limit, solid-state memories (e.g., subscriber identity module (SIM) card, secure digital card (SD card), micro SD card, or solid-state drive (SSD)), optical and magnetic media, and any other tangible storage media.
- SIM subscriber identity module
- SD card secure digital card
- SSD solid-state drive
- a computer-readable storage device includes a tangible, non-transitory medium.
- Such non-transitory media excludes, for example, transitory waves and signals.
- “Non-transitory memory” should be interpreted to exclude computer readable transmission media, such as signals, per se.
- Input/output devices may include a video display unit (e.g., a liquid crystal display (LCD) or a cathode ray tube (CRT) monitor), an alphanumeric input device (e.g., a keyboard), a cursor control device (e.g., a mouse or trackpad), a disk drive unit, a signal generation device (e.g., a speaker), a touchscreen, an accelerometer, a microphone, a cellular radio frequency antenna, and a network interface device, which can be, for example, a network interface card (NIC), Wi-Fi card, or cellular modem.
- a video display unit e.g., a liquid crystal display (LCD) or a cathode ray tube (CRT) monitor
- an alphanumeric input device e.g., a keyboard
- a cursor control device e.g., a mouse or trackpad
- a disk drive unit e.g., a disk drive unit
- a signal generation device
- the invention includes methods for producing sequences (e.g., nucleic acid sequences, amino acid sequences) corresponding to nucleic acids recovered from biological samples.
- sequences e.g., nucleic acid sequences, amino acid sequences
- the resulting information can be used to identify mutations present in nucleic acid material obtained from a subject.
- a sample i.e., nucleic acids (e.g. DNA or RNA) are obtained from a subject, the nucleic acids are processed (lysed, amplified, and/or purified) and the nucleic acids are sequenced using a method described below.
- the result of the sequencing is not a linear nucleic acid sequence, but a collection of thousands or millions of individual short nucleic acid reads that must be re-assembled into a sequence for the subject.
- the aligned sequence can be compared to reference sequences to identify mutations that may be indicative of disease, for example.
- the subject may be identified with particular mutations based upon the alignment of the reads against a reference sequence construct, i.e., a directed acyclic graph (“DAG”) as described above.
- DAG directed acyclic graph
- the biological samples may, for example, comprise samples of blood, whole blood, blood plasma, tears, nipple aspirate, serum, stool, urine, saliva, circulating cells, tissue, biopsy samples, hair follicle or other samples containing biological material of the patient.
- One issue in conducting tests based on such samples is that, in most cases only a tiny amount of DNA or RNA containing a mutation of interest may be present in a sample. This is especially true in non-invasive samples, such as a buccal swab or a blood sample, where the mutant nucleic acids are present in very small amounts.
- the nucleic acid fragments may be naturally short, that is, random shearing of relevant nucleic acids in the sample can generate short fragments.
- the nucleic acids are purposely fragmented for ease of processing or because the sequencing techniques can only sequence reads of less than 1000 bases, e.g., less than 500 bases, e.g., less than 200 bases, e.g., less than 100 bases, e.g., less than 50 bases.
- the majority of the plurality of nucleic acid reads will follow from the sequencing method and comprise less than 1000 bases, e.g., less than 500 bases, e.g., less than 200 bases, e.g., less than 100 bases, e.g., less than 50 bases.
- Nucleic acids may be obtained by methods known in the art. Generally, nucleic acids can be extracted from a biological sample by a variety of techniques such as those described by Maniatis, et al., Molecular Cloning: A Laboratory Manual, Cold Spring Harbor, N.Y., pp. 280-281, (1982), the contents of which is incorporated by reference herein in its entirety.
- Extracts may be prepared using standard techniques in the art, for example, by chemical or mechanical lysis of the cell. Extracts then may be further treated, for example, by filtration and/or centrifugation and/or with chaotropic salts such as guanidinium isothiocyanate or urea or with organic solvents such as phenol and/or HCCl 3 to denature any contaminating and potentially interfering proteins.
- chaotropic salts such as guanidinium isothiocyanate or urea
- organic solvents such as phenol and/or HCCl 3
- the sample may comprise RNA, e.g., mRNA, collected from a subject sample, e.g., a blood sample.
- RNA e.g., mRNA
- a subject sample e.g., a blood sample.
- RNA e.g., mRNA
- Methods for RNA extraction from paraffin embedded tissues are disclosed, for example, in Rupp and Locker, Lab Invest. 56:A67 (1987), and De Andres et al., BioTechniques 18:42044 (1995). The contents of each of these references is incorporated by reference herein in their entirety.
- RNA isolation can be performed using a purification kit, buffer set and protease from commercial manufacturers, such as Qiagen, according to the manufacturer's instructions.
- total RNA from cells in culture can be isolated using Qiagen RNeasy mini-columns.
- Other commercially available RNA isolation kits include MASTERPURE Complete DNA and RNA Purification Kit (EPICENTRE, Madison, Wis.), and Paraffin Block RNA Isolation Kit (Ambion, Inc.).
- Total RNA from tissue samples can be isolated using RNA Stat-60 (Tel-Test).
- RNA prepared from tumor can be isolated, for example, by cesium chloride density gradient centrifugation.
- Sequencing may be by any method known in the art.
- DNA sequencing techniques include classic dideoxy sequencing reactions (Sanger method) using labeled terminators or primers and gel separation in slab or capillary, sequencing by synthesis using reversibly terminated labeled nucleotides, pyrosequencing, 454 sequencing, allele specific hybridization to a library of labeled oligonucleotide probes, sequencing by synthesis using allele specific hybridization to a library of labeled clones that is followed by ligation, real time monitoring of the incorporation of labeled nucleotides during a polymerization step, polony sequencing, and SOLiD sequencing.
- nucleic acids are amplified using polymerase chain reactions (PCR) techniques known in the art.
- Illumina sequencing e.g., the MiSeqTM platform
- Illumina sequencing for DNA is based on the amplification of DNA on a solid surface using fold-back PCR and anchored primers. Genomic DNA is fragmented, and adapters are added to the 5′ and 3′ ends of the fragments. DNA fragments that are attached to the surface of flow cell channels are extended and bridge amplified. The fragments become double stranded, and the double stranded molecules are denatured.
- RNA fragments are being isolated and amplified in order to determine the RNA expression of the sample.
- sequences may be output in a data file, such as a FASTQ file, which is a text-based format for storing biological sequence and quality scores (see discussion above).
- Ion TorrentTM sequencing Another example of a DNA sequencing technique that may be used in the methods of the provided invention is Ion TorrentTM sequencing, offered by Life Technologies. See U.S. patent application numbers 2009/0026082, 2009/0127589, 2010/0035252, 2010/0137143, 2010/0188073, 2010/0197507, 2010/0282617, 2010/0300559, 2010/0300895, 2010/0301398, and 2010/0304982, the content of each of which is incorporated by reference herein in its entirety.
- Ion TorrentTM sequencing DNA is sheared into fragments of approximately 300-800 base pairs, and the fragments are blunt ended. Oligonucleotide adaptors are then ligated to the ends of the fragments.
- the adaptors serve as primers for amplification and sequencing of the fragments.
- the fragments can be attached to a surface and is attached at a resolution such that the fragments are individually resolvable. Addition of one or more nucleotides releases a proton (H + ), which signal detected and recorded in a sequencing instrument. The signal strength is proportional to the number of nucleotides incorporated. Ion Torrent data may also be output as a FASTQ file.
- 454TM sequencing is a sequencing-by-synthesis technology that utilizes also utilizes pyrosequencing. 454TM sequencing of DNA involves two steps. In the first step, DNA is sheared into fragments of approximately 300-800 base pairs, and the fragments are blunt ended. Oligonucleotide adaptors are then ligated to the ends of the fragments. The adaptors serve as primers for amplification and sequencing of the fragments.
- the fragments can be attached to DNA capture beads, e.g., streptavidin-coated beads using, e.g., Adaptor B, which contains 5′-biotin tag.
- the fragments attached to the beads are PCR amplified within droplets of an oil-water emulsion. The result is multiple copies of clonally amplified DNA fragments on each bead.
- the beads are captured in wells (pico-liter sized). Pyrosequencing is performed on each DNA fragment in parallel. Addition of one or more nucleotides generates a light signal that is recorded by a CCD camera in a sequencing instrument. The signal strength is proportional to the number of nucleotides incorporated.
- Pyrosequencing makes use of pyrophosphate (PPi) which is released upon nucleotide addition. PPi is converted to ATP by ATP sulfurylase in the presence of adenosine 5′ phosphosulfate. Luciferase uses ATP to convert luciferin to oxyluciferin, and this reaction generates light that is detected and analyzed.
- pyrosequencing is used to measure gene expression. Pyrosequecing of RNA applies similar to pyrosequencing of DNA, and is accomplished by attaching applications of partial rRNA gene sequencings to microscopic beads and then placing the attachments into individual wells. The attached partial rRNA sequence are then amplified in order to determine the gene expression profile. Sharon Marsh, Pyrosequencing® Protocols in Methods in Molecular Biology , Vol. 373, 15-23 (2007).
- SOLiDTM technology is a ligation based sequencing technology that may utilized to run massively parallel next generation sequencing of both DNA and RNA.
- genomic DNA is sheared into fragments, and adaptors are attached to the 5′ and 3′ ends of the fragments to generate a fragment library.
- internal adaptors can be introduced by ligating adaptors to the 5′ and 3′ ends of the fragments, circularizing the fragments, digesting the circularized fragment to generate an internal adaptor, and attaching adaptors to the 5′ and 3′ ends of the resulting fragments to generate a mate-paired library.
- clonal bead populations are prepared in microreactors containing beads, primers, template, and PCR components. Following PCR, the templates are denatured and beads are enriched to separate the beads with extended templates. Templates on the selected beads are subjected to a 3′ modification that permits bonding to a glass slide.
- the sequence can be determined by sequential hybridization and ligation of partially random oligonucleotides with a central determined base (or pair of bases) that is identified by a specific fluorophore. After a color is recorded, the ligated oligonucleotide is cleaved and removed and the process is then repeated.
- SOLiDTM Serial Analysis of Gene Expression is used to measure gene expression.
- Serial analysis of gene expression is a method that allows the simultaneous and quantitative analysis of a large number of gene transcripts, without the need of providing an individual hybridization probe for each transcript.
- a short sequence tag (about 10-14 bp) is generated that contains sufficient information to uniquely identify a transcript, provided that the tag is obtained from a unique position within each transcript.
- many transcripts are linked together to form long serial molecules, that can be sequenced, revealing the identity of the multiple tags simultaneously.
- the expression pattern of any population of transcripts can be quantitatively evaluated by determining the abundance of individual tags, and identifying the gene corresponding to each tag. For more details see, e.g. Velculescu et al., Science 270:484 487 (1995); and Velculescu et al., Cell 88:243 51 (1997, the contents of each of which are incorporated by reference herein in their entirety).
- tSMS Helicos True Single Molecule Sequencing
- a DNA sample is cleaved into strands of approximately 100 to 200 nucleotides, and a polyA sequence is added to the 3′ end of each DNA strand.
- Each strand is labeled by the addition of a fluorescently labeled adenosine nucleotide.
- the DNA strands are then hybridized to a flow cell, which contains millions of oligo-T capture sites that are immobilized to the flow cell surface.
- the templates can be at a density of about 100 million templates/cm 2 .
- the flow cell is then loaded into an instrument, e.g., HeliScopeTM sequencer, and a laser illuminates the surface of the flow cell, revealing the position of each template.
- a CCD camera can map the position of the templates on the flow cell surface.
- the template fluorescent label is then cleaved and washed away.
- the sequencing reaction begins by introducing a DNA polymerase and a fluorescently labeled nucleotide.
- the oligo-T nucleic acid serves as a primer.
- the polymerase incorporates the labeled nucleotides to the primer in a template directed manner. The polymerase and unincorporated nucleotides are removed.
- the templates that have directed incorporation of the fluorescently labeled nucleotide are detected by imaging the flow cell surface. After imaging, a cleavage step removes the fluorescent label, and the process is repeated with other fluorescently labeled nucleotides until the desired read length is achieved. Sequence information is collected with each nucleotide addition step. Further description of tSMS is shown for example in Lapidus et al. (U.S. Pat. No. 7,169,560), Lapidus et al. (U.S. patent application number 2009/0191565), Quake et al. (U.S. Pat. No. 6,818,395), Harris (U.S. Pat. No. 7,282,337), Quake et al. (U.S. patent application number 2002/0164629), and Braslaysky, et al., PNAS (USA), 100: 3960-3964 (2003), the contents of each of these references is incorporated by reference herein in its entirety.
- SMRT single molecule, real-time
- each of the four DNA bases is attached to one of four different fluorescent dyes. These dyes are phospholinked.
- a single DNA polymerase is immobilized with a single molecule of template single stranded DNA at the bottom of a zero-mode waveguide (ZMW).
- ZMW zero-mode waveguide
- a ZMW is a confinement structure which enables observation of incorporation of a single nucleotide by DNA polymerase against the background of fluorescent nucleotides that rapidly diffuse in an out of the ZMW (in microseconds).
- RNA polymerase is replaced with a with a reverse transcriptase in the ZMW, and the process is followed accordingly.
- a nanopore is a small hole, of the order of 1 nanometer in diameter. Immersion of a nanopore in a conducting fluid and application of a potential across it results in a slight electrical current due to conduction of ions through the nanopore. The amount of current which flows is sensitive to the size of the nanopore. As a DNA molecule passes through a nanopore, each nucleotide on the DNA molecule obstructs the nanopore to a different degree. Thus, the change in the current passing through the nanopore as the DNA molecule passes through the nanopore represents a reading of the DNA sequence.
- a sequencing technique that can be used in the methods of the provided invention involves using a chemical-sensitive field effect transistor (chemFET) array to sequence DNA (for example, as described in US Patent Application Publication No. 20090026082).
- chemFET chemical-sensitive field effect transistor
- DNA molecules can be placed into reaction chambers, and the template molecules can be hybridized to a sequencing primer bound to a polymerase.
- Incorporation of one or more triphosphates into a new nucleic acid strand at the 3′ end of the sequencing primer can be detected by a change in current by a chemFET.
- An array can have multiple chemFET sensors.
- single nucleic acids can be attached to beads, and the nucleic acids can be amplified on the bead, and the individual beads can be transferred to individual reaction chambers on a chemFET array, with each chamber having a chemFET sensor, and the nucleic acids can be sequenced.
- Another example of a sequencing technique that can be used in the methods of the provided invention involves using an electron microscope (Moudrianakis E. N. and Beer M. Proc Natl Acad Sci USA. 1965 March; 53:564-71).
- individual DNA molecules are labeled using metallic labels that are distinguishable using an electron microscope. These molecules are then stretched on a flat surface and imaged using an electron microscope to measure sequences.
- Additional detection methods can utilize binding to microarrays for subsequent fluorescent or non-fluorescent detection, barcode mass detection using a mass spectrometric methods, detection of emitted radiowaves, detection of scattered light from aligned barcodes, fluorescence detection using quantitative PCR or digital PCR methods.
- a comparative nucleic acid hybridization array is a technique for detecting copy number variations within the patient's sample DNA.
- the sample DNA and a reference DNA are differently labeled using distinct fluorophores, for example, and then hybridized to numerous probes. The fluorescent intensity of the sample and reference is then measured, and the fluorescent intensity ratio is then used to calculate copy number variations.
- Microarray detection may not produce a FASTQ file directly, however programs are available to convert the data produced by the microarray sequencers to a FASTQ, or similar, format.
- FISH fluorescent in situ hybridization
- In Situ Hybridization Protocols Ian Darby ed., 2000.
- FISH is a molecular cytogenetic technique that detects specific chromosomal rearrangements such as mutations in a DNA sequence and copy number variances.
- a DNA molecule is chemically denatured and separated into two strands.
- a single stranded probe is then incubated with a denatured strand of the DNA.
- the signals stranded probe is selected depending target sequence portion and has a high affinity to the complementary sequence portion.
- Probes may include a repetitive sequence probe, a whole chromosome probe, and locus-specific probes. While incubating, the combined probe and DNA strand are hybridized. The results are then visualized and quantified under a microscope in order to assess any variations.
- a MassARRAYTM-based gene expression profiling method is used to measure gene expression.
- the MassARRAYTM-based gene expression profiling method developed by Sequenom, Inc. (San Diego, Calif.) following the isolation of RNA and reverse transcription, the obtained cDNA is spiked with a synthetic DNA molecule (competitor), which matches the targeted cDNA region in all positions, except a single base, and serves as an internal standard.
- the cDNA/competitor mixture is PCR amplified and is subjected to a post-PCR shrimp alkaline phosphatase (SAP) enzyme treatment, which results in the dephosphorylation of the remaining nucleotides.
- SAP post-PCR shrimp alkaline phosphatase
- the PCR products from the competitor and cDNA are subjected to primer extension, which generates distinct mass signals for the competitor- and cDNA-derives PCR products. After purification, these products are dispensed on a chip array, which is pre-loaded with components needed for analysis with matrix-assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF MS) analysis.
- MALDI-TOF MS matrix-assisted laser desorption ionization time-of-flight mass spectrometry
- the cDNA present in the reaction is then quantified by analyzing the ratios of the peak areas in the mass spectrum generated. For further details see, e.g. Ding and Cantor, Proc. Natl. Acad. Sci. USA 100:3059 3064 (2003).
- PCR-based techniques include, for example, differential display (Liang and Pardee, Science 257:967 971 (1992)); amplified fragment length polymorphism (iAFLP) (Kawamoto et al., Genome Res. 12:1305 1312 (1999)); BeadArrayTM technology (Illumina, San Diego, Calif.; Oliphant et al., Discovery of Markers for Disease (Supplement to Biotechniques), June 2002; Ferguson et al., Analytical Chemistry 72:5618 (2000)); Beads Array for Detection of Gene Expression (BADGE), using the commercially available Luminex100 LabMAP system and multiple color-coded microspheres (Luminex Corp., Austin, Tex.) in a rapid assay for gene expression (Yang et al., Genome Res.
- iAFLP amplified fragment length polymorphism
- BeadArrayTM technology Illumina, San Diego, Calif.; Oliphant et al., Discovery of Mark
- variances in gene expression can also be identified, or confirmed using a microarray techniques, including nylon membrane arrays, microchip arrays and glass slide arrays, e.g., such as available commercially from Affymetrix (Santa Clara, Calif.).
- a microarray technique including nylon membrane arrays, microchip arrays and glass slide arrays, e.g., such as available commercially from Affymetrix (Santa Clara, Calif.).
- RNA samples are isolated and converted into labeled cDNA via reverse transcription.
- the labeled cDNA is then hybridized onto either a nylon membrane, microchip, or a glass slide with specific DNA probes from cells or tissues of interest.
- the hybridized cDNA is then detected and quantified, and the resulting gene expression data may be compared to controls for analysis.
- the methods of labeling, hybridization, and detection vary depending on whether the microarray support is a nylon membrane, microchip, or glass slide.
- Nylon membrane arrays are typically hybridized with P-dNTP labeled probes.
- Glass slide arrays typically involve labeling with two distinct fluorescently labeled nucleotides.
- Methods for making microarrays and determining gene product expression are shown in Yeatman et al. (U.S. patent application number 2006/0195269), the content of which is incorporated by reference herein in its entirety.
- mass spectrometry (MS) analysis can be used alone or in combination with other methods (e.g., immunoassays or RNA measuring assays) to determine the presence and/or quantity of the one or more biomarkers disclosed herein in a biological sample.
- the MS analysis includes matrix-assisted laser desorption/ionization (MALDI) time-of-flight (TOF) MS analysis, such as for example direct-spot MALDI-TOF or liquid chromatography MALDI-TOF mass spectrometry analysis.
- the MS analysis comprises electrospray ionization (ESI) MS, such as for example liquid chromatography (LC) ESI-MS.
- ESI electrospray ionization
- Mass analysis can be accomplished using commercially-available spectrometers.
- Methods for utilizing MS analysis including MALDI-TOF MS and ESI-MS, to detect the presence and quantity of biomarker peptides in biological samples are known in the art. See for example U.S. Pat. Nos. 6,925,389; 6,989,100; and 6,890,763 for further guidance, each of which is incorporated by reference herein in their entirety.
- Protein sequences for use with the methods, sequence constructs, and systems of the invention can be determined using a number of techniques known to those skilled in the relevant art.
- amino acid sequences and amino acid sequence reads may be produced by analyzing a protein or a portion of a protein with mass spectrometry or using Edman degradation.
- Mass spectrometry may include, for example, matrix-assisted laser desorption/ionization (MALDI) time-of-flight (TOF) MS analysis, such as for example direct-spot MALDI-TOF or liquid chromatography MALDI-TOF mass spectrometry analysis, electrospray ionization (ESI) MS, such as for example liquid chromatography (LC) ESI-MS, or other techniques such as MS-MS.
- MALDI matrix-assisted laser desorption/ionization
- TOF time-of-flight
- ESI electrospray ionization
- MS-MS liquid chromatography
- Edman degradation analysis may be performed using commercial instruments such as the Model 49X Procise protein/peptide sequencer (Applied Biosystems/Life Technologies).
- the sequenced amino acid sequences i.e., polypeptides, i.e., proteins, may be at least 10 amino acids in length, e.g., at least 20 amino acids in length, e.g., at least 50 amino acids in length.
Abstract
Description
(S1) | |
(SEQ ID NO. 17) | |
AGCTA-CGTACACTACC | |
(S2) | |
(SEQ ID NO. 18) | |
AGCTATCGTAC--TAGC |
H k0 =H 0l=0(for 0≤k≤n and 0≤l≤m)H ij=max{H i-1,j-1 +s(a i ,b j),H i-1,j −W in ,H i,j-1 −W del,0}(for 1≤i≤n and 1≤j≤m) (1)
In the equations above, s(ai,bj) represents either a match bonus (when ai=bj) or a mismatch penalty (when ai≠bj), and insertions and deletions are given the penalties Win and Wdel, respectively. In most instance, the resulting matrix has many elements that are zero. This representation makes it easier to backtrace from high-to-low, right-to-left in the matrix, thus identifying the alignment.
B[j,k]=max(p[j,k],i[j,k],d[j,k],0)(for 0<j≤m,0<k≤n) (2)
The arguments of the maximum function, B[j,k], are outlined in equations (3)-(5) below, wherein MISMATCH_PENALTY, MATCH_BONUS, INSERTION_PENALTY, DELETION_PENALTY, and OPENING_PENALTY are all constants, and all negative except for MATCH_BONUS. The match argument, p[j,k], is given by equation (3), below:
the insertion argument i[j,k], is given by equation (4), below:
i[j,k]=max(p[j−1,k]+OPENING_PENALTY,i[j−1,k],d[j−1,k]+OPENING_PENALTY)+INSERTION_PENALTY (4)
and the deletion argument d[j,k], is given by equation (5), below:
d[j,k]=max(p[j,k−1]+OPENING_PENALTY,i[j,k−1]+OPENING_PENALTY,d[j,k−1])+DELETION_PENALTY (5)
For all three arguments, the [0,0] element is set to zero to assure that the backtrack goes to completion, i.e., p[0,0]=i[0,0]=d[0,0]=0.
-
- (i) If d is not the first letter of the sequence of its node, the letter preceding d in its node is its (only) predecessor;
- (ii) If d is the first letter of the sequence of its node, the last letter of the sequence of any node that is a parent of d's node is a predecessor of d.
Claims (20)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/517,419 US10832797B2 (en) | 2013-10-18 | 2014-10-17 | Method and system for quantifying sequence alignment |
US17/087,385 US20210280272A1 (en) | 2013-10-18 | 2020-11-02 | Methods and systems for quantifying sequence alignment |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201361892666P | 2013-10-18 | 2013-10-18 | |
US14/517,419 US10832797B2 (en) | 2013-10-18 | 2014-10-17 | Method and system for quantifying sequence alignment |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/087,385 Continuation US20210280272A1 (en) | 2013-10-18 | 2020-11-02 | Methods and systems for quantifying sequence alignment |
Publications (2)
Publication Number | Publication Date |
---|---|
US20150199473A1 US20150199473A1 (en) | 2015-07-16 |
US10832797B2 true US10832797B2 (en) | 2020-11-10 |
Family
ID=52828743
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/517,419 Active 2036-10-23 US10832797B2 (en) | 2013-10-18 | 2014-10-17 | Method and system for quantifying sequence alignment |
US17/087,385 Pending US20210280272A1 (en) | 2013-10-18 | 2020-11-02 | Methods and systems for quantifying sequence alignment |
Family Applications After (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US17/087,385 Pending US20210280272A1 (en) | 2013-10-18 | 2020-11-02 | Methods and systems for quantifying sequence alignment |
Country Status (2)
Country | Link |
---|---|
US (2) | US10832797B2 (en) |
WO (1) | WO2015058095A1 (en) |
Families Citing this family (35)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9116866B2 (en) | 2013-08-21 | 2015-08-25 | Seven Bridges Genomics Inc. | Methods and systems for detecting sequence variants |
US9898575B2 (en) | 2013-08-21 | 2018-02-20 | Seven Bridges Genomics Inc. | Methods and systems for aligning sequences |
WO2015058120A1 (en) | 2013-10-18 | 2015-04-23 | Seven Bridges Genomics Inc. | Methods and systems for aligning sequences in the presence of repeating elements |
US10832797B2 (en) | 2013-10-18 | 2020-11-10 | Seven Bridges Genomics Inc. | Method and system for quantifying sequence alignment |
KR20160062763A (en) | 2013-10-18 | 2016-06-02 | 세븐 브릿지스 지노믹스 인크. | Methods and systems for genotyping genetic samples |
AU2014337093B2 (en) | 2013-10-18 | 2020-07-30 | Seven Bridges Genomics Inc. | Methods and systems for identifying disease-induced mutations |
US9092402B2 (en) | 2013-10-21 | 2015-07-28 | Seven Bridges Genomics Inc. | Systems and methods for using paired-end data in directed acyclic structure |
EP3092317B1 (en) | 2014-01-10 | 2021-04-21 | Seven Bridges Genomics Inc. | Systems and methods for use of known alleles in read mapping |
US9817944B2 (en) | 2014-02-11 | 2017-11-14 | Seven Bridges Genomics Inc. | Systems and methods for analyzing sequence data |
WO2016141294A1 (en) | 2015-03-05 | 2016-09-09 | Seven Bridges Genomics Inc. | Systems and methods for genomic pattern analysis |
US10395759B2 (en) | 2015-05-18 | 2019-08-27 | Regeneron Pharmaceuticals, Inc. | Methods and systems for copy number variant detection |
US10229519B2 (en) | 2015-05-22 | 2019-03-12 | The University Of British Columbia | Methods for the graphical representation of genomic sequence data |
US20160364523A1 (en) * | 2015-06-11 | 2016-12-15 | Seven Bridges Genomics Inc. | Systems and methods for identifying microorganisms |
US10793895B2 (en) | 2015-08-24 | 2020-10-06 | Seven Bridges Genomics Inc. | Systems and methods for epigenetic analysis |
US10584380B2 (en) | 2015-09-01 | 2020-03-10 | Seven Bridges Genomics Inc. | Systems and methods for mitochondrial analysis |
US10724110B2 (en) | 2015-09-01 | 2020-07-28 | Seven Bridges Genomics Inc. | Systems and methods for analyzing viral nucleic acids |
US11347704B2 (en) | 2015-10-16 | 2022-05-31 | Seven Bridges Genomics Inc. | Biological graph or sequence serialization |
US9811391B1 (en) * | 2016-03-04 | 2017-11-07 | Color Genomics, Inc. | Load balancing and conflict processing in workflow with task dependencies |
US10853130B1 (en) | 2015-12-02 | 2020-12-01 | Color Genomics, Inc. | Load balancing and conflict processing in workflow with task dependencies |
US20170199960A1 (en) | 2016-01-07 | 2017-07-13 | Seven Bridges Genomics Inc. | Systems and methods for adaptive local alignment for graph genomes |
US10364468B2 (en) | 2016-01-13 | 2019-07-30 | Seven Bridges Genomics Inc. | Systems and methods for analyzing circulating tumor DNA |
US10460829B2 (en) | 2016-01-26 | 2019-10-29 | Seven Bridges Genomics Inc. | Systems and methods for encoding genetic variation for a population |
US10262102B2 (en) | 2016-02-24 | 2019-04-16 | Seven Bridges Genomics Inc. | Systems and methods for genotyping with graph reference |
US10790044B2 (en) | 2016-05-19 | 2020-09-29 | Seven Bridges Genomics Inc. | Systems and methods for sequence encoding, storage, and compression |
US10600499B2 (en) | 2016-07-13 | 2020-03-24 | Seven Bridges Genomics Inc. | Systems and methods for reconciling variants in sequence data relative to reference sequence data |
US11289177B2 (en) | 2016-08-08 | 2022-03-29 | Seven Bridges Genomics, Inc. | Computer method and system of identifying genomic mutations using graph-based local assembly |
US11250931B2 (en) | 2016-09-01 | 2022-02-15 | Seven Bridges Genomics Inc. | Systems and methods for detecting recombination |
US10241970B2 (en) | 2016-11-14 | 2019-03-26 | Microsoft Technology Licensing, Llc | Reduced memory nucleotide sequence comparison |
US10319465B2 (en) | 2016-11-16 | 2019-06-11 | Seven Bridges Genomics Inc. | Systems and methods for aligning sequences to graph references |
US11347844B2 (en) | 2017-03-01 | 2022-05-31 | Seven Bridges Genomics, Inc. | Data security in bioinformatic sequence analysis |
US10726110B2 (en) | 2017-03-01 | 2020-07-28 | Seven Bridges Genomics, Inc. | Watermarking for data security in bioinformatic sequence analysis |
CN107895104B (en) * | 2017-11-13 | 2020-07-07 | 深圳华大基因科技服务有限公司 | Method and device for evaluating and verifying sequence assembly result of third-generation sequencing |
CN110021365B (en) * | 2018-06-22 | 2021-01-22 | 深圳市达仁基因科技有限公司 | Method, device, computer equipment and storage medium for determining detection target point |
US11726757B2 (en) * | 2019-08-14 | 2023-08-15 | Nvidia Corporation | Processor for performing dynamic programming according to an instruction, and a method for configuring a processor for dynamic programming via an instruction |
WO2023170091A1 (en) | 2022-03-09 | 2023-09-14 | Politecnico Di Milano | Methods for the alignment of sequence reads to non-acyclic genome graphs on heterogeneous computing systems |
Citations (145)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US4683202A (en) | 1985-03-28 | 1987-07-28 | Cetus Corporation | Process for amplifying nucleic acid sequences |
US4683195A (en) | 1986-01-30 | 1987-07-28 | Cetus Corporation | Process for amplifying, detecting, and/or-cloning nucleic acid sequences |
US4988617A (en) | 1988-03-25 | 1991-01-29 | California Institute Of Technology | Method of detecting a nucleotide change in nucleic acids |
US5234809A (en) | 1989-03-23 | 1993-08-10 | Akzo N.V. | Process for isolating nucleic acid |
US5242794A (en) | 1984-12-13 | 1993-09-07 | Applied Biosystems, Inc. | Detection of specific sequences in nucleic acids |
US5494810A (en) | 1990-05-03 | 1996-02-27 | Cornell Research Foundation, Inc. | Thermostable ligase-mediated DNA amplifications system for the detection of genetic disease |
US5511158A (en) | 1994-08-04 | 1996-04-23 | Thinking Machines Corporation | System and method for creating and evolving directed graphs |
US5583024A (en) | 1985-12-02 | 1996-12-10 | The Regents Of The University Of California | Recombinant expression of Coleoptera luciferase |
US5701256A (en) | 1995-05-31 | 1997-12-23 | Cold Spring Harbor Laboratory | Method and apparatus for biological sequence comparison |
US6054278A (en) | 1997-05-05 | 2000-04-25 | The Perkin-Elmer Corporation | Ribosomal RNA gene polymorphism based microorganism identification |
US6210891B1 (en) | 1996-09-27 | 2001-04-03 | Pyrosequencing Ab | Method of sequencing DNA |
US6223128B1 (en) | 1998-06-29 | 2001-04-24 | Dnstar, Inc. | DNA sequence assembly system |
US6306597B1 (en) | 1995-04-17 | 2001-10-23 | Lynx Therapeutics, Inc. | DNA sequencing by parallel oligonucleotide extensions |
US20020164629A1 (en) | 2001-03-12 | 2002-11-07 | California Institute Of Technology | Methods and apparatus for analyzing polynucleotide sequences by asynchronous base extension |
US20020190663A1 (en) | 2000-07-17 | 2002-12-19 | Rasmussen Robert T. | Method and apparatuses for providing uniform electron beams from field emission displays |
US6582938B1 (en) | 2001-05-11 | 2003-06-24 | Affymetrix, Inc. | Amplification of nucleic acids |
US20040023209A1 (en) | 2001-11-28 | 2004-02-05 | Jon Jonasson | Method for identifying microorganisms based on sequencing gene fragments |
US6818395B1 (en) | 1999-06-28 | 2004-11-16 | California Institute Of Technology | Methods and apparatus for analyzing polynucleotide sequences |
US6828100B1 (en) | 1999-01-22 | 2004-12-07 | Biotage Ab | Method of DNA sequencing |
US6833246B2 (en) | 1999-09-29 | 2004-12-21 | Solexa, Ltd. | Polynucleotide sequencing |
US20050089906A1 (en) | 2003-09-19 | 2005-04-28 | Nec Corporation Et Al. | Haplotype estimation method |
US6890763B2 (en) | 2001-04-30 | 2005-05-10 | Syn X Pharma, Inc. | Biopolymer marker indicative of disease state having a molecular weight of 1350 daltons |
US6925389B2 (en) | 2000-07-18 | 2005-08-02 | Correlogic Systems, Inc., | Process for discriminating between biological states based on hidden patterns from biological data |
US6989100B2 (en) | 2002-05-09 | 2006-01-24 | Ppd Biomarker Discovery Sciences, Llc | Methods for time-alignment of liquid chromatography-mass spectrometry data |
US20060024681A1 (en) | 2003-10-31 | 2006-02-02 | Agencourt Bioscience Corporation | Methods for producing a paired tag from a nucleic acid sequence and methods of use thereof |
US20060195269A1 (en) | 2004-02-25 | 2006-08-31 | Yeatman Timothy J | Methods and systems for predicting cancer outcome |
US20060292611A1 (en) | 2005-06-06 | 2006-12-28 | Jan Berka | Paired end sequencing |
US7169560B2 (en) | 2003-11-12 | 2007-01-30 | Helicos Biosciences Corporation | Short cycle methods for sequencing polynucleotides |
US20070114362A1 (en) | 2005-11-23 | 2007-05-24 | Illumina, Inc. | Confocal imaging methods and apparatus |
US7232656B2 (en) | 1998-07-30 | 2007-06-19 | Solexa Ltd. | Arrayed biomolecules and their use in sequencing |
US20070166707A1 (en) | 2002-12-27 | 2007-07-19 | Rosetta Inpharmatics Llc | Computer systems and methods for associating genes with traits using cross species data |
WO2007086935A2 (en) | 2005-08-01 | 2007-08-02 | 454 Life Sciences Corporation | Methods of amplifying and sequencing nucleic acids |
US7282337B1 (en) | 2006-04-14 | 2007-10-16 | Helicos Biosciences Corporation | Methods for increasing accuracy of nucleic acid sequencing |
US20080003571A1 (en) | 2005-02-01 | 2008-01-03 | Mckernan Kevin | Reagents, methods, and libraries for bead-based sequencing |
US7321623B2 (en) | 2002-10-01 | 2008-01-22 | Avocent Corporation | Video compression system |
US20080077607A1 (en) | 2004-11-08 | 2008-03-27 | Seirad Inc. | Methods and Systems for Compressing and Comparing Genomic Data |
US20080251711A1 (en) | 2004-09-30 | 2008-10-16 | U.S. Department Of Energy | Ultra High Mass Range Mass Spectrometer Systems |
US20080281463A1 (en) | 2006-01-18 | 2008-11-13 | Suh Suk Hwan | Method of Non-Linear Process Planning and Internet-Based Step-Nc System Using the Same |
US20080294403A1 (en) | 2004-04-30 | 2008-11-27 | Jun Zhu | Systems and Methods for Reconstructing Gene Networks in Segregating Populations |
US7483585B2 (en) | 2004-12-01 | 2009-01-27 | Ati Technologies Ulc | Image compression using variable bit size run length encoding |
US20090026082A1 (en) | 2006-12-14 | 2009-01-29 | Ion Torrent Systems Incorporated | Methods and apparatus for measuring analytes using large scale FET arrays |
US20090119313A1 (en) | 2007-11-02 | 2009-05-07 | Ioactive Inc. | Determining structure of binary data using alignment algorithms |
US20090127589A1 (en) | 2006-12-14 | 2009-05-21 | Ion Torrent Systems Incorporated | Methods and apparatus for measuring analytes using large scale FET arrays |
US20090164135A1 (en) | 2007-12-21 | 2009-06-25 | Brodzik Andrzej K | Quaternionic algebra approach to dna and rna tandem repeat detection |
US7577554B2 (en) | 2001-07-03 | 2009-08-18 | I2 Technologies Us, Inc. | Workflow modeling using an acyclic directed graph data structure |
US7580918B2 (en) | 2006-03-03 | 2009-08-25 | Adobe Systems Incorporated | System and method of efficiently representing and searching directed acyclic graph structures in databases |
US20090233809A1 (en) | 2008-03-04 | 2009-09-17 | Affymetrix, Inc. | Resequencing methods for identification of sequence variants |
US7598035B2 (en) | 1998-02-23 | 2009-10-06 | Solexa, Inc. | Method and compositions for ordering restriction fragments |
US7620800B2 (en) | 2002-10-31 | 2009-11-17 | Src Computers, Inc. | Multi-adaptive processing systems and techniques for enhancing parallelism and performance of computational functions |
US20090300781A1 (en) | 2006-03-31 | 2009-12-03 | Ian Bancroft | Prediction of heterosis and other traits by transcriptome analysis |
US20090318310A1 (en) | 2008-04-21 | 2009-12-24 | Softgenetics Llc | DNA Sequence Assembly Methods of Short Reads |
US20090325145A1 (en) | 2006-10-20 | 2009-12-31 | Erwin Sablon | Methodology for analysis of sequence variations within the hcv ns5b genomic region |
US20100010992A1 (en) | 2008-07-10 | 2010-01-14 | Morris Robert P | Methods And Systems For Resolving A Location Information To A Network Identifier |
WO2010010992A1 (en) | 2008-07-25 | 2010-01-28 | Korea Research Institute Of Bioscience And Biotechnology | Bio information analysis process auto design system and thereof |
US20100035252A1 (en) | 2008-08-08 | 2010-02-11 | Ion Torrent Systems Incorporated | Methods for sequencing individual nucleic acids under tension |
US20100041048A1 (en) | 2008-07-31 | 2010-02-18 | The Johns Hopkins University | Circulating Mutant DNA to Assess Tumor Dynamics |
US20100137143A1 (en) | 2008-10-22 | 2010-06-03 | Ion Torrent Systems Incorporated | Methods and apparatus for measuring analytes |
US20100169026A1 (en) * | 2008-11-20 | 2010-07-01 | Pacific Biosciences Of California, Inc. | Algorithms for sequence determination |
US7776616B2 (en) | 1997-09-17 | 2010-08-17 | Qiagen North American Holdings, Inc. | Apparatuses and methods for isolating nucleic acid |
US20100240046A1 (en) | 2009-03-20 | 2010-09-23 | Siemens Corporation | Methods and Systems for Identifying PCR Primers Specific to One or More Target Genomes |
US7809509B2 (en) | 2001-05-08 | 2010-10-05 | Ip Genesis, Inc. | Comparative mapping and assembly of nucleic acid sequences |
US20100285578A1 (en) | 2009-02-03 | 2010-11-11 | Network Biosystems, Inc. | Nucleic Acid Purification |
US20100282617A1 (en) | 2006-12-14 | 2010-11-11 | Ion Torrent Systems Incorporated | Methods and apparatus for detecting molecular interactions using fet arrays |
US7835871B2 (en) | 2007-01-26 | 2010-11-16 | Illumina, Inc. | Nucleic acid sequencing system and method |
US20100300559A1 (en) | 2008-10-22 | 2010-12-02 | Ion Torrent Systems, Inc. | Fluidics system for sequential delivery of reagents |
US20100300895A1 (en) | 2009-05-29 | 2010-12-02 | Ion Torrent Systems, Inc. | Apparatus and methods for performing electrochemical reactions |
US20100301398A1 (en) | 2009-05-29 | 2010-12-02 | Ion Torrent Systems Incorporated | Methods and apparatus for measuring analytes |
US20100304982A1 (en) | 2009-05-29 | 2010-12-02 | Ion Torrent Systems, Inc. | Scaffolded nucleic acid polymer particles and methods of making and using |
US20110004413A1 (en) | 2009-04-29 | 2011-01-06 | Complete Genomics, Inc. | Method and system for calling variations in a sample polynucleotide sequence with respect to a reference polynucleotide sequence |
US7885840B2 (en) | 2003-01-07 | 2011-02-08 | Sap Aktiengesellschaft | System and method of flexible workflow management |
US7917302B2 (en) | 2000-09-28 | 2011-03-29 | Torbjorn Rognes | Determination of optimal local sequence alignment similarity score |
US20110098193A1 (en) | 2009-10-22 | 2011-04-28 | Kingsmore Stephen F | Methods and Systems for Medical Sequencing Analysis |
US20110096193A1 (en) | 2009-10-26 | 2011-04-28 | Kabushiki Kaisha Toshiba | Solid-state imaging device |
US7957913B2 (en) | 2006-05-03 | 2011-06-07 | Population Diagnostics, Inc. | Evaluating genetic disorders |
US7960120B2 (en) | 2006-10-06 | 2011-06-14 | Illumina Cambridge Ltd. | Method for pair-wise sequencing a plurality of double stranded target polynucleotides |
US20110207135A1 (en) | 2008-11-07 | 2011-08-25 | Sequenta, Inc. | Methods of monitoring conditions by sequence analysis |
US20110257889A1 (en) | 2010-02-24 | 2011-10-20 | Pacific Biosciences Of California, Inc. | Sequence assembly and consensus sequence determination |
WO2011139797A2 (en) | 2010-04-27 | 2011-11-10 | Spiral Genetics Inc. | Method and system for analysis and error correction of biological sequences and inference of relationship for multiple samples |
US20120030566A1 (en) | 2010-07-28 | 2012-02-02 | Victor B Michael | System with touch-based selection of data items |
US20120040851A1 (en) | 2008-09-19 | 2012-02-16 | Immune Disease Institute, Inc. | miRNA TARGETS |
US20120041727A1 (en) | 2008-12-24 | 2012-02-16 | New York University | Method, computer-accessible medium and systems for score-driven whole-genome shotgun sequence assemble |
US20120045771A1 (en) | 2008-12-11 | 2012-02-23 | Febit Holding Gmbh | Method for analysis of nucleic acid populations |
US8146099B2 (en) | 2007-09-27 | 2012-03-27 | Microsoft Corporation | Service-oriented pipeline based architecture |
US8165821B2 (en) | 2007-02-05 | 2012-04-24 | Applied Biosystems, Llc | System and methods for indel identification using short read sequencing |
US20120157322A1 (en) | 2010-09-24 | 2012-06-21 | Samuel Myllykangas | Direct Capture, Amplification and Sequencing of Target DNA Using Immobilized Primers |
US8209130B1 (en) | 2012-04-04 | 2012-06-26 | Good Start Genetics, Inc. | Sequence assembly |
WO2012096579A2 (en) | 2011-01-14 | 2012-07-19 | Keygene N.V. | Paired end random sequence based genotyping |
WO2012098515A1 (en) | 2011-01-19 | 2012-07-26 | Koninklijke Philips Electronics N.V. | Method for processing genomic data |
US20120239706A1 (en) | 2011-03-18 | 2012-09-20 | Los Alamos National Security, Llc | Computer-facilitated parallel information alignment and analysis |
WO2012142531A2 (en) | 2011-04-14 | 2012-10-18 | Complete Genomics, Inc. | Processing and analysis of complex nucleic acid sequence data |
US20120330566A1 (en) | 2010-02-24 | 2012-12-27 | Pacific Biosciences Of California, Inc. | Sequence assembly and consensus sequence determination |
US20130029879A1 (en) | 2011-07-29 | 2013-01-31 | Ginkgo Bioworks | Methods and Systems for Cell State Quantification |
US20130035904A1 (en) | 2010-01-14 | 2013-02-07 | Daniel Kuhn | 3d plant modeling systems and methods |
US20130059738A1 (en) | 2011-04-28 | 2013-03-07 | Life Technologies Corporation | Methods and compositions for multiplex pcr |
WO2013035904A1 (en) | 2011-09-08 | 2013-03-14 | 한국과학기술정보연구원 | System and method for processing bio information analysis pipeline |
US20130073214A1 (en) | 2011-09-20 | 2013-03-21 | Life Technologies Corporation | Systems and methods for identifying sequence variation |
US20130124100A1 (en) | 2009-06-15 | 2013-05-16 | Complete Genomics, Inc. | Processing and Analysis of Complex Nucleic Acid Sequence Data |
KR101282798B1 (en) | 2011-09-08 | 2013-07-04 | 한국과학기술정보연구원 | System and method for processing bio information analysis pipeline |
WO2013106737A1 (en) | 2012-01-13 | 2013-07-18 | Data2Bio | Genotyping by next-generation sequencing |
US20130232480A1 (en) | 2012-03-02 | 2013-09-05 | Vmware, Inc. | Single, logical, multi-tier application blueprint used for deployment and management of multiple physical applications in a cloud environment |
US20130289099A1 (en) | 2010-12-17 | 2013-10-31 | Universite Pierre Et Marie Curie (Paris 6) | Abcg1 gene as a marker and a target gene for treating obesity |
US20130311106A1 (en) | 2012-03-16 | 2013-11-21 | The Research Institute At Nationwide Children's Hospital | Comprehensive Analysis Pipeline for Discovery of Human Genetic Variation |
US20130332081A1 (en) | 2010-09-09 | 2013-12-12 | Omicia Inc | Variant annotation, analysis and selection tool |
WO2013184643A1 (en) | 2012-06-04 | 2013-12-12 | Good Start Genetics, Inc. | Determining the clinical significance of variant sequences |
US20130345066A1 (en) | 2012-05-09 | 2013-12-26 | Life Technologies Corporation | Systems and methods for identifying sequence variation |
US20140012866A1 (en) | 2012-07-03 | 2014-01-09 | International Business Machines Corporation | Using annotators in genome research |
US20140025312A1 (en) | 2012-07-13 | 2014-01-23 | Pacific Biosciences Of California, Inc. | Hierarchical genome assembly method using single long insert library |
US8639847B2 (en) | 2003-03-18 | 2014-01-28 | Microsoft Corporation | Systems and methods for scheduling data flow execution based on an arbitrary graph describing the desired data flow |
US20140066317A1 (en) | 2012-09-04 | 2014-03-06 | Guardant Health, Inc. | Systems and methods to detect rare mutations and copy number variation |
US20140129201A1 (en) | 2012-11-07 | 2014-05-08 | Good Start Genetics, Inc. | Validation of genetic tests |
US20140136120A1 (en) | 2007-11-21 | 2014-05-15 | Cosmosid Inc. | Direct identification and measurement of relative populations of microorganisms with direct dna sequencing and probabilistic methods |
US20140200147A1 (en) | 2013-01-17 | 2014-07-17 | Personalis, Inc. | Methods and Systems for Genetic Analysis |
US20140280360A1 (en) | 2013-03-15 | 2014-09-18 | James Webber | Graph database devices and methods for partitioning graphs |
US20140278590A1 (en) | 2013-03-13 | 2014-09-18 | Airline Tariff Publishing Company | System, method and computer program product for providing a fare analytic engine |
US20140281708A1 (en) | 2013-03-14 | 2014-09-18 | International Business Machines Corporation | Generating fault tolerant connectivity api |
US20140323320A1 (en) | 2011-12-31 | 2014-10-30 | Bgi Tech Solutions Co., Ltd. | Method of detecting fused transcripts and system thereof |
US20150020061A1 (en) | 2013-07-11 | 2015-01-15 | Oracle International Corporation | Forming an upgrade recommendation in a cloud computing environment |
US20150057946A1 (en) | 2013-08-21 | 2015-02-26 | Seven Bridges Genomics Inc. | Methods and systems for aligning sequences |
US20150056613A1 (en) | 2013-08-21 | 2015-02-26 | Seven Bridges Genomics Inc. | Methods and systems for detecting sequence variants |
US8972201B2 (en) | 2011-12-24 | 2015-03-03 | Tata Consultancy Services Limited | Compression of genomic data file |
US20150066383A1 (en) | 2013-09-03 | 2015-03-05 | Seven Bridges Genomics Inc. | Collapsible modular genomic pipeline |
US20150094212A1 (en) | 2013-10-01 | 2015-04-02 | Life Technologies Corporation | Systems and Methods for Detecting Structural Variants |
WO2015048753A1 (en) | 2013-09-30 | 2015-04-02 | Seven Bridges Genomics Inc. | Methods and system for detecting sequence variants |
WO2015058097A1 (en) | 2013-10-18 | 2015-04-23 | Seven Bridges Genomics Inc. | Methods and systems for identifying disease-induced mutations |
US20150112602A1 (en) | 2013-10-21 | 2015-04-23 | Seven Bridges Genomics Inc. | Systems and methods for using paired-end data in directed acyclic structure |
US20150110754A1 (en) | 2013-10-15 | 2015-04-23 | Regeneron Pharmaceuticals, Inc. | High Resolution Allele Identification |
WO2015058093A1 (en) | 2013-10-18 | 2015-04-23 | Seven Bridges Genomics Inc. | Methods and systems for genotyping genetic samples |
WO2015058095A1 (en) | 2013-10-18 | 2015-04-23 | Seven Bridges Genomics Inc. | Methods and systems for quantifying sequence alignment |
WO2015058120A1 (en) | 2013-10-18 | 2015-04-23 | Seven Bridges Genomics Inc. | Methods and systems for aligning sequences in the presence of repeating elements |
WO2015105963A1 (en) | 2014-01-10 | 2015-07-16 | Seven Bridges Genomics Inc. | Systems and methods for use of known alleles in read mapping |
US20150227685A1 (en) | 2014-02-11 | 2015-08-13 | Seven Bridges Genomics Inc. | Systems and methods for analyzing sequence data |
US20150293994A1 (en) | 2012-11-06 | 2015-10-15 | Hewlett-Packard Development Company, L.P. | Enhanced graph traversal |
US20150344970A1 (en) | 2010-02-18 | 2015-12-03 | The Johns Hopkins University | Personalized Tumor Biomarkers |
US20150356147A1 (en) | 2013-01-24 | 2015-12-10 | New York University | Systems, methods and computer-accessible mediums for utilizing pattern matching in stringomes |
US20160259880A1 (en) | 2015-03-05 | 2016-09-08 | Seven Bridges Genomics Inc. | Systems and methods for genomic pattern analysis |
US20160364523A1 (en) | 2015-06-11 | 2016-12-15 | Seven Bridges Genomics Inc. | Systems and methods for identifying microorganisms |
US20170058341A1 (en) | 2015-09-01 | 2017-03-02 | Seven Bridges Genomics Inc. | Systems and methods for mitochondrial analysis |
US20170058320A1 (en) | 2015-08-24 | 2017-03-02 | Seven Bridges Genomics Inc. | Systems and methods for epigenetic analysis |
US20170058365A1 (en) | 2015-09-01 | 2017-03-02 | Seven Bridges Genomics Inc. | Systems and methods for analyzing viral nucleic acids |
WO2017066753A1 (en) | 2015-10-16 | 2017-04-20 | Seven Bridges Genomics Inc. | Biological graph or sequence serialization |
US20170193351A1 (en) | 2015-12-30 | 2017-07-06 | Micron Technology, Inc. | Methods and systems for vector length management |
WO2017120128A1 (en) | 2016-01-07 | 2017-07-13 | Seven Bridges Genomics Inc. | Systems and methods for adaptive local alignment for graph genomes |
US20170199959A1 (en) | 2016-01-13 | 2017-07-13 | Seven Bridges Genomics Inc. | Genetic analysis systems and methods |
WO2017123864A1 (en) | 2016-01-13 | 2017-07-20 | Seven Bridges Genomics Inc. | Systems and methods for analyzing circulating tumor dna |
US20170242958A1 (en) | 2016-02-24 | 2017-08-24 | Seven Bridges Genomics Inc. | Systems and methods for genotyping with graph reference |
-
2014
- 2014-10-17 US US14/517,419 patent/US10832797B2/en active Active
- 2014-10-17 WO PCT/US2014/061158 patent/WO2015058095A1/en active Application Filing
-
2020
- 2020-11-02 US US17/087,385 patent/US20210280272A1/en active Pending
Patent Citations (182)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5242794A (en) | 1984-12-13 | 1993-09-07 | Applied Biosystems, Inc. | Detection of specific sequences in nucleic acids |
US4683202B1 (en) | 1985-03-28 | 1990-11-27 | Cetus Corp | |
US4683202A (en) | 1985-03-28 | 1987-07-28 | Cetus Corporation | Process for amplifying nucleic acid sequences |
US5583024A (en) | 1985-12-02 | 1996-12-10 | The Regents Of The University Of California | Recombinant expression of Coleoptera luciferase |
US5700673A (en) | 1985-12-02 | 1997-12-23 | The Regents Of The University Of California | Recombinantly produced Coleoptera luciferase and fusion proteins thereof |
US5674713A (en) | 1985-12-02 | 1997-10-07 | The Regents Of The University Of California | DNA sequences encoding coleoptera luciferase activity |
US4683195A (en) | 1986-01-30 | 1987-07-28 | Cetus Corporation | Process for amplifying, detecting, and/or-cloning nucleic acid sequences |
US4683195B1 (en) | 1986-01-30 | 1990-11-27 | Cetus Corp | |
US4988617A (en) | 1988-03-25 | 1991-01-29 | California Institute Of Technology | Method of detecting a nucleotide change in nucleic acids |
US5234809A (en) | 1989-03-23 | 1993-08-10 | Akzo N.V. | Process for isolating nucleic acid |
US5494810A (en) | 1990-05-03 | 1996-02-27 | Cornell Research Foundation, Inc. | Thermostable ligase-mediated DNA amplifications system for the detection of genetic disease |
US5511158A (en) | 1994-08-04 | 1996-04-23 | Thinking Machines Corporation | System and method for creating and evolving directed graphs |
US6306597B1 (en) | 1995-04-17 | 2001-10-23 | Lynx Therapeutics, Inc. | DNA sequencing by parallel oligonucleotide extensions |
US5701256A (en) | 1995-05-31 | 1997-12-23 | Cold Spring Harbor Laboratory | Method and apparatus for biological sequence comparison |
US6210891B1 (en) | 1996-09-27 | 2001-04-03 | Pyrosequencing Ab | Method of sequencing DNA |
US6054278A (en) | 1997-05-05 | 2000-04-25 | The Perkin-Elmer Corporation | Ribosomal RNA gene polymorphism based microorganism identification |
US7776616B2 (en) | 1997-09-17 | 2010-08-17 | Qiagen North American Holdings, Inc. | Apparatuses and methods for isolating nucleic acid |
US7598035B2 (en) | 1998-02-23 | 2009-10-06 | Solexa, Inc. | Method and compositions for ordering restriction fragments |
US6223128B1 (en) | 1998-06-29 | 2001-04-24 | Dnstar, Inc. | DNA sequence assembly system |
US7232656B2 (en) | 1998-07-30 | 2007-06-19 | Solexa Ltd. | Arrayed biomolecules and their use in sequencing |
US6828100B1 (en) | 1999-01-22 | 2004-12-07 | Biotage Ab | Method of DNA sequencing |
US6911345B2 (en) | 1999-06-28 | 2005-06-28 | California Institute Of Technology | Methods and apparatus for analyzing polynucleotide sequences |
US6818395B1 (en) | 1999-06-28 | 2004-11-16 | California Institute Of Technology | Methods and apparatus for analyzing polynucleotide sequences |
US6833246B2 (en) | 1999-09-29 | 2004-12-21 | Solexa, Ltd. | Polynucleotide sequencing |
US20020190663A1 (en) | 2000-07-17 | 2002-12-19 | Rasmussen Robert T. | Method and apparatuses for providing uniform electron beams from field emission displays |
US6925389B2 (en) | 2000-07-18 | 2005-08-02 | Correlogic Systems, Inc., | Process for discriminating between biological states based on hidden patterns from biological data |
US7917302B2 (en) | 2000-09-28 | 2011-03-29 | Torbjorn Rognes | Determination of optimal local sequence alignment similarity score |
US20020164629A1 (en) | 2001-03-12 | 2002-11-07 | California Institute Of Technology | Methods and apparatus for analyzing polynucleotide sequences by asynchronous base extension |
US6890763B2 (en) | 2001-04-30 | 2005-05-10 | Syn X Pharma, Inc. | Biopolymer marker indicative of disease state having a molecular weight of 1350 daltons |
US7809509B2 (en) | 2001-05-08 | 2010-10-05 | Ip Genesis, Inc. | Comparative mapping and assembly of nucleic acid sequences |
US6582938B1 (en) | 2001-05-11 | 2003-06-24 | Affymetrix, Inc. | Amplification of nucleic acids |
US7577554B2 (en) | 2001-07-03 | 2009-08-18 | I2 Technologies Us, Inc. | Workflow modeling using an acyclic directed graph data structure |
US20040023209A1 (en) | 2001-11-28 | 2004-02-05 | Jon Jonasson | Method for identifying microorganisms based on sequencing gene fragments |
US6989100B2 (en) | 2002-05-09 | 2006-01-24 | Ppd Biomarker Discovery Sciences, Llc | Methods for time-alignment of liquid chromatography-mass spectrometry data |
US7321623B2 (en) | 2002-10-01 | 2008-01-22 | Avocent Corporation | Video compression system |
US7620800B2 (en) | 2002-10-31 | 2009-11-17 | Src Computers, Inc. | Multi-adaptive processing systems and techniques for enhancing parallelism and performance of computational functions |
US20070166707A1 (en) | 2002-12-27 | 2007-07-19 | Rosetta Inpharmatics Llc | Computer systems and methods for associating genes with traits using cross species data |
US7885840B2 (en) | 2003-01-07 | 2011-02-08 | Sap Aktiengesellschaft | System and method of flexible workflow management |
US8639847B2 (en) | 2003-03-18 | 2014-01-28 | Microsoft Corporation | Systems and methods for scheduling data flow execution based on an arbitrary graph describing the desired data flow |
US20050089906A1 (en) | 2003-09-19 | 2005-04-28 | Nec Corporation Et Al. | Haplotype estimation method |
US20060024681A1 (en) | 2003-10-31 | 2006-02-02 | Agencourt Bioscience Corporation | Methods for producing a paired tag from a nucleic acid sequence and methods of use thereof |
US7169560B2 (en) | 2003-11-12 | 2007-01-30 | Helicos Biosciences Corporation | Short cycle methods for sequencing polynucleotides |
US20090191565A1 (en) | 2003-11-12 | 2009-07-30 | Helicos Biosciences Corporation | Short cycle methods for sequencing polynucleotides |
US20060195269A1 (en) | 2004-02-25 | 2006-08-31 | Yeatman Timothy J | Methods and systems for predicting cancer outcome |
US20080294403A1 (en) | 2004-04-30 | 2008-11-27 | Jun Zhu | Systems and Methods for Reconstructing Gene Networks in Segregating Populations |
US20080251711A1 (en) | 2004-09-30 | 2008-10-16 | U.S. Department Of Energy | Ultra High Mass Range Mass Spectrometer Systems |
US20080077607A1 (en) | 2004-11-08 | 2008-03-27 | Seirad Inc. | Methods and Systems for Compressing and Comparing Genomic Data |
US8340914B2 (en) | 2004-11-08 | 2012-12-25 | Gatewood Joe M | Methods and systems for compressing and comparing genomic data |
US7483585B2 (en) | 2004-12-01 | 2009-01-27 | Ati Technologies Ulc | Image compression using variable bit size run length encoding |
US20080003571A1 (en) | 2005-02-01 | 2008-01-03 | Mckernan Kevin | Reagents, methods, and libraries for bead-based sequencing |
US20060292611A1 (en) | 2005-06-06 | 2006-12-28 | Jan Berka | Paired end sequencing |
WO2007086935A2 (en) | 2005-08-01 | 2007-08-02 | 454 Life Sciences Corporation | Methods of amplifying and sequencing nucleic acids |
US20070114362A1 (en) | 2005-11-23 | 2007-05-24 | Illumina, Inc. | Confocal imaging methods and apparatus |
US20080281463A1 (en) | 2006-01-18 | 2008-11-13 | Suh Suk Hwan | Method of Non-Linear Process Planning and Internet-Based Step-Nc System Using the Same |
US7580918B2 (en) | 2006-03-03 | 2009-08-25 | Adobe Systems Incorporated | System and method of efficiently representing and searching directed acyclic graph structures in databases |
US20090300781A1 (en) | 2006-03-31 | 2009-12-03 | Ian Bancroft | Prediction of heterosis and other traits by transcriptome analysis |
US7282337B1 (en) | 2006-04-14 | 2007-10-16 | Helicos Biosciences Corporation | Methods for increasing accuracy of nucleic acid sequencing |
US7957913B2 (en) | 2006-05-03 | 2011-06-07 | Population Diagnostics, Inc. | Evaluating genetic disorders |
US7960120B2 (en) | 2006-10-06 | 2011-06-14 | Illumina Cambridge Ltd. | Method for pair-wise sequencing a plurality of double stranded target polynucleotides |
US20090325145A1 (en) | 2006-10-20 | 2009-12-31 | Erwin Sablon | Methodology for analysis of sequence variations within the hcv ns5b genomic region |
US20100188073A1 (en) | 2006-12-14 | 2010-07-29 | Ion Torrent Systems Incorporated | Methods and apparatus for measuring analytes using large scale fet arrays |
US20100282617A1 (en) | 2006-12-14 | 2010-11-11 | Ion Torrent Systems Incorporated | Methods and apparatus for detecting molecular interactions using fet arrays |
US20090127589A1 (en) | 2006-12-14 | 2009-05-21 | Ion Torrent Systems Incorporated | Methods and apparatus for measuring analytes using large scale FET arrays |
US20090026082A1 (en) | 2006-12-14 | 2009-01-29 | Ion Torrent Systems Incorporated | Methods and apparatus for measuring analytes using large scale FET arrays |
US20100197507A1 (en) | 2006-12-14 | 2010-08-05 | Ion Torrent Systems Incorporated | Methods and apparatus for measuring analytes using large scale fet arrays |
US20110009278A1 (en) | 2007-01-26 | 2011-01-13 | Illumina, Inc. | Nucleic acid sequencing system and method |
US7835871B2 (en) | 2007-01-26 | 2010-11-16 | Illumina, Inc. | Nucleic acid sequencing system and method |
US8165821B2 (en) | 2007-02-05 | 2012-04-24 | Applied Biosystems, Llc | System and methods for indel identification using short read sequencing |
US8146099B2 (en) | 2007-09-27 | 2012-03-27 | Microsoft Corporation | Service-oriented pipeline based architecture |
US20090119313A1 (en) | 2007-11-02 | 2009-05-07 | Ioactive Inc. | Determining structure of binary data using alignment algorithms |
US20140136120A1 (en) | 2007-11-21 | 2014-05-15 | Cosmosid Inc. | Direct identification and measurement of relative populations of microorganisms with direct dna sequencing and probabilistic methods |
US20090164135A1 (en) | 2007-12-21 | 2009-06-25 | Brodzik Andrzej K | Quaternionic algebra approach to dna and rna tandem repeat detection |
US20090233809A1 (en) | 2008-03-04 | 2009-09-17 | Affymetrix, Inc. | Resequencing methods for identification of sequence variants |
US20090318310A1 (en) | 2008-04-21 | 2009-12-24 | Softgenetics Llc | DNA Sequence Assembly Methods of Short Reads |
US20100010992A1 (en) | 2008-07-10 | 2010-01-14 | Morris Robert P | Methods And Systems For Resolving A Location Information To A Network Identifier |
WO2010010992A1 (en) | 2008-07-25 | 2010-01-28 | Korea Research Institute Of Bioscience And Biotechnology | Bio information analysis process auto design system and thereof |
US20100041048A1 (en) | 2008-07-31 | 2010-02-18 | The Johns Hopkins University | Circulating Mutant DNA to Assess Tumor Dynamics |
US20100035252A1 (en) | 2008-08-08 | 2010-02-11 | Ion Torrent Systems Incorporated | Methods for sequencing individual nucleic acids under tension |
US20120040851A1 (en) | 2008-09-19 | 2012-02-16 | Immune Disease Institute, Inc. | miRNA TARGETS |
US20100300559A1 (en) | 2008-10-22 | 2010-12-02 | Ion Torrent Systems, Inc. | Fluidics system for sequential delivery of reagents |
US20100137143A1 (en) | 2008-10-22 | 2010-06-03 | Ion Torrent Systems Incorporated | Methods and apparatus for measuring analytes |
US20110207135A1 (en) | 2008-11-07 | 2011-08-25 | Sequenta, Inc. | Methods of monitoring conditions by sequence analysis |
US20100169026A1 (en) * | 2008-11-20 | 2010-07-01 | Pacific Biosciences Of California, Inc. | Algorithms for sequence determination |
US8370079B2 (en) | 2008-11-20 | 2013-02-05 | Pacific Biosciences Of California, Inc. | Algorithms for sequence determination |
US20120045771A1 (en) | 2008-12-11 | 2012-02-23 | Febit Holding Gmbh | Method for analysis of nucleic acid populations |
US20120041727A1 (en) | 2008-12-24 | 2012-02-16 | New York University | Method, computer-accessible medium and systems for score-driven whole-genome shotgun sequence assemble |
US20100285578A1 (en) | 2009-02-03 | 2010-11-11 | Network Biosystems, Inc. | Nucleic Acid Purification |
US20100240046A1 (en) | 2009-03-20 | 2010-09-23 | Siemens Corporation | Methods and Systems for Identifying PCR Primers Specific to One or More Target Genomes |
US20110004413A1 (en) | 2009-04-29 | 2011-01-06 | Complete Genomics, Inc. | Method and system for calling variations in a sample polynucleotide sequence with respect to a reference polynucleotide sequence |
US20100301398A1 (en) | 2009-05-29 | 2010-12-02 | Ion Torrent Systems Incorporated | Methods and apparatus for measuring analytes |
US20100304982A1 (en) | 2009-05-29 | 2010-12-02 | Ion Torrent Systems, Inc. | Scaffolded nucleic acid polymer particles and methods of making and using |
US20100300895A1 (en) | 2009-05-29 | 2010-12-02 | Ion Torrent Systems, Inc. | Apparatus and methods for performing electrochemical reactions |
US20140051588A9 (en) | 2009-06-15 | 2014-02-20 | Complete Genomics, Inc. | Sequencing Small Amounts of Complex Nucleic Acids |
US20130059740A1 (en) | 2009-06-15 | 2013-03-07 | Complete Genomics, Inc. | Sequencing Small Amounts of Complex Nucleic Acids |
US20130124100A1 (en) | 2009-06-15 | 2013-05-16 | Complete Genomics, Inc. | Processing and Analysis of Complex Nucleic Acid Sequence Data |
US20110098193A1 (en) | 2009-10-22 | 2011-04-28 | Kingsmore Stephen F | Methods and Systems for Medical Sequencing Analysis |
US20110096193A1 (en) | 2009-10-26 | 2011-04-28 | Kabushiki Kaisha Toshiba | Solid-state imaging device |
US20130035904A1 (en) | 2010-01-14 | 2013-02-07 | Daniel Kuhn | 3d plant modeling systems and methods |
US20150344970A1 (en) | 2010-02-18 | 2015-12-03 | The Johns Hopkins University | Personalized Tumor Biomarkers |
US20120330566A1 (en) | 2010-02-24 | 2012-12-27 | Pacific Biosciences Of California, Inc. | Sequence assembly and consensus sequence determination |
US20110257889A1 (en) | 2010-02-24 | 2011-10-20 | Pacific Biosciences Of California, Inc. | Sequence assembly and consensus sequence determination |
WO2011139797A2 (en) | 2010-04-27 | 2011-11-10 | Spiral Genetics Inc. | Method and system for analysis and error correction of biological sequences and inference of relationship for multiple samples |
US20120030566A1 (en) | 2010-07-28 | 2012-02-02 | Victor B Michael | System with touch-based selection of data items |
US20130332081A1 (en) | 2010-09-09 | 2013-12-12 | Omicia Inc | Variant annotation, analysis and selection tool |
US20120157322A1 (en) | 2010-09-24 | 2012-06-21 | Samuel Myllykangas | Direct Capture, Amplification and Sequencing of Target DNA Using Immobilized Primers |
US20130289099A1 (en) | 2010-12-17 | 2013-10-31 | Universite Pierre Et Marie Curie (Paris 6) | Abcg1 gene as a marker and a target gene for treating obesity |
WO2012096579A2 (en) | 2011-01-14 | 2012-07-19 | Keygene N.V. | Paired end random sequence based genotyping |
WO2012098515A1 (en) | 2011-01-19 | 2012-07-26 | Koninklijke Philips Electronics N.V. | Method for processing genomic data |
US20120239706A1 (en) | 2011-03-18 | 2012-09-20 | Los Alamos National Security, Llc | Computer-facilitated parallel information alignment and analysis |
WO2012142531A2 (en) | 2011-04-14 | 2012-10-18 | Complete Genomics, Inc. | Processing and analysis of complex nucleic acid sequence data |
US20130059738A1 (en) | 2011-04-28 | 2013-03-07 | Life Technologies Corporation | Methods and compositions for multiplex pcr |
US20130029879A1 (en) | 2011-07-29 | 2013-01-31 | Ginkgo Bioworks | Methods and Systems for Cell State Quantification |
WO2013035904A1 (en) | 2011-09-08 | 2013-03-14 | 한국과학기술정보연구원 | System and method for processing bio information analysis pipeline |
KR101282798B1 (en) | 2011-09-08 | 2013-07-04 | 한국과학기술정보연구원 | System and method for processing bio information analysis pipeline |
WO2013043909A1 (en) | 2011-09-20 | 2013-03-28 | Life Technologies Corporation | Systems and methods for identifying sequence variation |
US20130073214A1 (en) | 2011-09-20 | 2013-03-21 | Life Technologies Corporation | Systems and methods for identifying sequence variation |
US8972201B2 (en) | 2011-12-24 | 2015-03-03 | Tata Consultancy Services Limited | Compression of genomic data file |
US20140323320A1 (en) | 2011-12-31 | 2014-10-30 | Bgi Tech Solutions Co., Ltd. | Method of detecting fused transcripts and system thereof |
WO2013106737A1 (en) | 2012-01-13 | 2013-07-18 | Data2Bio | Genotyping by next-generation sequencing |
US20130232480A1 (en) | 2012-03-02 | 2013-09-05 | Vmware, Inc. | Single, logical, multi-tier application blueprint used for deployment and management of multiple physical applications in a cloud environment |
US20130311106A1 (en) | 2012-03-16 | 2013-11-21 | The Research Institute At Nationwide Children's Hospital | Comprehensive Analysis Pipeline for Discovery of Human Genetic Variation |
US8209130B1 (en) | 2012-04-04 | 2012-06-26 | Good Start Genetics, Inc. | Sequence assembly |
US20130345066A1 (en) | 2012-05-09 | 2013-12-26 | Life Technologies Corporation | Systems and methods for identifying sequence variation |
WO2013184643A1 (en) | 2012-06-04 | 2013-12-12 | Good Start Genetics, Inc. | Determining the clinical significance of variant sequences |
US20140012866A1 (en) | 2012-07-03 | 2014-01-09 | International Business Machines Corporation | Using annotators in genome research |
US20140025312A1 (en) | 2012-07-13 | 2014-01-23 | Pacific Biosciences Of California, Inc. | Hierarchical genome assembly method using single long insert library |
US20140066317A1 (en) | 2012-09-04 | 2014-03-06 | Guardant Health, Inc. | Systems and methods to detect rare mutations and copy number variation |
US20150293994A1 (en) | 2012-11-06 | 2015-10-15 | Hewlett-Packard Development Company, L.P. | Enhanced graph traversal |
US20140129201A1 (en) | 2012-11-07 | 2014-05-08 | Good Start Genetics, Inc. | Validation of genetic tests |
US20140200147A1 (en) | 2013-01-17 | 2014-07-17 | Personalis, Inc. | Methods and Systems for Genetic Analysis |
US20150356147A1 (en) | 2013-01-24 | 2015-12-10 | New York University | Systems, methods and computer-accessible mediums for utilizing pattern matching in stringomes |
US20140278590A1 (en) | 2013-03-13 | 2014-09-18 | Airline Tariff Publishing Company | System, method and computer program product for providing a fare analytic engine |
US20140281708A1 (en) | 2013-03-14 | 2014-09-18 | International Business Machines Corporation | Generating fault tolerant connectivity api |
US20140280360A1 (en) | 2013-03-15 | 2014-09-18 | James Webber | Graph database devices and methods for partitioning graphs |
US20150020061A1 (en) | 2013-07-11 | 2015-01-15 | Oracle International Corporation | Forming an upgrade recommendation in a cloud computing environment |
US20150056613A1 (en) | 2013-08-21 | 2015-02-26 | Seven Bridges Genomics Inc. | Methods and systems for detecting sequence variants |
WO2015027050A1 (en) | 2013-08-21 | 2015-02-26 | Seven Bridges Genomics Inc. | Methods and systems for aligning sequences |
US20160306921A1 (en) | 2013-08-21 | 2016-10-20 | Seven Bridges Genomics Inc. | Methods and systems for detecting sequence variants |
US20150347678A1 (en) | 2013-08-21 | 2015-12-03 | Seven Bridges Genomics Inc. | Methods and systems for detecting sequence variants |
US9390226B2 (en) | 2013-08-21 | 2016-07-12 | Seven Bridges Genomics Inc. | Methods and systems for detecting sequence variants |
US9116866B2 (en) | 2013-08-21 | 2015-08-25 | Seven Bridges Genomics Inc. | Methods and systems for detecting sequence variants |
US20150057946A1 (en) | 2013-08-21 | 2015-02-26 | Seven Bridges Genomics Inc. | Methods and systems for aligning sequences |
US20150066383A1 (en) | 2013-09-03 | 2015-03-05 | Seven Bridges Genomics Inc. | Collapsible modular genomic pipeline |
WO2015048753A1 (en) | 2013-09-30 | 2015-04-02 | Seven Bridges Genomics Inc. | Methods and system for detecting sequence variants |
US20150094212A1 (en) | 2013-10-01 | 2015-04-02 | Life Technologies Corporation | Systems and Methods for Detecting Structural Variants |
US20150110754A1 (en) | 2013-10-15 | 2015-04-23 | Regeneron Pharmaceuticals, Inc. | High Resolution Allele Identification |
WO2015058120A1 (en) | 2013-10-18 | 2015-04-23 | Seven Bridges Genomics Inc. | Methods and systems for aligning sequences in the presence of repeating elements |
WO2015058095A1 (en) | 2013-10-18 | 2015-04-23 | Seven Bridges Genomics Inc. | Methods and systems for quantifying sequence alignment |
US20150199473A1 (en) | 2013-10-18 | 2015-07-16 | Seven Bridges Genomics Inc. | Methods and systems for quantifying sequence alignment |
US20150199472A1 (en) | 2013-10-18 | 2015-07-16 | Seven Bridges Genomics Inc. | Methods and systems for genotyping genetic samples |
US20150197815A1 (en) | 2013-10-18 | 2015-07-16 | Seven Bridges Genomics Inc. | Methods and systems for identifying disease-induced mutations |
US20150199474A1 (en) | 2013-10-18 | 2015-07-16 | Seven Bridges Genomics Inc. | Methods and systems for aligning sequences in the presence of repeating elements |
WO2015058093A1 (en) | 2013-10-18 | 2015-04-23 | Seven Bridges Genomics Inc. | Methods and systems for genotyping genetic samples |
WO2015058097A1 (en) | 2013-10-18 | 2015-04-23 | Seven Bridges Genomics Inc. | Methods and systems for identifying disease-induced mutations |
US9063914B2 (en) | 2013-10-21 | 2015-06-23 | Seven Bridges Genomics Inc. | Systems and methods for transcriptome analysis |
WO2015061099A1 (en) | 2013-10-21 | 2015-04-30 | Seven Bridges Genomics Inc. | Systems and methods for transcriptome analysis |
US20150112602A1 (en) | 2013-10-21 | 2015-04-23 | Seven Bridges Genomics Inc. | Systems and methods for using paired-end data in directed acyclic structure |
US9092402B2 (en) | 2013-10-21 | 2015-07-28 | Seven Bridges Genomics Inc. | Systems and methods for using paired-end data in directed acyclic structure |
US20150112658A1 (en) | 2013-10-21 | 2015-04-23 | Seven Bridges Genomics Inc. | Systems and methods for transcriptome analysis |
US20150302145A1 (en) | 2013-10-21 | 2015-10-22 | Seven Bridges Genomics Inc. | Systems and methods for transcriptome analysis |
US20150310167A1 (en) | 2013-10-21 | 2015-10-29 | Seven Bridges Genomics Inc. | Systems and methods for using paired-end data in directed acyclic structure |
WO2015061103A1 (en) | 2013-10-21 | 2015-04-30 | Seven Bridges Genomics Inc. | Systems and methods for using paired-end data in directed acyclic structure |
WO2015105963A1 (en) | 2014-01-10 | 2015-07-16 | Seven Bridges Genomics Inc. | Systems and methods for use of known alleles in read mapping |
US20150199475A1 (en) | 2014-01-10 | 2015-07-16 | Seven Bridges Genomics Inc. | Systems and methods for use of known alleles in read mapping |
US20150227685A1 (en) | 2014-02-11 | 2015-08-13 | Seven Bridges Genomics Inc. | Systems and methods for analyzing sequence data |
WO2015123269A1 (en) | 2014-02-11 | 2015-08-20 | Seven Bridges Genomics Inc. | System and methods for analyzing sequence data |
US9817944B2 (en) | 2014-02-11 | 2017-11-14 | Seven Bridges Genomics Inc. | Systems and methods for analyzing sequence data |
US20160259880A1 (en) | 2015-03-05 | 2016-09-08 | Seven Bridges Genomics Inc. | Systems and methods for genomic pattern analysis |
WO2016141294A1 (en) | 2015-03-05 | 2016-09-09 | Seven Bridges Genomics Inc. | Systems and methods for genomic pattern analysis |
US20160364523A1 (en) | 2015-06-11 | 2016-12-15 | Seven Bridges Genomics Inc. | Systems and methods for identifying microorganisms |
WO2016201215A1 (en) | 2015-06-11 | 2016-12-15 | Seven Bridges Genomics Inc. | Systems and methods for identifying microorganisms |
US20170058320A1 (en) | 2015-08-24 | 2017-03-02 | Seven Bridges Genomics Inc. | Systems and methods for epigenetic analysis |
US20170058365A1 (en) | 2015-09-01 | 2017-03-02 | Seven Bridges Genomics Inc. | Systems and methods for analyzing viral nucleic acids |
US20170058341A1 (en) | 2015-09-01 | 2017-03-02 | Seven Bridges Genomics Inc. | Systems and methods for mitochondrial analysis |
WO2017066753A1 (en) | 2015-10-16 | 2017-04-20 | Seven Bridges Genomics Inc. | Biological graph or sequence serialization |
US20170193351A1 (en) | 2015-12-30 | 2017-07-06 | Micron Technology, Inc. | Methods and systems for vector length management |
WO2017120128A1 (en) | 2016-01-07 | 2017-07-13 | Seven Bridges Genomics Inc. | Systems and methods for adaptive local alignment for graph genomes |
US20170199960A1 (en) | 2016-01-07 | 2017-07-13 | Seven Bridges Genomics Inc. | Systems and methods for adaptive local alignment for graph genomes |
US20170199959A1 (en) | 2016-01-13 | 2017-07-13 | Seven Bridges Genomics Inc. | Genetic analysis systems and methods |
WO2017123864A1 (en) | 2016-01-13 | 2017-07-20 | Seven Bridges Genomics Inc. | Systems and methods for analyzing circulating tumor dna |
US20170242958A1 (en) | 2016-02-24 | 2017-08-24 | Seven Bridges Genomics Inc. | Systems and methods for genotyping with graph reference |
WO2017147124A1 (en) | 2016-02-24 | 2017-08-31 | Seven Bridges Genomics Inc. | Systems and methods for genotyping with graph reference |
Non-Patent Citations (291)
Title |
---|
Abouelhoda, 2012, Tavaxy: integrating Taverna and Galaxy workflows with cloud computing support, BMC Bioinformatics 13:77. |
Agarwal, 2013, SINNET: Social Interaction Network Extractor from Text, Proc IJCNLP 33-36. |
Aguiar, 2012, HapCompass: A fast cycle basis algorithm for accurate haplotype assembly of sequence data, J Comp Biol 19(6):577-590. |
Aguiar, 2013, Haplotype assembly in polyploid genomes and identical by descent shared tracts, BioInformatics 29(13):i352-i360. |
Airoldi, 2008, Mixed membership stochastic blockmodels, JMLR 9:1981-2014. |
Albers, 2011, Dindel: Accurate indel calls from short-read data, Genome Research 21:961-973. |
Alioto et al., A comprehensive assessment of somatic mutation detection in cancer using whole-genome sequencing, Nature Communications, Dec. 9, 2015. |
Altera, 2007, Implementation of the Smith-Waterman algorithm on reconfigurable supercomputing platform, White Paper ver 1.0 (18 pages). |
Altschul et al., Optimal Sequence Alignment Using Affine Gap Costs, Bulletin of Mathematical Biology vol. 48, No. 5/6, pp. 603-616, 1986. |
Ayguade et al. (SW Algorithm, Oct. 2007, pp. 1-18) (Year: 2007). * |
Bansal, 2008, An MCMC algorithm for haplotype assembly from whole-genorne sequence data, Genome Res 18:1336-1346. |
Bao et al., 2013, BRANCH: boosting RNA-Seq assemblies with partial or related genomic sequences, Bioinformatics 29(10):1250-1259. |
Barbieri, 2013, Exome sequencing identifies recurrent SPOP, FOXA1 and MED12 mutations in prostate cancer, Nature Genetics 44:6 685-689. |
BCF2 Quick Reference (r198), available at http://samtools.github.io/hts-specs/BCFv2_qref.pdf. |
Beerenwinkel, 2007, Conjunctive Bayesian Networks, Bernoulli 13(4), 893-909. |
Berlin, 2014, Assembling large genomes with single-molecule sequencing and locality sensitive hashing, bioRxiv preprint (35 pages); retrieved from the internet on Jan. 29, 2015, at. |
Bertone et al., 2004, Global identification of human transcribed sequences with genome tiling arrays, Science 306:2242-2246. |
Bertrand et al., 2009, Genetic map refinement using a comparative genomic approach, J Comp Biol 16(10):1475-1486. |
Black, 2005, A simple answer for a splicing conundrum, PNAS 102:4927-8. |
Boyer, 1977, A Fast String Searching Algorithm, Comm ACM 20(10):762-772. |
Browning et al, Haplotype phasing: existing methods and new developments, 2011, vol. 12, Nature Reviews Genetics. |
Buhler, 2001, Search algorithms for biosequences using random projection, dissertation, University of Washington (203 pages); retreived from the internet on Jun. 3, 2016, at. |
Caboche et al, Comparison of mapping algorithms used in high-throughput sequencing: application to Ion Torrent data, 2014, vol. 15, BMC Genomics. |
Carrington et al., 1985, Polypeptide ligation occurs during post-translational modification of concanavalin A, Nature 313:64-67. |
Cartwright, DNA assembly with gaps (DAWG): simulating sequence evolution, 2005, pp. iii31-iii38, vol. 21, Oxford University Press. |
Chang et al., 2005, The application of alternative splicing graphs in quantitative analysis of alternative splicing form from EST database, Int J. Comp. Appl. Tech 22(1): 14. |
Chen et al., 2012, Transient hypermutability, chromothripsis and replication-based mechansisms in the generation of concurent clustered mutations, Mutat Res 750(1):52-59. |
Chin et al., 2013, Nonhybrid finished microbial genome assemblies from long-read SMRT sequencing data, Nat Meth 10(6):563-569. |
Chin et al., Nonhybrid, finished microbial genome assemblies from long-read SMRTS sequencing data Nature Methods vol. 10 No. Jun. 6, 2013 pp. 563-571. |
Chuang, 2001, Gene recognition based on DAG shortest paths, Bioinformatics 17(Suppl. 1):556-564. |
Clark, 2014, Illumina announces landmark $1,000 human genome sequencing, Wired, Jan. 15, 2014. |
Cock, 2013, Galaxy tools and workflows for sequence analysis with applications in molecular plant pathology, Peer J 1: e167. |
Cohen-Boulakia, 2014, Distilling structure in Taverna scientific workflows: a refactoring approach, BMC Bioinformatics 15(Suppl 1):S12. |
Compeauet et al., How to apply de Bruijn graphs to genome assembly, Nature Biotechnology vol. 29 No. 11, pp. 987-991. |
Costa et al., 2010, Uncovering the Complexity of Transcriptomes with RNA-Seq, Journal of Biomedicine and Biotechnology Article ID 853916:1-19. |
Craig, 1990, Ordering of cosmid clones covering the Herpes simplex virus type 1 (HSV-I) genome: a test case for fingerprinting by hybridisation; Nucleic Acids Research 18:9 pp. 2653-2660. |
Danecek et al., 2011, The variant call format and VCFtools, Bionformatics 27(15):2156-2158. |
Delcher et al., 1999, Alignment of whole genomes, Nucleic Acids Research, 27(11):2369-2376. |
Denoeud, 2004, Identification of polymorphic tandem repeats by direct comparison of aenome sequence from different bacterial strains: a web-based resource, BMC Bioinformatics 5:4 pp. 1-12. |
DePristo, et al., 2011, A framework for variation discovery and genotyping using next-generation DNA sequencing data, Nature Genetics 43:491-498. |
Dinov et al., 2011, Applications of the pipeline environment for visual informatics and genomic computations, BMC Bioinformatics 12:304. |
Duan et al., Optimizing de novo common wheat transcriptome assembly using short-read RNA-Seq data. (2012) pp. 1-12, vol. 13, BMC Genomics. |
Dudley and Butte, 2009, A quick guide for developing effective bioinformatics programming skills, PLoS Comput Biol 5 (12):e1000589. |
Dudley and Butte, A quick guide for developing effective bioinformatics programming skills, PLoS Comput Biol 5(12): e1000589 (2009). |
Durbin, 2014, Efficient haplotype matching and storage using the positional Burrows-Wheeler transform (PBWT), Bioinformatics 30(9):1266-1272. |
Durham et al., 2005, EGene: a configurable pipeline system for automated sequence analysis, Bioinformatics 21 (12):2812-2813. |
Durham, et al., EGene: a configurable pipeline system for automated sequence analysis, Bioinformatics 21 (12):2812-2813 (2005). |
EESR issued in EP 14847490.1. |
EESR issued in EP 14854801.9. |
Enedelman, 2011, New algorithm improves fine structure of the barley consensus SNP map, BMC Genomics 12 (1):407 (and whole document). |
Exam Report issued in EP14803268.3. |
Examination Report issued in SG 11201601124Y. |
Extended European Search Report issued in EP 14837955.5. |
Farrar et al., Striped Smith-Waterman speeds database searches six times over other SSIMD implementations, vol. 23 No. 2 2007, pp. 156-161. |
Fiers 2008. High-throughput Bioinformatics with the Cyrille2 Pipeline System, BMC Bioinformatics 9:96. |
Fitch, 1970, Distinguishing homologous from analogous proteins, Systematic Zoology 19:99-113. |
Flicek, 2009, Sense from sequence reads: methods for alignment and assembly, Nat Meth Suppl 6(11s):s6-s12. |
Florea et al., 2005, Gene and alternative splicing annotation with AIR, Genome Research 15:54-66. |
Florea et al., Gene and alternative splicing annotation with AIR, Genome Res. 2005 15: 54-66. |
Florea, 2013, Genome-guided transcriptome assembly in the age of next-generation sequencing, IEEE/ACM Trans Comp Biol Bioinf 10(5):1234-1240. |
Garber et al., 2011, Computational methods for transcriptome annotation and quantification using RNA-Seq, Nat Meth 8(6):469-477. |
Gerlinger, 2012, Intratumor Heterogeneity and Branched Evolution Revealed by Multiregion Sequencing, 366:10 883-892. |
Glusman, 2014, Whole-genome haplotyping approaches and genomic medicine, Genome Med 6:73. |
Golub, 1999, Molecular classification of cancer: class discovery and class prediction by gene expression monitoring, Science 286, pp. 531-537. |
Goto et al., 2010, BioRuby: bioinformatics software for the Ruby programming language, Bioinformatics 26 (20):2617-2619. |
Goto, et al., 2010, BioRuby: bioinformatics software for the Ruby programming language, Bioinformatics 26 (20):2617-9. |
Gotoh et al., An Improved Algorithm for Matching Biological Sequences, J. Mol. Bid. (1982) 162, 705-708. |
Gotoh, 1999, Multiple sequence alignment: algorithms and applications, Adv Biophys 36:159-206. |
Grabherr et al., 2011, Full-length transcriptome assembly from RNA-Seq data without a reference genome, Nature Biotechnology 29(7):644-654. |
Grasso, 2004, Combining partial order alignment and progressive multiple sequence alignment increases alignment speed and scalability to very large alignment problems, Bioinformatics 20(10):1546-1556. |
Guttman et al., 2010, Ab initio reconstruction of cell type-specific transcriptomes in mouse reveals the conserved multi-exonic structure of lincRNAs, Nature Biotechnology 28(5):503-510. |
Haas et al., 2004, DAGchainer: a tool for mining segmental genome duplications and synteny, Bioinformatics 20 (18):3643-3646. |
Harenberg, 2014, Community detection in large-scale networks: a survey and empirical evaluation, WIREs Comp Stat 6:426-439. |
Harrow et al., 2012, Gencode: The reference human genome annotation for the Encode Project, Genome Res 22:1760-1774. |
He, 2010, Optimal algorithms for haplotype assembly from whole-genome sequence data, Bioinformatics 26:i183-i190. |
Heber et al., 2002, Splicing graphs and EST assembly problems, Bioinformatics 18 Suppl:181-188. |
Hein et al., A New Method That Simultaneously Aligns and Reconstructs Ancestral Sequences for Any Number of Homologous Sequences, When the Phylogeny is Given. Mol. Biol. E vol. 6(6):649-668. 1989. |
Hein et al., A Tree Reconstruction Method That is Economical in the Number of Pairwise Comparisons Used, Mol. Biol. Evol. 6(6):649-668. 1989. |
Hokamp, 2003, Wrapping up BLAST and Other Applications for Use on Unix Clusters, Bioinformatics 19(3)441-42. |
Holland et al., 2008, BioJava: an open-source framework for bioinformatics, Bioinformatics 24(18):2096-2097. |
Holland, et al., 2008, BioJava: an open-source framework for bioinformatics, Bioinformatics 24(18):2096-97. |
Homer et al., 2010, Improved variant discovery through local re-alignment of short-read next generation sequencing data using SRMA, Genome Biology 11(10):R99. |
Hoon et al., 2003, Biopipe: A flexible framework for protocol-based bioinformatics analysis, Genome Research 13 (8):1904-1915. |
Hoon, et al., Biopipe: A flexible framework for protocol-based bioinformatics analysis, Genome Research 13 (8):1904-1915 (2003). |
Horspool, 1980, Practical Fast Searching in Strings, Software-Practice & Experience 10:501-506. |
Horspool, 1980, Practical Fast Searching in Strings, Software—Practice & Experience 10:501-506. |
Huang, Chapter 3: Bio-Sequence Comparison and Alignment, ser. Curr Top Comp Mol Biol. Cambridge, Mass.: The MIT Press, 2002. |
Hull, 2006, Taverna: a tool for building and running workflows of services, Nucl Acids Res 34(Web Server issue): W729-32. |
Hutchinson, 2014, Allele-specific methylation occurs at genetic variants associated with complex diseases, PLoS One 9(6):e98464. |
International HapMap Consortium, 2005, A haplotype map of the human genome. Nature 437:1299-1320. |
International Preliminary Report on Patentability issued in application No. PCT/US2014/052065 dated Feb. 23, 2016. |
International Search Report and Written Opinion dated Apr. 19, 2017 for international Patent Application No. PCT/US2017/012015, (14 Pages). |
International Search Report and Written Opinion dated Apr. 7, 2017, for International Patent Application No. PCT/US17/13329, filed Jan. 13, 2017, (9 pages). |
International Search Report and Written Opinion dated Aug. 31, 2017, for International Application No. PCT/US2017/018830 with International Filing Date Feb. 22, 2017, (11 pages). |
International Search Report and Written Opinion dated Dec. 11, 2014, for International Patent Application No. PCT/US14/52065, filed Aug. 21, 2014, (18 pages). |
International Search Report and Written Opinion dated Dec. 30, 2014, for International Patent Application No. PCT/US14/58328, filed Sep. 30, 2014 (22 pages). |
International Search Report and Written Opinion dated Dec. 30, 2014, for PCT/US14/58328, with International Filing Date Sep. 30, 2014 (15 pages). |
International Search Report and Written Opinion dated Feb. 10, 2015, for International Patent Application No. PCT/US2014/060690, filed Oct. 15, 2014, PCT/US2014/060690 (11 pages). |
International Search Report and Written Opinion dated Feb. 17, 2015, for International Patent Application No. PCT/US2014/061156, filed Oct. 17, 2014 (19 pages). |
International Search Report and Written Opinion dated Feb. 4, 2015, for International Patent Application No. PCT/US2014/061198, filed Oct. 17, 2014, (8 pages). |
International Search Report and Written Opinion dated Feb. 4, 2015, for Patent Application No. PCT/US2014/061158, filed Oct. 17, 2014, (11 pages). |
International Search Report and Written Opinion dated Jan. 10, 2017, for International Patent Application No. PCT/US16/57324 with International Filing Date Oct. 17, 2016, (7 pages). |
International Search Report and Written Opinion dated Jan. 27, 2015, for International Patent Application No. PCT/US2014/060680, filed Oct. 215, 2014, (11 pages). |
International Search Report and Written Opinion dated Jan. 5, 2016, for International Patent Application PCT/US2015/054461 with International Filing Date Oct. 7, 2015 (7 pages). |
International Search Report and Written Opinion dated Mar. 19, 2015, for International Application No. PCT/US2014/061162 with International Filing Date Oct. 17, 2014 (12 pages). |
International Search Report and Written Opinion dated Mar. 31, 2015 for International Application No. PCT/US2015/010604 filed Jan. 8, 2015 (13 pages). |
International Search Report and Written Opinion dated May 11, 2015, for PCT/US2015/015375, with International Filing Date Feb. 11, 2015 (13 pages). |
International Search Report and Written Opinion dated May 5, 2016, for International Patent Application No. PCT/US2016/020899, wiht International Filing Date Mar. 4, 2016 (12 pages). |
International Search Report and Written Opinion dated Sep. 2, 2016, for International Patent Application No. PCT/US2016/033201 with International Filing Date May 19, 2016 (14 pages). |
International Search Report and Written Opinion dated Sep. 7, 2016, for International Application No. PCT/US2016/036873 with International filing date Jun. 10, 2016 (8 pages). |
International Search Report and Written Opinion of the International Searching Authority dated Nov. 17, 2015 for International Application No. PCT/US2015/048891 (11 Pages). |
Kano, 2010, Text mining meets workflow: linking U-Compare with Taverna, Bioinformatics 26(19):2486-7. |
Katoh, 2005, MAFFT version 5: improvement in accuracy of multiple sequence alignment, Nucl Acids Res 33 (2):511-518. |
Kawas, 2006, BioMoby extensions to the Taverna workflow management and enactment software, BMC Bioinformatics 7:523. |
Kehr, 2014, Genome alignment with graph data structures: a comparison, BMC Bioinformatics 15:99. |
Kent, 2002, BLAT-The Blast-Like Alignment Tool, Genome Research 4:656-664. |
Kent, 2002, BLAT—The Blast-Like Alignment Tool, Genome Research 4:656-664. |
Kim et al., 2005, ECgene: Genome-based EST clustering and gene modeling for alternative splicing, Genome Research 15:566-576. |
Kim, 2008, A Scaffold Analysis Tool Using Mate-Pair Information in Genome Sequencing, Journal of Biomedicine and Biotechnology 8(3):195-197. |
Kim, 2013, TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions, Genome Biol 14(4):R36. |
Koolen, 2008, Clinical and Molecular Delineation of the 17q21.31 Microdeletion Syndrome, J Med Gen 45(11):710-720. |
Krabbenhoft, 2008, Integrating ARC grid middleware with Taverna workflows, Bioinformatics 24(9):1221-2. |
Kuhn, 2010, CDK-Taverna: an open workflow environment for cheminformatics, BMC Bioinformatics 11:159. |
Kumar et al., 2010, Comparing de novo assemblers for 454 transcriptome data, BMC Genomics 11:571. |
Kurtz et al., 2004, Versatile and open software for comparing large genomes, Genome Biology, 5:R12. |
LaFramboise, 2009, Single nucleotide polymorphism arrays: a decade of biological, computational and technological advance, Nucleic Acids Res 37(13):4181-4193. |
Lam et al., 2008, Compressed indexing and local alignment of DNA, Bioinformatics 24(6):791-97. |
Langmead et al., 2009, Ultrafast and memory-efficient alignment of short DNA sequences to the human genome, Genome Biology 10:R25. |
Lanzen, 2008, The Taverna Interaction Service: enabling manual interaction in workflows, Bioinforrnatics 24 (8):1118-20. |
Larkin et al., 2007, Clustal W and Clustal X version 2.0, Bioinformatics 23(21):2947-2948. |
Layer, 2015, Efficient compression and analysis of large variation datasets, Biorxiv available at http://biorxiv.org/content/early/2015/04/20/018259. |
Layer, 2015, Efficient genotype compression and analysis of large genetic-variation data sets, Nat Meth 13(1):63-65. |
Lecca, 2015, Defining order and timing of mutations during cancer progression: the TO-DAG probabilistic graphical model, Frontiers in Genetics, vol. 6 Article 309 1-17. |
Lee and Wang, 2005, Bioinformatics analysis of alternative splicing, Brief Bioinf 6(1):23-33. |
Lee et al. (Bioinformatics, 2002, vol. 18, No. 3, pp. 452-464). * |
Lee et al. Accurate read mapping using a graph-based human pan-genome. (May 2015) American Society of Human Genetics 64th Annual Meeting Platform Abstracts; Abstract 41. |
Lee et al., 2005, Bioinformatics analysis of alternative splicing, Brief Bioinf 6(I):23-33. |
Lee, 2003, Generating consensus sequences from partial order multiple sequence alignment graphs, Bioinformatics 19 (8):999-1008. |
Lee, 2014, Accurate read mapping using a graph-based human pan-genome, ASHG 2014 Abstracts. |
Lee, 2014, MOSAIK: A hash-based algorithm for accurate next-generation sequencing short-read mapping, PLoS One 9(3):e90581. |
Lee, et al., 2002, Multiples sequence alignment using partial order graphs, Bioinformatics 18(3): 452-464. |
LeGault and Dewey, 2013, Inference of alternative splicing from RNA-Seq data with probabilistic splice graphs, Bioinformatics 29(18):2300-2310. |
LeGault et al., 2010, Learning Probalistic Splice Graphs from RNA-Seq data, pages.cs.wisc.edu/˜legault/cs760_writeup.pdf; retrieved from the internet on Apr. 6, 2014. |
LeGault et al., 2013, Inference of alternative splicing from RNA-Seq data with probabilistic splice graphs, Bioinformatics 29(18):2300-2310. |
Leipzig et al., 2004, The alternative splicing gallery (ASG): Bridging the gap between genome and transcriptome, Nucleic Acids Res., 23(13):3977-3983. |
Leipzig, et al., 2004, The alternative splicing gallery (ASG): Bridging the gap between genome and transcriptome, Nucl Ac Res 23(13):3977-2983. |
Li et al., 2008, SOAP: short oligonucleotide alignment program, Bioinformatics 24(5):713-14. |
Li et al., 2009, SOAP2: an improved ultrafast tool for short read alignment, Bioinformatics 25(15): 1966-67. |
Li et al., 2010, A survey of sequence alignment algorithms for next-generation sequencing, Briefings in Bionformatics 11(5):473-483. |
Li, 2008, Automated manipulation of systems biology models using libSBML within Taverna workflows, Bioinformatics 24(2):287-9. |
Li, 2008, Performing statistical analyses on quantitative data in Taverna workflows: an example using R and maxdBrowse to identify differentially-expressed genes from microarray data, BMC Bioinformatics 9:334. |
Li, 2009, Fast and accurate short read alignment with Burrows-Wheeler Transform. Bioinformatics 25:1754-60. |
Li, 2015, BGT: efficient and flexible genotype query across many samples, arXiv 1506.08452 [q-bio.GN]. |
Li, 2015, Towards Better Understanding of Artificats in Variant Calling from High-Coverage Samples, arXiv:1404.0929 [q-bio.GN]. |
Li, et al., 2009, The Sequence Alignment/Map format and SAMtools, Bioinformatics 25(16):2078-9. |
Life Technologies, 2013, Rapid Exome Sequencing Using the Ion Proton System and Ion Ampliseq Technology, Application Note (5 Pages). |
Lindgreen, 2012, AdapterRemoval: easy cleaning of next-generation sequence reads, BMC Res Notes 5:337. |
Lipman and Pearson, 1985, Rapid and sensitive protein similarity searches, Science 227(4693):1435-41. |
Lucking, 2011 PICS-Ord: unlimited coding of ambiguous regions by pairwise identity and cost scores ordination, BMC Bioinf 12:10. |
Lupski, 2005, Genomic disorders: Molecular mechanisms for rearrangements and conveyed phenotypes, PLoS Genetics 1(6):e49. |
Ma et al., 2010, Multiple genome alignment based on longest path in directed acyclic graphs, IJBRA 6(4):366-383. |
Ma et al., Multiple genome alignment based on longest path in directed acyclic graphs. Int. J. Bioinformatics Research and Applications, vol. 6, No. 4, 2010. |
Machine translation of KR 10-1282798 B1 generated on Jan. 6, 2016, by the website of the European Patent Office (23 pages). |
Machine translation produced on Jun. 1, 2015, by Espacenet of WO 2010/010992 A1 (11 pages). |
Machine translation produced on Jun. 1, 2015, by WPIO website of WO 2013/035904 (10 pages). |
Mamoulis, 2004, Non-contiguous sequence pattern queries, in Advances in Database Technology-EDBT 2004: 9th International Conference on Extending Database Technology, Heraklion, Crete, Greece, Mar. 14-18, 2004, Proceedings (18 pages); retreived from the Internet on Jun. 3, 2016, at. |
Mamoulis, 2004, Non-contiguous sequence pattern queries, in Advances in Database Technology—EDBT 2004: 9th International Conference on Extending Database Technology, Heraklion, Crete, Greece, Mar. 14-18, 2004, Proceedings (18 pages); retreived from the Internet on Jun. 3, 2016, at. |
Manolio, et al., 2010, Genome wide association studies and assessment of the risk of disease, NEJM 363(2):166-76. |
Mardis, 2010, The $1,000 genome, the $1,000 analysis?, Genome Med 2:84-85. |
Margulies et al., 2005, Genome sequencing in microfabricated high-density picolitre reactors, Nature 437:376-380. |
Margulies et al., 2005, Genome sequencing in micro-fabricated high-density picotiter reactors, Nature, 437:376-380. |
Marth et al., 1999-A general approach to single-nucleotide polymorphism discovery, pp. 452-456, vol. 23, Nature Genetics. |
Marth et al., 1999—A general approach to single-nucleotide polymorphism discovery, pp. 452-456, vol. 23, Nature Genetics. |
Marth, 1999, A general approach to single-nucleotide polymorphism discovery, Nature Genetics 23:452-456. |
Mazrouee, 2014, FastHap: fast and accurate single individual haplotype reconstructions using fuzzy conflict graphs, Bioinformatics 30:i371-i378. |
McKenna, et al., 2010, The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data, Genome Res 20:1297-303. |
McSherry, 2001, Spectral partitioning of random graphs, Proc 42nd IEEE Symp Found Comp Sci 529-537. |
Miller et al., 2010, Assembly Algorithms for Next-Generation Sequencing Data, Genomics 95(6): 315-327. |
Misra, 2011, Anatomy of a hash-based long read sequence mapping algorithm for next generation DNA sequencing, Bioinformatics 27(2):189-195. |
Missier, 2010, Taverna, reloaded, Proc. Scientific and Statistical Database Management, 22nd Int Conf, Heidelberg, Germany, Jun./ Jul. 2010, Gertz & Ludascher, Eds., Springer. |
Moudrianakis, 1965, Base sequence determination in nucleic acids with electron microscope III: chemistry and microscopy of guanine-labelled DNA, PNAS 53:564-71. |
Mount et al., Multiple Sequence Alignment, Bioinformatics, 2001, Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, pp. 139-204. |
Mourad, 2012, A hierarchical Bayesian network approach for linkage disequilibrium modeling and data-dimensionality reduction prior to genome-wide association studies, BMC Bioinformatics 12:16 1-20. |
Myers, The Fragment Assembly String Graph, Bioinformatics, 2005, pp. ii79-ii85, vol. 21. |
Nagalakshmi et al., RNA-Seq: A Method for Comprehensive Transcriptome Analysis, Current Protocols in Molecular Biology 4.11.1.13, Jan. 2010, 13 pages. |
Nagarajan & Pop, 2013, Sequence assembly demystified, Nat Rev 14:157-167. |
Najafi, 2016, Fundamental limits of pooled-DNA sequencing, arXiv:1604.04735. |
Nakao et al., 2005, Large-scale analysis of human alternative protein isoforms: pattern classification and correlation with subcellular localization signals, Nucl Ac Res 33(8):2355-2363. |
Needleman & Wunsch, 1970, A general method applicable to the search for similarities in the amino acid sequence of two proteins, J Mol Biol, 48(3):443-453. |
Nenadic, 2010, Nested Workflows, The Taverna Knowledge Blog, Dec. 13, 2010. Retrieved on Feb. 25, 2016 from http://taverna.knowledgeblog.org/2010/12/13/nested-workflows/. |
Newman, 2013, Community detection and graph portioning, Europhys Lett 103(2):28003, arXiv:1305.4974v1. |
Newman, 2014, An ultrasensitive method for quantitating circulating tumor DNA with broad patient coverage, Nature Medicine 20:5 1-11. |
NIH Public Access Author Manuscript, Guttman et al., 2010, Ab initio reconstruction of transcriptomes of pluripotent and lineage committed cells reveals gene structures of thousands of lincRNAs, NIH-PA Author Manuscript. |
NIH Public Access Author Manuscript, Trapnell et al., 2010, Transcript assembly and abundance estimation from RNA-Seq reveals thousands of new transcripts and swtiching among isoforms, NIH-PA Author Manuscript. |
Ning, 2001, SSAHA: a fast search method for large DNA databases, Genome Res 11(10):1725-9. |
Oinn, 2004, Taverna: a tool for the composition and enactment of bioinformatics workflows, Bioinformatics 20 (17):3045-54. |
Oinn, 2006, Taverna: lessons in creating a workflow environment for the life sciences, Concurrency and Computation: Practice and Experience 18(10):1067-1100. |
Olsson, 2015, Serial monitoring of circulating tumor DNA in patients with primary breast cancer for detection of occult metastatic disease, EMBO Molecular Medicine 7:8 1034-1047. |
O'Rawe, 2013, Low Concordance of Multiple Variant-Calling Pipelines: Practical Implications for Exome and Genome Sequencing, Genome Med 5:28. |
Oshlack et al., From RNA-seq reads to differential expression results. Genoome Bio 2010, 11:220, pp. 1-10. |
Pabinger et al., 2013, A survey of tools for variant analysis of next-generation genome sequencing data, Brief Bioinf. |
Parks, 2015, Detecting non-allelic homologous recombination from high-throughput sequencing data, Genome Biol 16:17. |
Paterson, 2009, An XML transfer schema for exchange of genomic and genetic mapping data: implementation as a web service in a Taverna workflow, BMC Bioinformatics 10:252. |
Pearson et al., 1988, Improved tools for biological sequence comparison, PNAS 85(8):2444-8. |
Pe'er., et al, 2006, Evaluating and improving power in whole-genome association studies using fixed marker sets. Nat. Genet., 38, 663-667. |
Pelxoto, 2014, Efficient Monte Carlo and greedy heuristic for the inference of stochastic block models, Phys. Rev. E 89, 012804. |
Pop et al., 2004, Comparative genorne assembly, Briefings in Bioinformatics vol. 5, pp. 237-248. |
Pope, 2014, ROVER Variant Caller: Read-Pair Overlap Considerate Variant-Calling Software Applied to PCR-Based Massively Parallel Sequencing Datasets, Source Code Bio Med 9:3. |
Popitsch, 2013, NGC: lossless and lossy compression of aligned high-throughput sequencing data, Nucl Acids Res, 41 (1):e27. |
Posada and Crandall, 1998, Model Test: testing the model of DNA substitution, Bioinformatics 14(9):817-8. |
Potter et al., 2004, The ensemble analysis pipeline, Genome Res 14:934-941. |
Potter et al., ASC: An Associative-Computing Paradigm, Computer , 27(11):19-25, 1994. |
Pruesse, 2012, SINA: Accurate high-throughput multiple sequence alignment of ribosomal RNA genes, Bioinformatics 28:14 1823-1829. |
Quail, et al. 2012, A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers, BMC Genomics 13:341. |
Rajaram, 2013, Pearl millet [Pennisetum glaucum (L.) R. Br.] consensus linkage map constructed using four RIL mapping populations and newly developed EST-SSRs, BMC Genomics 14(1):159. |
Ramirez-Gonzalez, 2011, Gee Fu: a sequence version and web-services database tool for genomic assembly, genome feature and NGS data, Bioinformatics 27(19):2754-2755. |
Raphael, 2004, A novel method for multiple alignment of sequences with repeated and shuffled elements, Genome Res 14:2336-2346. |
Robertson et al., 2010, De novo assembly and analysis of RNA-seq data, Nat Meth 7(11):909. |
Rodelsperger, 2008, Syntenator: Multiple gene order alignments with a gene-specific scoring function, Alg Mol Biol 3:14. |
Rognes et al., Faster Smith-Waterman database searches with inter-sequence SIMD parallelisation,Bioinformatics 2011, 12:221. |
Rognes et al., ParAlign: a parallel sequence alignment algorithm for rapid and sensitive database searches, Nucleic Acids Research, 2001, vol. 29, No. 7 1647-1652. |
Rognes et al., Six-fold speed-up of Smith-Waterman sequence database searching using parallel processing on common microprocessors, Bioinformatics VOI. 16 No. 8 2000, pp. 699-706. |
Ronquist, et al., 2012, MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space, Syst Biol 61(3):539-42. |
Rothberg, et al., 2011, An integrated semiconductor device enabling non-optical genome sequencing, Nature 475:348-352. |
Saebo et al., PARALIGN: rapid and sensitive sequence similarity searches powered by parallel computing technology, Nucleic Acids Research, 2005, vol. 33, Web Server issue W535-W539. |
Sato et al., 2008, Directed acyclic graph kernels for structural RNA analysis, BMC (BioMed Central) Bioinformatics 9 (318). |
Schenk et al., 2013, A pipeline for comprehensive and automated processing of electron diffraction data in IPLT, J Struct Biol 182(2):173-185. |
Schneeberger et al., 2009, Sumaltaneous alignment of short reads against multiple genomes, Genome Biology 10(9): R98.2-R98.12. |
Schwikowski & Vingron, 2002, Weighted sequence graphs: boosting iterated dynamic programming using locally suboptimal solutions, Disc Appl Mat 127:95-117. |
Shao et al., 2006, Bioinformatic analysis of exon repetition, exon scrambling and trans-splicing in humans, Bioinformatics 22:692-698. |
Sievers et al., 2011, Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omeag, Mol Syst Biol 7:539. |
Slater & Birney, 2005, Automated generation of heuristics for biological sequence comparison, BMC Bioinformatics 6:31. |
Slater et al., 2005, Automated generation of heuristics for biological sequence comparison, BMC Bioinformatics 6:31. |
Smith & Waterman, 1981, Identification of common molecular subsequences, J Mol Biol, 147(1):195-197. |
Smith et al., Identification of Common Molecular Subsequences, J. Mol. Biol. (1981) 147, 195-197. |
Smith et al., Multiple insert size paired-end sequencing for deconvolution of complex transcriptions, RNA Bio 9:5, 596-609; May 2012. |
Soni and Meller, 2007, Progress toward ultrafast DNA sequencing using solid-state nanopores, Clin Chem 53 (11):1996-2001. |
Sosa, 2012, Next-Generation Sequencing of Human Mitochondrial Reference Genomes Uncovers High Heteroplasmy Frequency, PLoS One 8(10):e1002737. |
Sroka, 2006, XQTav: an XQuery processor for Taverna environment, Bioinformatics 22(10):1280-1. |
Sroka, 2010, A formal semantics for the Taverna 2 workflow model, J Comp Sys Sci 76(6):490-508. |
Sroka, 2011, CalcTav-integration of a spreadsheet and Taverna workbench, Bioinformatics 27(18)2618-9. |
Sroka, 2011, CalcTav—integration of a spreadsheet and Taverna workbench, Bioinformatics 27(18)2618-9. |
Steinfadt (Dissertation, Kent State University, May 2010, pp. 1-156) (Year: 2010). * |
Stephens, et al,. 2001, A new statistical method for haplotype reconstruction from population data, Am J Hum Genet 68:978-989. |
Stewart, et al., 2011, A comprehensive map of mobile element insertion polymorphisms in humans, PLoS Genetics 7 (8):1-19. |
Sturgeon, RCDA: a highly sensitive and specific alternatively spliced transcript assembly tool featuring upstream consecutive exon structures, Genomics, Dec. 2012, 100(6): 357-362. |
Subramanian, 2008, DIALIGN-TX: greedy and progessive approaches for segment-based multiple sequence alignment, Alg Mol Biol 3(1):1-11. |
Sudmant, 2015, An integrated map of structural variation in 2,504 human genomes, Nature 526:75-81. |
Sun, 2006, Pairwise Comparison Between Genomic Sequences and Optical maps, dissertation, New York University (131 pages); retreived from the internet on Jun. 3, 2016, at. |
Szalkowski, 2012, Fast and robust multiple sequence alignment with phylogeny-aware gap placement, BMC (BioMed Central) Bioinformatics 13(129). |
Szalkowski, 2013, Graph-based modeling of tandem repeats improves global multiple sequence alignment, Nucl Ac Res 41(17):e162. |
Tan, 2010, A Comparison of Using Taverna and BPEL in Building Scientific Workflows: the case of caGrid, Concurr Comput 22(9):1098-1117. |
Tan, 2010, CaGrid Workflow Toolkit: a Taverna based workflow tool for cancer grid, BMC Bioinformatics 11:542. |
Tarhio, 1993, Approximate Boyer-Moore String Matching, SIAM J Comput 22(2):243-260. |
Tewhey, 2011, The importance of phase information for human genomics, Nat Rev Gen 12:215-223. |
The 1000 Genomes Project, 2015, A global reference for human genetic variation, Nature 526:68-74. |
The Variant Call Format (VCF) Version 4.2 Specification (Jan. 26, 2015), available at https://samtools.github.io/hts-specs/VCFv4.2.pdf. |
Thomas, 2014, Community-wide effort aims to better represent variation in human reference genome, Genome Web (11 pages). |
Torri et al., 2012, Next generation sequence analysis and computational genomics using graphical pipeline workflows, Genes (Basel) 3(3):545-575. |
Trapnell et al., 2009, TopHat: discovering splice junctions with RNA-Seq, Bioinformatics 25:1105-1111. |
Trapnell et al., 2010, Transcript assembly and quantification by RNA-Seq reveals unannotated trancripts and isoform switching during cell differentiation, Nature Biotechnology 28(5):511-515. |
Truszkowski, 2011, New developments on the cheminformatics open workflow environment CDK-Taverna, J Cheminform 3:54. |
Turi, 2007, Taverna Workflows: Syntax and Semantics, IEEE Int Conf on e-Science and Grid Computing 441-448. |
Uchiyama et al., CGAT: a comparative genome analysis tool for visualizing alignments in the analysis of complex evolutionary changes between closely related genomes, 2006, e-pp. 1-17, vol. 7:472; BMC Bioinformatics. |
Wallace, 2005, Multiple sequence alignments, Curr Op Struct Biol 15(3):261-266. |
Wang et al., 2009, RNA-Seq: a revolutionary tool for transcriptomics, Nat Rev Genet 10(I):57-63. |
Wang, et al., 2011, Next generation sequencing has lower sequence coverage and poorer SNP-detection capability in the regulatory regions, Scientific Reports 1:55. |
Wassink, 2009, Using R in Taverna: RShell v1.2. BMC Res Notes 2:138. |
Waterman, et al., 1976, Some biological sequence metrics, Adv. in Math. 20(3):367-387. |
Wellcome Trust Case Control Consortium, 2007, Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls, Nature 447:661-678. |
Wolstencroft, 2005, Panoply of Utilities in Taverna, Proc 2005 1st Int Conf e-Science and Grid Computing 156-162. |
Wolstencroft, 2013, The Taverna Workflow Suite: Designing and Executing Workflows of Web Services on the Desktop, Web or in the Cloud, Nucl Acids Res 41(W1):W556-W561. |
Written Opinion issued in SG 11201601124Y. |
Written Opinion issued in SG 11201602903X. |
Written Opinion issued in SG 11201603039P. |
Written Opinion issued in SG 11201603044S. |
Written Opinion issued in SG 11201605506Q. |
Wu et al., Fast and SNP-tolerant detection of complex variants and splicing in short reads, Bioinformatics, vol. 26 No. 7 2010, pp. 873-881. |
Xing et al., 2006, An expectation-maximization algorithm for probabilistic reconstructions of full-length isoforms from splice graphs, Nucleic Acids Research, 34:3150-3160. |
Yang, 2013, Leveraging reads that span multiple single nucleotide polymorphisms for haplotype inference from sequencing data, Bioinformatics 29(18):2245-2252. |
Yang, 2014, Community detection in networks with node attributes, proc IEEE ICDM '13, arXiv:1401.7267. |
Yanovsky et al., 2008, Read mapping algorithms for single molecule sequencing data, Alg Bioinf 38-49, Springer Berlin. |
Yanovsky, et al., 2008, Read mapping algorithms for single molecule sequencing data, Procs of the 8th Int Workshop on Algorithms in Bioinformatics 5251:38-49. |
Yildiz, 2014, BiFI: a Taverna piugin for a simplified and user-friendly workflow platform, BMC Res Notes 7:740. |
Yu et al., 2007, A tool for creating and parallelizing bioinformatics pipelines, DOD High Performance Computing Conf., 417-420. |
Yu et al., The construction of a tetraploid cotton genome wide comprehensive reference map, Genomics 95 (2010) 230-240. |
Yu, et al., A tool for creating and parallelizing bioinformatics pipelines, DOD High Performance Computing Conf., 417-420 (2007). |
Zeng, 2013, PyroHMMvar: a sensitive and accurate method to call short indels and SNPs for Ion Torrent and 454 data, Bioinformatics 29:22 2859-2868. |
Zhang et al., Construction of a high-density genetic map for sesame based on large scale marker development by specific length amplified fragment (SLAF) sequencing. (2013) pp. 1-12, vol. 13, BMC Plant Biology. |
Zhang, 2013, Taverna Mobile: Taverna workflows on Android, EMBnet J 19(8):43-45. |
Zhao, 2012, Why Workflows Break-Understanding and Combating Decay in Taverna Workflows, eScience 2012, Chicago, Oct. 2012. |
Also Published As
Publication number | Publication date |
---|---|
US20210280272A1 (en) | 2021-09-09 |
US20150199473A1 (en) | 2015-07-16 |
WO2015058095A1 (en) | 2015-04-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11837328B2 (en) | Methods and systems for detecting sequence variants | |
US20210280272A1 (en) | Methods and systems for quantifying sequence alignment | |
US20210398616A1 (en) | Methods and systems for aligning sequences in the presence of repeating elements | |
US11211146B2 (en) | Methods and systems for aligning sequences | |
US20220411881A1 (en) | Methods and systems for identifying disease-induced mutations | |
US20190272891A1 (en) | Methods and systems for genotyping genetic samples | |
EP3053073B1 (en) | Methods and system for detecting sequence variants |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: SEVEN BRIDGES GENOMICS INC., MASSACHUSETTS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:KURAL, DENIZ;REEL/FRAME:036092/0783 Effective date: 20150401 |
|
AS | Assignment |
Owner name: BROWN RUDNICK, MASSACHUSETTS Free format text: NOTICE OF ATTORNEY'S LIEN;ASSIGNOR:SEVEN BRIDGES GENOMICS INC.;REEL/FRAME:044174/0113 Effective date: 20171011 |
|
AS | Assignment |
Owner name: MJOLK HOLDING BV, NETHERLANDS Free format text: SECURITY INTEREST;ASSIGNOR:SEVEN BRIDGES GENOMICS INC.;REEL/FRAME:044305/0871 Effective date: 20171013 |
|
AS | Assignment |
Owner name: SEVEN BRIDGES GENOMICS INC., MASSACHUSETTS Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:MJOLK HOLDING BV;REEL/FRAME:045928/0013 Effective date: 20180412 |
|
AS | Assignment |
Owner name: SEVEN BRIDGES GENOMICS INC., MASSACHUSETTS Free format text: TERMINATION AND RELEASE OF NOTICE OF ATTORNEY'S LIEN;ASSIGNOR:BROWN RUDNICK LLP;REEL/FRAME:046943/0683 Effective date: 20180907 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: FINAL REJECTION MAILED |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NOTICE OF ALLOWANCE MAILED -- APPLICATION RECEIVED IN OFFICE OF PUBLICATIONS |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: PUBLICATIONS -- ISSUE FEE PAYMENT VERIFIED |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
FEPP | Fee payment procedure |
Free format text: ENTITY STATUS SET TO UNDISCOUNTED (ORIGINAL EVENT CODE: BIG.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
AS | Assignment |
Owner name: IMPERIAL FINANCIAL SERVICES B.V., NETHERLANDS Free format text: SECURITY INTEREST;ASSIGNOR:SEVEN BRIDGES GENOMICS INC.;REEL/FRAME:059554/0165 Effective date: 20220330 |
|
AS | Assignment |
Owner name: IMPERIAL FINANCIAL SERVICES B.V., NETHERLANDS Free format text: SECURITY INTEREST;ASSIGNOR:SEVEN BRIDGES GENOMICS INC.;REEL/FRAME:060173/0803 Effective date: 20220520 Owner name: SEVEN BRIDGES GENOMICS INC., MASSACHUSETTS Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:IMPERIAL FINANCIAL SERVICES B.V.;REEL/FRAME:060173/0792 Effective date: 20220523 |
|
AS | Assignment |
Owner name: SEVEN BRIDGES GENOMICS INC., MASSACHUSETTS Free format text: RELEASE BY SECURED PARTY;ASSIGNOR:IMPERIAL FINANCIAL SERVICES B.V.;REEL/FRAME:061055/0078 Effective date: 20220801 |
|
AS | Assignment |
Owner name: ORBIMED ROYALTY & CREDIT OPPORTUNITIES III, LP, NEW YORK Free format text: SECURITY INTEREST;ASSIGNORS:PIERIANDX, INC.;SEVEN BRIDGES GENOMICS INC.;REEL/FRAME:061084/0786 Effective date: 20220801 |