WO2008154317A1 - Methods and processes for calling bases in sequence by incorporation methods - Google Patents
Methods and processes for calling bases in sequence by incorporation methods Download PDFInfo
- Publication number
- WO2008154317A1 WO2008154317A1 PCT/US2008/065996 US2008065996W WO2008154317A1 WO 2008154317 A1 WO2008154317 A1 WO 2008154317A1 US 2008065996 W US2008065996 W US 2008065996W WO 2008154317 A1 WO2008154317 A1 WO 2008154317A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- pulse
- signal
- spectral
- optical signal
- data
- Prior art date
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6869—Methods for sequencing
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
- G16B30/10—Sequence alignment; Homology search
Definitions
- the invention is generally directed to processes, and particularly computer implemented processes for analyzing fluorescent signals from sequence by incorporation systems, and for ultimately identifying sequence information of a target nucleic acid sequence. Consequently, the invention is also directed to systems that carry out these processes.
- logic systems and methods such as described herein can include a variety of different components and different functions in a modular fashion. Different embodiments of the invention can include different mixtures of elements and functions and may group various functions as parts of various elements. For purposes of clarity, the invention is described in terms of systems that include many different innovative components and innovative combinations of innovative components and known components. No inference should be taken to limit the invention to combinations containing all of the innovative components listed in any illustrative embodiment in this specification.
- Figure 1 provides a schematic illustration of an overall system used, inter alia, for sequencing by incorporation analyses.
- Figure 2 is an EMCCD derived image of fluorescent signal pulses from an array of sequencing reactions.
- Figure 3 provides a schematic flow-chart of the process used in monitoring sequencing reactions and identifying bases in the sequence.
- Figure 4 is a high level flow chart breakdown of an overall real time sequence by incorporation process.
- Figure 5 is a detailed flow chart illustrating a number of the process steps included in the overall process shown in Figure 4.
- Figure 6 is an EMCCD image of a trans-illuminated array of zero mode waveguides.
- Figure 7 is a close up image of a subset of the trans-illuminated waveguides in the array which also shows the centroid determination for each imaged waveguide.
- Figure 8 illustrates an exemplary 4 channel signal trace derived from an imaged waveguide.
- Figure 9 provides a schematic flow-chart of a pulse recognition process.
- Figure 10 illustrates an integrated trace following base classification of a pulse trace, including base calls.
- Figure 11 is a spectral reference pixel image of a trans-illuminated array of zero mode waveguides according to specific embodiments of the invention.
- Figure 12 is a flow chart illustrating an example method of base calling from a sequencing array using a logic processing system according to specific embodiments of the invention.
- Figure 13 is a diagram illustrating an example method of base calling from a sequencing array using a logic processing system according to specific embodiments of the invention.
- Figure 14 is a flow chart illustrating an example method of determining a reference spectral centroid for a reaction location image according to specific embodiments of the invention.
- Figure 15 is a diagram illustrating an initial gridding step and corrected centroids according to specific embodiments of the invention.
- Figure 16 is a flow chart illustrating an example method of determining an alignment for reaction locations and a central axis for individual reaction locations from multiple reaction location images according to specific embodiments of the invention.
- Figure 17 is a flow chart illustrating an example method of determining high-resolution dye spectral templates and down-sampled individualized spectral templates according to specific embodiments of the invention.
- Figure 18 is a diagram illustrating stitching high-resolution dye spectral templates from multiple captured spectral calibration data according to specific embodiments of the invention.
- Figure 19 is a flow chart illustrating an example method of determining background noise for reaction location images according to specific embodiments of the invention.
- Figure 20 is a flow chart illustrating an example method of data reduction for a reaction location image according to specific embodiments of the invention.
- Figure 21 illustrates an example of four superimposed spectral images representing different dyes imaged from two different reaction locations according to specific embodiments of the invention.
- Figure 22 is a flow chart illustrating an example method of dye trace extraction from a time series of a reaction location images according to specific embodiments of the invention.
- Figure 23 is a diagram illustrating using an individualized set of spectral templates for a ZMW to extract four spectral traces using a flux calculation according to specific embodiments of the invention.
- Figure 24 is a diagram illustrating a set of dye-weighted spectral traces (to) and a set of multi-component spectral traces (bottom) extracted from captured data for a reaction location according to specific embodiments of the invention.
- Figure 25 is a flow chart illustrating an example method of pulse detection in a dye trace according to specific embodiments of the invention.
- Figure 26 is a diagram illustrating an example of analysis of one pulse in one spectral trace according to specific embodiments of the invention.
- Figure 27 is a diagram illustrating a number of pulses associated with incorporation events and showing that some pulses with low amplitude are likely to be indicative of actual incorporation events.
- Figure 28 is a flow chart illustrating an example method of multi-timescale pulse detection according to specific embodiments of the invention.
- Figure 29 is a diagram illustrating increasing a time window and reducing a threshold for pulse detection according to specific embodiments of the invention.
- Figure 30 is a diagram illustrating algorithmic branch detection according to specific embodiments of the invention.
- Figure 31 is a flow chart illustrating an example method of a pulse merging analysis according to specific embodiments of the invention.
- Figure 32 is a flow chart illustrating an example method of pulse classification according to specific embodiments of the invention.
- Figure 33 is a diagram illustrating an example method of pulse classification according to specific embodiments of the invention.
- Figure 34 is a flow chart illustrating an example method of determining pulse metrics according to specific embodiments of the invention.
- Figure 35 is a flow chart illustrating an example method of consensus determination according to specific embodiments of the invention.
- Figure 36 is a diagram illustrating an example HMM model used in consensus calling according to specific embodiments of the invention.
- Figure 37 is a diagram illustrating an example HMM model with a branch prediction probability according to specific embodiments of the invention.
- Figure 38 is a diagram illustrating training an HMM with a set of possibly predictive error parameters and a known read sequence of a known DNA sequence template.
- Figure 39 is a diagram illustrating employing an HMM with a set of predictive error parameters, a known read sequence, and an unknown DNA sequence template.
- Figure 40 is a block diagram showing a representative example logic device in which various aspects of the present invention may be embodied.
- the present invention is generally directed to novel processes, and particularly computer implemented processes, software and systems for monitoring and characterizing optical signals from analytical systems, and particularly systems that produce signals related to the sequence of nucleic acids in a target or template nucleic acid sequence, using a sequencing by incorporation process.
- the present invention is also generally directed to novel processes for analyzing optical and associated data from sequencing by incorporation processes to ultimately determine a nucleotide base sequence (also referred to herein as "base calling').
- base calling' also referred to herein as "base calling'”
- the present invention is also generally directed to novel processes for analyzing sequencing by incorporation processes from many reactions locations in real time.
- the identity of the sequence of nucleotides in a template nucleic acid sequence is determined by identifying each complementary base that is added to a nascent strand being synthesized against the template sequence, as such bases are added. While detection of added bases may be a result of detecting a byproduct of the synthesis or extension reaction, e.g., detecting released pyrophosphate, in many systems and processes, added bases are labeled with fluorescent dyes that permit their detection. By uniquely labeling each base with a distinguishable fluorescent dye, one attaches a distinctive detectable characteristic to each dye that is incorporated, and as a result provides a basis for identification of an incorporated base, and by extension, its complementary base upon the template sequence.
- a number of sequencing by incorporation methods utilize a solid phase immobilized synthesis complex that includes a DNA polymerase enzyme, a template nucleic acid sequence, and a primer sequence that is complementary to a portion of the template sequence.
- the fluorescently labeled nucleotides are then added to the immobilized complex and if complementary to the next base in the template adjacent to the primer sequence, they are incorporated onto the 5' end of the primer as an extension reaction product.
- the labeled bases are added under conditions that prevent more than a single nucleotide addition. Typically, this is accomplished through the inclusion of a removable extension terminating group on the 5' position of the added nucleotide, which prevents any further extension reactions.
- the removable terminating group may include the fluorescent label.
- the immobilized complex is interrogated with one or more labeled nucleotide analogs. When a labeled analog is added, the extension reaction stops. The complex is then washed to eliminate all unincorporated labeled nucleotides. Incorporation is then determined based upon the presence of a particular fluorescent label with the immobilized complex, indicating incorporation of the base that was so labeled.
- the removable chain terminating group and the label which in some cases may comprise the same group, are then removed from the extension product and the reaction and interrogation is repeated, stepwise, along the template sequence.
- incorporation events are detected in real-time as the bases are incorporated into the extension product. Briefly, this is accomplished by providing the complex immobilized within an optically confined space or otherwise resolved as an individual molecular complex. Nucleotide analogs that include fluorescent labels coupled to the polyphosphate chain of the analog are then exposed to the complex. Upon incorporation, the nucleotide along with its label is retained by the complex for a time and in a manner that permits its detection that is distinguishable from detection of random diffusion of unincorporated bases. Upon completion of incorporation, all but the alpha phosphate group of the nucleotide is cleaved away, liberating the label from retention by the complex, and diffusing the signal from that label.
- a complementary nucleotide analog including its fluorescent labels is effectively "immobilized” for a time at the incorporation site, and then the fluorescent label is subsequently released and diffuses away when incorporation is completed.
- detecting the localized "pulses" of florescent tags at the incorporation site, and distinguishing those pulses from a variety of other signals and background noise as described below allows the invention to effective call bases is real-time as they are being incorporated.
- the signal resulting from incorporation can have one or more of increased intensity and duration as compared to random diffusion events and/or other non-incorporation events.
- optical signal data is required to be processed to indicate real incorporation events as compared to other signals derived from non-incorporation events, and to identify the bases that are incorporated in those real incorporation events.
- the processing requirements become even greater where multiple different colored labels are used in interrogating larger and larger numbers of immobilized complexes arrayed over reaction substrates.
- the processes and systems will be described with reference to detection of incorporation events in a real time, sequence by incorporation process, e.g., as described in U.S. Patent Nos. 7,056,661, 7,052,847, 7,033,764 and 7,056,676 (the full disclosures of which are incorporated herein by reference in their entirety for all purposes), when carried out in arrays of discrete reaction regions or locations.
- An exemplary sequencing system for use in conjunction with the invention is shown in Figure 1.
- the system 100 includes a substrate 102 that includes a plurality of discrete sources of optical signals, e.g., reaction wells or optical confinements or reaction locations 104.
- reaction locations 104 are regularly spaced and thus substrate 102 can also be understood as an array 102 of reaction locations 104.
- An excitation light source e.g., laser 106
- Emitted signals from source 104 are then collected by the optical components, e.g., objective lens 110, and passed through additional optical elements, e.g., dichroic 108, prism 112 and lens 114, until they are directed to and impinge upon an optical detection system, e.g., detector array (or camera ) 116.
- optical detection system e.g., detector array (or camera ) 116.
- the signals are then detected by detector array 116, and the data from that detection is transmitted to an appropriate data processing unit, e.g., computer 118, where the data is subjected to interpretation, analysis, and ultimately presented in a user ready format, e.g., on display 120, printout 122 from printer 124, or the like, or may be stored in an appropriate database, transmitted to another computer system, or recorded onto tangible media for further analysis and/or later review.
- Connection of the detector to the computer may take on a variety of different forms.
- the detector is coupled to appropriate Analog to Digital (AIO) converter that is then coupled to an appropriate connector in the computer.
- AIO Analog to Digital
- Such connections may be standard USB connections, Fire wire® connections, Ethernet connections or other high speed data connections.
- the detector or camera may be formatted to provide output in a digital format and be readily connected to the computer without any intermediate components.
- each base incorporation event results in a prolonged illumination (or localization) of one of four differentially labeled nucleotides being incorporated, so as to yield a recognizable pulse that carries a distinguishable spectral profile or color.
- the present invention is generally directed to machine or computer implemented processes, and/or software incorporated onto a computer readable medium instructing such processes, as set forth in greater detail below.
- signal data generated by the reactions and optical systems described above is input or otherwise received into a computer or other data processor, and subjected to one or more of the various process steps or components set forth below.
- the resulting output of the computer implemented processes may be produced in a tangible or observable format, e.g., printed in a user readable report, displayed upon a computer display, or it may be stored in one or more databases for later evaluation, processing, reporting or the like, or it may be retained by the computer or transmitted to a different computer for use in configuring subsequent reactions or data processes.
- Computers for use in carrying out the processes of the invention can range from personal computers such as PC or Macintosh® type computers running Intel Pentium or DuoCore processors, to workstations, laboratory equipment, or high speed servers, running UNIX, LINUX, Windows®, or other systems.
- Logic processing of the invention may be performed entirely by general purposes logic processors (such as CPU's) executing software and/or firmware logic instructions; or entirely by special purposes logic processing circuits (such as ASICs) incorporated into laboratory or diagnostic systems or camera systems which may also include software or firmware elements; or by a combination of general purpose and special purpose logic circuits.
- Data formats for the signal data may comprise any convenient format, including digital image based data formats, such as JPEG, GIF, BMP, TIFF, or other convenient formats, while video based formats, such as avi, mpeg, mov, rmv, or other video formats may be employed.
- the software processes of the invention may generally be programmed in a variety of programming languages including, e.g., Matlab, C, C++, C#, NET, Visual Basic, Python, JAVA, CGI, and the like.
- Figure 2 provides a single captured image frame of signal data on a portion of an EMCCD, where light portions correspond to optical signals that are incident upon the detector. Over time, these light areas appear as a series of flashes. Where those flashes have the appropriate characteristics, they are identified as an incorporation based signal, and assigned a color, based upon their spectral composition, e.g., as determined from their location of the detector.
- the present invention is directed to automated processes, and machine readable software that instructs such processes, for deciphering the signal data from a detection system that is optically coupled to any of the foregoing reactions, and particularly where such processes identify the incorporation of a nucleotide or nucleotide analog in a template dependent fashion, and identify the label associated with the incorporated analog and by extension, the analog and its complementary base in the template sequence.
- a general flow chart illustrating the processing of signal data is provided in Figure 3.
- signal data is received by the processor at step 300.
- a number of initial calibrations operations may be applied as step 302. Some of these initial calibration steps may be performed just once at the beginning of a run or on a more continuous basis during the run. These initial calibration steps can include such things as centroid determination, alignment, gridding, drift correction, initial background subtraction, noise parameter adjustment, frame-rate adjustment, etc. Some of these initial calibration steps, such as binning, may involve communication from the processor back to the detector/camera, as discussed further below.
- spectral trace determination/spectral trace extraction/spectral filters are applied to the initial signal data at step 302. Some of all of this filter step may optionally be carried out at a later point in the process, e.g., after the pulse identification step 304.
- the spectral trace extraction/spectral filters may include a number of noise reduction and other filters as set forth elsewhere herein.
- Spectral trace determination is performed at this stage for many of the example systems discussed herein because the initial signal data received are the light levels, or photon counts, captured by a series of adjacent pixel detectors. For example, in one example system, 14 pixels (or intensity levels) from 14 positions are captured for an individual wave-guide at each frame.
- spectral trace extraction may be performed using various analysis, as discussed below, that provide the highest signal- to-noise ratio for each spectral trace.
- methods of the invention may also analyze a single signal derived from the intensity levels at the multiple pixel positions (this may be referred to as a summed spectral signal or a gray-scale spectral signal or an intensity level signal).
- a method according to the invention may analyze the multiple captured pixel data using a statistical model such as a Hidden Markov Model. In present systems, however, determining multiple (e.g., four) spectral traces from the initial signal data has proven a preferred method.
- Whether the signal can be categorized as a significant signal pulse or event is determined at step 304.
- various statistical analysis techniques may be performed in determining whether a significant pulse has been detected.
- a further optional spectral profile comparison may be performed to verify the spectral assignment.
- This spectral profile comparison is optional in embodiments where spectral traces are determined prior to or during pulse identification.
- a color is assigned to a given incorporation signal, that assignment is used to call either the base incorporated, or its complement in the template sequence, at step 308.
- the compilation of called bases is then subjected to additional processing at step 310, to provide linear sequence information, e.g., the successive sequence of nucleotides in the template sequence, assemble sequence fragments into longer contigs, or the like.
- the signal data is input into the processing system, e.g., an appropriately programmed computer or other processor.
- Signal data may input directly from a detection system, e.g., for real time signal processing, or it may be input from a signal data storage file or database. In some cases, e.g., where one is seeking immediate feedback on the performance of the detection system, adjusting detection or other experimental parameters, real-time signal processing will be employed.
- signal data is stored from the detection system in an appropriate file or database and is subject to processing in post reaction or non-real time fashion.
- the signal data used in conjunction with the present invention may be in a variety of forms.
- the data may be numerical data representing intensity values for optical signals received at a given detector or detection point of an array based detector.
- Signal data may comprise image data from an imaging detector, such as a CCD, EMCCD, ICCD or CMOS sensor.
- an imaging detector such as a CCD, EMCCD, ICCD or CMOS sensor.
- signal data used according to specific embodiments of the invention generally includes both intensity level information and spectral information. In the context of separate detector elements, such spectral information will generally includes identification of the location or position of the detector portion (e.g., a pixel) upon which an intensity is detected..
- the spectral image data will typically be the data derived from the image data that correlates with the calibrated spectral image data for the imaging system and detector when the system includes spectral resolution of overall signals, e.g., as shown in Figure 1.
- the spectral data may be obtained from the image data that is extracted from the detector, or alternatively, the derivation of spectral data may occur on the detector such that spectral data will be extracted from the detector.
- optical signal that is detected by the detection system that is not the result of a signal from an incorporation event.
- signal referred to hereafter as “noise” may derive from a number of sources that may be internal to the monitored reaction, internal to the detection system and/or external to all of the above.
- noise internal to the reaction being monitored includes, e.g.: presence of fluorescent labels that are not associated with a detection event, e.g., liberated labels, labels associated with unincorporated bases in diffused in solution, bases associated with the complex but not incorporated; presence of multiple complexes in an individual observation volume or region; non-specific adsorption of dyes or nucleotides to the substrate or enzyme complex within an observation volume; contaminated nucleotide analogs, e.g., contaminated with other fluorescent components; other reaction components that may be weakly fluorescent; spectrally shifting dye components, e.g., as a result of reaction conditions; and the like.
- Sources of noise internal to the detection system, but outside of the reaction mixture can include, e.g., reflected excitation radiation that bleeds through the filtering optics; scattered excitation or fluorescent radiation from the substrate or any of the optical components; spatial cross-talk of adjacent signal sources; auto-fluorescence of any or all of the optical components of the system; read noise from the detector, e.g., CCDs, gain register noise, e.g., for EMCCD cameras, and the like.
- Other system derived noise contributions can come from data processing issues, such as background correction errors, focus drift errors, autofocus errors, pulse frequency resolution, alignment errors, and the like. Still other noise contributions can derive from sources outside of the overall system, including ambient light interference, dust, and the like.
- noise components contribute to the background photons underlying any signal pulses that may be associated with an incorporation event. As such, the noise level will typically form the limit against which any signal pulses may be determined to be statistically significant.
- Identification of noise contribution to overall signal data may be carried out by a number of methods, including, for example, signal monitoring in the absence of the reaction of interest, where any signal data is determined to be irrelevant.
- a baseline signal is estimated and subtracted from the signal data that is produced by the system, so that the noise measurement is made upon and contemporaneously with the measurements on the reaction of interest.
- Generation and application of the baseline may be carried out by a number of means, which are described in greater detail below.
- signal processing methods distinguish between noise, as broadly applied to all non-significant pulse based signal events, and significant signal pulses that may, with a reasonable degree of confidence, be considered to be associated with, and thus can be tentatively identified as, an incorporation event.
- a signal event is first classified as to whether it constitutes a significant signal pulse based upon whether such signal event meets any of a number of different pulse criteria. Once identified or classified as a significant pulse, the signal pulse may be further assessed to determine whether the signal pulse constitutes an incorporation event and may be called as a particular incorporated base.
- the basis for calling a particular signal event as a significant pulse, and ultimately as an incorporation event will be subject to a certain amount of error, based upon a variety of parameters as generally set forth herein.
- the aspects of the invention that involve classification of signal data as a pulse, and ultimately as an incorporation event or an identified base are subject to the same or similar errors, and such nomenclature is used for purposes of discussion and as an indication that it is expected with a certain degree of confidence that the base called is the correct base in the sequence, and not as an indication of absolute certainty that the base called is actually the base in a given position in a given sequence.
- One such signal pulse criterion is the ratio of the signals associated with the signal event in question to the level of all background noise ("signal to noise ratio" or "SNR"), which provides a measure of the confidence or statistical significance with which one can classify a signal event as a significant signal pulse.
- SNR signal to noise ratio
- the signal In distinguishing a significant pulse signal from systematic or other noise components, the signal generally must exceed a signal threshold level in one or more of a number of metrics, including for example, signal intensity, signal duration, temporal signal pulse shape, pulse spacing, and pulse spectral characteristics.
- signal data may be input into the processing system. If the signal data exceeds a signal threshold value in one or more of signal intensity and signal duration, it may be deemed a significant pulse signal.
- the signal may be compared against such metrics in identifying a particular signal event as a significant pulse.
- this comparison will typically involve at least one of the foregoing metrics, and preferably at least two such thresholds, and in many cases three or all four of the foregoing thresholds in identifying significant pulses.
- Signal threshold values whether in terms of signal intensity, signal duration, pulse shape, spacing or pulse spectral characteristics, or a combination of these, will generally be determined based upon expected signal profiles from prior experimental data, although in some cases, such thresholds may be identified from a percentage of overall signal data, where statistical evaluation indicates that such thresholding is appropriate. In particular, in some cases, a threshold signal intensity and/or signal duration may be set to exclude all but a certain fraction or percentage of the overall signal data, allowing a real-time setting of a threshold. Again, however, identification of the threshold level, in terms of percentage or absolute signal values, will generally correlate with previous experimental results. In alternative aspects, the signal thresholds may be determined in the context of a given evaluation.
- a pulse intensity threshold may be based upon an absolute signal intensity, but such threshold would not take into account variations in signal background levels, e.g., through reagent diffusion, that might impact the threshold used, particularly in cases where the signal is relatively weak compared to the background level.
- the methods of the invention determine the background fluorescence of the particular reaction in question, including, in particular, the contribution of freely diffusing dyes or dye labeled analogs into a zero mode waveguide, and set the signal threshold above that actual background by the desired level, e.g., as a ratio of pulse intensity to background fluorophore diffusion, or by statistical methods, e.g., 5 sigma, or the like.
- reaction background is meant the level of background signal specifically associated with the reaction of interest and that would be expected to vary depending upon reaction conditions, as opposed to systemic contributions to background, e.g., autofluorescence of system or substrate components, laser bleedthrough, or the like.
- identification of a significant signal pulse may rely upon a signal profile that traverses thresholds in both signal intensity and signal duration. For example, when a signal is detected that crosses a lower intensity threshold in an increasing direction, ensuing signal data from the same set of detection elements, e.g., pixels, are monitored until the signal intensity crosses the same or a different intensity threshold in the decreasing direction. Once a peak of appropriate intensity is detected, the duration of the period during which it exceeded the intensity threshold or thresholds is compared against a duration threshold. Where a peak comprises a sufficiently intense signal of sufficient duration, it is called as a significant signal pulse.
- Figure 3 provides a schematic of the process flow for an exemplary pulse classification aspect of the invention.
- pulse classification may employ a number of other signal parameters in classifying pulses as significant.
- signal parameters include, e.g., pulse shape, spectral profile of the signal, e.g., pulse spectral centroid, pulse height, pulse diffusion ratio, pulse spacing, total signal levels, and the like.
- signal data may be correlated to a particular signal type.
- this typically denotes a particular spectral profile of the signal giving rise to the signal data.
- the optical detection systems used in conjunction with the methods and processes of the invention are generally configured to receive optical signals that have distinguishable spectral profiles, where each spectrally distinguishable signal profile may generally be correlated to a different reaction event.
- each spectrally distinguishable signal may be correlated or indicative of a specific nucleotide incorporated or present at a given position of a nucleic acid sequence.
- the detection systems include optical trains that receive such signals and separate the signals based upon their spectra. The different signals are then directed to different detectors, to different locations on a single array based detector, or are differentially imaged upon the same imaging detector (See, e.g., U.S. Patent Publication No. 2007/0036511, which is incorporated herein by reference in its entirety for all purposes).
- the detection systems used in conjunction with the invention utilize an imaging detector upon which all or at least several of the different spectral components of the overall signal are imaged in a manner that allows distinction between different spectral components.
- multiple signal components are directed to the same overall detector, but may be incident upon wholly or partly different regions of the detector, e.g., imaged upon different sets of pixels in an imaging detector, and give rise to distinguishable spectral images (and associated image data).
- spectra or spectral image generally indicates a pixel image or frame (optionally data reduced to one dimension) that has multiple intensities caused by the spectral spread of an optical signal received from a reaction location.
- the spectral classification, or identification of a color, associated with a given signal image on a detector may be accomplished by a number of methods.
- a spectral image associated with a given signal (which may or may not be an incorporation event signal) is compared to a standard set of spectral image profiles associated with the signal events for which the system is being interrogated. Restated, a standard set of spectral image profiles are determined for the labels associated with the four different nucleotides and/or incorporation of those nucleotides, and those standards are used as comparators in identifying to which color a given unknown spectral image corresponds.
- a signal source such as a reaction region is illuminated while containing only the individual fluorescent labels or fluorescently labeled nucleotide analogs of one dye color that give rise to signals during the monitored reaction, e.g., in the absence of the reaction complex.
- the spectral image for each color of dye is then stored for use in the later comparison with the spectral images from actual reaction derived signals. This standard set is then used as the comparator in identifying whether the spectral image from an actual signal event can be assigned to a given color with an acceptable level of confidence, and if so, what that color is.
- the spectral profiles may be determined based upon theoretical models of the optical system and the emission spectra of the signal producing reagents, e.g., labeled nucleotides, without the need for empirical determination of the standard spectral images.
- the comparison of a given signal's spectral image to the standard spectral image profiles for the various colors of signals will assess the confidence with which a color may be assigned to a given signal event, based upon a number of parameters.
- whether a given spectral image is identified as matching one of the standard spectral image profiles may be determined by subjecting the comparison to any of a variety of statistical correlation evaluations including, e.g., cross-correlation tests, ⁇ 2 , least squares fit, and the like.
- the steps of incorporation signal identification and color assignment may be performed in either order and are not dependent upon each other. Restated, one may first assign a color to the signal before categorizing it as a significant pulse, or alternatively, one may first categorize a signal as a significant pulse and then assign a color to that pulse. [0052] Once a particular signal is identified as a significant pulse and is assigned a particular spectrum, the spectrally assigned pulse may be further assessed to determine whether the pulse can be called an incorporation event and, as a result, call the base incorporated in the nascent strand, or its complement in the template sequence. Calling of bases from color assigned pulse data will typically employ tests that again identify the confidence level with which a base is called.
- Such tests will take into account the data environment in which a signal was received, including a number of the same data parameters used in identifying significant pulses, etc.
- such tests may include considerations of background signal levels, adjacent pulse signal parameters (spacing, intensity, duration, etc.), spectral image resolution, and a variety of other parameters.
- Such data may be used to assign a score to a given base call for a color assigned signal pulse, where such scores are correlative of a probability that the base called is incorrect, e.g., 1 in 100 (99% accurate), 1 in 1000 (99.9% accurate), 1 in 10,000 (99.99% accurate), 1 in 100,000 (99.999% accurate), or even greater.
- scores may be used to provide an indication of accuracy for sequencing data and/or filter out sequence information of insufficient accuracy.
- the optically confined complexes are provided within the observation volumes of discrete zero mode waveguide (ZMW) cores in an arrayed format.
- ZMW discrete zero mode waveguide
- the methods of the invention in whole or in part may be applicable to other types of sequencing by incorporation reactions and particularly those based upon immobilized reaction complexes, and more particularly, those employing optically resolvable single molecule complexes, e.g., including a single polymerase/template/primer complex.
- An example of an overall sequence process comprised of three general process categories is generally shown in Figure 4.
- the process includes an initial calibration step 400, in which the system is calibrated from both an instrument standpoint and a reaction standpoint, e.g., to adjust for consistent noise sources, and calibrated for color identification, e.g., using standard dye or label sets.
- initial signal processing e.g., to identify significant signal events or pulses and to extract spectral signals from the signal data.
- classified pulses are then assessed at step 404 to classify signal data as corresponding to a given spectral signal event, and consequently as corresponding to incorporation of a particular base, or its complement.
- This exemplary, overall process is schematically illustrated in greater detail in the flow chart of Figure 5.
- the system is initially calibrated to provide identification of the signals associated with each discrete signal source, to identify within that signal the most relevant signal portion, and to associate spectral information with different signal profiles.
- This calibration process provided in greater detail below, is indicated at step 500, and is typically precedent and supplementary to the process of signal data processing from actual sequencing runs.
- the signal image or movie files for a given run are converted to spectral data at step 504 by comparing the overall signal data to the spectral standards created in step 500.
- signals received from each waveguide are converted to two dimensional time-series or one dimensional spectral time series traces.
- the output of the conversion or extraction step is a series of individual movies or traces that indicate the different spectral signal components over time, e.g., as a series of n signal traces. For a typical four-color sequencing process, this will typically result in four different traces, where each trace represents the signal spectrum correlated with a different standard spectral image profile.
- the pulse recognition process identifies significant signal pulses (e.g., pulses that meet criteria of significance for assessment to determine if they are associated with an incorporation event) in each trace, and distinguishes those from background or noise signals, e.g., those resulting from normal diffusion of unincorporated label molecules or labeled nucleotides into the observation volume, non-specific adsorption of labels or analogs within or near the observation volume, or the like.
- significant signal pulses e.g., pulses that meet criteria of significance for assessment to determine if they are associated with an incorporation event
- background or noise signals e.g., those resulting from normal diffusion of unincorporated label molecules or labeled nucleotides into the observation volume, non-specific adsorption of labels or analogs within or near the observation volume, or the like.
- the pulse recognition process identifies significant pulses based upon a number of signal characteristics as described above, including whether such signals meet signal thresholds described above (intensity, duration, temporal pulse shape, pulse spacing and spectral characteristics).
- signal thresholds intensity, duration, temporal pulse shape, pulse spacing and spectral characteristics.
- the time collapsed spectrum for a given significant pulse is extracted and classified at step 508 by correlating the pulse spectrum to the standard spectral image for the various signal possibilities, e.g., dye colors, by comparing the pulse spectrum to the standard spectra, based upon one or a number of different pulse metrics, as set forth elsewhere herein.
- the statistical significance of the fit of the pulse spectral image may be calculated against those spectral images for the 4 different standard dye images, e.g., using a ⁇ 2 test, or the like.
- the pulse is then subjected to the base classification process at step 510, where the spectrum assigned pulse data is further filtered based upon one or more of a number of signal parameters, which provide a basis of classification of the signal as a particular base (also referred to herein as a base classifier).
- the base classifier will typically comprise an algorithm that assesses the one or more signal parameters in order to classify the particular pulse as being correlative of a given base incorporation event.
- such algorithms will typically comprise a multi-parameter fit process to determine whether a spectrum assigned signal pulse corresponds to an incorporation event within a selected probability range, as described in greater detail, below.
- the processes of the invention are particularly useful in processing signal data from arrays of optically confined sequencing or other optically monitored reactions.
- the systems and processes of the invention are particularly preferred for use with arrays of zero mode waveguides in which polymerase mediated, template directed primer extension reactions are occurring, where the addition of a nucleotide to an extending primer gives rise to a fluorescent signaling event.
- the signals emanating from the various signal sources on the array are then imaged onto an imaging detector, such as a CCD, ICCD, EMCCD or CMOS based detector array.
- the array In locating the image of the different signal sources on the detector, the array is typically illuminated so as to provide an imaged signal associated with it on the detector.
- the array In the case of an array of ZMWs (zero mode waveguides), the array is trans-illuminated through the waveguide using a reference light source.
- the referenced light source may be a broad band light source imaged onto the detector through a narrow band-pass filter, e.g., 543 nm, as shown in step 552 in Figure 5 or may be an emitter of narrow-band light.
- Such trans-illumination provides for high signal to noise levels for the image, allowing for more accurate centroid estimation for gridding, more accurate point spread function estimation, and provides a fixed spectral reference point for each imaged waveguide.
- the imaged signals are then aligned to the known spacing of image sources on the waveguide array optionally employing registration marks incorporated into the array.
- registration marks might include regularly spaced image sources that are separate from the waveguides, but are at known locations and spacing relative to the waveguides in the array to permit alignment of the image to the array.
- image sources may include apertures like waveguides, or may include fluorescent or luminescent marks that provide a signal event that can be used for alignment.
- the gridding step also permits the identification and calibration of the system to take into account any artifacts in a given waveguide array, e.g., blank waveguides other than registration blanks, irregularly spaced waveguides, or the like.
- Figure 6 is an image of a trans-illuminated waveguide array that includes four rows of approximately 100 waveguides per row. Alignment may additionally be aided by feature recognition software that correlates signal peaks in the trans-illuminated array image with the known relative locations of waveguides on an array.
- each waveguide image in the intra-row dimension is determined.
- a multi-point spread function is fit to each image to identify the image centroids in the intra-row dimension (See step 504 in Figure 5).
- Figure 7 provides a close-up image of a series of trans-illuminated waveguide images in a given row, showing the identified centroids for each image.
- Figure 7 Also shown in Figure 7 is a blank space that is used in registration of the overall array image, as discussed above.
- the image positions are then used in subsequent processing to associate a given set of signal data to a given waveguide, and consequently a given extension reaction, and to permit collapsing of image data from a given location to spectral images from that location.
- the range of the full imaged spectra for each waveguide are identified and this range may be communicated to detector 116 to allow binning and other data reduction operations to be performed prior to extraction from the detector.
- an image of each ZMW typically has its narrowest dimension along the axis of the specific row in which the waveguide is disposed (and that is orthogonal to the axis of the spectrally separated image).
- the row axis is termed the spatial axis
- the axis that runs through the elongated image of the spectral components of the waveguide is termed the spectral axis.
- the spatial axis dimension of an image will fall within a 5 pixel range, while the spectral axis will typically fall within a pixel range of from about 12 pixels to about 20 or more pixels, depending upon the extent of spectral separation of the image, and the size of the image in the first instance.
- the pixels corresponding to the full spectral image of a given waveguide may range from 60 to 100 or more pixels in a rectangular area.
- These pixels that are associated with each waveguide are optionally combined (binned) by the detector 116 prior to further analysis. This combination may optionally be performed upon full image data extracted from a detector, e.g., a software process, or it may be performed within the detector, e.g., a firmware process, and output to the software process.
- the software based process has a number of advantages, including, e.g.: minimizing data loss during the acquisition of the image data or movie; maximizing the signal to noise ratio of pulses based on establishing flux distribution around each waveguide in spatial dimension and noise characteristics of detector and doing appropriate weighted sum of pixel intensities; detecting and compensating for any instrument drift during movie acquisition; allowance for algorithmically distinguishing some instrument systematic artifacts such as Clock Induced Charge (CIC) noise and cosmic ray events on the CCD from the signals of interest based upon the two dimensional images being processed; the ability to estimate and potentially correct spatial cross-talk between ZMWs.
- CIC Clock Induced Charge
- Certain disadvantages of the software process include, e.g.: a decrease in the maximum frame rate of the detector camera, as more pixels are read out from the camera, reducing the ability to detect shorter pulses; and increased instrument noise compared with firmware processes described below, resulting from read-noise that is associated with each pixel that is read out.
- a weighted sum of pixels along the spatial axis of the ZMW is performed. Weights that maximize the SNR are determined by the inverse variance of each pixel.
- the first step is to estimate ZMW flux distribution shape in the spatial dimension for each line of ZMWs. This shape for individual ZMW signals in the transillumination phase is identical (governed by the instrument Point Spread Function (PSF)). This will typically provide a good estimate of the PSF for subsequent analyses, e.g., in a sequencing movie.
- PSF Point Spread Function
- the regular nature of the grid allows for accurate estimates in the spatial dimension by summing lines centered along a line of ZMWs. By subtracting adjacent lines an accurate local background correction can be made to leave a one dimensional intensity profile of ZMWs whose shapes are governed entirely by the instrument point spread function.
- the instrument PSF can be modeled (e.g., by a Gaussian or Moffat function). Fits to the one dimensional profile of a line of ZMWs may solve for all ZMW amplitudes, the ZMW spacing, and a PSF width. These fits can also solve for more parameters (e.g. a polynomial model of PSF width as a function of FOV position) in order to account for second order effects, such as variation of the optical PSF across the field of view (FOV) of the camera or variations in chip geometry.
- FOV field of view
- the variance of the pixel signal intensity for a given camera is also determined.
- this relationship is predictable and measurable being governed by the Shot noise on the detected photons and the CCD read-noise, as well as variances from the gain register for, e.g., EMCCDs.
- EMCCDs variances from the gain register for, e.g., EMCCDs.
- These CCD parameters are typically estimated from the calibration data using static signal data taken at different intensities, but they can also be measured from stable pixels (pulse free) in a sequencing movie. Using the PSF estimate and the signal-variance relationship the CCD pixels are weighted-summed by their inverse variance to maximize the SNR in the collapsed spectrum.
- this binning or data reductions process reduces a two-dimensional pixel image for each ZMW into a one- dimensional line of pixel values.
- the differences in pixels along this line are due to spectral refraction as described herein.
- each pixel of this line is at times herein referred to as a spectral pixel of a ZMW and the line of pixels is at times herein referred to as a spectra of the ZMW.
- the binning of spectral images from each waveguide is carried out on the detector (e.g., a camera chip in a firmware controlled process).
- the detector e.g., a camera chip in a firmware controlled process.
- a high resolution calibration image is taken and used to establish a map for on-chip binning in the spatial axis of the CCD in hardware to essentially read out spectra directly from the camera.
- this process provides benefits of: reading out fewer pixels, allowing for increased maximum frame rate, for increased sensitivity to the shorter timescale pulses; fewer pixels imaged per ZMW and therefore less instrument read-noise; and less data storage of raw output from camera.
- certain disadvantages of this process include, e.g.: potential for data loss during acquisition from pixels not binned; lower signal to noise ratios per pulse (if pixels are not in regime where read-noise dominates); instrument stability and/or dynamic drift correction must be done by the instrument during acquisition, rather than as a software correction; a reduced ability to distinguish instrument artifacts such as CIC noise and cosmic rays based on spatial profile; and reduced ability to account for and remediate spatial cross-talk between ZMWs, due to loss of spatial information in the image.
- the location of ZMW signals is determined from a full illumination frame, and on-camera (or "on-chip") binning sums (in the spatial direction) only those CCD lines associated with a line of ZMW holes which contains the majority of the signal and reads out only those lines during the actual movie acquisition.
- the optimal binning strategy is the one that maximizes the SNR of pulses from each hole.
- the system is also calibrated for the image spectra from each source.
- signals associated with each of the different incorporated bases have a distinguishable spectrum.
- the system used in the preferred sequencing process e.g., as schematically illustrated in Figure 1, through the use of a wedge prism or other similar optical train component(s), directs each spectrally different signal component from a given waveguide, to a different spot on the detector, where each type of signal from a given waveguide, is separated from the others in one dimension.
- a wedge prism will deflect spectrally different signal components to a greater or lesser degree, depending upon their wavelengths, and image those components on different portions of the detector (See, e.g., U.S. Patent Publication No. 2007/0036511, previously incorporated herein by reference).
- the system In order to identify a signal from a sequencing reaction as corresponding to a given label, the system must first be calibrated to the shape and location on the detector at which each spectrally different signal component from each waveguide, will be imaged, e.g., as shown at step 556 of Figure 5.
- the waveguide array is provided with a standard reference label that may include each pure dye, a pure labeled nucleotide, or another relevant pure component, e.g., a polymerase/template/primer complexed labeled nucleotide.
- the signal of each pure label compound is then imaged upon the detector and its location is mapped. This is repeated for each different label that is to be used in a reaction. For a typical sequencing operation, this would include the four different labels used in identifying each of the four different nucleotides. The result is a spectral template or map for the overall array and for the various different labels to be used in a sequencing operation.
- the calibration spectra will be taken at different locations on a waveguide array, e.g., from different waveguides, than analytical reads, e.g., sequencing movies.
- the positions of the spectral images are measured relative to the reference wavelength used in the transillumination phase, above, which can then be used to correlate spectral images from different waveguide locations obtained during a sequencing movie.
- These spectral templates can then be aligned to the different locations on the CCD as given by the centroids of the transillumination image in the sequencing movie.
- the spectra as seen on the CCD are coarsely sampled and the spectral shape is sensitive to the subpixel centroid location of the ZMW within the image.
- the calibration spectra are taken at multiple subpixel centroids, e.g., 0.1 pixels samplings. These can then be combined into a much higher resolution spectrum than a single image can provide. With a subpixel spectral reference centroid estimated from a ZMWs transillumination image, this high resolution spectrum can then be accurately downsampled to account for any pixelation of the camera.
- a subpixel spectral reference centroid estimated from a ZMWs transillumination image this high resolution spectrum can then be accurately downsampled to account for any pixelation of the camera.
- due to potential distortions in the optics e.g. coma, chromatic and spherical aberration
- one may also obtain calibration spectra across the field of view of the chip. In this way, a unique spectrum is used from the calibration data for a ZMWs position, thereby accounting for spatially dependent effects that may arise from the optics.
- the detector is calibrated by providing for an imaging step while the shutter is closed, to ascertain and calibrate for any noise that may be deriving from the detector itself.
- an overall system noise calibration step may be performed in the absence of any fluorescent or other labeling components within the waveguides in the array, to ascertain and calibrate for noise that derives from the system as a whole, e.g., auto-fluorescence of the optical train components, the array substrate, etc. (See step 558 of Figure 5).
- noise sources will be factored into one or more of the filtering steps in an overall process, e.g., subtracted from overall signal levels, or in one of the other calibration steps, e.g., in identification of the image locations and/or generation of the spectral template.
- the system is also typically calibrated for additional noise sources deriving from the reaction itself, e.g., resulting from nonspecific adsorption of dyes within an observation volume, presence of multiple complexes, and the like, as shown at step 560 of Figure 5.
- Images or movies of signal data deriving from an actual sequencing reaction is processed initially based upon the calibration of the system, as set forth above.
- signal data is associated with a particular signal source, e.g., waveguide, in the array based upon the positional data obtained during the calibration process.
- the result of the calibration process, above is a time series of spectra for each waveguide, which is stored as an image with dimensions of the number of scans and number of spectral pixels (See, e.g., Figure 8).
- the next step is to identify and classify pulse spectra from this image. Since the spectral templates for experimentally represented dyes for each ZMW are known through the above calibration process, these 2D images can be converted to ID time series signals (one for each dye).
- This conversion can be achieved by a number of methods. For example, one may employ a linear matrix inversion in which the dye calibration templates are used to decompose the spectrum into the individual intensities of the pure dye spectra at each time point. Alternatively, or additionally, one may employ a weighted sum of spectral pixels using the dye templates to maximize the SNR of a pure dye sum in its trace.
- the signals for each waveguide are compared to the spectral template and for each located signal source, each spectral component is then collapsed into an individual trace.
- the signal intensity at the image location that corresponds to a particular spectral signal from a particular signal source is plotted and/or monitored as a function of time, to provide a time resolved trace of signal activity of a given color for a given waveguide.
- four different traces will be generated that reflect the intensity of the different signal components over time.
- An example of trace data from four spectral traces from a single waveguide is shown in Figure 8.
- the signal data represented in each trace is an aggregate signal of the particular pixels associated with a given spectral component of the signal.
- an image location may include a plurality of pixels in the detector, in order to yield the most accurate data.
- the overall image can be aggregated and processed as a single data unit. Aggregation of the pixel data can be accomplished in the processor but is preferably carried out in the detector, itself, as an initial process, to minimize the amount of data created by the system and subject to further processing.
- the traces are subjected to the pulse recognition process.
- the pulse recognition process is schematically illustrated in the flow chart of Figure 9.
- the baseline may comprise signal contributions from a number of background sources (depending on the details of the spectral and trace extraction steps).
- background sources e.g., global (out-of-focus) background (e.g., auto-fluorescence and large scale spatial cross-talk from the optics) and diffusion (in focus) background from the individual waveguides considered).
- These backgrounds are generally stable on the timescales of pulses, but still may vary slowly over longer timescales.
- Baseline removal comprises any number of techniques, ranging from, e.g.: a median of the trace, running lowest-percentile with bias correction, polynomial and/or exponential fits, or low-pass filtering with an FFT. Generally these methods will attempt to be robust to the presence of pulses in the trace and may actually be derived at through iterative methods that make multiple passes at identifying pulses and removing them from consideration of baseline estimation.
- Other baselining functions include correction for drift or decay of overall signal levels. For example, photobleaching of organic material sometimes present on the back of the waveguide array is believed to cause decay in the level of background, and thus result in a decreasing baseline over time. This same global background decay is present on portions of the substrate at which there is no waveguide, thus allowing the traces derived from these locations to be used in combination with the two dimensional global background image to estimate the contribution of this signal to every trace/channel across the chip. This component of variability can then be subtracted from each trace and is usually very effective at removing this decay. Typically, this is carried out prior to the baselining processes described above.
- each trace's baseline is established at step 900.
- the traces are subjected to noise suppression filtering to maximize pulse detection (step 902).
- the noise filter is a 'matched filter' that has the width and shape of the pulse of interest. While pulse timescales (and thus, pulse widths) are expected to vary among different dye labeled nucleotides, the preferred filters will typically look for pulses that have a general "top-hat" shape with varying overall duration. As such, a boxcar filter that looks for a pulse of prolonged duration, e.g., from about 10 ms to 100 or more ms, provides a suitable filter.
- This filtering is generally performed in the time-domain through convolution or low-pass frequency domain filtering.
- Other filtering techniques include: median filtering (which has the additional effect of removing short timescale pulses completely from the trace depending on the timescale used), and Savitsky-Golay filtering which tends to preserve the shape of the pulse - again depending on the parameters used in the filter).
- spectral traces may have different characteristics, and thus may be subjected to trace specific filtering protocols.
- a given dye labeled analog e.g., A
- T another different dye labeled analog
- the filtering process for the spectral trace corresponding to the A analog will have different filtering metrics on the longer duration pulses, than for the trace corresponding to the T analog incorporation.
- identifying pulses on a filtered trace a number of different criteria may be used. For example, one could use absolute pulse height, either with or without normalization.
- a number of signal parameters may be and generally are used in pulse identification (as well as in pulse classification).
- the process illustrate in the flow chart of Figure 9 focuses primarily on the use of two main pulse metrics, namely pulse intensity and pulse width.
- the process steps at step 906 and 908 may generally include any one or more of the various pulse metric comparisons set forth elsewhere herein.
- standard deviation of the baselines is determined at step 904.
- Preferred methods for determining the standard deviation of a trace include robust standard deviation determinations including, e.g., being based upon the median absolute difference about the baseline, a Gaussian or Poisson fit to the histogram of baselined intensities, or an iterative sigma-clip estimate in which extreme outliers are excluded.
- a pulse is identified if it exceeds some preset number of standard deviations from the baseline, at step 906.
- the number of standard deviations that constitute a significant pulse may vary depending upon a number of factors, including, for example, the desired degree of confidence in identification or classification of significant pulses, the signal to noise ratio for the system, the amount of other noise contributions to the system, and the like.
- the up-threshold for an incorporation event e.g., at the initiation of a pulse in the trace, is set at about 5 standard deviations or greater, while the down- threshold (the point at which the pulse is determined to have ended) is set at 1.25 standard deviations.
- the pulse width is then determined from the time between the up and down thresholds at step 910. Once significant pulses are initially identified, they are subjected to further processing to determine whether the pulse can be called as a particular base incorporation event at step 912, and as described in greater detail, below.
- multiple passes are made through traces examining pulses at different timescales, from which a list of non-redundant pulses detected at such different time thresholds may be created.
- This typically includes analysis of unfiltered traces in order to minimize potential pulse overlap in time, thereby maximizing sensitivity to pulses with width at or near the highest frame rate of the camera.
- This allows the application of pulse shape or other metrics to pulses that inherently operate on different timescale.
- an analysis at longer timescales may establish trends not identifiable at shorter timescales, for example, identifying multiple short timescale pulses actually correspond to a single longer, discrete pulse.
- some pulses may be removed from consideration/evaluation, where they may have been identified as the result of systematic errors, such as through spatial cross-talk of adjacent waveguides, or spectral cross-talk between detection channels for a given waveguide (to the extent such issues have not been resolved in the calibration processes, supra).
- the calibration process will identify spectral and spatial cross-talk coefficients for each waveguide, and thus allow such components to be corrected.
- Pulse recognition e.g., on the one dimensional traces, as described above, may provide sufficient distinction to classify pulses as corresponding to particular dyes, and consequently, particular bases, based purely on their peak height.
- the pulses identified for each waveguide are used to return to the waveguide's spectra to extract individual waveguide's spectra for each pulse for additional pulse metrics and to identify any interfering signal components, such as whether a detected pulse in a trace is due to spectral cross-talk.
- Classification of an extracted pulse spectrum is then carried out by comparing the extracted spectrum to the spectra of the standard dye sets established in the calibration process.
- a number of comparative methods may be used to generate a comparative metric for this process. For example, in preferred aspects, a ⁇ 2 test is used to establish the goodness of fit of the comparison.
- S 1 extracted pulse spectrum
- P 1 pure dye calibration spectrum
- the classification of a given pulse spectrum is then identified based upon calculating values for each of the four different dyes. The lowest ⁇ 2 value (and the highest probability fit), assigns the pulse to that particular dye spectrum, and the pulse is called as corresponding to that dye.
- a number of other pulse metrics may be employed in addition to a straight spectral comparison in classifying a pulse as correlating to a given dye/nucleotide.
- signals associated with incorporation of a given dye labeled nucleotide typically have a number of other characteristics that can be used in further confirming a given pulse classification.
- different dye labeled nucleotides may have different characteristics such as pulse arrival time (following a prior pulse), pulse width, signal intensity or integrated counts (also referred to as pulse area), signal to noise ratio, power to noise ratio, pulse to diffusion ratio (ratio of pulse signal to the diffusion background signal in each waveguide), spectral fit (e.g., using a minimum ⁇ 2 test, or the like), spectrum centroid, correlation coefficient against a pulse's classified dye, time interval to end of preceding pulse, time interval to the ensuing pulse, pulse shape, polarization of the pulse, and the like.
- a plurality of these various pulse metrics are used in addition to the spectral comparison, in classifying a pulse to a given dye, with particularly preferred processes comparing two, three, five, 10 or more different pulse metrics in classifying a pulse to a particular dye/nucleotide.
- the pulse spectrum is classified as corresponding to a particular dye spectrum, that correlation is then used to assign a base classification to the pulse.
- the base classification or "calling" may be configured to identify directly the dye labeled base added to the extended primer sequence in the reaction, or it may be set to call the complementary base to that added (and for which the pulse spectrum best matches the dye spectrum). In either case, the output will be the assignment of a base to each recognized and classified pulse.
- An illustration of bases being called or assigned to different pulses is shown in Figure 10, which shows such pulses on a collapsed timescale.
- a base could simply be called on the basis of that information.
- signal traces include a substantial amount of signal noise, false positive pulses, e.g., resulting from nonspecifically adsorbed analogs or dyes or the like.
- pulse classification can in many cases involve a more complex analysis.
- base classification typically relies upon a plurality of different signal characteristics in assigning a base to a particular identified significant pulse. In many cases, two, three, five, ten or more different signal characteristics may be compared in order to call a base from a given significant pulse.
- Such characteristics include those used in identifying significant pulses as described above, such as pulse width, pulse intensity, signal to noise ratio, power to noise ratio, integrated counts in pulse peak, shape of and distance/time to neighboring pulses, spectral signature of the pulse, pulse centroid location, and the like.
- comparison will be based upon standard pattern recognition of the metrics used as compared to patterns of known base classifications, yielding base calls for the closest pattern fit between the significant pulse and the pattern of the standard base profile.
- Comparison of pulse metrics against representative metrics from pulses associated with a known base identity will typically employ predictive or machine learning processes.
- a "training" database of "N previously solved cases” is created that includes the various metrics set forth above. From that database, a learning procedure is applied to the data in order to extract a predicting function from the data.
- a wide variety of learning procedures are known in the art and are readily applicable to the database of pulse metrics. These include, for example, linear/logistic regression algorithms, neural networks, kernel methods, decision trees, multivariate splines (MARS), support vector machines. Further, employing machine learned meta-algorithms for performing supervised learning, or "boosting" may be applied to any of the foregoing processes or any combinations of those.
- such boosting incrementally adds to the current learned function.
- a weak learner i.e., one that yields an accuracy only slightly greater than chance
- LPBoost LPBoost
- TotalBoost Boost
- boosting algorithms include, for example, AdaBoost, LPBoost, TotalBoost, and the like.
- assignment or classification of a particular pulse as incorporation of a particular base will typically be based, at least partially, on a desired probability score, e.g., probability that the called base is accurate.
- the probability scores for base calling will typically take into account the closeness of fit of a pattern of signal metrics to a standard signal profile, based upon a plurality of different signal characteristics that include those elements described elsewhere herein, including the signal environment around a given pulse being called as a particular base, including adjacent pulses, adjacent called bases, signal background levels, pulse shape (height or intensity, width or duration, etc.), signal to noise ratios, and other signal contributors.
- preferred base calls will be made at greater than the 90% probability level (90% probability that the called base is correct), based upon the probability evaluation, preferably, greater than 95% probability level, more preferably greater than 99% probability, and even more preferably, greater than 99.9% or even 99.99% probability level.
- the processes of the invention will typically be integrated with sequence arrangement processes for arranging and outputting the individual called bases into a linear sequence, and outputting such data to the user in any of a variety of convenient formats. Additionally, such processes will optionally verify and correct such sequence data based upon iterative sequencing of a given template, multiple sampling of overall sequence fragments through the sequencing of overlapping templates, and the like, to provide higher confidence in sequence data obtained.
- a number of other filtering processes may be used in the overall evaluation of data from sequencing by incorporation reactions as discussed herein.
- a number of filtering processes may be employed to identify signal sources or waveguides that are yielding the highest quality level of data, e.g., resulting from a single fully functional polymerase/template/primer complex, immobilized on the bottom surface of the waveguide.
- These filters may rely upon a number of the metrics described above.
- these filters may employ holistic characteristics associated with a long time scale showing a large number of pulses, and determining whether the longer timescale metrics of the traces have characteristics of a typical sequence by incorporation trace, e.g., relatively regular, high confidence (based upon one or a number of relevant pulse metrics) pulses coming out over the course of the trace, yielding a "picket fence" appearance to the trace.
- additional components may be introduced to the reactants, e.g., labeling of the complexes, to facilitate their identification in the filtering process. As such, the existence of the indicator would be an initial filter to apply to any waveguide's data traces.
- Substrate 102 is an ordered arrangement (e.g., an array) of reaction locations and/or reaction wells and/or optical confinements and/or reaction optical sources 104.
- Detector/camera 116 is an ordered array of optical signal collectors or detectors, such as a CCDs, EMCCDs, or other devices able to detect optical signals and report intensity values.
- Detector 116 will typically include an array of addressable areas, each able to make an intensity measure and data output. The separate areas are commonly referred to as pixels. Each pixel generally has an address or coordinate (e.g., x, y) and outputs one or more intensity levels for a given interval or frame. Pixels that output only one intensity level are sometimes referred to as gray-scale or monochrome pixels.
- a static pixel array of light intensity values (e.g., generally for one interval) is commonly referred to as an image or frame.
- a time sequence of frames is referred to as a movie.
- a detector/camera such as 116 may be capable of only the most basic functions necessary to capture and output intensity levels.
- a detector/camera such as 116 may include or be associated with logic circuitry able to perform various optical adjustments and/or data collection and/or data manipulation functions such as adjusting frame rate, correcting for noise and/or background, adjusting alignment or performing tracking, adjusting pixel size, combining indicated pixels prior to output, ignoring or filtering indicated pixels, etc.
- the raw data available from a detector 116 typically can be understood as a sequence of 2-dimensional arrays of pixel values at a particular frame rate. In an example system as in Figure 1, the raw pixel data is mono-chrome.
- Figure 2 is an example of a single frame or image of monochrome raw data captured by detector 116.
- detector 116 captures and outputs 1 frame each 10 milliseconds (or 100 frames per second (f.p.s.)).
- the optical signal (or light) from one location 102 will pass through an optical train including a spectral spreading or refracting component such as prism 112 and lens 114.
- the optical signal from one location 102 will generally be imaged on and detected by a rectangular to nearly rectangular area of pixels on detector 116.
- One dimension or axis (typically the longer dimension) is primarily due to spectral refraction and is herein referred to as the spectral axis.
- the other dimension (typically the shorter dimension) is referred to as the spatial axis.
- This axis is defined as the axis orthogonal to the spectral axis and is primarily due to the point source spread onto detector 116 through the optical train from location 102.
- the spatial axis will be reduced to one pixel or a few pixels using one or more known combination techniques, such as a point spread function (PSF) analysis. This reduction may be performed before or after data collection from the optical system as described below.
- PSF point spread function
- the raw-data spectral dimension for one imaged location 102 is about 8 pixels to about 20 or more pixels. However, this value can vary widely as a result of the minimum size of pixels available in a detector 116 or other optical component and could feasibly range in the 100s or 1000s.
- the raw-data spatial dimension for one imaged location 102 is about 3 pixels to about 5 pixels. However, this value can also vary widely as a result of the minimum size of pixels available in a detector 116 or other optical component and could feasibly range in the 100s or 1000s.
- the frame-rate for raw captured images is about 100 frames per second (f.p.s.). However, this value could also vary widely depending on desired characteristics of the system and available computational and/or optical components.
- data capture and data analysis according to the present invention includes many novel elements related to analyzing a large number of individual sequencing reactions located in an array of reaction locations or optical confinements.
- the invention addresses the difficulties that arise in such a system and takes advantage of the unique properties of the data arising from such a system.
- Figure 12 is a flow chart illustrating an example method of base calling from a sequencing array using a logic processing system according to specific embodiments of the invention. This figure illustrates a number of steps that are explained in greater detail below. Not all of these steps will be performed in all embodiments.
- data capture and analysis involves: capturing pixels from a pixel detector for multiple reaction locations of a sequencing array (Step Al); determining correlations (gridding) between pixel sets and reaction locations (Step A2); performing one or more individualized calibrations and storing one or more individualized reaction location calibration parameters and/or data sets (Step A3); performing one or more collective array calibrations and storing one or more collective calibration parameters (Step A4); determining one or more data reduction parameters (Step A5); capturing a time sequence of pixel images from a pixel detector for multiple reaction locations during multiple incorporation reactions (Step A6); performing one or more data reductions on captured pixels (Step A7); for individual reaction locations, extracting separated dye spectral traces (Step A8); analyzing extracted dye separated traces to determine significant trace pulses and trace pulse characteristics or optionally to exclude locations with traces indicating poor quality reaction data (Step A9); using trace pulse start and end times (optionally from multiple traces) to identify a pulse in captured pixel data (Step Al9);
- FIG. 13 Analysis of sequencing-by-incorporation-reactions on an array of reaction locations according to specific embodiments of the invention is also illustrated graphically in Figure 13.
- data captured by a CAMERA is represented as a MOVIE, which is also a time sequence of SPECTRA.
- Spectral CALIBRATION templates are used to extract TRACES from the spectra. Pulses identifies in the traces are then used to return to the SPECTRA data and from that data produce a temporally averaged PULSE SPECTRUM for each pulse.
- the Spectral CALIBRATION templates are then also used to classify PULSE SPECTRUM to a particular base. Base classifications and pulse and trace metrics are then stored or passed to other logic for further analysis. Calibrations
- various adjustments or calibrations are made in digital imaging systems both prior to and during image capture. These adjustments can include such things as determining and correcting for background noise or various distortions caused by the optical and/or digital capture components, adjusting frame or shutter speed based on intensity levels, adjusting contrast in reported intensity levels, etc.
- Various such calibrations or adjustments may be made according to specific embodiments of the invention so long as the adjustments to not interfere with the data analysis as described below. Calibrations particular to specific embodiments of the present invention are described in more detail herein.
- Some of these calibration steps described herein may be performed periodically (such as once a week or once a day), other calibrations may be performed once at the beginning of a sequencing reaction data capture and analysis, and some calibrations are performed on a more continuous basis, throughout or at intervals during a reaction capture and analysis.
- These calibration steps can include such things as centroid determination, alignment, gridding, drift correction, initial background subtraction, noise parameter adjustment, spectral calibration, frame-rate adjustment, etc.
- Some calibration steps, such as binning may involve communication from the processor back to the detector/camera, as discussed further below.
- An initial step in analyzing data from a system such as illustrated in Figure 1 is determining which sets of pixels of detector 116 correspond to individual reaction locations 104. (In some implementations, this step could be addressed in the construction of the system, so that detectors and reaction locations are manufactured to have a fixed association.)
- Gridding in particular embodiments, is an initial identification of particular reaction locations with particular areas of pixels in an image.
- imaged signals are correlated to the known spacing of image sources on the waveguide array.
- one or more registration marks incorporated into the array can be used. For example, in preferred aspects, rows of waveguides in an array will include one or more blank spaces in place of a waveguide, where the blanks will be spaced at regular, known intervals.
- registration marks might include regularly spaced image sources that are separate from the waveguides, but are at known locations and spacing relative to the waveguides in the array to permit alignment of the image to the array.
- image sources may include apertures like waveguides, or may include fluorescent or luminescent marks that provide a signal. Gridding in generally accomplished with an illuminated reference frame.
- an individualized reference centroid is determined and stored for each or nearly each ZMW.
- This centroid is determined by finding the geometric center or Gaussian center from a known spectrum, high SNR, narrow band light source that is imaged on detector 116 through generally the same optical train as sequencing reaction optical signals.
- the illumination is directed through the partially transparent waveguides 104 and then through the optical train. Note that, while pixel address locations are generally integer values, formulas for determining a Gaussian center provide a decimal result.
- transillumination is provided by a light source with a narrow band-pass filter (e.g., 543 nm) or by a narrow band light source, such as an 730 nm light source.
- the subpixel reference centroid is rounded to a closest 0.1 interval. This subpixel reference centroid may then be used to estimate subpixel centroids for one or more detection signals as described herein.
- Figure 14 is a flow chart illustrating an example method of determining a reference spectral centroid for a reaction location image according to specific embodiments of the invention.
- Figure 15 is a diagram illustrating an initial gridding step and corrected centroids according to specific embodiments of the invention.
- the central white line down the center of pixel 277 indicates centroid locations as determined by an initial gridding step.
- the individual dots to the left of this line indicate individually determined centroids for each ZMW as discussed above.
- the spline through these individual dots indicates an optionally spline fitting or similar step that can be used to smooth centroid locations and provide a smoothed individualized sub-pixel centroid.
- Figure 16 is a flow chart illustrating an example method of determining an alignment for reaction locations and a central axis for individual reaction locations from multiple reaction location images according to specific embodiments of the invention. After the initial gridding step and determination of a sub-pixel centroid effectively in the spatial axis, an alignment along a spatial axis may be performed to more accurately determine a location of a sub-pixel centroid in the spectral axis.
- Figure 17 is a flow chart illustrating an example method of determining high-resolution dye spectral templates and down-sampled individualized spectral templates according to specific embodiments of the invention.
- a high resolution spectral template is optionally produced using multiple ZMWs.
- the waveguide array is provided with a standard reference label that may include each pure dye, a pure labeled nucleotide, or another relevant pure component, e.g., a polymerase/template/primer-complexed labeled nucleotide.
- the signal of each pure label compound is then imaged upon the detector using an optical train and its location is mapped.
- Calibration spectra may be taken at different locations on a waveguide array than analytical reads and are then aligned to the different locations on the detector using the centroids of the transillumination image. Typically however the spectra as seen on the detector are coarsely sampled and the spectral shape is sensitive to the subpixel centroid location of the ZMW within the image.
- the calibration spectra are taken at multiple subpixel centroids, e.g., 0.1 pixels samplings. These can then be combined into a much higher resolution spectrum than a single image can provide. With a subpixel spectral reference centroid estimated from a ZMWs transillumination image, this high resolution spectrum can then be accurately downsampled to account for pixelation of the camera. In addition, due to potential distortions in the optics (e.g. coma, chromatic and spherical aberration) one may also obtain calibration spectra across the field of view of the chip.
- subpixel centroids e.g., 0.1 pixels samplings.
- FIG. 18 is a diagram illustrating stitching high-resolution dye spectral templates from multiple captured spectral calibration data according to specific embodiments of the invention.
- reaction locations e.g., ZMWs
- ZMWs reaction locations
- a vertical row of 100 ZMWs each 10 pixels wide along the spectral axis and effectively 1 pixel high in the spatial dimension (although more spatial pixels could be used.).
- One means of averaging a spectral calibration template of the 100 ZMWs would be to simply average each of the 10 pixel locations separately down the 100 ZMWs. This would provide an averaged spectral template, but not a higher resolution one.
- the next sub-pixel, pixel 02 will be the average of a sliding window of 10 ZMWs (e.g., ZMW 2 through ZMWn for pixel 02 , ZMW 3 through ZMWi 2 for pixelo 3 , etc.).
- pixeli ⁇ will be an average of pixeli for 9 ZMWs (e.g., ZMW 9I through ZMW 99 ) and the second pixel for ZMWioo- In this way, in this example, 100 ZMWs with 10 points of spectral pixel resolution are averaged into one spectral template with 100 points of spectral pixel resolution.
- the spectra taken for ZWM 1 — ZWM 100 will in general have their spectral reference wavelength placed at varying sub-pixel shifts, relative to the centroid pixel of the ZMW, because the centroids of the ZMWs will vary more or less uniformly across ⁇ 1 pixel.
- the ZMW spectra can therefore be characterized as arising from a re-binning of spectra at 10x the resolution, where the higher resolution bin offset is known.
- the high-resolution spectrum can then be estimated by placing each ZMW spectrum in its corresponding high-resolution bin locations (shift and pitch), and then averaging the values in each bin.
- the generation of high resolution spectral calibration templates is done periodically, such as once a day or once a week, as it involves generally four different rounds of exposing an array 102 to four different dyes, with the overhead of preparation of the array for each of the four different reactions.
- the high resolution spectral templates are then individualized (and optionally downsampled) using an spectral subpixel centroid, generally during each run.
- a spectral template can be determined for each ZMW for each sequencing reaction by including a series of known bases in a known sequence. In such a case, spectral calibration data for each dye is collected for each ZMW and averaged to provide an individualized ZMW spectral template, optionally using additional relevant data as provided herein.
- Figure 19 is a flow chart illustrating an example method of determining background noise for reaction location images according to specific embodiments of the invention.
- FIG. 20 is a flow chart illustrating an example method of data reduction for a reaction location image according to specific embodiments of the invention.
- SNR signal to noise ratio
- a weighted sum of pixels along the spatial axis of a ZMW (or reaction location) image area is performed. Weights that maximize the SNR are determined by the inverse variance of each pixel.
- optical signal detected from multiple locations can be used to determine a line of pixels through multiple locations, and this line of pixels can be used as the central line for a PSF function.
- the instrument PSF can be modeled (e.g., by a Gaussian or Moffat function). Fits to the one dimensional profile of a line of ZMWs may solve for all ZMW amplitudes, the ZMW spacing, and a PSF width. These fits can also solve for more parameters (e.g. a polynomial model of PSF width as a function of FOV position) in order to account for second order effects, such as variation of the optical PSF across the field of view (FOV) of the camera or variations in chip (e.g., array substrate 102) geometry.
- FOV field of view
- binning to derive spectral images from each location 104 waveguide is carried out on the camera chip (in a firmware controlled process).
- the location of ZMW signals is determined from a full illumination frame, and on-camera (or "on-chip") binning sums (in the spatial direction) only those CCD lines associated with a line of ZMW holes which contains the majority of the signal and reads out only those lines during the actual movie acquisition.
- the optimal binning strategy is the one that maximizes the SNR of pulses from each reaction location.
- the optical signal data for an individual ZMW or location 104 is a sequence or movie of a small area (e.g., an area of about 1 x 14 pixels) of monochrome spectral-images also referred to herein as spectra.
- the optical signal data for an individual ZMW can also be understood or represented as a time series of arrays (or vectors) of intensity values (e.g., a sequence of 1 by 14 intensity values).
- one or more spectral traces are extracted for further analysis, e.g., pulse detection.
- a spectral trace, as used herein, is a time series of generally a single intensity value.
- Figure 21 illustrates an example of four superimposed spectral images representing different dyes imaged from two different reaction locations according to specific embodiments of the invention.
- Figure 21 provides some illustration of the nature of the spectral-spread position data and the difficulties in extracting spectral traces that are overcome according to specific embodiments of the invention.
- the top portion shows four superimposed graphs of 14 pixel values. Each of the four superimposed graphs represents "training data” or "calibration data" from four different known dye spectra at a particular ZMW. Represented numerically, the data at the top portion would approximately be as shown in the table below. For ease of reading, the intensity values shown in the figure are multiplied by 100 in the table below.
- the top graph in Figure 21 indicates one difficulty that is addressed according to specific embodiments of the invention.
- the figure indicates that even in a situation of "best case" training date, there is can be substantial overlap between the pixel data captured from different dyes, e.g., the 555 and 568 nm dye spectra. In this example, this is to be expected due to the closeness of the wavelengths of the fluorescent signals. However, choice of dye may be constrained for a variety of reasons, and extraction of even closer spectra may be desired in some situations.
- the bottom graph illustrates another complexity. This figure represents the same data as above, but captured at a different ZMW location.
- Figure 22 is a flow chart illustrating an example method of dye trace extraction from a time series of a reaction location images according to specific embodiments of the invention.
- the method involves: at a captured frame, performing a flux matching to a first dye spectrum template (Step Gl); optionally, using a down-sampled high resolution dye spectral template optionally individualized to a subpixel-centroid (Step G2 ); storing outputting an intensity value from the flux analysis as the frame intensity for the dye (Step G3); repeating the spectral template comparison on the same captured frame compared for each dye spectrum template (Step G4); repeating the above steps at successive frames to generate one or more dye spectral traces for each ZMW (Step G5); optionally, repeat all above steps using a multi-component flux matching to generate a second set of trace intensities corrected for cross talk (Step G6). Details of this method are described further below.
- Figure 23 is a diagram illustrating using an individualized set of spectral templates for a ZMW to extract four spectral traces using a flux calculation according to specific embodiments of the invention.
- four spectral traces will be extracted.
- one or more of the spectral values may be combined (such as 555 and 568 as indicated above) to derive fewer traces.
- one or more additional traces may be derived.
- Extraction of the four spectral intensity values (or spectral pixel values) from the 14 raw values is accomplished according to specific embodiments of the invention as follows. With individualized spectral templates determined for each dye for each ZMW, trace extraction can be achieved by a number of methods, as described herein.
- the signal intensity at the image location that corresponds to a particular spectral signal from a particular signal source is plotted and/or analyzed as a function of time.
- Various techniques for spectral extraction from data similar to the spectral-spread image data are known and can be used for spectral extraction according to specific embodiments of the invention.
- Home (1986) discusses a number of spectral extraction techniques used in CCD-based astronomical spectroscopy.
- spectral extraction is performed for the four intensities using the reference spectra, and spectral-spread image, according to the following equation: where F is F is the summed flux at each spectral pixel that maximizes S/N. This technique is used for spectral extraction according to specific embodiments of the invention.
- methods of the invention may also analyze a single signal derived from the intensity levels at the multiple pixel positions (this may be referred to as a summed spectral signal or a gray-scale spectral signal or an intensity level signal).
- a method according to the invention may analyze the multiple captured pixel data using a statistical model such as a Hidden Markov Model. In present systems, however, determining multiple (e.g., four) spectral traces from the initial signal data has proven a preferred method.
- Figure 24 is a diagram illustrating a set of dye-weighted spectral traces (to) and a set of multi- component spectral traces (bottom) extracted from captured data for a reaction location according to specific embodiments of the invention..
- Figure 25 is a flow chart illustrating an example method of pulse detection in a dye trace according to specific embodiments of the invention.
- Figure 26 is a diagram illustrating an example of analysis of one pulse in one spectral trace according to specific embodiments of the invention.
- FIG. 27 is a diagram illustrating a number of pulses associated with incorporation events and showing that some pulses with low amplitude are likely to be indicative of actual incorporation events.
- the figure illustrates an example of a captured data frames for a number of zero mode waveguides according to specific embodiments of the invention.
- an incorporation event produces an optical signal intensity at a normalized intensity value of 1 during each capture interval (e.g., frame). Further, assume that a frame is 10 milliseconds. Then, an incorporation event that happens to begin at a frame boundary, and that happens to last for one frame, will produce a pulse with a 10 millisecond (1 frame) width and an amplitude of 1. However, it is equally likely that a 10 millisecond incorporation event will begin in the middle of one frame and complete in the middle of the next frame. In such a case, the same incorporation event will produce a pulse with a 20 millisecond (2 frame) width and an amplitude of 0.5.
- an incorporation event will last only 5 milliseconds (e.g., Vi a frame). If such an incorporation event occurs entirely during a single frame, such an incorporation event will produce a 1 frame pulse (1 frame being the shortest pulse width detectable) with an amplitude of 0.5. There is also a probability that a 5 millisecond pulse will occur exactly across 2 frames. In such a case, the same incorporation event will produce a pulse with a 20 millisecond (2 frame) width and an amplitude of 0.25. With this understanding, it will be understood that the lower amplitude pulses circled in Figure 27 can have an equal probability of being valid significant pulses as the higher amplitude pulses.
- a pulse detection algorithm can assign confidence levels and make adjustments to account for stochastic false positive (FP) rates (e.g., detecting a pulse as a result of noise alone), stochastic false negative (FN) rates (e.g., failing to detect a pulse because it is masked by background noise), and miss- match (MM) errors (e.g., incorrectly classifying the spectra of a pulse due to its detected width and intensity.)
- FP stochastic false positive
- FN stochastic false negative
- MM miss- match
- the invention can determine that optimal pulse calling thresholds are around 3.5 sigma for each channel based on the kinetic parameters of the incorporation reaction and the frame capture parameters.
- such an analysis shows that increasing the frame rate can increase the SNR of sub-frame pulse-width (PW) pulses by roughly sqrt (f.p.s.), and therefore while the FN rate is initially reduced with an increased FPS, it increases again as pulse peak SNR degrades at inverse sqrt(f.p.s.). Detection of one significant pulse as two (referred to as algorithmic branching) also is found to be an increasing problem at higher frame rates.
- PW sub-frame pulse-width
- an pulse detection algorithm is optimized to reduce FN rates while allowing an increase in fps rate.
- Figure 29 is a diagram illustrating increasing a time window and reducing a threshold for pulse detection according to specific embodiments of the invention..
- a pulse detection uses merge heuristics as follows. For consecutive pulses of same base, for each pair estimate a mean pulse height and interpulse baseline. Using a noise model and/or actual noise data, estimate the statistical significance of inter-pulse region relative to peak height and merge consecutive peaks the statistical significance is ⁇ 3 standard deviations.
- Figure 30 illustrates two consecutive pulses between frames 20 and 34 that would be merged as discussed above according to specific embodiments of the invention, thus eliminating one FP that would occur without the merging.
- Figure 31 is a flow chart illustrating an example method of a pulse merging analysis according to specific embodiments of the invention.
- the invention uses the ratio of pulse height to diffusion background, or DBR (diffusional background ratio) in determining significant pulses. While at times this may be referred to a component of SNR (signal to noise ratio), the term DBR is used to avoid confusion with more traditional usages of SNR.
- DBR diffusional background ratio
- Calling pulses by the DBR makes the "intensity" component of the pulse less variable even in the presence of laser excitation variations and regardless of background noise envelope fluctuations (due to concentration variations, extremely sticky ZMWs, etc). Furthermore, setting the pulse intensity threshold according to the intensity contributions of freely diffusing fluorophores in the ZMW provides a theoretical framework for locating a single molecule event in a ZMW and provides some immunity from other sources of signal variations and error. There are several methods of obtaining an estimate of the DBR intensity per ZMW.
- pulse intensity is described not in absolute counts measured against a threshold, but as a ratio against the background diffusion of fluorophores in and above an individual ZMW.
- DBR (Intensity-dcOffset)/(dcoffset-NDB) where (Intensity-dcOffset) is the average intensity of a pulse above baseline, and "NDB" is the portion of the baseline (dcOffset) that is not diffusion background (e.g., baseline from autofluorescence, base clamping, etc.).
- the NDB is determined from a sample movie of the array (or a neutral substrate, such as a solid aluminum film) with the same laser and camera conditions, which provides values for NDB(ZMW).
- the DBR method of pulse calling provides additional information about where in the ZMW a particular pulse originated. This information is used in specific embodiments to determine if multiples polymerase are sequencing in a ZMW, in which case data from that specific ZMW may be excluded from further data analysis. The location of a fluorophore within a ZMW can also be used as one of the parameters in the data analysis as described herein.
- the maximum values of the DBR of pulses from a single ZMW also allows estimation of the ZMWs effective diameter according to specific embodiments of the invention. In a particular example implementation, this method was used to estimate the ZMW diameters in an array to vary within 13nm.
- DBR thresholding in some embodiments may be vulnerable to diameter variations of the ZMWs themselves across the array (because more diffusion will occur into larger diameter ZMWs. In specific embodiments, this is accounted for on a per-ZMW basis, for example by transmission light analysis prior to sequencing. With the size of each ZMW known or accurately analyzed, the DBR method is generally preferable to sigma-calling or intensity-calling of pulses.
- analysis of individual ZMWs includes repeated evaluation of whether a ZMW should be excluded from further analysis. Because large numbers of reaction locations are being prepared and monitored, it is expected that in some systems some percentage of reaction locations will not provide useful data. This may occur if no reaction enzyme becomes located in a particular ZMW, if more than one reaction enzyme is located in a ZMW or if a reaction enzyme is otherwise producing problematic data. Rejection of particular reaction location data streams may be performed at multiple points during the analysis where the captured data does not match expected data criteria.
- FIG. 32 is a flow chart illustrating an example method of pulse classification according to specific embodiments of the invention.
- the invention retrieves pulse start and end times (Step Kl); these times are combined to determine a plurality of pulses in the pre-extraction captured data (Step K2); optionally, for each pulse, determine proximate background correction values from temporally close captured pixel values that are not within a pulse (Step K3); optionally, for each pulse, examine multi-component traces to avoid false pulse combinations due to dye cross-talk (Step K4); for each pulse, temporally combine captured pixel values into a temporally averaged pulse frame (Step K5); for each pulse, compare flux values to a plurality of dye spectral templates to determine a best match, optionally using proximate background correction values(Step K6 ); store or output the best match as the dye classification for a significant pulse(Step K7 ); optionally store or output a match probability for
- a number of comparative methods may be used to generate a comparative metric for this process.
- a ⁇ 2 test is used to establish the goodness of fit of the comparison.
- the amplitude (A) of the fit of an individual dye spectral shape, as measured from the pure dye calibration spectrum, P 1 is the only variable to solve and will have a ⁇ 2 value of:
- the probability that the pure dye spectrum fits with the extracted spectrum is then derived from the ⁇ 2 probability distribution (with a number of degrees of freedom for the number of data points used, v).
- the classification of a given pulse spectrum is then identified based upon calculating values for each of the four different dyes. The lowest ⁇ 2 value (and the highest probability fit), assigns the pulse to that particular dye spectrum, and the pulse is called as corresponding to that dye.
- a number of other pulse metrics may be employed in classifying a pulse as correlating to a given dye/nucleotide.
- signals associated with incorporation of a given dye labeled nucleotide typically have a number of other characteristics that can be used in further confirming a given pulse classification.
- different dye labeled nucleotides may have different characteristics such as pulse arrival time (following a prior pulse), pulse width, signal intensity or integrated counts (also referred to as pulse area), signal to noise ratio, power to noise ratio, pulse to diffusion ratio (ratio of pulse signal to the diffusion background signal in each waveguide), spectral fit (e.g., using a minimum ⁇ 2 test, or the like), spectrum centroid, correlation coefficient against a pulse's classified dye, time interval to end of preceding pulse, time interval to the ensuing pulse, pulse shape, polarization of the pulse, and the like.
- a plurality of these various pulse metrics are used in addition to the spectral comparison, in classifying a pulse to a given dye, with particularly preferred processes comparing two, three, five, 10 or more different pulse metrics in classifying a pulse to a particular dye/nucleotide.
- extraction from spectra to multiple spectral traces is may be performed according to an algorithm that maximizes the flux in each trace.
- this approach will result in single incorporation pulse being detected in two traces. Because traces are used generally to determine start and end times from the captured data, this situation does not present a problem in most cases.
- a secondary spectral trace extraction is performed that attempts to increase separation between spectral template matches. This secondary trace extraction is then used to confirm that start and end times of pulses represent a pulse in one spectral color and not in two overlapping colors.
- Electrophoretic DNA sequencing often involves trace data from four different dyes that are used to label four bases.
- PHRED is a base-calling program for automated sequencer traces that outputs at each base generally one of five base identifiers (A C T G and N for not identifiable) and often a quality score for each base.
- a C T G and N base identifiers
- PHRED processing of DNA traces predicted peak locations in terms of migration times are determined, observed peaks are identified in the trace and are matched to predicted peak locations, sometimes omitting some peaks and splitting.
- Unmatched observed peaks may be checked for any peak that appears to represent a base but could not be assigned to a predicted peak in the third phase and if found, the corresponding base is inserted into the read sequence. Peaks in a PHRED analysis may be difficult to distinguish in regions where the peaks are not well resolved, noisy, or displaced (as in compressions).
- the PHRED algorithm typically assigns quality values to the bases, and writes the base calls and quality values to output files.
- PHRED can evaluate the trace surrounding each called base using four or five quality value parameters to quantify the trace quality.
- PHRED can use dye chemistry parameter data to do such tasks as identifying loop/stem sequence motifs that tend to result in CC and GG merged peak compressions.
- PHRAP is a sequence assembly program often used together with PHRED.
- PHRAP uses PHRED quality scores to determine highly accurate consensus sequences and to estimate the quality of the consensus sequences.
- PHRAP also uses PHRED quality scores to estimate whether discrepancies between two overlapping sequences are more likely to arise from random errors, or from different copies of a repeated sequence.
- PHRED quality scores to estimate whether discrepancies between two overlapping sequences are more likely to arise from random errors, or from different copies of a repeated sequence.
- Various expert analysis and similar systems have been proposed for analyzing such data, See, for example, U.S. Patent 6236944, Expert system for analysis of DNA sequencing electropherograms.
- HMMs hidden Markov models
- software methods of the present invention include techniques for generating consensus DNA sequence information of high accuracy from a collection of less accurate reads generated by a real-time sequencing by incorporation system.
- two features of data typical of some such systems that motivate these techniques are: (1) the errors in the raw data are mostly insertions or deletions of base symbols from the correct sequence, rather than 'mismatches' or misidentified bases; (2) a relatively large number (e.g., 1000 or more) of data points are collected in real time for each base symbol in the raw read.
- a signal intensity and signal spectrum is measured through time. This results in a large collection of data features associated with each base in the raw read sequence.
- the time series data are summarized by finding regions of high signal intensity 'pulses', and measuring a series of features of those pulses, such as their duration, average intensity, average spectrum, time until the following pulse, and best reference spectra match.
- Observable pulses are generated when nucleotides are productively incorporated by the polymerase ('incorporation pulses"), as well as by interfering processes (such as incorrect bases that stick temporarily but are not incorporated or correct bases that become illuminated temporarily, but are not fully incorporated, and then are incorporated and produce a second pulse (branching)) that introduce errors into the observed sequence.
- a predictive HMM observation distribution model is extended to not only identity of the called base, but also the features of the associated pulse.
- each class of microscopic event (true incorporations as well as interfering events) generates pulses with different but overlapping probability distributions in the space of pulse features.
- the distribution over pulse feature space for each pulse type is learned from experimental data and used to generate an approximate observation distribution via density estimation techniques.
- a most likely template (sequence) is discovered by constructing a series of trial models that maximize the likelihood of the observed data under the model, via an expectation maximization procedure.
- a probability model according to specific embodiments of the invention must decide among a number of competing hypotheses about the true template. For example, in attempting to decide between a T and an A at the highlighted position the model asks which event is more likely, that a T base generates an emission that is called as an A, or that an A base is called as a T. While a standard alignment approach of choosing the template that maximizes the likelihood still applies, in the present invention the ⁇ j (O) that models the probability of an observation is a function not solely of the base identity, but is also extended to return a measure of the probability of observing a pulse and various of its associated features on that transition.
- the T pulse has stored or associated with it observed features indicating it was a higher intensity, longer pulse (and therefore less likely to be misclassified), while the A pulse was weaker and briefer, these features would be included in the probability model with other alignment probabilities to determine whether T or A was more probable. Other features being equal, the probability of having misclassified the bright T pulse being generated from an A template location would be much smaller than the probability of the weak brief A pulse being generated from a T base, therefore the model would call T as the consensus in that position. Because the data analyzed during the consensus alignment phase includes a number of different physical parameters of identified pulses and overall reaction parameters, rather than just a single quality score, many different characteristics of a real-time incorporation sequencing reaction can be used in the predictive model.
- the predictive model can thus be trained to account for the probability that a detected pulse was due to a branch or a stick of a labeled nucleotide analog, probabilities of which will vary for different bases, as well as account for overall reaction quality features such as overall noise detected at a reaction location or overall confidence of spectral classifications at a reaction location.
- each state of an example HMM models a location along the template DNA strand where the synthesizing polymerase will reside between incorporation events.
- Two classes of transitions that can occur from this state are (1) a "move” transition where the polymerase incorporates a base and proceeds one position along the template, with a probability denoted by Ci 1 l+l and (2) a "stay” transition where the polymerase binds a nucleotide, but unbinds before the incorporation event (a "branch") or a labeled nucleotide "sticks" transiently to the surface of the ZMW, inside the illumination region, and the polymerase does not move along the template, with probability given by Ci 1 1 .
- a branch generally emits the symbol corresponding to the current template location while a stick generates a random symbol.
- the probability of branching and sticking are modeled as a function of the observation symbols (A C T G and null), and optionally modeled as a function of symbols for pulse metrics, such as intensity, duration, forward interval, subsequent interval, etc.
- pulse metrics such as intensity, duration, forward interval, subsequent interval, etc.
- One method of scoring such a model during training is determining parameters that result in a maximum alignment length as is understood in the art.
- Figure 40 is a block diagram showing a representative example logic device in which various aspects of the present invention may be embodied.
- the invention can be implemented in hardware and/or software.
- different aspects of the invention can be implemented in either client-side logic or server-side logic.
- the invention or components thereof may be embodied in a fixed media program component containing logic instructions and/or data that when loaded into an appropriately configured computing device cause that device to perform according to the invention.
- a fixed media containing logic instructions may be delivered to a viewer on a fixed media for physically loading into a viewer' s computer or a fixed media containing logic instructions may reside on a remote server that a viewer accesses through a communication medium in order to download a program component.
- Figure 40 shows an information appliance (or digital device) 700 that may be understood as a logical apparatus that can read instructions from media 717 and/or network port 719, which can optionally be connected to server 720 having fixed media 722. Apparatus 700 can thereafter use those instructions to direct server or client logic, as understood in the art, to embody aspects of the invention.
- One type of logical apparatus that may embody the invention is a computer system as illustrated in 700, containing CPU 707, optional input devices 709 and 711, disk drives 715 and optional monitor 705.
- Fixed media 717, or fixed media 722 over port 719 may be used to program such a system and may represent a disk-type optical or magnetic media, magnetic tape, solid state dynamic or static memory, etc.
- the invention may be embodied in whole or in part as software recorded on this fixed media.
- Communication port 719 may also be used to initially receive instructions that are used to program such a system and may represent any type of communication connection.
- the invention also may be embodied in whole or in part within the circuitry of an application specific integrated circuit (ASIC) or a programmable logic device (PLD).
- ASIC application specific integrated circuit
- PLD programmable logic device
- the invention may be embodied in a computer understandable descriptor language, which may be used to create an ASIC, or PLD that operates as herein described.
- ASIC application specific integrated circuit
- PLD programmable logic device
- a viewer digital information appliance has generally been illustrated as a personal computer.
- the digital computing device is meant to be any information appliance for interacting with a remote data application, and could include such devices as a digitally enabled television, cell phone, personal digital assistant, etc.
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Physics & Mathematics (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Organic Chemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biotechnology (AREA)
- Analytical Chemistry (AREA)
- Biophysics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Microbiology (AREA)
- Molecular Biology (AREA)
- Immunology (AREA)
- Biochemistry (AREA)
- General Engineering & Computer Science (AREA)
- Genetics & Genomics (AREA)
- Investigating, Analyzing Materials By Fluorescence Or Luminescence (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
Description
Claims
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
AU2008261935A AU2008261935B2 (en) | 2007-06-06 | 2008-06-05 | Methods and processes for calling bases in sequence by incorporation methods |
CA2689626A CA2689626C (en) | 2007-06-06 | 2008-06-05 | Methods and processes for calling bases in sequence by incorporation methods |
EP08770244.5A EP2155855B1 (en) | 2007-06-06 | 2008-06-05 | Methods and processes for calling bases in sequence by incorporation methods |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US93339907P | 2007-06-06 | 2007-06-06 | |
US60/933,399 | 2007-06-06 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2008154317A1 true WO2008154317A1 (en) | 2008-12-18 |
Family
ID=40130137
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/US2008/065996 WO2008154317A1 (en) | 2007-06-06 | 2008-06-05 | Methods and processes for calling bases in sequence by incorporation methods |
Country Status (5)
Country | Link |
---|---|
US (1) | US8182993B2 (en) |
EP (1) | EP2155855B1 (en) |
AU (1) | AU2008261935B2 (en) |
CA (1) | CA2689626C (en) |
WO (1) | WO2008154317A1 (en) |
Cited By (25)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8133672B2 (en) | 2008-03-31 | 2012-03-13 | Pacific Biosciences Of California, Inc. | Two slow-step polymerase enzyme systems and methods |
US8257954B2 (en) | 2008-03-31 | 2012-09-04 | Pacific Biosciences Of California, Inc. | Generation of modified polymerases for improved accuracy in single molecule sequencing |
US8420366B2 (en) | 2008-03-31 | 2013-04-16 | Pacific Biosciences Of California, Inc. | Generation of modified polymerases for improved accuracy in single molecule sequencing |
US8530164B2 (en) | 2008-09-05 | 2013-09-10 | Pacific Biosciences Of California, Inc. | Method for sequencing using branching fraction of incorporatable nucleotides |
US8652781B2 (en) | 2008-02-12 | 2014-02-18 | Pacific Biosciences Of California, Inc. | Cognate sampling kinetics |
US8986930B2 (en) | 2010-07-12 | 2015-03-24 | Pacific Biosciences Of California, Inc. | Sequencing reactions with alkali metal cations for pulse width control |
US8999676B2 (en) | 2008-03-31 | 2015-04-07 | Pacific Biosciences Of California, Inc. | Recombinant polymerases for improved single molecule sequencing |
US8999674B2 (en) | 2009-03-27 | 2015-04-07 | Life Technologies Corporation | Methods and apparatus for single molecule sequencing using energy transfer detection |
EP2893040A1 (en) * | 2012-09-04 | 2015-07-15 | Guardant Health Inc. | Systems and methods to detect rare mutations and copy number variation |
WO2015135718A1 (en) * | 2014-03-14 | 2015-09-17 | Unisense Fertilitech A/S | Methods and apparatus for analysing embryo development |
EP2831283A4 (en) * | 2012-03-30 | 2015-11-04 | Pacific Biosciences California | Methods and composition for sequencing modified nucleic acids |
US9399766B2 (en) | 2012-10-01 | 2016-07-26 | Pacific Biosciences Of California, Inc. | Recombinant polymerases for incorporation of protein shield nucleotide analogs |
US9902992B2 (en) | 2012-09-04 | 2018-02-27 | Guardant Helath, Inc. | Systems and methods to detect rare mutations and copy number variation |
US9920366B2 (en) | 2013-12-28 | 2018-03-20 | Guardant Health, Inc. | Methods and systems for detecting genetic variants |
US10480027B2 (en) | 2012-06-08 | 2019-11-19 | Pacific Biosciences Of California, Inc. | Nanopore sequencing methods |
US10704085B2 (en) | 2014-03-05 | 2020-07-07 | Guardant Health, Inc. | Systems and methods to detect rare mutations and copy number variation |
EP3882362A1 (en) * | 2013-03-15 | 2021-09-22 | Guardant Health, Inc. | Methods for sequencing of cell free polynucleotides |
US11210554B2 (en) | 2019-03-21 | 2021-12-28 | Illumina, Inc. | Artificial intelligence-based generation of sequencing metadata |
US11242569B2 (en) | 2015-12-17 | 2022-02-08 | Guardant Health, Inc. | Methods to determine tumor gene copy number by analysis of cell-free DNA |
US11347965B2 (en) | 2019-03-21 | 2022-05-31 | Illumina, Inc. | Training data generation for artificial intelligence-based sequencing |
US11515010B2 (en) | 2021-04-15 | 2022-11-29 | Illumina, Inc. | Deep convolutional neural networks to predict variant pathogenicity using three-dimensional (3D) protein structures |
US11593649B2 (en) | 2019-05-16 | 2023-02-28 | Illumina, Inc. | Base calling using convolutions |
US11749380B2 (en) | 2020-02-20 | 2023-09-05 | Illumina, Inc. | Artificial intelligence-based many-to-many base calling |
US11913065B2 (en) | 2012-09-04 | 2024-02-27 | Guardent Health, Inc. | Systems and methods to detect rare mutations and copy number variation |
US12106828B2 (en) | 2019-05-16 | 2024-10-01 | Illumina, Inc. | Systems and devices for signal corrections in pixel-based sequencing |
Families Citing this family (115)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2009020682A2 (en) | 2007-05-08 | 2009-02-12 | The Trustees Of Boston University | Chemical functionalization of solid-state nanopores and nanopore arrays and applications thereof |
US8703422B2 (en) | 2007-06-06 | 2014-04-22 | Pacific Biosciences Of California, Inc. | Methods and processes for calling bases in sequence by incorporation methods |
US7973146B2 (en) * | 2008-03-26 | 2011-07-05 | Pacific Biosciences Of California, Inc. | Engineered fluorescent dye labeled nucleotide analogs for DNA sequencing |
US8143030B2 (en) | 2008-09-24 | 2012-03-27 | Pacific Biosciences Of California, Inc. | Intermittent detection during analytical reactions |
AU2009229157B2 (en) | 2008-03-28 | 2015-01-29 | Pacific Biosciences Of California, Inc. | Compositions and methods for nucleic acid sequencing |
US20090247426A1 (en) * | 2008-03-31 | 2009-10-01 | Pacific Biosciences Of California, Inc. | Focused library generation |
US8795961B2 (en) * | 2008-09-05 | 2014-08-05 | Pacific Biosciences Of California, Inc. | Preparations, compositions, and methods for nucleic acid sequencing |
CA2738626C (en) | 2008-09-30 | 2017-08-08 | Pacific Biosciences Of California, Inc. | Ultra-high multiplex analytical systems and methods |
US8379090B1 (en) * | 2008-11-06 | 2013-02-19 | Target Brands, Inc. | Virtual visits |
US8370079B2 (en) | 2008-11-20 | 2013-02-05 | Pacific Biosciences Of California, Inc. | Algorithms for sequence determination |
WO2010068289A2 (en) | 2008-12-11 | 2010-06-17 | Pacific Biosciences Of California, Inc. | Classification of nucleic acid templates |
US20230148447A9 (en) * | 2008-12-11 | 2023-05-11 | Pacific Biosciences Of California, Inc. | Classification of nucleic acid templates |
US9175338B2 (en) | 2008-12-11 | 2015-11-03 | Pacific Biosciences Of California, Inc. | Methods for identifying nucleic acid modifications |
EP3020830A1 (en) * | 2009-01-20 | 2016-05-18 | The Board Of Trustees Of The Leland Stanford Junior University | Single cell gene expression for diagnosis, prognosis and identification of drug targets |
WO2010117420A2 (en) * | 2009-03-30 | 2010-10-14 | Pacific Biosciences Of California, Inc. | Fret-labeled compounds and uses therefor |
WO2011040996A1 (en) | 2009-09-30 | 2011-04-07 | Quantapore, Inc. | Ultrafast sequencing of biological polymers using a labeled nanopore |
EP3943920B1 (en) * | 2010-02-19 | 2024-04-03 | Pacific Biosciences Of California, Inc. | Integrated analytical system and method for fluorescence measurement |
TW201140139A (en) * | 2010-03-11 | 2011-11-16 | Pacific Biosciences California | Micromirror arrays having self aligned features |
CA2796822C (en) | 2010-05-07 | 2021-10-05 | The Board Of Trustees Of The Leland Standford Junior University | Measurement and comparison of immune diversity by high-throughput sequencing |
AU2011226792A1 (en) | 2010-06-11 | 2012-01-12 | Life Technologies Corporation | Alternative nucleotide flows in sequencing-by-synthesis methods |
WO2012015628A2 (en) * | 2010-07-30 | 2012-02-02 | Ge Healthcare Bio-Sciences Corp. | Method for reducing image artifacts produced by a cmos camera |
US8465922B2 (en) | 2010-08-26 | 2013-06-18 | Pacific Biosciences Of California, Inc. | Methods and systems for monitoring reactions |
US10273540B2 (en) | 2010-10-27 | 2019-04-30 | Life Technologies Corporation | Methods and apparatuses for estimating parameters in a predictive model for use in sequencing-by-synthesis |
WO2012058459A2 (en) | 2010-10-27 | 2012-05-03 | Life Technologies Corporation | Predictive model for use in sequencing-by-synthesis |
WO2012118555A1 (en) | 2010-12-29 | 2012-09-07 | Life Technologies Corporation | Time-warped background signal for sequencing-by-synthesis operations |
EP2658999B1 (en) | 2010-12-30 | 2019-03-13 | Life Technologies Corporation | Models for analyzing data from sequencing-by-synthesis operations |
US10241075B2 (en) | 2010-12-30 | 2019-03-26 | Life Technologies Corporation | Methods, systems, and computer readable media for nucleic acid sequencing |
US20130060482A1 (en) | 2010-12-30 | 2013-03-07 | Life Technologies Corporation | Methods, systems, and computer readable media for making base calls in nucleic acid sequencing |
WO2012121756A1 (en) * | 2011-03-04 | 2012-09-13 | Quantapore, Inc. | Apparatus and methods for performing optical nanopore detection or sequencing |
US9611510B2 (en) | 2011-04-06 | 2017-04-04 | The University Of Chicago | Composition and methods related to modification of 5-methylcytosine (5-mC) |
EP3366782B1 (en) | 2011-04-08 | 2021-03-10 | Life Technologies Corporation | Phase-protecting reagent flow orderings for use in sequencing-by-synthesis |
US10704164B2 (en) | 2011-08-31 | 2020-07-07 | Life Technologies Corporation | Methods, systems, computer readable media, and kits for sample identification |
US9347900B2 (en) | 2011-10-14 | 2016-05-24 | Pacific Biosciences Of California, Inc. | Real-time redox sequencing |
CA2852949A1 (en) | 2011-10-19 | 2013-04-25 | Nugen Technologies, Inc. | Compositions and methods for directional nucleic acid amplification and sequencing |
EP2807292B1 (en) | 2012-01-26 | 2019-05-22 | Tecan Genomics, Inc. | Compositions and methods for targeted nucleic acid sequence enrichment and high efficiency library generation |
US9238836B2 (en) | 2012-03-30 | 2016-01-19 | Pacific Biosciences Of California, Inc. | Methods and compositions for sequencing modified nucleic acids |
WO2013163207A1 (en) | 2012-04-24 | 2013-10-31 | Pacific Biosciences Of California, Inc. | Identification of 5-methyl-c in nucleic acid templates |
US9646132B2 (en) | 2012-05-11 | 2017-05-09 | Life Technologies Corporation | Models for analyzing data from sequencing-by-synthesis operations |
GB2518078B (en) | 2012-06-18 | 2015-04-29 | Nugen Technologies Inc | Compositions and methods for negative selection of non-desired nucleic acid sequences |
US20150011396A1 (en) | 2012-07-09 | 2015-01-08 | Benjamin G. Schroeder | Methods for creating directional bisulfite-converted nucleic acid libraries for next generation sequencing |
US10777301B2 (en) | 2012-07-13 | 2020-09-15 | Pacific Biosciences For California, Inc. | Hierarchical genome assembly method using single long insert library |
US10329608B2 (en) | 2012-10-10 | 2019-06-25 | Life Technologies Corporation | Methods, systems, and computer readable media for repeat sequencing |
US9651539B2 (en) | 2012-10-28 | 2017-05-16 | Quantapore, Inc. | Reducing background fluorescence in MEMS materials by low energy ion beam treatment |
CN105008878B (en) * | 2012-12-05 | 2017-09-19 | 吉恩波克公司 | Optical interrogation device |
US9562269B2 (en) | 2013-01-22 | 2017-02-07 | The Board Of Trustees Of The Leland Stanford Junior University | Haplotying of HLA loci with ultra-deep shotgun sequencing |
US20140296080A1 (en) | 2013-03-14 | 2014-10-02 | Life Technologies Corporation | Methods, Systems, and Computer Readable Media for Evaluating Variant Likelihood |
US9146248B2 (en) | 2013-03-14 | 2015-09-29 | Intelligent Bio-Systems, Inc. | Apparatus and methods for purging flow cells in nucleic acid sequencing instruments |
US9822408B2 (en) | 2013-03-15 | 2017-11-21 | Nugen Technologies, Inc. | Sequential sequencing |
US9591268B2 (en) | 2013-03-15 | 2017-03-07 | Qiagen Waltham, Inc. | Flow cell alignment methods and systems |
EP2994544B1 (en) | 2013-05-06 | 2019-10-02 | Pacific Biosciences Of California, Inc. | Real-time electronic sequencing |
TWI498789B (en) * | 2013-05-20 | 2015-09-01 | Lite On Singapore Pte Ltd | Proximity sensing method, proximity sensing device, and electronic device |
US9862997B2 (en) | 2013-05-24 | 2018-01-09 | Quantapore, Inc. | Nanopore-based nucleic acid analysis with mixed FRET detection |
US9926597B2 (en) | 2013-07-26 | 2018-03-27 | Life Technologies Corporation | Control nucleic acid sequences for use in sequencing-by-synthesis and methods for designing the same |
WO2015021079A1 (en) | 2013-08-05 | 2015-02-12 | Pacific Biosciences Of California, Inc. | Protected fluorescent reagent compounds |
US9879318B2 (en) | 2013-09-06 | 2018-01-30 | Pacific Biosciences Of California, Inc. | Methods and compositions for nucleic acid sample preparation |
WO2015051338A1 (en) | 2013-10-04 | 2015-04-09 | Life Technologies Corporation | Methods and systems for modeling phasing effects in sequencing using termination chemistry |
EP3068883B1 (en) | 2013-11-13 | 2020-04-29 | Nugen Technologies, Inc. | Compositions and methods for identification of a duplicate sequencing read |
EP3077430A4 (en) | 2013-12-05 | 2017-08-16 | Centrillion Technology Holdings Corporation | Modified surfaces |
CN106460032B (en) | 2013-12-05 | 2019-12-24 | 生捷科技控股公司 | Preparation of patterned arrays |
EP3077545B1 (en) | 2013-12-05 | 2020-09-16 | Centrillion Technology Holdings Corporation | Methods for sequencing nucleic acids |
US9745614B2 (en) | 2014-02-28 | 2017-08-29 | Nugen Technologies, Inc. | Reduced representation bisulfite sequencing with diversity adaptors |
US11060139B2 (en) | 2014-03-28 | 2021-07-13 | Centrillion Technology Holdings Corporation | Methods for sequencing nucleic acids |
US10102337B2 (en) | 2014-08-06 | 2018-10-16 | Nugen Technologies, Inc. | Digital measurements from targeted sequencing |
ES2789000T3 (en) | 2014-10-10 | 2020-10-23 | Quantapore Inc | Nanopore-based polynucleotide analysis with mutually inactivating fluorescent labels |
WO2016060974A1 (en) | 2014-10-13 | 2016-04-21 | Life Technologies Corporation | Methods, systems, and computer-readable media for accelerated base calling |
JP6757316B2 (en) | 2014-10-24 | 2020-09-16 | クアンタポール, インコーポレイテッド | Efficient optical analysis of polymers using nanostructured arrays |
US9859394B2 (en) | 2014-12-18 | 2018-01-02 | Agilome, Inc. | Graphene FET devices, systems, and methods of using the same for sequencing nucleic acids |
US9618474B2 (en) | 2014-12-18 | 2017-04-11 | Edico Genome, Inc. | Graphene FET devices, systems, and methods of using the same for sequencing nucleic acids |
WO2016100049A1 (en) | 2014-12-18 | 2016-06-23 | Edico Genome Corporation | Chemically-sensitive field effect transistor |
US10020300B2 (en) | 2014-12-18 | 2018-07-10 | Agilome, Inc. | Graphene FET devices, systems, and methods of using the same for sequencing nucleic acids |
US9857328B2 (en) | 2014-12-18 | 2018-01-02 | Agilome, Inc. | Chemically-sensitive field effect transistors, systems and methods for manufacturing and using the same |
US10006910B2 (en) | 2014-12-18 | 2018-06-26 | Agilome, Inc. | Chemically-sensitive field effect transistors, systems, and methods for manufacturing and using the same |
US10302972B2 (en) | 2015-01-23 | 2019-05-28 | Pacific Biosciences Of California, Inc. | Waveguide transmission |
WO2016126941A1 (en) | 2015-02-04 | 2016-08-11 | Pacific Biosciences Of California, Inc. | Multimeric protected fluorescent reagents |
WO2016179437A1 (en) * | 2015-05-07 | 2016-11-10 | Pacific Biosciences Of California, Inc. | Multiprocessor pipeline architecture |
EP4220645A3 (en) | 2015-05-14 | 2023-11-08 | Life Technologies Corporation | Barcode sequences, and related systems and methods |
WO2016191380A1 (en) | 2015-05-26 | 2016-12-01 | Pacific Biosciences Of California, Inc. | De novo diploid genome assembly and haplotype sequence reconstruction |
EP3527672B1 (en) | 2015-06-09 | 2022-10-05 | Centrillion Technology Holdings Corporation | Oligonucleotide arrays for sequencing nucleic acids |
EP3332033B1 (en) | 2015-08-06 | 2021-04-21 | Pacific Biosciences of California, Inc. | Single-molecule nanofet sequencing systems and methods |
CN108140240B (en) | 2015-08-12 | 2022-05-31 | 分子装置有限公司 | System and method for automated analysis of phenotypic responses of cells |
US10676788B2 (en) | 2015-11-20 | 2020-06-09 | Pacific Biosciences Of California, Inc. | Modified nucleotide reagents |
WO2017087974A1 (en) | 2015-11-20 | 2017-05-26 | Pacific Biosciences Of California, Inc. | Protected dye-labeled reagents |
US10781483B2 (en) | 2015-11-20 | 2020-09-22 | Pacific Biosciences Of California, Inc. | Labeled nucleotide analogs, reaction mixtures, and methods and systems for sequencing |
US10619205B2 (en) | 2016-05-06 | 2020-04-14 | Life Technologies Corporation | Combinatorial barcode sequences, and related systems and methods |
WO2017201081A1 (en) | 2016-05-16 | 2017-11-23 | Agilome, Inc. | Graphene fet devices, systems, and methods of using the same for sequencing nucleic acids |
KR102425257B1 (en) * | 2016-06-01 | 2022-07-27 | 퀀텀-에스아이 인코포레이티드 | Pulse Caller and Base Caller |
WO2017223515A1 (en) | 2016-06-23 | 2017-12-28 | F. Hoffman-La Roche Ag | Formation and calibration of nanopore sequencing cells |
US11124827B2 (en) | 2016-06-23 | 2021-09-21 | Roche Sequencing Solutions, Inc. | Period-to-period analysis of AC signals from nanopore sequencing |
US10823721B2 (en) | 2016-07-05 | 2020-11-03 | Quantapore, Inc. | Optically based nanopore sequencing |
EP3497233B1 (en) | 2016-08-08 | 2021-11-10 | F. Hoffmann-La Roche AG | Basecalling for stochastic sequencing processes |
US10190155B2 (en) | 2016-10-14 | 2019-01-29 | Nugen Technologies, Inc. | Molecular tag attachment and transfer |
US11099202B2 (en) | 2017-10-20 | 2021-08-24 | Tecan Genomics, Inc. | Reagent delivery system |
EP3704521A4 (en) | 2017-11-03 | 2021-07-07 | Pacific Biosciences of California, Inc. | Systems, devices, and methods for improved optical waveguide transmission and alignment |
CN111512155B (en) | 2017-12-28 | 2022-07-05 | 豪夫迈·罗氏有限公司 | Measuring and removing noise in random signals from an alternating signal driven nanopore DNA sequencing system |
KR20200115590A (en) * | 2018-01-26 | 2020-10-07 | 퀀텀-에스아이 인코포레이티드 | Machine-learnable pulse and base calls for sequencing devices |
EP3765632A4 (en) | 2018-03-13 | 2021-12-08 | Sarmal, Inc. | Methods for single molecule sequencing |
EP3782158A1 (en) * | 2018-04-19 | 2021-02-24 | Omniome, Inc. | Improving accuracy of base calls in nucleic acid sequencing methods |
CN113454217A (en) | 2018-12-07 | 2021-09-28 | 奥科坦特公司 | System for screening protein-protein interaction |
JP7282880B2 (en) * | 2019-05-22 | 2023-05-29 | 株式会社日立ハイテク | Analysis device and analysis method |
MA56037A (en) | 2019-05-28 | 2022-04-06 | Octant Inc | TRANSCRIPTION RELAY SYSTEM |
GB2604481A (en) | 2019-10-10 | 2022-09-07 | 1859 Inc | Methods and systems for microfluidic screening |
CN112257734B (en) * | 2019-11-15 | 2024-08-20 | 北京沃东天骏信息技术有限公司 | Information processing method and device and storage medium |
US12059674B2 (en) | 2020-02-03 | 2024-08-13 | Tecan Genomics, Inc. | Reagent storage system |
US11188778B1 (en) * | 2020-05-05 | 2021-11-30 | Illumina, Inc. | Equalization-based image processing and spatial crosstalk attenuator |
CN111647506B (en) * | 2020-05-18 | 2023-11-03 | 深圳市真迈生物科技有限公司 | Positioning method, positioning device and sequencing system |
WO2022008641A1 (en) | 2020-07-08 | 2022-01-13 | Roche Sequencing Solutions, Inc. | Split-pool synthesis apparatus and methods of performing split-pool synthesis |
JP2023545478A (en) | 2020-10-15 | 2023-10-30 | カパ バイオシステムズ,インコーポレイティド | Electrophoretic devices and methods for next generation sequencing library preparation |
US11361194B2 (en) | 2020-10-27 | 2022-06-14 | Illumina, Inc. | Systems and methods for per-cluster intensity correction and base calling |
WO2022194764A1 (en) | 2021-03-15 | 2022-09-22 | F. Hoffmann-La Roche Ag | Targeted next-generation sequencing via anchored primer extension |
CN117098854A (en) | 2021-03-26 | 2023-11-21 | 豪夫迈·罗氏有限公司 | Hybridization buffer formulations |
WO2022208171A1 (en) | 2021-03-31 | 2022-10-06 | UCL Business Ltd. | Methods for analyte detection |
WO2022207682A1 (en) | 2021-04-01 | 2022-10-06 | F. Hoffmann-La Roche Ag | Immune cell counting of sars-cov-2 patients based on immune repertoire sequencing |
US11455487B1 (en) | 2021-10-26 | 2022-09-27 | Illumina Software, Inc. | Intensity extraction and crosstalk attenuation using interpolation and adaptation for base calling |
WO2024003332A1 (en) | 2022-06-30 | 2024-01-04 | F. Hoffmann-La Roche Ag | Controlling for tagmentation sequencing library insert size using archaeal histone-like proteins |
WO2024046992A1 (en) | 2022-09-02 | 2024-03-07 | F. Hoffmann-La Roche Ag | Improvements to next-generation target enrichment performance |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20030064366A1 (en) * | 2000-07-07 | 2003-04-03 | Susan Hardin | Real-time sequence determination |
US20030096302A1 (en) * | 2001-02-23 | 2003-05-22 | Genicon Sciences Corporation | Methods for providing extended dynamic range in analyte assays |
US20040009586A1 (en) * | 1998-05-16 | 2004-01-15 | Oldham Mark F. | Instrument for monitoring nucleic acid sequence amplification reaction |
US20050233363A1 (en) * | 2003-09-19 | 2005-10-20 | Harding Ian A | Whole genome expression analysis system |
US20060014151A1 (en) * | 2002-12-25 | 2006-01-19 | Jun Ogura | Optical dna sensor, dna reading apparatus, identification method of dna and manufacturing method of optical dna sensor |
US20060019267A1 (en) * | 2004-02-19 | 2006-01-26 | Stephen Quake | Methods and kits for analyzing polynucleotide sequences |
US20060063264A1 (en) * | 2004-09-17 | 2006-03-23 | Stephen Turner | Apparatus and method for performing nucleic acid analysis |
US20070036511A1 (en) * | 2005-08-11 | 2007-02-15 | Pacific Biosciences Of California, Inc. | Methods and systems for monitoring multiple optical signals from a single source |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1072006A1 (en) * | 1998-04-16 | 2001-01-31 | Northeastern University | Expert system for analysis of dna sequencing electropherograms |
US7056661B2 (en) * | 1999-05-19 | 2006-06-06 | Cornell Research Foundation, Inc. | Method for sequencing nucleic acid molecules |
ES2388722T3 (en) * | 2005-12-21 | 2012-10-18 | F. Hoffmann-La Roche Ag | Sequencing and genotyping using 2 'modified nucleotides reversibly |
-
2008
- 2008-06-05 CA CA2689626A patent/CA2689626C/en active Active
- 2008-06-05 US US12/134,186 patent/US8182993B2/en active Active
- 2008-06-05 AU AU2008261935A patent/AU2008261935B2/en active Active
- 2008-06-05 EP EP08770244.5A patent/EP2155855B1/en active Active
- 2008-06-05 WO PCT/US2008/065996 patent/WO2008154317A1/en active Application Filing
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040009586A1 (en) * | 1998-05-16 | 2004-01-15 | Oldham Mark F. | Instrument for monitoring nucleic acid sequence amplification reaction |
US20030064366A1 (en) * | 2000-07-07 | 2003-04-03 | Susan Hardin | Real-time sequence determination |
US20030096302A1 (en) * | 2001-02-23 | 2003-05-22 | Genicon Sciences Corporation | Methods for providing extended dynamic range in analyte assays |
US20060014151A1 (en) * | 2002-12-25 | 2006-01-19 | Jun Ogura | Optical dna sensor, dna reading apparatus, identification method of dna and manufacturing method of optical dna sensor |
US20050233363A1 (en) * | 2003-09-19 | 2005-10-20 | Harding Ian A | Whole genome expression analysis system |
US20060019267A1 (en) * | 2004-02-19 | 2006-01-26 | Stephen Quake | Methods and kits for analyzing polynucleotide sequences |
US20060063264A1 (en) * | 2004-09-17 | 2006-03-23 | Stephen Turner | Apparatus and method for performing nucleic acid analysis |
US20070036511A1 (en) * | 2005-08-11 | 2007-02-15 | Pacific Biosciences Of California, Inc. | Methods and systems for monitoring multiple optical signals from a single source |
Non-Patent Citations (1)
Title |
---|
See also references of EP2155855A4 * |
Cited By (120)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8652781B2 (en) | 2008-02-12 | 2014-02-18 | Pacific Biosciences Of California, Inc. | Cognate sampling kinetics |
US9719073B2 (en) | 2008-03-31 | 2017-08-01 | Pacific Biosciences Of California, Inc. | Recombinant polymerases for improved single molecule sequencing |
US8133672B2 (en) | 2008-03-31 | 2012-03-13 | Pacific Biosciences Of California, Inc. | Two slow-step polymerase enzyme systems and methods |
US10167455B2 (en) | 2008-03-31 | 2019-01-01 | Pacific Biosciences Of California, Inc. | Recombinant polymerases for improved single molecule sequencing |
US8257954B2 (en) | 2008-03-31 | 2012-09-04 | Pacific Biosciences Of California, Inc. | Generation of modified polymerases for improved accuracy in single molecule sequencing |
US8658365B2 (en) | 2008-03-31 | 2014-02-25 | Pacific Biosciences Of California, Inc. | Nucleic acid synthesis compositions and methods and systems for using same |
US8420366B2 (en) | 2008-03-31 | 2013-04-16 | Pacific Biosciences Of California, Inc. | Generation of modified polymerases for improved accuracy in single molecule sequencing |
US8999676B2 (en) | 2008-03-31 | 2015-04-07 | Pacific Biosciences Of California, Inc. | Recombinant polymerases for improved single molecule sequencing |
US10975362B2 (en) | 2008-03-31 | 2021-04-13 | Pacific Biosciences Of California, Inc. | Recombinant polymerases for improved single molecule sequencing |
US11746338B2 (en) | 2008-03-31 | 2023-09-05 | Pacific Biosciences Of California, Inc. | Recombinant polymerases for improved single molecule sequencing |
US9279155B2 (en) | 2008-03-31 | 2016-03-08 | Pacific Biosciences Of California, Inc. | Step-wise nucleic acid sequencing with catalytic and non-catalytic metals |
US8530164B2 (en) | 2008-09-05 | 2013-09-10 | Pacific Biosciences Of California, Inc. | Method for sequencing using branching fraction of incorporatable nucleotides |
US9365839B2 (en) | 2009-03-27 | 2016-06-14 | Life Technologies Corporation | Polymerase compositions and methods |
US9365838B2 (en) | 2009-03-27 | 2016-06-14 | Life Technologies Corporation | Conjugates of biomolecules to nanoparticles |
US11542549B2 (en) | 2009-03-27 | 2023-01-03 | Life Technologies Corporation | Labeled enzyme compositions, methods and systems |
US10093974B2 (en) | 2009-03-27 | 2018-10-09 | Life Technologies Corporation | Methods and apparatus for single molecule sequencing using energy transfer detection |
US9695471B2 (en) | 2009-03-27 | 2017-07-04 | Life Technologies Corporation | Methods and apparatus for single molecule sequencing using energy transfer detection |
US9567629B2 (en) | 2009-03-27 | 2017-02-14 | Life Technologies Corporation | Labeled enzyme compositions, methods and systems |
US11008612B2 (en) | 2009-03-27 | 2021-05-18 | Life Technologies Corporation | Methods and apparatus for single molecule sequencing using energy transfer detection |
US10093973B2 (en) | 2009-03-27 | 2018-10-09 | Life Technologies Corporation | Polymerase compositions and methods |
US10093972B2 (en) | 2009-03-27 | 2018-10-09 | Life Technologies Corporation | Conjugates of biomolecules to nanoparticles |
US8999674B2 (en) | 2009-03-27 | 2015-04-07 | Life Technologies Corporation | Methods and apparatus for single molecule sequencing using energy transfer detection |
US11015220B2 (en) | 2009-03-27 | 2021-05-25 | Life Technologies Corporation | Conjugates of biomolecules to nanoparticles |
US11453909B2 (en) | 2009-03-27 | 2022-09-27 | Life Technologies Corporation | Polymerase compositions and methods |
US9932573B2 (en) | 2009-03-27 | 2018-04-03 | Life Technologies Corporation | Labeled enzyme compositions, methods and systems |
US10184148B2 (en) | 2010-07-12 | 2019-01-22 | Pacific Biosciences Of California, Inc. | Sequencing reactions with monovalent cations for pulse width control |
US10612087B2 (en) | 2010-07-12 | 2020-04-07 | Pacific Biosciences Of California, Inc. | Sequencing reactions with monovalent cations for pulse width control |
US8986930B2 (en) | 2010-07-12 | 2015-03-24 | Pacific Biosciences Of California, Inc. | Sequencing reactions with alkali metal cations for pulse width control |
US9650671B2 (en) | 2010-07-12 | 2017-05-16 | Pacific Biosciences Of California, Inc. | Sequencing reactions with lithium for pulse width control |
EP2831283A4 (en) * | 2012-03-30 | 2015-11-04 | Pacific Biosciences California | Methods and composition for sequencing modified nucleic acids |
US10480027B2 (en) | 2012-06-08 | 2019-11-19 | Pacific Biosciences Of California, Inc. | Nanopore sequencing methods |
US10947600B2 (en) | 2012-09-04 | 2021-03-16 | Guardant Health, Inc. | Systems and methods to detect rare mutations and copy number variation |
EP2893040A1 (en) * | 2012-09-04 | 2015-07-15 | Guardant Health Inc. | Systems and methods to detect rare mutations and copy number variation |
US12116624B2 (en) | 2012-09-04 | 2024-10-15 | Guardant Health, Inc. | Systems and methods to detect rare mutations and copy number variation |
US12110560B2 (en) | 2012-09-04 | 2024-10-08 | Guardant Health, Inc. | Methods for monitoring residual disease |
US10457995B2 (en) | 2012-09-04 | 2019-10-29 | Guardant Health, Inc. | Systems and methods to detect rare mutations and copy number variation |
US9902992B2 (en) | 2012-09-04 | 2018-02-27 | Guardant Helath, Inc. | Systems and methods to detect rare mutations and copy number variation |
US12054783B2 (en) | 2012-09-04 | 2024-08-06 | Guardant Health, Inc. | Systems and methods to detect rare mutations and copy number variation |
US10494678B2 (en) | 2012-09-04 | 2019-12-03 | Guardant Health, Inc. | Systems and methods to detect rare mutations and copy number variation |
US10501810B2 (en) | 2012-09-04 | 2019-12-10 | Guardant Health, Inc. | Systems and methods to detect rare mutations and copy number variation |
US10501808B2 (en) | 2012-09-04 | 2019-12-10 | Guardant Health, Inc. | Systems and methods to detect rare mutations and copy number variation |
EP3591073A1 (en) * | 2012-09-04 | 2020-01-08 | Guardant Health, Inc. | Systems and methods to detect rare mutations and copy number variation |
US11319597B2 (en) | 2012-09-04 | 2022-05-03 | Guardant Health, Inc. | Systems and methods to detect rare mutations and copy number variation |
US11319598B2 (en) | 2012-09-04 | 2022-05-03 | Guardant Health, Inc. | Systems and methods to detect rare mutations and copy number variation |
US10683556B2 (en) | 2012-09-04 | 2020-06-16 | Guardant Health, Inc. | Systems and methods to detect rare mutations and copy number variation |
US12049673B2 (en) | 2012-09-04 | 2024-07-30 | Guardant Health, Inc. | Systems and methods to detect rare mutations and copy number variation |
US11913065B2 (en) | 2012-09-04 | 2024-02-27 | Guardent Health, Inc. | Systems and methods to detect rare mutations and copy number variation |
US10738364B2 (en) | 2012-09-04 | 2020-08-11 | Guardant Health, Inc. | Systems and methods to detect rare mutations and copy number variation |
US10793916B2 (en) | 2012-09-04 | 2020-10-06 | Guardant Health, Inc. | Systems and methods to detect rare mutations and copy number variation |
EP4036247A1 (en) * | 2012-09-04 | 2022-08-03 | Guardant Health, Inc. | Systems and methods to detect rare mutations and copy number variation |
US10822663B2 (en) | 2012-09-04 | 2020-11-03 | Guardant Health, Inc. | Systems and methods to detect rare mutations and copy number variation |
US10837063B2 (en) | 2012-09-04 | 2020-11-17 | Guardant Health, Inc. | Systems and methods to detect rare mutations and copy number variation |
US11879158B2 (en) | 2012-09-04 | 2024-01-23 | Guardant Health, Inc. | Systems and methods to detect rare mutations and copy number variation |
US10876152B2 (en) | 2012-09-04 | 2020-12-29 | Guardant Health, Inc. | Systems and methods to detect rare mutations and copy number variation |
US10876172B2 (en) | 2012-09-04 | 2020-12-29 | Guardant Health, Inc. | Systems and methods to detect rare mutations and copy number variation |
US10876171B2 (en) | 2012-09-04 | 2020-12-29 | Guardant Health, Inc. | Systems and methods to detect rare mutations and copy number variation |
US11773453B2 (en) | 2012-09-04 | 2023-10-03 | Guardant Health, Inc. | Systems and methods to detect rare mutations and copy number variation |
US10041127B2 (en) | 2012-09-04 | 2018-08-07 | Guardant Health, Inc. | Systems and methods to detect rare mutations and copy number variation |
US10894974B2 (en) | 2012-09-04 | 2021-01-19 | Guardant Health, Inc. | Systems and methods to detect rare mutations and copy number variation |
US9840743B2 (en) | 2012-09-04 | 2017-12-12 | Guardant Health, Inc. | Systems and methods to detect rare mutations and copy number variation |
US10961592B2 (en) | 2012-09-04 | 2021-03-30 | Guardant Health, Inc. | Systems and methods to detect rare mutations and copy number variation |
US9834822B2 (en) | 2012-09-04 | 2017-12-05 | Guardant Health, Inc. | Systems and methods to detect rare mutations and copy number variation |
US11434523B2 (en) | 2012-09-04 | 2022-09-06 | Guardant Health, Inc. | Systems and methods to detect rare mutations and copy number variation |
US10995376B1 (en) | 2012-09-04 | 2021-05-04 | Guardant Health, Inc. | Systems and methods to detect rare mutations and copy number variation |
US11001899B1 (en) | 2012-09-04 | 2021-05-11 | Guardant Health, Inc. | Systems and methods to detect rare mutations and copy number variation |
US9598731B2 (en) | 2012-09-04 | 2017-03-21 | Guardant Health, Inc. | Systems and methods to detect rare mutations and copy number variation |
EP2893040A4 (en) * | 2012-09-04 | 2016-04-27 | Guardant Health Inc | Systems and methods to detect rare mutations and copy number variation |
EP3842551A1 (en) * | 2012-09-04 | 2021-06-30 | Guardant Health, Inc. | Methods of analysing cell free polynucleotides |
US9399766B2 (en) | 2012-10-01 | 2016-07-26 | Pacific Biosciences Of California, Inc. | Recombinant polymerases for incorporation of protein shield nucleotide analogs |
US11198906B2 (en) | 2012-10-01 | 2021-12-14 | Pacific Biosciences Of California, Inc. | Recombinant polymerases for incorporation of protein shield nucleotide analogs |
US11891659B2 (en) | 2012-10-01 | 2024-02-06 | Pacific Biosciences Of California, Inc. | Recombinant polymerases for incorporation of protein shield nucleotide analogs |
US10626456B2 (en) | 2012-10-01 | 2020-04-21 | Pacific Biosciences Of California, Inc. | Recombinant polymerases for incorporation of protein shield nucleotide analogs |
US9873911B2 (en) | 2012-10-01 | 2018-01-23 | Pacific Biosciences Of California, Inc. | Recombinant polymerases for incorporation of protein shield nucleotide analogs |
EP3882362A1 (en) * | 2013-03-15 | 2021-09-22 | Guardant Health, Inc. | Methods for sequencing of cell free polynucleotides |
US11767555B2 (en) | 2013-12-28 | 2023-09-26 | Guardant Health, Inc. | Methods and systems for detecting genetic variants |
US12024745B2 (en) | 2013-12-28 | 2024-07-02 | Guardant Health, Inc. | Methods and systems for detecting genetic variants |
US9920366B2 (en) | 2013-12-28 | 2018-03-20 | Guardant Health, Inc. | Methods and systems for detecting genetic variants |
US11149307B2 (en) | 2013-12-28 | 2021-10-19 | Guardant Health, Inc. | Methods and systems for detecting genetic variants |
US11149306B2 (en) | 2013-12-28 | 2021-10-19 | Guardant Health, Inc. | Methods and systems for detecting genetic variants |
US12098421B2 (en) | 2013-12-28 | 2024-09-24 | Guardant Health, Inc. | Methods and systems for detecting genetic variants |
US11118221B2 (en) | 2013-12-28 | 2021-09-14 | Guardant Health, Inc. | Methods and systems for detecting genetic variants |
US11434531B2 (en) | 2013-12-28 | 2022-09-06 | Guardant Health, Inc. | Methods and systems for detecting genetic variants |
US12098422B2 (en) | 2013-12-28 | 2024-09-24 | Guardant Health, Inc. | Methods and systems for detecting genetic variants |
US12054774B2 (en) | 2013-12-28 | 2024-08-06 | Guardant Health, Inc. | Methods and systems for detecting genetic variants |
US12024746B2 (en) | 2013-12-28 | 2024-07-02 | Guardant Health, Inc. | Methods and systems for detecting genetic variants |
US11959139B2 (en) | 2013-12-28 | 2024-04-16 | Guardant Health, Inc. | Methods and systems for detecting genetic variants |
US10801063B2 (en) | 2013-12-28 | 2020-10-13 | Guardant Health, Inc. | Methods and systems for detecting genetic variants |
US10883139B2 (en) | 2013-12-28 | 2021-01-05 | Guardant Health, Inc. | Methods and systems for detecting genetic variants |
US11767556B2 (en) | 2013-12-28 | 2023-09-26 | Guardant Health, Inc. | Methods and systems for detecting genetic variants |
US11639525B2 (en) | 2013-12-28 | 2023-05-02 | Guardant Health, Inc. | Methods and systems for detecting genetic variants |
US11639526B2 (en) | 2013-12-28 | 2023-05-02 | Guardant Health, Inc. | Methods and systems for detecting genetic variants |
US11649491B2 (en) | 2013-12-28 | 2023-05-16 | Guardant Health, Inc. | Methods and systems for detecting genetic variants |
US11667967B2 (en) | 2013-12-28 | 2023-06-06 | Guardant Health, Inc. | Methods and systems for detecting genetic variants |
US10889858B2 (en) | 2013-12-28 | 2021-01-12 | Guardant Health, Inc. | Methods and systems for detecting genetic variants |
US11091796B2 (en) | 2014-03-05 | 2021-08-17 | Guardant Health, Inc. | Systems and methods to detect rare mutations and copy number variation |
US10982265B2 (en) | 2014-03-05 | 2021-04-20 | Guardant Health, Inc. | Systems and methods to detect rare mutations and copy number variation |
US11667959B2 (en) | 2014-03-05 | 2023-06-06 | Guardant Health, Inc. | Systems and methods to detect rare mutations and copy number variation |
US11091797B2 (en) | 2014-03-05 | 2021-08-17 | Guardant Health, Inc. | Systems and methods to detect rare mutations and copy number variation |
US10870880B2 (en) | 2014-03-05 | 2020-12-22 | Guardant Health, Inc. | Systems and methods to detect rare mutations and copy number variation |
US10704085B2 (en) | 2014-03-05 | 2020-07-07 | Guardant Health, Inc. | Systems and methods to detect rare mutations and copy number variation |
US11447813B2 (en) | 2014-03-05 | 2022-09-20 | Guardant Health, Inc. | Systems and methods to detect rare mutations and copy number variation |
US10704086B2 (en) | 2014-03-05 | 2020-07-07 | Guardant Health, Inc. | Systems and methods to detect rare mutations and copy number variation |
US10282842B2 (en) | 2014-03-14 | 2019-05-07 | Unisense Fertilitech A/S | Methods and apparatus for analysing embryo development |
WO2015135718A1 (en) * | 2014-03-14 | 2015-09-17 | Unisense Fertilitech A/S | Methods and apparatus for analysing embryo development |
AU2015230291B2 (en) * | 2014-03-14 | 2019-11-21 | Unisense Fertilitech A/S | Methods and apparatus for analysing embryo development |
US11242569B2 (en) | 2015-12-17 | 2022-02-08 | Guardant Health, Inc. | Methods to determine tumor gene copy number by analysis of cell-free DNA |
US11783917B2 (en) | 2019-03-21 | 2023-10-10 | Illumina, Inc. | Artificial intelligence-based base calling |
US11210554B2 (en) | 2019-03-21 | 2021-12-28 | Illumina, Inc. | Artificial intelligence-based generation of sequencing metadata |
US11908548B2 (en) | 2019-03-21 | 2024-02-20 | Illumina, Inc. | Training data generation for artificial intelligence-based sequencing |
US12119088B2 (en) | 2019-03-21 | 2024-10-15 | Illumina, Inc. | Deep neural network-based sequencing |
US11436429B2 (en) | 2019-03-21 | 2022-09-06 | Illumina, Inc. | Artificial intelligence-based sequencing |
US11676685B2 (en) | 2019-03-21 | 2023-06-13 | Illumina, Inc. | Artificial intelligence-based quality scoring |
US11347965B2 (en) | 2019-03-21 | 2022-05-31 | Illumina, Inc. | Training data generation for artificial intelligence-based sequencing |
US11961593B2 (en) | 2019-03-21 | 2024-04-16 | Illumina, Inc. | Artificial intelligence-based determination of analyte data for base calling |
US11593649B2 (en) | 2019-05-16 | 2023-02-28 | Illumina, Inc. | Base calling using convolutions |
US11817182B2 (en) | 2019-05-16 | 2023-11-14 | Illumina, Inc. | Base calling using three-dimentional (3D) convolution |
US12106828B2 (en) | 2019-05-16 | 2024-10-01 | Illumina, Inc. | Systems and devices for signal corrections in pixel-based sequencing |
US12106829B2 (en) | 2020-02-20 | 2024-10-01 | Illumina, Inc. | Artificial intelligence-based many-to-many base calling |
US11749380B2 (en) | 2020-02-20 | 2023-09-05 | Illumina, Inc. | Artificial intelligence-based many-to-many base calling |
US11515010B2 (en) | 2021-04-15 | 2022-11-29 | Illumina, Inc. | Deep convolutional neural networks to predict variant pathogenicity using three-dimensional (3D) protein structures |
Also Published As
Publication number | Publication date |
---|---|
EP2155855B1 (en) | 2016-10-12 |
CA2689626A1 (en) | 2008-12-18 |
US8182993B2 (en) | 2012-05-22 |
CA2689626C (en) | 2016-10-25 |
US20090024331A1 (en) | 2009-01-22 |
EP2155855A4 (en) | 2015-05-06 |
AU2008261935B2 (en) | 2013-05-02 |
AU2008261935A1 (en) | 2008-12-18 |
EP2155855A1 (en) | 2010-02-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CA2689626C (en) | Methods and processes for calling bases in sequence by incorporation methods | |
US10023911B2 (en) | Methods and processes for calling bases in sequence by incorporation methods | |
US20120015825A1 (en) | Analytical systems and methods with software mask | |
EP3969884B1 (en) | Systems and methods for characterization and performance analysis of pixel-based sequencing | |
US12106828B2 (en) | Systems and devices for signal corrections in pixel-based sequencing | |
CN113012757B (en) | Method and system for identifying bases in nucleic acids | |
CN109564189B (en) | Electropherogram analysis | |
EP3387613B1 (en) | Background compensation | |
CN116994246A (en) | Base recognition method and device based on multitasking combination, gene sequencer and medium | |
CN117392673B (en) | Base recognition method and device, gene sequencer and medium | |
CN117237198B (en) | Super-resolution sequencing method and device based on deep learning, sequencer and medium | |
US20070177799A1 (en) | Image analysis | |
US7732217B2 (en) | Apparatus and method for reading fluorescence from bead arrays | |
EP3387616A1 (en) | Object classification in digital images | |
US20240177807A1 (en) | Cluster segmentation and conditional base calling | |
CN117523559B (en) | Base recognition method and device, gene sequencer and storage medium | |
US20240100518A1 (en) | Flow cell based motion system calibration and control methods | |
CN117274739A (en) | Base recognition method, training set construction method thereof, gene sequencer and medium | |
CN118429966A (en) | Base recognition method and system | |
Cysewska-Sobusiak | Optoelectronic identification of changes in DNA structure | |
Khojasteh Lakelayeh | Quality filtering and normalization for microarray-based CGH data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 08770244 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2008261935 Country of ref document: AU |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2689626 Country of ref document: CA |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
REEP | Request for entry into the european phase |
Ref document number: 2008770244 Country of ref document: EP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2008770244 Country of ref document: EP |
|
ENP | Entry into the national phase |
Ref document number: 2008261935 Country of ref document: AU Date of ref document: 20080605 Kind code of ref document: A |