WO2021072394A1 - Identification of genomic structural variants using long-read sequencing - Google Patents
Identification of genomic structural variants using long-read sequencing Download PDFInfo
- Publication number
- WO2021072394A1 WO2021072394A1 PCT/US2020/055293 US2020055293W WO2021072394A1 WO 2021072394 A1 WO2021072394 A1 WO 2021072394A1 US 2020055293 W US2020055293 W US 2020055293W WO 2021072394 A1 WO2021072394 A1 WO 2021072394A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- target
- interest
- genomic
- grnas
- genomic region
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6876—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes
- C12Q1/6883—Nucleic acid products used in the analysis of nucleic acids, e.g. primers or probes for diseases caused by alterations of genetic material
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6811—Selection methods for production or design of target specific oligonucleotides or binding molecules
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6869—Methods for sequencing
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6813—Hybridisation assays
- C12Q1/6827—Hybridisation assays for detection of mutation or polymorphism
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12N—MICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
- C12N2310/00—Structure or type of the nucleic acid
- C12N2310/10—Type of nucleic acid
- C12N2310/20—Type of nucleic acid involving clustered regularly interspaced short palindromic repeats [CRISPR]
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6806—Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6844—Nucleic acid amplification reactions
- C12Q1/6858—Allele-specific amplification
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q2600/00—Oligonucleotides characterized by their use
- C12Q2600/156—Polymorphic or mutational markers
Definitions
- a genetic abnormality or genomic variation in the genetic makeup of an individual can cause a genetic disease or disorder in the individual.
- the genetic abnormality or genomic variation can range for a discrete mutation in a single base (e.g. single nucleotide variant) to a chromosomal abnormality or structural variant (SV) (e.g. copy number variant, segmental inversions, etc.) comprising the rearrangement, addition or deletion of one or more genes.
- SV structural variant
- sickle cell disease is caused by a single nucleotide mutation in the beta- globin gene
- Fragile X syndrome is caused by tandem duplication of the CGG trinucleotide repeated over 200 times
- Down Syndrome is commonly caused by complete duplication of chromosome 21.
- Short-read sequencing technologies can identify small genomic variations such as single nucleotide variants, insertions and deletions, with high accuracy. However, these technologies are unable to identify structural variants larger than a few hundred base pairs with good accuracy.
- Several methods have emerged to try to detect structural variants; but they all have their limitations. For example, microscopy using fluorescent probes is low-throughput, is quite expensive, and has low resolution.
- Quantitative PCR (qPCR) and microarray assays are high-throughput and inexpensive but cannot identify unknown structural variants.
- Short-read sequencers which are high-throughput and inexpensive, have difficulty resolving SVs and frequently are coupled with another technology, such as optical mapping or linked read sequencing, to identify SVs accurately.
- Whole genome sequencing using long-read sequencers can be used to detect large structural variants; however, whole genome sequencing is expensive, and some long-read sequencers have difficulty resolving very large structural variants. As such, there is an immediate need, especially in the clinical setting, for a fast, high-throughput yet cost- effective method to identify genomic structural variants, in particular, de novo structural variants.
- a method for identifying a set of guide RNAs (gRNAs) that are hybridizable to a genomic region of interest in a genome comprising designing a plurality of gRNAs, wherein at least one gRNA is hybridizable to a target site within the genomic region of interest and is configured to produce a genomic variant that comprises at least 1000 bp; and said plurality of gRNAs comprises a plurality of CRISPR RNAs (crRNAs), wherein said plurality of crRNAs comprises a GC of at least about 20% to about 80%.
- crRNAs CRISPR RNAs
- a method for identifying a set of guide RNAs (gRNAs) that are hybridizable to a genomic region of interest in a genome comprising designing a plurality of gRNAs, wherein at least one gRNA is hybridizable to a target site within the genomic region of interest and is configured to produce a genomic variant that comprises at least 1000 bp; and said plurality of gRNAs comprises a plurality of CRISPR RNAs (crRNAs), wherein said plurality of crRNAs comprises a self-complementarity score of zero.
- crRNAs CRISPR RNAs
- a method for identifying a set of guide RNAs (gRNAs) that are hybridizable to a genomic region of interest in a genome comprising designing a plurality of gRNAs, wherein at least one gRNA is hybridizable to a target site within the genomic region of interest and is configured to produce a genomic variant that comprises at least 1000 bp; and said plurality of gRNAs comprises a plurality of CRISPR RNAs (crRNAs), wherein said plurality of crRNAs comprises an efficiency score of about 0.2.
- crRNAs CRISPR RNAs
- a method for identifying a set of guide RNAs (gRNAs) that are hybridizable to a genomic region of interest in a genome comprising designing a plurality of gRNAs, wherein at least one gRNA is hybridizable to a target site within the genomic region of interest and is configured to produce a genomic variant that comprises at least 1000 bp; and said plurality of gRNAs comprises a plurality of CRISPR RNAs (crRNAs), wherein said plurality of crRNAs comprises a mismatch profile of MM0 ⁇ 2, MM1 ⁇ 3, MM2 ⁇ 3, and MM3 ⁇ 21.
- crRNAs CRISPR RNAs
- the plurality of crRNAs comprises a mismatch profile of MM3 ⁇ 5.
- a method of detecting a genomic variant in a sample comprising enriching said sample for a genomic region of interest comprising said genomic variant using a gene-editing based approach; and sequencing said enriched sample comprising said genomic region of interest using long-read sequencing.
- said genomic variant comprises a structural variant. In some cases, said genomic variant comprises at least 50 bp. In some embodiments, said genomic variant comprises a structural variant. In some cases, said genomic variant comprises at least 1000 bp.
- said gene-editing based approach comprises use of a clustered regularly interspersed short palindromic repeats (CRISPR)-Cas system. In some cases, said CRISPR-Cas system comprises Cas9.
- step (a) of enriching of said sample further comprises amplification of said genomic region of interest. In some embodiments, step (a) of enriching said sample does not require amplification of said genomic region of interest.
- step (a) of enriching of said sample further comprises coupling a sequence of dAMPs to said genomic variant. In some embodiments, step (a) of enriching of said sample further comprises coupling a plurality of barcode molecules to said genomic variant. In some embodiments, step (a) of enriching of said sample further comprises coupling said genomic variant to a magnetic bead.
- said long-read sequencing comprises nanopore sequencing. In some embodiments, said long-read sequencing comprises single molecule, real-time (SMRT) sequencing.
- SMRT real-time
- said CRISPR-Cas system further comprises a crRNA comprising a sequence of Tables 1-117.
- said genomic region of interest comprises two or more repeat regions. In some embodiments, said genomic region of interest comprises a GC content of greater than 30%.
- said sample comprises at least 10 genomic regions of interest.
- said genomic variant is associated with a disorder.
- the disorder is selected from the group consisting of acute lymphoblastic leukemia (ALL), alpha- thalassemia, ataxia-telangiectasia (AT), autosomal recessive deafness 16, autosomal recessive deafness 22, beta-thalassemia, breast cancer, Canavan disease, cancer, celiac disease, chronic myeloid leukemia (CML), cystic fibrosis, cystinosis, deafness infertility syndrome (DIS), Duchenne muscular dystrophy, Ehlers-Danlos syndrome type III and IV, Ellis-van Creveld syndrome, Fabry disease, familial adenomatous polyposis (FAP), familiar cutaneous melanoma, Fragile X, gastric cancer (including hereditary diffuse gastric cancer), Gaucher disease, hereditary predis
- ALL acute lymphoblastic leukemia
- AT ataxia
- the fourth set of candidates comprises a mismatch profile of MM3 ⁇ 5.
- said designing comprises using CHOPCHOP.
- said first set of candidates have a GC content of about 40% to about 80%.
- said nucleic acid probe of interest comprises a crRNA.
- the probability of said crRNA cutting said genomic region of interest is greater than or equal to 80%.
- the method further comprises estimating on-target value of said crRNA. In some embodiments, the method further comprises estimating off-target value of said crRNA.
- kits comprising a set of guide RNAs (gRNAs) that are hybridizable to a genomic region of interest in a genome comprising designing a plurality of gRNAs, wherein at least one gRNA is hybridizable to a target site within the genomic region of interest and is configured to produce a genomic variant that comprises at least 1000 bp; and said plurality of gRNAs comprises a plurality of CRISPR RNAs (crRNAs), wherein said plurality of crRNAs comprises a GC of at least about 40% to about 80%.
- gRNAs guide RNAs
- kits comprising a set of guide RNAs (gRNAs) that are hybridizable to a genomic region of interest in a genome comprising designing a plurality of gRNAs, wherein at least one gRNA is hybridizable to a target site within the genomic region of interest and is configured to produce a genomic variant that comprises at least 1000 bp; and said plurality of gRNAs comprises a plurality of CRISPR RNAs (crRNAs), wherein said plurality of crRNAs comprises a self-complementarity score of zero.
- gRNAs guide RNAs
- kits comprising a set of guide RNAs (gRNAs) that are hybridizable to a genomic region of interest in a genome comprising designing a plurality of gRNAs, wherein at least one gRNA is hybridizable to a target site within the genomic region of interest and is configured to produce a genomic variant that comprises at least 1000 bp; and said plurality of gRNAs comprises a plurality of CRISPR RNAs (crRNAs), wherein said plurality of crRNAs comprises an efficiency score of about 0.2.
- gRNAs guide RNAs
- gRNAs guide RNAs
- FIG. 1 provides exemplary genomic abnormalities and variants.
- FIG. 2 provides an exemplary target enrichment sample preparation approach, in accordance with the embodiments provided herein.
- FIG. 3 provides an exemplary design approach for crRNA probes, in accordance with the embodiments provided herein.
- FIGS. 4 A and 4B provide exemplary coverage of a crRNA probe embodiment, in accordance with the embodiments provided herein.
- FIG. 5 provides an exemplary computer control system that is programmed to implement the methods provided, in accordance with the embodiments provided herein.
- FIG. 6 provides an exemplary design approach for crRNA probes, in accordance with the embodiments provided herein.
- the term “subject,” as used herein, generally refers to an animal, such as a mammal (e.g., human) or avian (e.g., bird), or other organism, such as plant.
- the subject can be a vertebrate, a mammal, a rodent (e.g., a mouse), a primate, a simian, or a human.
- Animals may include, but are not limited to, farm animals, sport animals, and pets.
- a subject may be a healthy or asymptomatic individual, an individual that has or is suspected of having a disease (e.g., a genetic disorder) or a pre-disposition to a disease, and/or an individual that is in need of therapy or suspected of needing therapy.
- a subject can be a patient.
- genomic information generally refers to genomic information from a subject, which may be, for example, at least a portion or an entirety of a subject’s hereditary information.
- a genome can be encoded either in DNA or in RNA.
- a genome can include the sequence of all chromosomes together in an organism.
- the human genome ordinarily has a total of 46 chromosomes. The sequence of all these together may constitute a human genome.
- sequence of nucleotide bases in one or more polynucleotides can be, for example, nucleic acid molecules such as deoxyribonucleic acid (DNA) or ribonucleic acid (RNA), including variants or derivatives thereof (e.g., single stranded DNA).
- Sequencing can be performed by various systems currently available, such as, without limitation, sequencing system by Illumina®, Pacific Biosciences (PacBio®), Oxford Nanopore®, Life Technologies (Ion Torrent®), Roche®, Genapsys®, and MGI Tech®. Sequencing may be performed without using nucleic acid amplification.
- sequencing may be performed using nucleic acid amplification, polymerase chain reaction (PCR) (e.g. digital PCR, quantitative PCR, or real time PCR), or isothermal amplification.
- PCR polymerase chain reaction
- Such systems may provide a plurality of raw genetic data corresponding to the genetic information of a subject (e.g., human), as generated by the systems from a sample provided by the subject.
- sequencing reads also “reads” herein).
- a read may include a string of nucleic acid bases corresponding to a sequence of a nucleic acid molecule that has been sequenced.
- systems and methods provided herein may be used with proteomic information.
- sample generally refers to a biological sample of a subject.
- the biological sample may comprise any number of macromolecules, for example, cellular macromolecules.
- the sample may be a cell sample.
- the sample may be a cell line or cell culture sample.
- the sample can include one or more cells.
- the sample may include one or more microbes.
- the biological sample may be a nucleic acid sample or protein sample.
- the biological sample may also be a carbohydrate sample or a lipid sample.
- the biological sample may be derived from another sample.
- the sample may be a tissue sample, such as a biopsy, core biopsy, needle aspirate, of fine needle aspirate.
- the sample may be a fluid sample, such as blood sample, urine sample, or saliva sample.
- the sample may be a skin sample.
- the sample may be a cheek swab.
- the sample may be a plasma or serum sample.
- the sample may include cells or may be cell-free.
- a cell-free sample may include extracellular polynucleotides. Extracellular polynucleotides may be isolated from a bodily sample that may be selected from the group consisting of blood, plasma, serum, urine, saliva, mucosal excretions, sputum, stool and tears.
- the term “short read,” as used herein, generally refers to a read length of a DNA or RNA polynucleotide of about 100 to about 600 bp.
- long read generally refers to a read length of a DNA or RNA polynucleotide of greater than 1 Kbp.
- ribonucleoprotein As used herein is a ribonucleoprotein is a ribonucleic acid (RNA)-protein complex.
- CRISPR-Cas system generally refers the clustered regularly short palindromic repeats (CRISPR system) which comprises an array of two types of DNA sequences: (i) repetitive, flanking DNA sequences; and (ii) spacer sequences that are endogenously derived from a virus, and can be used to target DNA or RNA sequences for cleaving using the CRISPR-associated (Cas) enzyme (ribonucleoprotein) complex that are used to cleave the CRISPR sites that are complementary to those in spacer regions.
- Cas CRISPR-associated enzyme
- barcoding is the ligation of known, unique sequences to target DNA molecules, between the adapter and the ROI in order for the target sequence recognition in the downstream analysis, i.e. post-base calling.
- multiplexing is the running of multiple samples in a single flow cell, identifying each sample’s DNA molecules through unique ‘barcode’ molecules that have been attached to the DNA ends.
- the decoded sequences of a sample’s DNA will be identified downstream once the sequences have been basecalled.
- crRNA are the RNA sequences that recognize the target site. Together with the tracrRNA, this forms a single guide RNA (sgRNA) and when several are used together, gRNA.
- sgRNA single guide RNA
- tracrRNA refers to trans-activating-crRNA specific to Type II Cas/CRISPR system. It is used to process the pre-crRNA along with an RNase III. The tracrRNA provides structural support to the ribonucleoprotein and anneals to the pre-crRNA for processing via the internal endonuclease activity of the Cas protein.
- Non-limiting examples of Cas enzymes can include Casl, CaslB, Cas2, Cas3, Cas4, Cas5, Cas6, Cas7, Cas8, Cas9 (also known as Csnl or Csxl2), CaslO, Csyl , Csy2, Csy3, Csel, Cse2, Cscl, Csc2, Csa5, Csn2, Csm2, Csm3, Csm4, Csm5, Csm6, Cmrl, Cmr3, Cmr4, Cmr5, Cmr6, Csbl, Csb2, Csb3, Csxl7, Csxl4, CsxlO,
- a catalytically dead Cas protein can be used, for example a dCas9.
- An unmodified CRISPR enzyme can have DNA cleavage activity, such as Cas9.
- protospacer refers to a sequence acquired from a pathogenic organism’s DNA molecule. The sequence is converted into DNA and forms the gene of the crRNA, which, along with the PAM in the substrate sequence, directs the Cas-crRNA-tracrRNA ternary complex to cleave target molecule.
- PAM protospacer adjacent motif
- UTR untranslated region
- FIG. 1 shows examples of genomic variations.
- a single nucleotide variant is a substitution of a single nucleotide at a specific position in the genome.
- a deletion is a loss of one or more nucleotides in the genome, ranging from a single base to an entire chromosome.
- an insertion is the addition of one or more nucleotides to the genome.
- a tandem repeat consists of two or more adjacent copies of a sequence of at least two nucleotides in length.
- a tandem duplication occurs when a nucleotide sequence, which itself can contain a repeated sequence, is copied into two adjacent copies.
- Interspersed duplication differs from tandem duplication or repeat in that the repeated sequence is dispersed throughout the genome and is nonadjacent to the original copy.
- Inversion is a chromosome rearrangement in which a segment of a gene, structural element or chromosome is reversed end to end.
- Translocation is the unusual rearrangement of chromosomes.
- Copy number variants is a type of structural repetition in which one or more parts of the genome are repeated.
- the types of genomic variants can be categorized based on the number of nucleotides involved.
- Single nucleotide variants affect a single nucleotide or base pair.
- Small insertions and deletions commonly called indels, are shorter than 50 nucleotides in length.
- Structural variants are changes in the structure of chromosome and generally affect 50 or more nucleotides.
- the typical human genome has about 8 million bases that differ from a reference due to SNVs and indels.
- the typical human genome has about 20,000 structural variants that differ from the reference and affects about 10 million bases.
- systems and methods to detect one or more genomic variants in a sample comprising (a) preparing a sample for sequencing using a non- amplification-based, gene-editing based approach, and (b) long-read sequencing, as described herein elsewhere.
- kits and methods for conducting a diagnostic assay for a genetic disorder comprising (a) preparing a sample for sequencing using a non- amplification-based, gene-editing based approach, and (b) long-read sequencing, as described herein elsewhere.
- the one or more genomic variants comprise one or more structural variants. In some cases, the one or more genomic variants comprise at least one structural variant. In some cases, the structural variant is about 30 bp to about 1,000 bp. In some cases, the structural variant is about 30 bp to about 50 bp, about 30 bp to about 100 bp, about 30 bp to about 500 bp, about 30 bp to about 750 bp, about 30 bp to about 1,000 bp, about 50 bp to about 100 bp, about 50 bp to about 500 bp, about 50 bp to about 750 bp, about 50 bp to about 1,000 bp, about 100 bp to about 500 bp, about 100 bp to about 750 bp, about 100 bp to about 1,000 bp, about 500 bp to about 750 bp, about 100 bp to about 1,000 bp, about 500 bp to about 750 bp, about 100 bp to about 1,000
- the structural variant is about 30 bp, about 50 bp, about 100 bp, about 500 bp, about 750 bp, or about 1,000 bp. In some cases, the structural variant is at least about 30 bp, about 50 bp, about 100 bp, about 500 bp, or about 750 bp. In some cases, the structural variant is at most about 50 bp, about 100 bp, about 500 bp, about 750 bp, or about 1,000 bp. In some cases, the structural variant is about 1 Kbp to about 1,000 Kbp.
- the structural variant is about 1 Kbp to about 50 Kbp, about 1 Kbp to about 100 Kbp, about 1 Kbp to about 250 Kbp, about 1 Kbp to about 500 Kbp, about 1 Kbp to about 750 Kbp, about 1 Kbp to about 1,000 Kbp, about 50 Kbp to about 100 Kbp, about 50 Kbp to about 250 Kbp, about 50 Kbp to about 500 Kbp, about 50 Kbp to about 750 Kbp, about 50 Kbp to about 1,000 Kbp, about 100 Kbp to about 250 Kbp, about 100 Kbp to about 500 Kbp, about 100 Kbp to about 750 Kbp, about 100 Kbp to about 1,000 Kbp, about 250 Kbp to about 500 Kbp, about 250 Kbp to about 750 Kbp, about 250 Kbp to about 1,000 Kbp, about 500 Kbp to about 750 Kbp, about 500 Kbp to about 1,000 Kbp, or about
- the structural variant is about 1 Kbp, about 50 Kbp, about 100 Kbp, about 250 Kbp, about 500 Kbp, about 750 Kbp, or about 1,000 Kbp. In some cases, the structural variant is at least about 1 Kbp, about 50 Kbp, about 100 Kbp, about 250 Kbp, about 500 Kbp, or about 750 Kbp. In some cases, the structural variant is at most about 50 Kbp, about 100 Kbp, about 250 Kbp, about 500 Kbp, about 750 Kbp, or about 1,000 Kbp. In some cases, the structural variant is about 1 Mbp to about 10 Mbp. In some cases, the structural variant is at least about 1 Mbp.
- the structural variant is at most about 10 Mbp. In some cases, the structural variant is about 1 Mbp to about 2 Mbp, about 1 Mbp to about 3 Mbp, about 1 Mbp to about 4 Mbp, about 1 Mbp to about 5 Mbp, about 1 Mbp to about 6 Mbp, about 1 Mbp to about 7 Mbp, about 1 Mbp to about 8 Mbp, about 1 Mbp to about 9 Mbp, about 1 Mbp to about 10 Mbp, about 2 Mbp to about 3 Mbp, about 2 Mbp to about 4 Mbp, about 2 Mbp to about 5 Mbp, about 2 Mbp to about 6 Mbp, about 2 Mbp to about 7 Mbp, about 2 Mbp to about 8 Mbp, about 2 Mbp to about 9 Mbp, about 2 Mbp to about 10 Mbp, about 3 Mbp to about 4 Mbp, about 3 Mbp to about 5 Mbp, about 1 Mbp to
- the structural variant is about 1 Mbp, about 2 Mbp, about 3 Mbp, about 4 Mbp, about 5 Mbp, about 6 Mbp, about 7 Mbp, about 8 Mbp, about 9 Mbp, or about 10 Mbp.
- the one or more target genomic variants may comprise one or more structural variants. In some cases, the one or more target genomic variants may comprise at least one structural variant. In some cases, the sample comprises about 1 target genomic variant to about 100 target genomic variants.
- the sample comprises RNA transcripts.
- the sample comprises genomic DNA (gDNA).
- the sample comprises gDNA and RNA transcripts.
- the sample comprises one or more target genomic variants.
- the sample comprises about 1 target genomic variant to about 2 target genomic variants, about 1 target genomic variant to about 4 target genomic variants, about 1 target genomic variant to about 6 target genomic variants, about 1 target genomic variant to about 8 target genomic variants, about 1 target genomic variant to about 10 target genomic variants, about 1 target genomic variant to about 20 target genomic variants, about 1 target genomic variant to about 30 target genomic variants, about 1 target genomic variant to about 40 target genomic variants, about 1 target genomic variant to about 50 target genomic variants, about 1 target genomic variant to about 75 target genomic variants, about 1 target genomic variant to about 100 target genomic variants, about 2 target genomic variants to about 4 target genomic variants, about 2 target genomic variants to about 6 target genomic variants, about 2 target genomic variants to about 8 target genomic variants, about 2 target genomic variants to about 10 target genomic variants, about 2 target genomic variants to about 20 target genomic variants, about 2 target genomic variants to about 30 target genomic variants, about
- the sample comprises about 1 target genomic variant, about 2 target genomic variants, about 4 target genomic variants, about 6 target genomic variants, about 8 target genomic variants, about 10 target genomic variants, about 20 target genomic variants, about 30 target genomic variants, about 40 target genomic variants, about 50 target genomic variants, about 75 target genomic variants, or about 100 target genomic variants.
- the sample comprises at least about 1 target genomic variant, about 2 target genomic variants, about 4 target genomic variants, about 6 target genomic variants, about 8 target genomic variants, about 10 target genomic variants, about 20 target genomic variants, about 30 target genomic variants, about 40 target genomic variants, about 50 target genomic variants, or about 75 target genomic variants.
- the sample comprises at most about 2 target genomic variants, about 4 target genomic variants, about 6 target genomic variants, about 8 target genomic variants, about 10 target genomic variants, about 20 target genomic variants, about 30 target genomic variants, about 40 target genomic variants, about 50 target genomic variants, about 75 target genomic variants, or about 100 target genomic variants.
- the target enrichment sample preparation approach describe herein may comprise one or more genome editing technologies.
- the genome editing technology is an endonuclease-based genome editing technology.
- the endonuclease-based genome editing technology comprises zinc-finger nucleases (ZFNs), homing nucleases, transcription activator-like effector nucleases (TALENs), and/or clustered regularly interspersed short palindromic repeats (CRISPR)-Cas systems.
- ZFNs zinc-finger nucleases
- TALENs transcription activator-like effector nucleases
- CRISPR clustered regularly interspersed short palindromic repeats
- the target enrichment sample preparation approach may further comprise DNA amplification.
- the target enrichment sample preparation approach may not comprise DNA amplification.
- the target enrichment sample preparation approach comprises preparing a sample for sequencing using a non-amplification-based, gene-editing based approach.
- the sample preparation comprises Cas-mediated PCR-free enrichment of said sample as shown in FIG. 2.
- Cas-mediated PCR-free enrichment of said sample may comprise extracting genomic DNA (gDNA) from said sample; dephosphorylating 5’ ends of the DNA to reduce ligation of sequencing adapters to non-target strands; adding Cas9 ribonucleoproteins (RNPs) comprising bound crRNA and tracrRNA to the gDNA to bind and cleave the region of interest (ROI); cleaving of gDNA by Cas9 to reveal blunt ends with ligatable 5’ phosphates; dA-tailing of gDNA in said sample to prepare blunt ends for sequencing adapter ligation; and ligating sequencing adapters to the Cas9 cut sides, wherein the Cas9 cut sides are 3’dA-tailed and 5’phosphorylated.
- RNPs Cas9 ribonucleoproteins
- a two RNP (ribonucleoprotein complex comprising Cas9-crRNA- tracrRNA) complexes designed to excise a ROI, bind to sequences on the (+) and (-) strands, upstream and downstream of the ROI, respectively.
- the crRNAs confer specificity and ‘program’ the RNPs to bind to the specific sequences. Background DNA has been dephosphorylated (i.e. carries 5’-hydroxyl groups).
- the duplex DNA is locally melted.
- crRNA hybridizes to the non-target DNA strand, which is complementary to the crRNA. Cas9 cleaves both of the DNA strands within the target site, 3 bp upstream of the PAM.
- an alternative to Cas9 may be used in the CRISPR-Cas system, wherein the alternative to Cas9 may be Cas3, Cas4, Cas5, Cas8a, Cas8b, Cas8c, CaslO, CaslOd, Casl3a, Casl3b, Casl3c, Csel, Cse2, Csyl, Csy2, Csy3, Csm2, Cmr5, CsxlO, CsxlO, Csfl, Csn2, Cpfl, C2cl, or C2c3.
- the alternative to Cas9 may be Cas3, Cas4, Cas5, Cas8a, Cas8b, Cas8c, CaslO, CaslOd, Casl3a, Casl3b, Casl3c, Csel, Cse2, Csyl, Csy2, Csy3, Csm2, Cmr5, CsxlO, CsxlO, Cs
- the target enrichment sample preparation comprises preparing a sample for sequencing using the PacBio® sequencing system.
- genomic DNA gDNA
- Cas-mediated PCR-free enrichment as described herein.
- SMRTbell® adapters are ligated to the blunt template ends, forming SMRTbell® templates.
- unligated DNA is eliminated by exonuclease digestion and then prepared for sequencing by annealing to the Sequencing Primers and binding to the polymerase.
- the target enrichment sample preparation comprises preparing a sample for sequencing using Illumina® sequencing system.
- gDNA is dephosphorylated and then filled in using biotinylated nucleotides.
- the gDNA is then subjected to Cas-mediated PCR-free enrichment as described herein.
- non target gDNA is removed using streptavidin beads.
- the target gDNA is then fragmented to the appropriate size, end-repaired, and dA-tailed.
- Illumina® adapters are ligated to the end-repaired, dA-tailed target gDNA, and is then ready for sequencing.
- preliminary crRNA probes are designed using available guide RNA (gRNA) tools.
- gRNA design tools include CHOPCHOP program, based on ONT recommended design options, and Broad Institute sgRNA Designer.
- the preliminary crRNA probes are designed from Benchling probe design tool and/or CRISPOR probe design tool.
- the preliminary crRNA probes are filtered using one or more approaches as shown in Fig. 3 and Fig. 6.
- One filter approach is to retain preliminary crRNA probes with a GC content between about 40% and about 80%. If no candidates are obtained, the lower limit of the range is lowered to a GC content between about 20% and about 80%.
- Another filter approach is to retain preliminary crRNA probes with a self-complementarity score of zero. If no candidates are obtained, the self-complementarity score is increased to 1.
- Another filter approach is to retain preliminary crRNA probes with an efficiency score greater than 0.3. If no candidates are obtained, the efficiency score is lowered to greater than 0.2.
- the stringency of the mismatches is decreased in the following order: MMO ⁇ 1, MMl ⁇ 2, MM2 ⁇ 2 and MM3 ⁇ 21, until candidates are produced.
- candidates are further filtered by retaining candidates without any single nucleotide polymorphisms (SNPs).
- ambiguous bases are introduced at any position to increase on-target performance.
- RNA check tools include IDT CRISPR-Cas9 gRNA checker, Cas-OFFinder, Dharmacon’s CRISPR specificity analysis tool, Synthego’s CRISPR specificity analysis tool, or a combination thereof.
- Candidate crRNA probes obtained using the methods provided herein are more likely to cut the target genomic region of interest than crRNA probes obtained using other methods.
- the probability that a candidate crRNA probe will cut a target is about 60 % to about 99.9 %. In some cases, the probability that a candidate crRNA probe will cut a target is at least about 60 %. In some cases, the probability that a candidate crRNA probe will cut a target is at most about 99.9 %.
- the probability that a candidate crRNA probe will cut a target is about 60 % to about 65 %, about 60 % to about 70 %, about 60 % to about 75 %, about 60 % to about 80 %, about 60 % to about 85 %, about 60 % to about 90 %, about 60 % to about 95 %, about 60 % to about 99.9 %, about 65 % to about 70 %, about 65 % to about 75 %, about 65 % to about 80 %, about 65 % to about 85 %, about 65 % to about 90 %, about 65 % to about 95 %, about 65 % to about 99.9 %, about 70 % to about 75 %, about 70 % to about 80 %, about 70 % to about 85 %, about 70 % to about 90 %, about 70 % to about 95 %, about 70 % to about 99.9 %, about 75 %, about 70 % to about 80 %, about 70 % to about 85
- the probability that a candidate crRNA probe will cut a target is about 60 %, about 65 %, about 70 %, about 75 %, about 80 %, about 85 %, about 90 %, about 95 %, or about 99.9 %.
- a guide RNA can target a nucleic acid sequence of or of about 20 nucleotides.
- a target nucleic acid can be less than or less than about 20 nucleotides.
- a target nucleic acid can be at least or at least about 5, 10, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30 or more nucleotides.
- a target nucleic acid can be at most or at most about 5, 10, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 30 or more nucleotides in length.
- a target nucleic acid sequence can be or can be about 20 bases immediately 5’ of the first nucleotide of the PAM.
- a guide RNA can target the nucleic acid sequence.
- a guiding polynucleic acid, such as a guide RNA can bind to a genomic sequence with at least or at least about 50%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 98%, or up to about 100% sequence identity and/or sequence similarity to any of the sequences of the tables below.
- Table 16 Target sequences for CDH1 gene
- Table 17 Target sequences for CDK4 gene
- Table 27 Target sequences for CRX gene
- Table 28 Target sequences for CTNS gene
- Table 37 Target sequences for EPCAM gene [00107]
- Table 38 Target sequences for ERG gene
- Table 59 Target sequences for HBB gene [00129] Table 60: Target sequences for HEXA gene
- Table 62 Target sequences for HLA-C gene
- Table 63 Target sequences for HTT gene
- Table 70 Target sequences for KRAS gene
- Table 90 Target sequences for PKD1 gene
- Table 104 Target sequences for SHOX gene [00174] Table 105: Target sequences for SLC6A4 gene
- the highly fragmented gDNA samples can be sequenced to detect genomic variations.
- short-read sequencing is used.
- long- read sequencing is used.
- the sample contains high fragmented RNA samples.
- the sample contains full-length RNA transcripts.
- the long-read sequencing platform may be single molecule real time sequencing (SMRT) (e.g. Pacific Biosciences long-read sequencing technology), or a variation thereof.
- Single-molecule real-time sequencing (SMRT) is a parallelized single molecule DNA sequencing method.
- Single-molecule real-time sequencing utilizes a zero-mode waveguide (ZMW).
- ZMW zero-mode waveguide
- a single DNA polymerase enzyme is affixed at the bottom of a ZMW with a single molecule of DNA as a template.
- the ZMW is a structure that creates an illuminated observation volume that is small enough to observe only a single nucleotide of DNA being incorporated by DNA polymerase.
- Each of the four DNA bases is attached to one of four different fluorescent dyes.
- the long-read sequencing platform may be nanopore sequencing (e.g. Oxford Nanopore long-read sequencing technology), or a variation thereof. Nanopore sequencing uses electrophoresis to transport an unknown sample through an orifice of about KG 9 meters in diameter. A nanopore system can contains an electrolytic solution; when a constant electric field is applied, an electric current can be observed in the system.
- nanopore sequencing uses electrophoresis to transport an unknown sample through an orifice of about KG 9 meters in diameter.
- a nanopore system can contains an electrolytic solution; when a constant electric field is applied, an electric current can be observed in the system.
- the magnitude of the electric current density across a nanopore surface depends on the nanopore's dimensions and the composition of DNA or RNA molecule that is occupying the nanopore. Sequencing is made possible because, while traversing through the nanopore, samples cause characteristic changes in electric current density across nanopore surfaces.
- the total charge flowing through a nanopore channel is equal to the surface integral of electric current density flux across the nanopore unit normal surfaces between times ti and h.
- long-read sequencing requires application of the sample. In other cases, long-read sequencing does not require application of the sample.
- the systems and methods described herein can be used in clinical settings to detect and diagnose genetic diseases or disorders.
- the systems and methods described herein can be used in the detection, treatment and/or monitoring of hereditary breast cancer-related disorders by detecting genetic variations in relevant genes such as BRCA1, BRCA2, MLH1, MSH2, and STK11.
- the systems and methods described herein can be used in the detection, treatment and/or monitoring of hereditary colon cancer-related disorders by detecting genetic variations in relevant genes such as MLH1, MSH2, EPCAM, SMAD4, and STK11.
- the systems and methods described herein can be used can be used in the detection, treatment and/or monitoring of hereditary neuroendocrine tumor disorders by detecting genetic variations in relevant genes such as SDHB, SHDC, SDHD, and VHL.
- the systems and methods described herein can be used can be used in the detection, treatment and/or monitoring of Cowden Syndrome by detecting genetic variations in relevant genes such as PTEN.
- the systems and methods described herein can be used in the detection, treatment and/or monitoring of neuromuscular disorders such as Duchenne Muscular Dystrophy and Spinal Muscular Atrophy by detecting genetic variations in relevant genes such as DMD, SMN1, and SMN2.
- the systems and methods described herein can be used can be used in the detection, treatment and/or monitoring of Fragile X Syndrome by detecting genetic variations in relevant genes such as FMR1.
- the systems and methods described herein can be used in the detection, treatment and/or monitoring of cardiovascular disorders such as aortic dysfunction and dilation, and cardiac ion channelopathies, by detecting genetic variations in relevant genes such as TGFBR1, TFRBR2, MYH11, COL3A1, KCNH2 and KCNQ1.
- the systems and methods described herein can be used in the detection, treatment and/or monitoring of movement disorders such as Parkinson Disease, Hereditary Ataxia, and Dystonia 5, by detecting genetic variations in relevant genes such as SCNA, PARK2, PARK7, PINK1, SCA1 (ATXN1), SCA10 (ATXN10), SCA17 (TBP), SCA2 (ATXN2), SCA3 (MJD/ATXN3), SCA6 (CACNA1A), SCA7 (ATXN7), SCA8 (ATXN80S) and GCH1.
- relevant genes such as SCNA, PARK2, PARK7, PINK1, SCA1 (ATXN1), SCA10 (ATXN10), SCA17 (TBP), SCA2 (ATXN2), SCA3 (MJD/ATXN3), SCA6 (CACNA1A), SCA7 (ATXN7), SCA8 (ATXN80S) and GCH1.
- the systems and methods described herein can be used in the detection, treatment and/or monitoring of renal disorders (e.g. Alport Syndrome and Polycystic Kidney Disease) by detecting genetic variations in relevant genes such as COL4A5, PKD1 and PKD2.
- renal disorders e.g. Alport Syndrome and Polycystic Kidney Disease
- the systems and methods described herein can be used in the detection, treatment and/or monitoring of adrenal disorders (e.g. Congenital Adrenal Hyperplasia) by detecting genetic variations in relevant genes such as CYP21 A2.
- adrenal disorders e.g. Congenital Adrenal Hyperplasia
- the systems and methods described herein can be used in the detection, treatment and/or monitoring of neurodevelopmental disorders (e.g. Rett Syndrome) by detecting genetic variations in relevant genes such as FOXG1, and MeCP2.
- the systems and methods described herein can be used in the detection, treatment and/or monitoring of cerebrovascular disorders (e.g. Cerebral Cavernous Malformations) by detecting genetic variations in relevant genes such as KRIT1 and PDCD10.
- the systems and methods described herein can be used in the detection, treatment and/or monitoring of neuro-oncology (e.g. Neurofibromatosis Type 1 and Neurofibromatosis Type 2) by detecting genetic variations in relevant genes such as NF1 and NF2.
- neuro-oncology e.g. Neurofibromatosis Type 1 and Neurofibromatosis Type 2
- NF1 and NF2 neurofibromatosis Type 2
- the systems and methods described herein can be used in the detection, treatment and/or monitoring of epilepsy (e.g. Unverricht-Lundborg disease) by detecting genetic variations in relevant genes such as CSTB.
- epilepsy e.g. Unverricht-Lundborg disease
- the systems and methods described herein can be used can be used in the detection, treatment and/or monitoring of peripheral neuropathy by detecting genetic variations in relevant genes such as GJB1 and PMP22.
- a sample can be analyzed using short-read sequencing to detect SNVs and indels, and long-read sequencing to detect SVs.
- kits Is described herein.
- the kit may comprise a plurality of crRNA probes disclosed herein. Further, the kit may comprise a plurality of tracerRNA molecules.
- the kit may comprise reagents that can be used to performing dA tailing and adapter ligation. Moreover, the kit may comprise any buffer that can be used in performing needed experiments.
- the kit may comprise instructions for performing any experiments and procedures described herein.
- FIG. 5 shows an example computer system 501 that can be programmed or otherwise configured to, for example, process and/or analyze a metabolite, control addition of reagents to reaction mixtures, control partition generation, control of reagent addition to partitions, provide conditions sufficient to conduct reactions, obtain and process sequencing data, output sequencing results to a user, provide an interface for user input to control devices coupled to the computer processor.
- the computer system 501 can regulate various aspects of the present disclosure, such as, for example, regulating fluid flow, delivery of reagents, partition generation, modulate reactions conditions, etc.
- the computer system 501 can be an electronic device of a user or a computer system that is remotely located with respect to the electronic device.
- the electronic device can be a mobile electronic device.
- the computer system 501 includes a central processing unit (CPU, also “processor” and “computer processor” herein) 505, which can be a single core or multi core processor, or a plurality of processors for parallel processing.
- the computer system 501 also includes memory or memory location 510 (e.g., random-access memory, read-only memory, flash memory), electronic storage unit 515 (e.g., hard disk), communication interface 520 (e.g., network adapter) for communicating with one or more other systems, and peripheral devices 525, such as cache, other memory, data storage and/or electronic display adapters.
- the memory 510, storage unit 515, interface 520 and peripheral devices 525 are in communication with the CPU 505 through a communication bus (solid lines), such as a motherboard.
- the storage unit 515 can be a data storage unit (or data repository) for storing data.
- the computer system 501 can be operatively coupled to a computer network (“network”) 530 with the aid of the communication interface 520.
- the network 530 can be the Internet, an internet and/or extranet, or an intranet and/or extranet that is in communication with the Internet.
- the network 530 in some cases is a telecommunication and/or data network.
- the network 530 can include one or more computer servers, which can enable distributed computing, such as cloud computing.
- the network 530 in some cases with the aid of the computer system 501, can implement a peer-to- peer network, which may enable devices coupled to the computer system 501 to behave as a client or a server.
- the CPU 505 can execute a sequence of machine-readable instructions, which can be embodied in a program or software.
- the instructions may be stored in a memory location, such as the memory 510.
- the instructions can be directed to the CPU 505, which can subsequently program or otherwise configure the CPU 505 to implement methods of the present disclosure. Examples of operations performed by the CPU 505 can include fetch, decode, execute, and writeback.
- the CPU 505 can be part of a circuit, such as an integrated circuit.
- a circuit such as an integrated circuit.
- One or more other components of the system 501 can be included in the circuit.
- the circuit is an application specific integrated circuit (ASIC).
- the storage unit 515 can store files, such as drivers, libraries and saved programs.
- the storage unit 515 can store user data, e.g., user preferences and user programs.
- the computer system 501 in some cases can include one or more additional data storage units that are external to the computer system 501, such as located on a remote server that is in communication with the computer system 501 through an intranet or the Internet.
- the computer system 501 can communicate with one or more remote computer systems through the network 530.
- the computer system 501 can communicate with a remote computer system of a user (e.g., operator).
- remote computer systems include personal computers (e.g., portable PC), slate or tablet PC’s (e.g., Apple® iPad,
- the user can access the computer system 501 via the network 530.
- Methods as described herein can be implemented by way of machine (e.g., computer processor) executable code stored on an electronic storage location of the computer system 501, such as, for example, on the memory 510 or electronic storage unit 515.
- the machine executable or machine readable code can be provided in the form of software.
- the code can be executed by the processor 505.
- the code can be retrieved from the storage unit 515 and stored on the memory 510 for ready access by the processor 505.
- the electronic storage unit 515 can be precluded, and machine-executable instructions are stored on memory 510.
- the code can be pre-compiled and configured for use with a machine having a processor adapted to execute the code, or can be compiled during runtime.
- the code can be supplied in a programming language that can be selected to enable the code to execute in a pre compiled or as-compiled fashion.
- aspects of the systems and methods provided herein can be embodied in programming.
- Various aspects of the technology may be thought of as “products” or “articles of manufacture” typically in the form of machine (or processor) executable code and/or associated data that is carried on or embodied in a type of machine readable medium.
- Machine-executable code can be stored on an electronic storage unit, such as memory (e.g., read-only memory, random-access memory, flash memory) or a hard disk.
- “Storage” type media can include any or all of the tangible memory of the computers, processors or the like, or associated modules thereof, such as various semiconductor memories, tape drives, disk drives and the like, which may provide non-transitory storage at any time for the software programming. All or portions of the software may at times be communicated through the Internet or various other telecommunication networks. Such communications, for example, may enable loading of the software from one computer or processor into another, for example, from a management server or host computer into the computer platform of an application server.
- another type of media that may bear the software elements includes optical, electrical and electromagnetic waves, such as used across physical interfaces between local devices, through wired and optical landline networks and over various air-links.
- Non-volatile storage media include, for example, optical or magnetic disks, such as any of the storage devices in any computer(s) or the like, such as may be used to implement the databases, etc. shown in the drawings.
- Volatile storage media include dynamic memory, such as main memory of such a computer platform.
- Tangible transmission media include coaxial cables; copper wire and fiber optics, including the wires that comprise a bus within a computer system.
- Carrier-wave transmission media may take the form of electric or electromagnetic signals, or acoustic or light waves such as those generated during radio frequency (RF) and infrared (IR) data communications.
- Computer-readable media therefore include for example: a floppy disk, a flexible disk, hard disk, magnetic tape, any other magnetic medium, a CD-ROM, DVD or DVD-ROM, any other optical medium, punch cards paper tape, any other physical storage medium with patterns of holes, a RAM, a ROM, a PROM and EPROM, a FLASH-EPROM, any other memory chip or cartridge, a carrier wave transporting data or instructions, cables or links transporting such a carrier wave, or any other medium from which a computer may read programming code and/or data. Many of these forms of computer readable media may be involved in carrying one or more sequences of one or more instructions to a processor for execution.
- the computer system 501 can include or be in communication with an electronic display 535 that comprises a user interface (UI) 540 for providing, for example, monitoring of sample preparation, monitoring of reactions and/or reaction conditions, monitoring of sequencing, results of sequencing, and permitting user inputs for sample preparation, reactions, sequencing and/or sequencing analysis.
- UIs include, without limitation, a graphical user interface (GUI) and web-based user interface.
- Methods and systems of the present disclosure can be implemented by way of one or more algorithms.
- An algorithm can be implemented by way of software upon execution by the central processing unit 505.
- the algorithm can, for example, implement sample preparation protocols, reaction protocols, sequencing protocols, data analysis protocols and system or device operation protocols.
- Devices, systems, compositions and methods of the present disclosure may be used for various applications, such as, for example, processing a single analyte (e.g., RNA,
- DNA, or protein or multiple analytes (e.g., DNA and RNA, DNA and protein, RNA and protein, or RNA, DNA and protein) from a cell.
- a biological particle or analyte carrier e.g., a cell or cell bead
- a partition e.g., droplet
- multiple analytes from the biological particle or analyte carrier are processed for subsequent processing.
- the multiple analytes may be from the cell. This may enable, for example, simultaneous proteomic, transcriptomic and genomic analysis of the cell.
- An exemplary target enrichment protocol begins with preparing the Cas9 ribonucleoprotein complexes (RNPs). Prior to guide RNA assembly, all crRNAs are pooled into an equimolar mix, with a total concentration of 50-100 mM. The crRNA mix and tracrRNA are then combined such that the tracrRNA concentration and the total crRNA concentration are both 5-10 mM. The gRNA duplexes are formed by denaturation at 95 °C and then cooling to room temperature. Ribonucleoprotein complexes (RNPs) are constructed by combining the gRNA duplexes with Cas9 nucleases and then incubating at room temperature.
- RNPs Ribonucleoprotein complexes
- the next stage comprises dephosphorylating the genomic DNA. Between one to four genomic DNA samples can be pooled into the dephosphorylation reaction, for a total of 1-5 pg of gDNA in each phosphorylation reaction.
- the input DNA is dephosphorylated using Calf Intestinal Phosphatase or Shrimp Alkaline Phosphatase.
- the next stage comprises cleaving and dA-tailing target DNA.
- RNPs are added to the dephosphorylated gDNA along with dATP and Taq DNA polymerase.
- the sample is then incubated at 37 °C for Cas9 cleavage followed by 72 °C for dA-tailing.
- the reaction is then cleaned up using SPRI beads.
- barcode ligation Barcodes are ligated to the dA-tailed ends of the gDNA using ligase.
- the reaction is incubated at room temperature and then cleaned up using SPRI beads.
- Next stage is sequencing adapter ligation and clean-up. All the barcoded DNA are pooled together at an equimolar amount. Sequencing adapters are ligated to the pool of barcoded DNA using ligase. The DNA is then cleaned up using SPRI beads, and then eluted in elution buffer.
- the next stage is priming and loading the Flow Cell. Libraries were prepared for sequencing by adding the following to the eluate: Sequencing Buffer, Loading Beads, and Flush Tether. The sequencing libraries are then loaded onto the flow cell for sequencing.
- Example 2 BRCA1 crRNA probe design
- the CHOPCHOP design program yielded a total of 5567 possible crRNA probes along the entire length of the BRCA1 genomic locus. These crRNA sequences were then filtered using the filtering scheme described in [0041], reducing the number to 233 crRNA probes. The crRNA sequences were then checked using a second design checker tool, e.g. IDT CRISPR-Cas9 guide RNA design checker tool. The number of candidate crRNA probes was reduced to 86 probes. The final set of crRNA probes was chosen based upon the location of the target sites.
- a second design checker tool e.g. IDT CRISPR-Cas9 guide RNA design checker tool. The number of candidate crRNA probes was reduced to 86 probes. The final set of crRNA probes was chosen based upon the location of the target sites.
Landscapes
- Chemical & Material Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Organic Chemistry (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Zoology (AREA)
- Wood Science & Technology (AREA)
- Analytical Chemistry (AREA)
- Genetics & Genomics (AREA)
- Microbiology (AREA)
- Molecular Biology (AREA)
- Physics & Mathematics (AREA)
- Immunology (AREA)
- Biotechnology (AREA)
- Biochemistry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Pathology (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Priority Applications (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| JP2022506030A JP2022551551A (ja) | 2019-10-11 | 2020-10-12 | ロングリードシークエンシングを用いたゲノム構造バリアントの同定 |
| US17/774,345 US20230028445A1 (en) | 2019-10-11 | 2020-10-12 | Identification of genomic structural variants using long-read sequencing |
| CN202080071378.4A CN114555824A (zh) | 2019-10-11 | 2020-10-12 | 使用长读测序鉴定基因组结构变体 |
| EP20874075.3A EP4041916A4 (en) | 2019-10-11 | 2020-10-12 | IDENTIFICATION OF GENOMIC STRUCTURAL VARIANTS USING LONG READ SEQUENCING |
Applications Claiming Priority (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US201962913886P | 2019-10-11 | 2019-10-11 | |
| US62/913,886 | 2019-10-11 | ||
| US202062981146P | 2020-02-25 | 2020-02-25 | |
| US62/981,146 | 2020-02-25 |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2021072394A1 true WO2021072394A1 (en) | 2021-04-15 |
Family
ID=75436767
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2020/055293 Ceased WO2021072394A1 (en) | 2019-10-11 | 2020-10-12 | Identification of genomic structural variants using long-read sequencing |
Country Status (5)
| Country | Link |
|---|---|
| US (1) | US20230028445A1 (https=) |
| EP (1) | EP4041916A4 (https=) |
| JP (1) | JP2022551551A (https=) |
| CN (1) | CN114555824A (https=) |
| WO (1) | WO2021072394A1 (https=) |
Cited By (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114752668A (zh) * | 2022-05-13 | 2022-07-15 | 深圳市优圣康生物科技有限公司 | Crispr、cas9靶向捕获长片段dna的贫血筛查试剂盒及其方法 |
| WO2025049355A1 (en) * | 2023-08-26 | 2025-03-06 | Duke University | Compositions comprising nanoparticles for targeted delivery and methods of using the same |
| WO2025235828A3 (en) * | 2024-05-08 | 2026-01-15 | The General Hospital Corporation | Engineered prime editors for treating genetic deafness |
Families Citing this family (2)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN115232866A (zh) * | 2022-08-08 | 2022-10-25 | 南方医科大学皮肤病医院(广东省皮肤病医院、广东省皮肤性病防治中心、中国麻风防治研究中心) | 一种基于纳米孔测序靶向富集细菌16S rRNA基因的测序方法 |
| CN119662814B (zh) * | 2025-02-20 | 2025-06-27 | 北京贝瑞和康生物技术有限公司 | 检测戈谢病gba1基因及其假基因gbap1多种突变的引物组、试剂盒及其应用 |
Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2016028887A1 (en) * | 2014-08-19 | 2016-02-25 | Pacific Biosciences Of California, Inc. | Compositions and methods for enrichment of nucleic acids |
Family Cites Families (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| BR112017008877A2 (pt) * | 2014-10-29 | 2018-07-03 | 10X Genomics Inc | métodos e composições para sequenciamento de ácido nucleico-alvo |
| CN107250445A (zh) * | 2015-02-27 | 2017-10-13 | 富鲁达公司 | 用于高通量研究的单个细胞核酸 |
| KR102888521B1 (ko) * | 2015-04-06 | 2025-11-19 | 더 보드 어브 트러스티스 어브 더 리랜드 스탠포드 주니어 유니버시티 | Crispr/cas-매개 유전자 조절을 위한 화학적으로 변형된 가이드 rna |
-
2020
- 2020-10-12 CN CN202080071378.4A patent/CN114555824A/zh active Pending
- 2020-10-12 WO PCT/US2020/055293 patent/WO2021072394A1/en not_active Ceased
- 2020-10-12 EP EP20874075.3A patent/EP4041916A4/en not_active Withdrawn
- 2020-10-12 JP JP2022506030A patent/JP2022551551A/ja active Pending
- 2020-10-12 US US17/774,345 patent/US20230028445A1/en not_active Abandoned
Patent Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| WO2016028887A1 (en) * | 2014-08-19 | 2016-02-25 | Pacific Biosciences Of California, Inc. | Compositions and methods for enrichment of nucleic acids |
Non-Patent Citations (4)
| Title |
|---|
| ANONYMOUS: " Instructions", CHOPCHOP, pages 1 - 5, XP055904354, Retrieved from the Internet <URL:https://web.archive.org/web/20190811181724/https://chopchop.cbu.uib.no/instructions> [retrieved on 20220323] * |
| GILPATRICK, T. ET AL.: "Targeted Nanopore Sequencing with Cas9 for studies of methylation, structural variants, and mutations", BIORXIV, 4 June 2019 (2019-06-04), XP055816504, DOI: https://doi.org/10.1101/604173 * |
| LIU, X. ET AL.: "Sequence features associated with the cleavage efficiency of CR1SPR/Cas9 system", SCIENTIFIC REPORTS, vol. 6, no. 19675, 2016, XP055543782 * |
| LUBUN, K. ET AL.: "CHOPCHOP v3: expanding the CRISPR web toolbox beyond genome editing", NUCLEIC ACIDS RESEARCH, vol. 47, 20 May 2019 (2019-05-20), pages W171 - W174, XP055647063 * |
Cited By (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114752668A (zh) * | 2022-05-13 | 2022-07-15 | 深圳市优圣康生物科技有限公司 | Crispr、cas9靶向捕获长片段dna的贫血筛查试剂盒及其方法 |
| CN114752668B (zh) * | 2022-05-13 | 2025-08-01 | 深圳市优圣康生物科技有限公司 | Crispr、cas9靶向捕获长片段dna的贫血筛查试剂盒及其方法 |
| WO2025049355A1 (en) * | 2023-08-26 | 2025-03-06 | Duke University | Compositions comprising nanoparticles for targeted delivery and methods of using the same |
| WO2025235828A3 (en) * | 2024-05-08 | 2026-01-15 | The General Hospital Corporation | Engineered prime editors for treating genetic deafness |
Also Published As
| Publication number | Publication date |
|---|---|
| EP4041916A1 (en) | 2022-08-17 |
| CN114555824A (zh) | 2022-05-27 |
| US20230028445A1 (en) | 2023-01-26 |
| EP4041916A4 (en) | 2023-11-01 |
| JP2022551551A (ja) | 2022-12-12 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US20240327901A1 (en) | Methods of Generating Libraries of Nucleic Acid Sequences for Detection via Flourescent in Situ Sequ | |
| US20230028445A1 (en) | Identification of genomic structural variants using long-read sequencing | |
| EP3469079B1 (en) | Enrichment of mutated cell free nucleic acids for cancer detection | |
| Clark et al. | Recurrent somatic mutations in POLR2A define a distinct subset of meningiomas | |
| Macaulay et al. | G&T-seq: parallel sequencing of single-cell genomes and transcriptomes | |
| Tsai et al. | Amplification-free, CRISPR-Cas9 targeted enrichment and SMRT sequencing of repeat-expansion disease causative genomic regions | |
| Langevin et al. | Peregrine: a rapid and unbiased method to produce strand-specific RNA-Seq libraries from small quantities of starting material | |
| CN118638898A (zh) | 用于靶向核酸序列富集的方法及在错误纠正的核酸测序中的应用 | |
| JP2022505050A (ja) | プーリングを介した多数の試料の効率的な遺伝子型決定のための方法および試薬 | |
| JP2021530219A (ja) | ゲノム編集、クローン増殖、および関連用途を特徴付けるための方法および試薬 | |
| Haile et al. | Evaluation of protocols for rRNA depletion-based RNA sequencing of nanogram inputs of mammalian total RNA | |
| US20160098516A1 (en) | Methods and systems for detection of a genetic mutation | |
| JP2024056984A (ja) | エピジェネティック区画アッセイを較正するための方法、組成物およびシステム | |
| JP2026035733A (ja) | メチル化ポリヌクレオチドの結合を改善するための方法、組成物およびシステム | |
| Jaksik et al. | RNA-seq library preparation for comprehensive transcriptome analysis in cancer cells: the impact of insert size | |
| JP2022514010A (ja) | 核酸分子の回収率を改善するための方法、組成物、およびシステム | |
| US20240233871A9 (en) | Methods for the non-invasive detection and monitoring of therapeutic nucleic acid constructs | |
| Do et al. | Transcriptome analysis of non‐coding RNAs in livestock species: elucidating the ambiguity | |
| US12168801B1 (en) | Hybrid/capture probe designs for full-length cDNA | |
| HK40070564A (en) | Identification of genomic structural variants using long-read sequencing | |
| Nachmanson et al. | Targeted genome fragmentation with CRISPR/Cas9 improves hybridization capture, reduces PCR bias, and enables efficient high-accuracy sequencing of small targets | |
| US20240425930A1 (en) | Methods for selective sequencing of cancer dna | |
| US20230144221A1 (en) | Methods and systems for detecting alternative splicing in sequencing data |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20874075 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 2022506030 Country of ref document: JP Kind code of ref document: A |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| ENP | Entry into the national phase |
Ref document number: 2020874075 Country of ref document: EP Effective date: 20220511 |