WO2024037449A1 - 一种高通量构建rna测序文库的方法及试剂盒 - Google Patents

一种高通量构建rna测序文库的方法及试剂盒 Download PDF

Info

Publication number
WO2024037449A1
WO2024037449A1 PCT/CN2023/112554 CN2023112554W WO2024037449A1 WO 2024037449 A1 WO2024037449 A1 WO 2024037449A1 CN 2023112554 W CN2023112554 W CN 2023112554W WO 2024037449 A1 WO2024037449 A1 WO 2024037449A1
Authority
WO
WIPO (PCT)
Prior art keywords
sequence
probe
sequencing
gene
cells
Prior art date
Application number
PCT/CN2023/112554
Other languages
English (en)
French (fr)
Inventor
邵伟
肖梅
陈军
王栋
许俊泉
Original Assignee
格物致和生物科技(北京)有限公司
格物智造科技(成都)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 格物致和生物科技(北京)有限公司, 格物智造科技(成都)有限公司 filed Critical 格物致和生物科技(北京)有限公司
Publication of WO2024037449A1 publication Critical patent/WO2024037449A1/zh

Links

Classifications

    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B50/00Methods of creating libraries, e.g. combinatorial synthesis
    • C40B50/06Biochemical methods, e.g. using enzymes or whole viable microorganisms
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6806Preparing nucleic acids for analysis, e.g. for polymerase chain reaction [PCR] assay
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6844Nucleic acid amplification reactions
    • C12Q1/6851Quantitative amplification
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • CCHEMISTRY; METALLURGY
    • C40COMBINATORIAL TECHNOLOGY
    • C40BCOMBINATORIAL CHEMISTRY; LIBRARIES, e.g. CHEMICAL LIBRARIES
    • C40B40/00Libraries per se, e.g. arrays, mixtures
    • C40B40/04Libraries containing only organic compounds
    • C40B40/06Libraries containing nucleotides or polynucleotides, or derivatives thereof
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N33/00Investigating or analysing materials by specific methods not covered by groups G01N1/00 - G01N31/00
    • G01N33/48Biological material, e.g. blood, urine; Haemocytometers
    • G01N33/50Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing
    • G01N33/5005Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving human or animal cells
    • G01N33/5008Chemical analysis of biological material, e.g. blood, urine; Testing involving biospecific ligand binding methods; Immunological testing involving human or animal cells for testing or evaluating the effect of chemical or biological compounds, e.g. drugs, cosmetics
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/106Pharmacogenomics, i.e. genetic variability in individual responses to drugs and drug metabolism
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/158Expression markers
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q2600/00Oligonucleotides characterized by their use
    • C12Q2600/178Oligonucleotides characterized by their use miRNA, siRNA or ncRNA

Definitions

  • the present disclosure belongs to the field of molecular biology targeted RNA sequencing.
  • the present disclosure relates to methods, gene probe combinations and kits for constructing sequencing libraries.
  • RNA is the central link in the gene expression process and participates in many functional processes related to gene expression. Identifying and quantifying gene expression is important for understanding the normal physiological functions of organisms and the pathological processes of diseases. There are many methods for RNA identification and quantification of gene expression. Traditionally, they are mainly based on fluorescence quantitative PCR, gene expression chips and conventional RNA-seq (RNA-sequencing). In recent years, new methods have been developed such as NanoString digital single-molecule genes. Expression profiling, etc. The development of technology has expanded different application scenarios.
  • RNA library construction methods include: 1) total RNA extraction; 2) mRNA isolation and fragmentation; 3) cDNA first-strand synthesis; 4) cDNA second-strand synthesis; 5) end repair of double-stranded cDNA; 6) ligation of adapters ;7) PCR enrichment library.
  • Targeted capture RNA is an accurate and efficient RNA enrichment detection method. It can design probes in batches for genes or transcripts of interest. It is ideal for detecting gene expression, gene fusion, splicing variation, single nucleotide variation and insertion/ Effective method for missing etc.
  • NanoString's digital gene expression profiling technology designs molecular hybridization probes for each mRNA molecule (the length of the probe is approximately 35-50nt). Paired probes are designed to target unique regions of each gene or transcript.
  • the 5' end of the reporter probe contains a fluorescent molecular barcode label, and the 3' end of the capture probe contains a biotin label. After hybridization between the reporter probe and the mRNA, a complex is formed. After separation and purification, the hybridization product is fixed on the sample plate.
  • the digital analyzer then identifies, scans, and counts the fluorescent molecular barcode of each sample to analyze the gene expression level in the sample.
  • US patent US9938566 reports a method for detecting gene expression based on probe targeted capture.
  • the characteristic of this method is that a pair of probes is designed for the detection gene, and the probe contains a gene-matching region and a universal linker region.
  • the probes are matched with the target template, and the ligase connects the paired matched probes into complete fragments, which are then amplified by PCR in the universal linker region to analyze the gene expression level in the sample.
  • This method successfully detected gene expression and gene mutations in samples.
  • the problem with this method is that the detection throughput is low, the ligation probe is not corrected after PCR signal amplification, and the accuracy of characterizing gene expression is limited.
  • RNA-seq In drug treatment experiments on trace amounts of primary cells, cultured cells, tumor tissue cells or organoid cells, 384 microwell plates or 96 microwell plates are usually used, and the number of cells in each well ranges from a few hundred cells to tens of thousands of cells. No wait. In these details On the basis of cell volume, the amount of RNA that can be extracted by conventional methods is small and insufficient for subsequent large-scale gene expression detection. In addition, the isolation and purification of mRNA will further increase the loss of mRNA, especially the loss of low-abundance mRNA, resulting in quantitative deviations in gene expression. At the same time, technologies such as fluorescence quantitative PCR, gene expression chips, and conventional RNA-seq require the conversion of RNA into cDNA, which further affects the accuracy of experiments.
  • NanoString technology can be applied to micro-volume cell lysates, but the six fluorescent molecular barcodes of NanoString reporter probes distinguish the probes through the arrangement and combination of bead colors. Due to limitations of subsequent detection instruments and chips, NanoString can only detect 12 samples per experiment, and each sample detects the mRNA of 800 genes, resulting in limitations in NanoString's detection throughput and number of detected genes.
  • the present inventor provides a library construction method for trace amounts of complex samples.
  • RNA expression levels can be successfully analyzed from nanogram (ng) level RNA in the lysate of hundreds of cells. In theory, it can even analyze the expression level of the entire genome RNA. Comparative experiments with fluorescence quantitative PCR show that accurate quantification can be performed using the library construction method of the present disclosure.
  • the present disclosure provides a method for constructing a sequencing library, including the following steps: 1) providing a sample solution containing mRNA; 2) enriching the mRNA in the sample solution through magnetic beads; 3) adding at least one Gene probe combination, each gene probe combination in the at least one gene probe combination includes a first probe sequence and a second probe sequence; 4) adding a blocking sequence; 5) annealing to make the gene probe combination and mRNA hybridization; 6) Add nucleic acid ligase to connect the first probe sequence and the second probe sequence in each gene probe combination hybridized to the mRNA to form a nucleic acid molecule; 7) Add elution buffer to make the nucleic acid Separate molecules from mRNA; 8) Use nucleic acid molecules as templates to extend the closed sequence under the action of DNA polymerase, and complement the template into DNA double strands; 9) Use DNA double strands as templates to perform PCR amplification to construct a sequencing library; Wherein, the first probe sequence targets the 5'
  • the sample solution is a cell lysate, and the cells are selected from primary cells, cultured cells, tumor tissue cells and organoid cells.
  • the sample solution is a microcell lysate. in an implementation In the protocol, the micro-cell lysate is a lysate of hundreds (eg, 100 to 900) cells.
  • the trace cell lysate is a lysate of a single cell.
  • the magnetic beads are oligo-dT magnetic beads or streptavidin magnetic beads.
  • the magnetic beads are in excess relative to the sample solution.
  • steps 2) to 4) are performed in any order or simultaneously.
  • the annealing is performed at a temperature of 37°C to 45°C.
  • the method of constructing a sequencing library further includes the steps of using a nucleic acid ligase buffer to suspend the magnetic beads and using a magnetic stand to adsorb the magnetic beads after step 5) and before step 6).
  • the method of constructing a sequencing library further includes the steps of using a washing buffer to suspend the magnetic beads and using a magnetic stand to adsorb the magnetic beads after step 6) and before step 7).
  • the nucleic acid ligase is a ligase with the ability to catalyze single-strand ligation in hybrid strands, such as T4 DNA ligase or SplitR ligase.
  • step 9 PCR amplification is performed using a pair of primers that are complementary to the 3' end universal adapter sequence and the 5' end universal adapter sequence, and at least one of the pair of primers is One (preferably two) primers contain the index sequence.
  • the length of the 5' end specific complementary sequence or the 3' end specific complementary sequence is 20-25 bp.
  • the first probe sequence, the second probe sequence and/or the blocking sequence are single-stranded nucleotides containing natural nucleotides or modified nucleotides.
  • the length of the region blocked by the blocking sequence is less than or equal to the full length of the 3' end universal linker sequence.
  • the UMI molecule sequence is a random sequence of more than 4 bp (preferably 4-6 bp).
  • the extension in step 8) is performed by PCR.
  • the library construction method of the present disclosure directly uses magnetic beads to target and capture lysed cellular mRNA for subsequent library construction. There is no need for routine isolation, extraction and purification of RNA, or the need to transcribe the mRNA into cDNA through reverse transcriptase, but directly targets the target RNA. .
  • the library construction method of the present disclosure can effectively build a library from the lysate of trace amounts of cells (as low as 400 cells, or even a single cell) to characterize the expression of RNA in the sample, and is very suitable for large-scale drug screening and evaluation.
  • the disclosed library construction method can be combined with sequencing methods such as second-generation sequencing to achieve ultra-multiple digital gene expression detection, and can simultaneously detect the expression of hundreds or thousands (even the entire genome) of genes, and specifically and unbiasedly detect the expression levels of genes in samples. .
  • the disclosed database construction method has a simple process and a short time required for database construction, and is suitable for use with automated instruments to achieve automated database construction. Taking a 384-head automated workstation as an example, it is calculated that one piece of equipment can build four 384-well plates per day, that is, it can complete the library building of 1536 samples, which greatly improves the efficiency of library building and reduces the cost.
  • the present disclosure provides a sequencing library constructed by the method according to the first aspect.
  • the present disclosure provides a gene probe combination, which includes a first probe sequence and a second probe sequence.
  • the first probe sequence includes a 3' end universal linker sequence, a UMI molecule sequence and a 5' end-specific complementary sequence
  • the second probe sequence includes a 3' end-specific complementary sequence, a UMI molecule sequence and a 5' end universal linker sequence
  • the first probe sequence also includes a blocking sequence, the blocking sequence is bound to the The 3' end universal adapter sequence.
  • the length of the 5' end specific complementary sequence or the 3' end specific complementary sequence is 20-25 bp.
  • the first probe sequence, the second probe sequence and/or the blocking sequence are single-stranded nucleotides containing natural nucleotides or modified nucleotides.
  • the length of the region blocked by the blocking sequence is less than or equal to the full length of the 3' end universal linker sequence or the 5' end universal linker sequence.
  • the UMI molecule sequence is a random sequence of more than 4 bp (preferably 4-6 bp).
  • the length of UMI molecule sequence is usually about 10bp.
  • the UMI molecule sequences used in the methods of the present disclosure are shorter and synthesized directly into the probe sequence.
  • the universal adapter sequence (including the 3' end universal adapter sequence and the 5' end universal adapter sequence) is compatible with sequencers, including but not limited to Illumina sequencers (such as Illumina Novaseq6000), MGI The company's sequencers (such as MGI DNBSEQ-T7), Thermo Fisher Scientific's sequencers (such as Thermo Fisher Scientific Ion S5).
  • Illumina sequencers such as Illumina Novaseq6000
  • MGI The company's sequencers such as MGI DNBSEQ-T7
  • Thermo Fisher Scientific's sequencers such as Thermo Fisher Scientific Ion S5
  • the blocking sequence is the reverse complement of the universal linker sequence.
  • the present disclosure provides a kit comprising the gene probe combination according to the third aspect.
  • the kit further includes: a pair of primers, the pair of primers are respectively complementary to the 3' end universal linker sequence and the 5' end universal linker sequence, at least one of the pair of primers
  • the (preferably two) primers contain the index sequence.
  • the probe structure of the present disclosure includes: specific complementary sequence, UMI molecule sequence and universal linker sequence.
  • the UMI molecule sequence (located between the universal linker sequence and the specific complementary sequence) is directly synthesized and introduced into the probe structure, which can effectively eliminate the bias caused by PCR amplification, make gene expression analysis more accurate, and increase data reliability. Blocking sequences reduce probe-to-probe Mutual interference increases the ratio of target library construction bands and enables the detection of gene multiplicity.
  • the gene probe combination of the present disclosure and the kit containing the gene probe combination can be used for the construction of second-generation sequencing libraries, which can achieve fast, accurate and accurate gene expression in trace amounts of primary cells, cultured cells or organoid cell systems. Efficient and low-cost detection.
  • the present disclosure provides applications of the gene probe combination according to the third aspect or the kit according to the fourth aspect in quantitative PCR or library construction.
  • the quantitative PCR is fluorescence quantitative PCR.
  • the present disclosure provides a method for measuring mRNA content in a sample, including: constructing a sequencing library according to the method described in the first aspect, and then performing second-generation sequencing, third-generation sequencing, fluorescence spectroscopy or quantitative PCR. The mRNA content in the sample was quantified.
  • the present disclosure provides a sequencing method, including: constructing a sequencing library by the method according to the first aspect, and then performing sequencing using the sequencing library.
  • the sequencing method is a sequencing method for non-diagnostic purposes, eg, for research purposes.
  • the library construction method of the present disclosure can be used to construct a high-throughput library, sequence and analyze the effects of drugs or compounds on cellular gene expression, and analyze and analyze the biological functions of drugs or compounds.
  • the present disclosure provides a drug screening or drug evaluation method, including: inoculating cells; treating the cells with candidate drugs; constructing a sequencing library by the method according to the first aspect; performing sequencing using the sequencing library; constructing Gene expression profiling of drug candidates.
  • cells are seeded into a 384 microwell plate.
  • sequencing is performed by second-generation sequencing or third-generation sequencing.
  • a gene expression profile of the candidate drug is constructed through bioinformatics analysis to obtain the gene expression pattern after treatment with the candidate drug.
  • gene expression profiles to screen compounds or drugs is a new and unique high-throughput drug screening model that can provide the correlation between genes, diseases, and drugs, accelerate the screening of disease-related candidate compounds, and accelerate the study of drug action mechanisms, or Discover new uses for old medicines, etc. Further data integration into a target gene expression database related to disease-related active compounds or drugs will greatly promote drug screening, drug evaluation, etc.
  • Figure 1 shows a schematic diagram of the library construction method of the present disclosure. This figure schematically shows that under the action of nucleic acid ligase, the first probe sequence hybridized to mRNA (its 3' end universal adapter sequence is combined with a blocking sequence) and the second probe sequence are connected to form a nucleic acid molecule ( That is, step 6)) of the disclosed method.
  • Figure 2 shows the expression of the GAPDH gene using matching probes of different lengths targeting the hGAPDH gene.
  • Figure 3 shows the expression of ACTB gene using matching probes of different lengths for hACTB gene.
  • Figure 4 shows the fragment size distribution diagram of the library construction product without adding blocking sequence.
  • Figure 5 shows the fragment size distribution diagram of library construction products with added blocking sequences.
  • Figure 6 shows a comparison chart of the amount of library construction products with different starting amounts of library construction.
  • Figure 7 shows a correlation analysis diagram of gene expression amounts (UMI-count) with UMI linkers between Sample 1 and Sample 2.
  • Figure 8 shows the correlation analysis diagram of gene expression levels (count) between sample 1 and sample 2 without UMI linker.
  • Figure 9 shows the correlation analysis diagram of the detection results of qPCR and this method.
  • nucleic acid or polynucleotide sequences listed herein are in single-stranded form and are oriented from 5' to 3', left to right.
  • the nucleotides provided in this article are in the format recommended by the IUPACIUB Committee on Biochemical Nomenclature.
  • polynucleotide is a synonym for "nucleic acid” and refers to a polymeric form of nucleotides of any length, including deoxyribonucleotides or ribonucleotides, mixed sequences or analogs thereof. Polynucleotides may include modified nucleotides, such as methylated or capped nucleotides and nucleotide analogs.
  • a “specifically complementary” sequence refers to a base sequence that matches a target nucleic acid.
  • UMI Unique Molecular Identifier
  • molecular barcode also known as molecular barcode or molecular tag
  • the purpose is to more accurately quantify the number of starting molecules and reduce the inhomogeneity caused by PCR amplification.
  • Molecular barcodes usually consist of random sequences of about 10 nt (such as NNNNNNN), or degenerate bases (NNNRNYN).
  • Index sequence is a molecular sequence added to the DNA fragment during the PCR amplification stage of each sample in order to achieve simultaneous sequencing of multiple samples. It is used as a sample label for sequence splitting.
  • the present disclosure relates to a method of constructing a nucleic acid library, comprising the following steps:
  • nucleic acid ligase buffer cooperate with the magnetic stand, and replace the washing buffer
  • nucleic acid ligase to connect the probes in the gene probe combination that hybridize to the same nucleic acid template to form nucleic acid molecules
  • connection probe and the template are separated;
  • the blocking sequence is extended from the 5’ end to the 3’ end under the action of DNA polymerase, and the template is complemented into a DNA double strand;
  • sequencing double-end universal adapter primer pairs are identical or complementary to the 3'-end universal adapter sequence and the 5'-end universal adapter sequence respectively.
  • the primer contains the index sequence used for sequence splitting;
  • the present disclosure relates to a method for detecting nucleic acid expression, comprising the following steps:
  • nucleic acid ligase buffer cooperate with the magnetic stand, and replace the washing buffer
  • nucleic acid ligase to connect the probes in the gene probe combination that hybridize to the same nucleic acid template to form nucleic acid molecules
  • the blocking sequence is extended from the 5’ end to the 3’ end under the action of DNA polymerase, and the template is complemented into a DNA double strand;
  • PCR primers that match the universal adapter sequence at the 3' end of the first probe sequence and the universal adapter sequence at the 5' end of the second probe sequence to perform PCR amplification to obtain PCR product;
  • the nucleic acid template-containing solution is a cell lysate, such as a lysate of primary cells, cultured cells, tumor tissue cells or organoid cells.
  • the nucleic acid template is a ribonucleic acid (RNA) template, such as extracted RNA, RNA released by cleavage of cells, tissues or FFPE samples, or RNA expressed by exogenous genes in the cell body.
  • RNA ribonucleic acid
  • RNA is bound to oligo-dT magnetic beads via a polyA tail or to streptavidin beads via oligo-dT-biotin.
  • the universal adapter sequence is compatible with the sequencer.
  • sequencers include but are not limited to Illumina Novaseq6000 sequencer, MGI DNBSEQ-T7 sequencer, Thermo Fisher Scientific Ion S5 sequencer.
  • the hybridization buffer contains a high concentration of salt, such as NaCl at a concentration of 500mM to 1M.
  • the nucleic acid ligase is a ligase with the ability to catalyze single-strand ligation in hybrid strands, such as T4 DNA ligase, SplitR ligase, etc.
  • the elution buffer contains a low concentration of salt, such as NaCl at a concentration of 100mM to 500mM, which can effectively dissociate the probe from the nucleic acid template and release it into the liquid.
  • salt such as NaCl
  • the universal adapter sequence is the Truseq adapter sequence of the Illumina platform.
  • the sequencing double-end universal adapter primer pairs are Illumina P5 (containing i5 index) + read1 sequence and Illumina P7 (containing i7 index) + read2 sequence respectively.
  • PCR amplification is performed using a PCR reaction high-fidelity polymerase, such as Pfu enzyme, Kod enzyme, Kapa high-fidelity enzyme, etc.
  • a PCR reaction high-fidelity polymerase such as Pfu enzyme, Kod enzyme, Kapa high-fidelity enzyme, etc.
  • purification of PCR products is performed using magnetic beads to remove primer-dimer bands.
  • the universal linker sequence at the 3' end of the first probe sequence is previously blocked by a blocking sequence.
  • the disclosed method breaks through the limitation of pre-isolation and purification of nucleic acids (such as RNA) in conventional methods. Instead, cells are directly lysed to obtain cell lysate, and then the designed targeting probe is directly mixed with the cell lysate for incubation and hybridization. Each pair of probes is uniquely designed with three parts: the complementary sequence of the targeting nucleic acid (specific complementary sequence), the UMI molecule sequence, and the universal linker sequence. Each nucleic acid-specific targeted quantitation is achieved by at least one pair of probes via complementary sequences and UMI molecule sequences.
  • This method can quickly and efficiently label each nucleic acid molecule with a unique sequence tag without the need for PCR or ligase connection processes, increasing quantitative accuracy. It is especially suitable for trace amounts of primary cells, cultured cells, tumor tissue cells or organoid cell lysis liquids. Tie.
  • One aspect of the present disclosure relates to a sequencing method, including: constructing a sequencing library by the method of the present disclosure, and then using the sequencing library to perform sequencing.
  • breast cancer recurrence and metastasis are the main causes of death in breast cancer patients, and patients with recurrence and metastasis generally have a poor prognosis.
  • the multi-gene expression profile of breast cancer tumor tissues can provide guidance for breast cancer prognosis evaluation and efficacy prediction.
  • commonly used multi-gene expression profiles include 21-gene expression recurrence risk assessment, Mammaprint 70-gene test, PAM 50-gene test, etc. for prognostic evaluation of early breast cancer patients within 5 years.
  • the disclosed sequencing method can target more than 100 genes related to breast cancer recurrence and metastasis, including Ki-67, STK15, Survivin, Cy-clinB1, MYBL2, GRB7, Her-2ER, PR, Bel-2, and SCUBE2. Capture detection, obtain gene expression profile information, and build a gene expression-related scoring system to distinguish high-risk and low-risk recurrence and metastasis groups within five years, providing more precise clinical medical advice.
  • One aspect of the present disclosure relates to a drug screening or drug evaluation method.
  • second-generation sequencing is performed to generate expression information of each target candidate gene for each sample, and the generated gene expression
  • the information is divided into positive-regulated gene groups and negative-regulated gene groups, and a differential expression profile of characteristic genes is constructed.
  • the core algorithm is used to calculate the similarity of gene expression patterns and give corresponding scores (values between -1 and 1). The closer the score is to 1, it means that the drug molecules used to treat different samples are positively correlated, indicating that the drug molecules have similar cellular effects. On the contrary, the closer the score is to -1, it means that the drug molecules used to treat different samples are negatively correlated, indicating that the drug molecules have antagonistic cellular effects.
  • Probe sequence (as shown in Table 1, the matching length of the first probe sequence (D) and the second probe sequence (A) with the template is 20-25 bp). Probes of the same matching length targeting the same gene are paired as a probe pair.
  • the underlined part represents the universal linker sequence, the bolded part represents the specific complementary sequence, and the UMI molecule sequence is in the middle.
  • RNA Take 0.5 ⁇ g RNA, add 100pmol probe pair, 2pmol oligo-dT-biotin and 5 ⁇ l streptavidin magnetic beads.
  • the magnetic stand absorbs the magnetic beads and discards the supernatant. Use the washing buffer to suspend the magnetic beads, and the magnetic stand to absorb the magnetic beads. Repeat twice. The magnetic beads are purified to remove excess unpaired probes.
  • the ligation product is enzymatically complemented into DNA double strands by DNA polymerase.
  • probe pairs (first probe sequence and second probe sequence) targeting multiple different genes were designed, and a blocking sequence was designed for the universal linker sequence at the 3' end of the first probe sequence. .
  • One set of experiments used the first probe sequence and the second probe sequence to construct the library. Another set of experiments first used a blocking sequence to block the 3’-end universal linker of the first probe sequence, and then constructed a library with the second probe sequence. Other experimental materials, reagents, and library construction procedures for the two sets of experiments are exactly the same.
  • probe sequences and blocking sequences used are shown in Table 3 below.
  • the experimental materials, procedures, etc. are the same as those in Example 1.
  • PCR products were purified using magnetic beads. Fragment size was detected using an Agilent 4150 TapeStation system.
  • the underlined part represents the universal linker sequence
  • the bolded part represents the specific complementary sequence
  • the middle part is UMI. Molecular sequence.
  • a gradient of sample amounts was set.
  • SW620 cells were selected as the cell sample for bank construction.
  • Cell gradients used 3200 cells, 1600 cells, 800 cells, and 400 cells for each library preparation reaction.
  • Example 6 The experimental procedures and the like are the same as those in Example 1.
  • the probe sequence and blocking sequence are the same as in Example 2.
  • the gene expression level of the sample was relatively quantified by fluorescence quantitative PCR.
  • SW620 cells were selected as the cell sample for bank construction.
  • the cell usage is 4000 cells per library construction reaction.
  • the experimental procedures, etc. are the same as those in Example 1.
  • the probe sequence combination was selected as probes for 1300 genes (not listed), and the blocking sequence was the same as in Example 2.
  • the library construction product is sequenced by Illumina novaseq6000, and the bioinformatics data analyzes the RNA expression in the sample.
  • the "count” value indicates the number of "reads” detected for each gene, and "UMI_count” indicates that the influence of PCR amplification is removed according to the UMI molecular sequence. , a piece of RNA only retains one "read” number.
  • RNA detection accuracy of the disclosed method two cell types were used: HepG2 cells and MDA-MB-231 cells.
  • fluorescence quantitative PCR method primer sequences are shown in Table 4
  • method the experimental procedures, etc. are the same as those in Example 1, and the probe sequence and blocking sequence are the same as those in Example 2) to simultaneously detect the RNA content in the sample, and compare the consistency of the two methods to detect RNA expression.

Landscapes

  • Chemical & Material Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Organic Chemistry (AREA)
  • Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Biochemistry (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Immunology (AREA)
  • Zoology (AREA)
  • Wood Science & Technology (AREA)
  • Microbiology (AREA)
  • Analytical Chemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Biotechnology (AREA)
  • Chemical Kinetics & Catalysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Genetics & Genomics (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Medicinal Chemistry (AREA)
  • Biophysics (AREA)
  • Urology & Nephrology (AREA)
  • Hematology (AREA)
  • General Chemical & Material Sciences (AREA)
  • Toxicology (AREA)
  • Tropical Medicine & Parasitology (AREA)
  • Cell Biology (AREA)
  • Food Science & Technology (AREA)
  • General Physics & Mathematics (AREA)
  • Pathology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

本公开涉及构建测序文库的方法、基因探针组合以及试剂盒。本公开的建库方法可以结合二代测序等测序手段实现超多重数字化基因表达检测,可以同时检测出成百上千(甚至全基因组)的基因表达,特异并且无偏检测样本中基因的表达量。本公开的基因探针组合(靶向探针)可用于直接捕获RNA进行二代测序建库。

Description

一种高通量构建RNA测序文库的方法及试剂盒 技术领域
本公开属于分子生物学靶向RNA测序领域。本公开涉及构建测序文库的方法、基因探针组合以及试剂盒。
背景技术
基因表达模式对细胞功能有着直接的影响。RNA是基因表达过程的中心纽带,参与了基因表达相关的许多功能过程。鉴定和量化基因表达对理解生物体正常的生理功能,疾病的病理过程十分重要。RNA鉴定和量化基因表达有多种方法,传统上主要基于荧光定量PCR、基因表达芯片和常规RNA-seq(RNA-sequencing)等,近几年又发展了新的方法如NanoString数字式单分子基因表达谱等。技术的发展拓展了不同应用场景。
常规RNA建库方法包括:1)总RNA提取;2)mRNA分离与片段化;3)cDNA第一链合成;4)cDNA第二链合成;5)双链cDNA的末端修复;6)连接接头;7)PCR富集文库。
靶向捕获RNA是一种准确高效的RNA富集检测方法,可以针对感兴趣的基因或转录本批量设计探针,是检测基因表达量、基因融合、剪接变异、单核苷酸变异和插入/缺失等的有效方法。例如NanoString的数字基因表达谱技术针对每个mRNA分子设计分子杂交探针(探针的长度约为35-50nt)。针对每个基因或转录本独特区设计的成对探针,报告探针的5’端包含一个荧光分子条形码标记,捕获探针的3’端含有生物素标记。报告探针和mRNA杂交后形成复合物,经过分离纯化,杂交产物固定于样品板上。进而由数字分析仪对每个样本的荧光分子条形码进行识别、扫描和计数,解析出样本中基因表达量。
美国专利US9938566报道了一种基于探针靶向捕获法检测基因表达量的方法。该方法的特点是,对检测基因设计了一对探针,探针包含基因互配的区域和通用接头区域。探针与靶向模板互配,连接酶将成对互配探针连接成完整片段,随后通过通用接头区域PCR扩增,解析出样本中基因表达量。此方法成功检测出样本中基因表达量与基因突变。但该方法的问题在于检测通量低,连接探针通过PCR信号放大后未修正,表征基因表达量的准确性有限。
微量原代细胞、培养细胞、肿瘤组织细胞或类器官细胞等的药物处理实验中,通常采用384微孔板或96微孔板,每个孔的细胞数为几百个细胞至几万个细胞不等。在这些细 胞量的基础上,通过常规方法可提取获得的RNA量少,不足以用于后续的大规模基因表达检测。此外,mRNA的分离纯化会进一步增加mRNA损失,特别是低丰度的mRNA的损失更为严重,造成基因表达定量偏差。同时,荧光定量PCR、基因表达芯片和常规RNA-seq等技术需要将RNA反转为cDNA,进一步影响了实验的准确性。
直接采用细胞裂解液会对后续实验带来便利,但需要考虑细胞裂解后释放的大量细胞碎片、蛋白、基因组DNA、核糖体RNA等成分对后续实验的影响。同时,为了评估多种类型化合物的生物功能,需要检测的RNA种类尽可能多。NanoString技术可以运用于微量细胞裂解液,但NanoString报告探针的6种荧光分子条形码是通过珠子色彩的排列组合来区分探针的。受后续检测仪器和芯片的限制,NanoString的每次实验只能检测12个样本,每个样本检测800种基因的mRNA,导致NanoString的检测通量及检测基因数受限。
因此,针对微量复杂样本的384或96微孔板模式,并行性多样本检测上千个基因的表达量,是本领域亟待解决的问题。
发明内容
为了解决上述技术问题,本发明人提供了一种针对微量复杂样品的构建文库方法。利用本公开的建库方法成功对数百个细胞的裂解液中的纳克(ng)级别的RNA解析出多达数千种的RNA表达量,理论上甚至可以解析全基因组RNA的表达量。与荧光定量PCR的对比实验表明利用本公开的建库方法可以进行准确定量。
因此,在第一方面,本公开提供了一种构建测序文库的方法,包括以下步骤:1)提供含mRNA的样品溶液;2)通过磁珠富集样品溶液中的mRNA;3)加入至少一个基因探针组合,所述至少一个基因探针组合中的每一个基因探针组合包括第一探针序列和第二探针序列;4)加入封闭序列;5)退火,使基因探针组合与mRNA杂交;6)加入核酸连接酶,使杂交到mRNA的每一个基因探针组合中的第一探针序列和第二探针序列连接,形成核酸分子;7)加入洗脱缓冲液,使核酸分子与mRNA分离;8)以核酸分子为模板,在DNA聚合酶的作用下使封闭序列延伸,将模板互补为DNA双链;9)以DNA双链为模板进行PCR扩增,构建测序文库;其中,所述第一探针序列靶向mRNA的5’端,所述第一探针序列包括3’端通用接头序列、UMI分子序列和5’端特异互补序列,所述第二探针序列靶向mRNA的3’端,所述第二探针序列包括3’端特异互补序列、UMI分子序列和5’端通用接头序列,所述封闭序列结合至所述第一探针的所述3’端通用接头序列。
在一个实施方案中,样品溶液为细胞裂解液,所述细胞选自原代细胞、培养细胞、肿瘤组织细胞和类器官细胞。在一个实施方案中,样品溶液为微量细胞裂解液。在一个实施 方案中,微量细胞裂解液为数百个(如100~900个)细胞的裂解液。在一个实施方案中,微量细胞裂解液为单个细胞的裂解液。
在一个实施方案中,磁珠为oligo-dT磁珠或链霉亲和素磁珠。
在一个实施方案中,磁珠相对于样品溶液是过量的。
在一个实施方案中,步骤2)至步骤4)以任意顺序进行或同时进行。
在一个实施方案中,退火在37℃-45℃的温度下进行。
在一个实施方案中,构建测序文库的方法还包括在步骤5)之后且在步骤6)之前使用核酸连接酶缓冲液悬浮磁珠和使用磁力架吸附磁珠的步骤。
在一个实施方案中,构建测序文库的方法还包括在步骤6)之后且在步骤7)之前使用漂洗缓冲液悬浮磁珠和使用磁力架吸附磁珠的步骤。
在一个实施方案中,核酸连接酶为具有催化杂交链中单链连接能力的连接酶,如T4DNA连接酶或SplitR连接酶。
在一个实施方案中,在步骤9)中,使用分别与所述3’端通用接头序列和所述5’端通用接头序列互补的一对引物进行PCR扩增,所述一对引物中的至少一个(优选两个)引物包含index序列。通过使用优选的双端index,在同一个测序芯片中可以混合更多的样本,大大降低了建库产物的测序成本。
在一个实施方案中,5’端特异互补序列或3’端特异互补序列的长度为20-25bp。
在一个实施方案中,第一探针序列、第二探针序列和/或封闭序列为含有自然核苷酸或修饰核苷酸的单链核苷酸。
在一个实施方案中,封闭序列所封闭区域的长度小于或等于所述3’端通用接头序列的全长。
在一个实施方案中,UMI分子序列为4bp以上(优选4-6bp)的随机序列。
在一个实施方案中,步骤8)中的延伸是通过PCR进行的。
本公开的建库方法直接用磁珠靶向捕获裂解的细胞mRNA用于后续建库,无需常规分离提取纯化RNA,也无需通过反转录酶将mRNA转录为cDNA,而是直接靶向目标RNA。
本公开的建库方法可以从微量细胞(低至400个细胞,甚至单个细胞)的裂解液有效建库表征样品中RNA的表达量,非常适合大规模药物筛选及评价。
本公开的建库方法可以结合二代测序等测序手段实现超多重数字化基因表达检测,可以同时检测出成百上千(甚至全基因组)的基因表达,特异并且无偏检测样本中基因的表达量。
本公开的建库方法的流程简单,建库所需时间短,适合搭配自动化仪器实现自动化建库。以一台384头的自动化工作站为例,通过测算,一台设备每天可以建库4个384孔板,即完成1536个样本建库,大大提高了建库效率,同时降低了成本。
在第二方面,本公开提供一种通过根据第一方面所述的方法构建的测序文库。
在第三方面,本公开提供一种基因探针组合,其包括第一探针序列和第二探针序列,所述第一探针序列包括3’端通用接头序列、UMI分子序列和5’端特异互补序列,所述第二探针序列包括3’端特异互补序列、UMI分子序列和5’端通用接头序列,所述第一探针序列还包括封闭序列,所述封闭序列结合至所述3’端通用接头序列。
在一个实施方案中,5’端特异互补序列或3’端特异互补序列的长度为20-25bp。
在一个实施方案中,第一探针序列、第二探针序列和/或封闭序列为含有自然核苷酸或修饰核苷酸的单链核苷酸。
在一个实施方案中,封闭序列所封闭区域的长度小于或等于所述3’端通用接头序列或所述5’端通用接头序列的全长。
在一个实施方案中,UMI分子序列为4bp以上(优选4-6bp)的随机序列。UMI分子序列的长度通常为10bp左右。本公开的方法中所用的UMI分子序列较短且直接合成在探针序列中。当UMI分子序列为4bp时,基因探针组合中双UMI分子序列的设计可以获得44×44=65,536种组合,足以覆盖细胞中RNA的转录本数量。应理解,UMI分子序列的长度可以根据需求调整,以覆盖样本基因表达量。
在一个实施方案中,通用接头序列(包括3’端通用接头序列和5’端通用接头序列)与测序仪兼容,所述测序仪包括但不限于Illumina公司的测序仪(如Illumina Novaseq6000)、MGI公司的测序仪(如MGI DNBSEQ-T7)、Thermo Fisher Scientific公司的测序仪(如Thermo Fisher Scientific Ion S5)。
在一个实施方案中,封闭序列是通用接头序列的反向互补序列。
在第四方面,本公开提供一种试剂盒,其包含根据第三方面所述的基因探针组合。
在一个实施方案中,试剂盒还包含:一对引物,所述一对引物分别与所述3’端通用接头序列和所述5’端通用接头序列互补,所述一对引物中的至少一个(优选两个)引物包含index序列。
本公开的探针结构包括:特异互补序列、UMI分子序列和通用接头序列。UMI分子序列(位于通用接头序列和特异互补序列之间)被直接合成引入探针结构,可以有效排除PCR扩增带来的偏差,使基因表达解析更精准,增加了数据可靠性。封闭序列降低了探针间的 相互干扰,增加了目的建库条带比率,同时使得检测基因多重性得以实现。本公开的基因探针组合和包含该基因探针组合的试剂盒可用于二代测序文库的构建,可实现微量原代细胞、培养细胞或类器官细胞体系中的基因表达量的快捷、准确、高效、低成本检测。
在第五方面,本公开提供根据第三方面所述的基因探针组合或根据第四方面所述的试剂盒在定量PCR或文库构建中的应用。
在一个实施方案中,定量PCR是荧光定量PCR。
在第六方面,本公开提供一种测定样品中mRNA含量的方法,包括:通过根据第一方面所述的方法构建测序文库,然后通过二代测序、三代测序、荧光光谱法或定量PCR法对样品中mRNA含量进行定量。
在第七方面,本公开提供一种测序方法,包括:通过根据第一方面所述的方法构建测序文库,然后利用所述测序文库进行测序。
在一个实施方案中,测序方法为非诊断目的的测序方法,例如,研究目的。
此外,可以通过本公开的建库方法高通量建库,测序解析药物或化合物对细胞基因表达的影响,分析解析药物或化合物的生物功能。
在第八方面,本公开提供一种药物筛选或药物评价方法,包括:接种细胞;用候选药物处理细胞;通过根据第一方面所述的方法构建测序文库;利用所述测序文库进行测序;构建候选药物的基因表达谱。
在一个实施方案中,将细胞接种至384微孔板。
在一个实施方案中,测序是通过二代测序或三代测序进行的。
在一个实施方案中,通过生信分析构建候选药物的基因表达谱,获得候选药物处理后的基因表达模式。
利用基因表达谱筛选化合物或药物是一种新型独特的高通量药物筛选模式,可以提供基因-疾病-药物三者的关联关系,加速筛选到疾病相关的候选化合物,加速药物作用机理研究,或者发现老药新用等。进一步数据整合为疾病相关的活性化合物或药物的关联靶向基因表达数据库,极大推动药物筛选,药物评价等。
附图说明
图1示出了本公开的建库方法的原理图。该图示意性地显示了在核酸连接酶的作用下,使杂交到mRNA的第一探针序列(其3’端通用接头序列结合有封闭序列)和第二探针序列连接,形成核酸分子(即,本公开方法的步骤6))。
图2示出了使用针对hGAPDH基因的不同长度互配探针的GAPDH基因表达情况。
图3示出了使用针对hACTB基因的不同长度互配探针的ACTB基因表达情况。
图4示出了未加入封闭序列的建库产物的片段大小分布图。
图5示出了加入封闭序列的建库产物的片段大小分布图。
图6示出了不同建库起始量的建库产物量对比图。
图7示出了样本1、样品2间具有UMI接头的基因表达量(UMI-count)相关性分析图。
图8示出了样本1、样品2间没有UMI接头的基因表达量(count)相关性分析图。
图9示出了qPCR及本方法检测结果相关性分析图。
具体实施方式
除非另有定义,否则本文使用的所有技术和科学术语具有与本公开所属领域的普通技术人员的通常理解相同的含义。
除非另有说明,否则本文列出的核酸或多核苷酸序列是单链形式,方向是从5'至3',从左至右。本文提供的核苷酸采用IUPACIUB生化命名委员会建议的格式。
除非另有说明,“多核苷酸”是“核酸”的同义词,指任何长度的核苷酸的聚合形式,包括脱氧核糖核苷酸或核糖核苷酸,它们的混合序列或类似物。多核苷酸可以包括修饰的核苷酸,例如甲基化或加帽的核苷酸和核苷酸类似物。
在本文中,术语“包含”、“具有”、“包括”和“含有”应被解释为开放式术语(即意味着“包括但不限于”)。
如本文所用,“特异互补”序列指与目标核酸互配的碱基序列。
UMI(Unique Molecular Identifier)分子序列又称分子条形码或分子标签,用于区分同一样本中不同的片段。目的能够更精准定量起始的分子数,降低PCR扩增所造成的不均一性。分子条形码通常由大约10nt左右的随机序列(比如NNNNNNN),或者简并碱基(NNNRNYN)组成。
Index序列是为实现多个样本的同时测序而在各样本PCR扩增阶段在DNA片段上添加的一段分子序列,其作为样本标签用于序列拆分。
在一个实施方式中,本公开涉及一种构建核酸文库的方法,包括以下步骤:
-提供含核酸模板的溶液;
-加入磁珠、成对的基因探针组合(针对核酸模板设计,包括第一探针序列(3’端通用 接头序列+UMI分子序列+5’端磷酸化修饰的特异互补序列)和第二探针序列(3’端特异互补序列+UMI分子序列+5’端通用接头序列))、封闭序列(结合到探针序列的通用接头区域)、杂交缓冲液;
-退火使基因探针组合与核酸模板杂交,其中在杂交缓冲液的作用下,成对的基因探针组合退火杂交在同一条核酸模板;
-加入漂洗缓冲液,配合磁力架,将未杂交的探针冲洗掉;
-加入核酸连接酶缓冲液,配合磁力架,将漂洗缓冲液置换掉;
-加入核酸连接酶,使杂交到同一条核酸模板的基因探针组合中的探针连接,形成核酸分子;
-连接完成后,使用漂洗缓冲液,配合磁力架,将连接体系置换掉;在洗脱缓冲液的作用下,连接探针与模板分离;
-以核酸分子为模板,在DNA聚合酶酶的作用下使封闭序列从5’端至3’端延伸,将模板互补为DNA双链;
-以DNA双链为模板,使用测序双端通用接头引物对进行PCR扩增,所述测序双端通用接头引物对的序列分别与3’端通用接头序列和5’端通用接头序列相同或者互补,引物中含有用于序列拆分的index序列;
-对不同测序index的PCR产物进行混样、纯化、定量、质控,获得建库产物;
-测序、生物信息分析确定核酸模板的序列。
在一个实施方式中,本公开涉及一种检测核酸表达量的方法,包括以下步骤:
-提供含核酸模板的溶液;
-加入磁珠、成对的基因探针组合(针对核酸模板设计,包括第一探针序列(3’端通用接头序列+UMI分子序列+5’端磷酸化修饰的特异互补序列)和第二探针序列(3’端特异互补序列+UMI分子序列+5’端通用接头序列))、杂交缓冲液;其中第一探针序列的3’端通用接头序列预先被封闭序列封闭;
-退火使基因探针组合与核酸模板杂交,其中在杂交缓冲液的作用下,成对的基因探针组合退火杂交在同一条核酸模板;
-加入漂洗缓冲液,配合磁力架,将未杂交的探针冲洗掉;
-加入核酸连接酶缓冲液,配合磁力架,将漂洗缓冲液置换掉;
-加入核酸连接酶,使杂交到同一条核酸模板的基因探针组合中的探针连接,形成核酸分子;
-加入洗脱缓冲液,使核酸分子与核酸模板分离;
-以核酸分子为模板,在DNA聚合酶的作用下使封闭序列从5’端至3’端延伸,将模板互补为DNA双链;
-以DNA双链为模板,使用分别与第一探针序列的3’端通用接头序列和第二探针序列的5’端通用接头序列互配的PCR引物对,进行PCR扩增,得到PCR产物;
-对PCR产物进行混样、纯化、定量、测序;
-生物信息分析,得到核酸表达量信息。
在一个实施方式中,含核酸模板的溶液为细胞裂解液,例如原代细胞、培养细胞、肿瘤组织细胞或类器官细胞的裂解液。
在一个实施方式中,核酸模板为核糖核酸(RNA)模板,例如提取的RNA、由细胞、组织或FFPE样本裂解释放出的RNA,或外源性基因在细胞体内表达的RNA。
在一个实施方案中,RNA通过polyA尾结合到oligo-dT磁珠上,或通过oligo-dT-生物素与链霉亲和素磁珠结合。
在一个实施方式中,通用接头序列与测序仪相兼容。
在一个实施方式中,测序仪包括但不限于Illumina Novaseq6000测序仪、MGI DNBSEQ-T7测序仪、Thermo Fisher Scientific Ion S5测序仪。
在一个实施方案中,为增加封闭效果,封闭序列的碱基序列中可以有各种修饰碱基。
在一个实施方案中,杂交缓冲液含有高浓度盐,如浓度为500mM至1M的NaCl。
可以理解,当需要检测多个基因的表达时,针对每种基因加入一对以上的基因探针组合。理论上检测基因数量上限可以为全基因组范围。
在一个实施方案中,核酸连接酶为具有催化杂交链中单链连接能力的连接酶,比如T4 DNA连接酶,SplitR连接酶等。
在一个实施方案中,洗脱缓冲液含低浓度盐,如浓度为100mM至500mM的NaCl,可有效将探针从核酸模板解离,释放到液体中。
在一个实施方案中,通用接头序列为Illumina平台的Truseq接头序列。
在一个实施方案中,当使用Illumina测序平台时,测序双端通用接头引物对分别为Illumina P5(含i5 index)+read1序列和Illumina P7(含i7 index)+read2序列。
在一个实施方案中,PCR扩增使用PCR反应高保真聚合酶进行,例如Pfu酶、Kod酶、Kapa高保真酶等。
在一个实施方案中,PCR产物的纯化采用磁珠进行,去除引物二聚体条带。
在一个实施方案中,第一探针序列的3’端通用接头序列预先被封闭序列封闭。
本公开的方法突破了常规方法中需对核酸(如RNA)进行预分离纯化的限制,而是直接裂解细胞获得细胞裂解液,然后将设计的靶向探针直接与细胞裂解液混合孵育杂交。每对探针独特设计三个部分:靶向核酸的互补序列(特异互补序列)、UMI分子序列、通用接头序列。通过互补序列和UMI分子序列实现每个核酸特异性靶向定量至少一对探针。这样不需要通过PCR或连接酶连接过程即可快速、高效实现每个核酸分子标记上唯一序列标签,增加定量准确性,尤其适合微量原代细胞、培养细胞、肿瘤组织细胞或类器官细胞裂解液体系。
本公开的一个方面涉及一种测序方法,包括:通过本公开的方法构建测序文库,然后利用所述测序文库进行测序。
乳腺癌复发转移是导致乳腺癌患者死亡的主要原因,复发转移的患者预后一般较差。乳腺癌肿瘤组织的多基因表达谱可以为乳腺癌预后评价、疗效预测提供指导。目前常用的多基因表达谱有21基因表达复发风险评估,Mammaprint 70基因检测、PAM 50基因检测等用于早期乳腺癌患者5年内预后评价。本公开的测序方法可以通过对与乳腺癌复发转移相关的Ki-67、STK15、Survivin、Cy-clinB1、MYBL2、GRB7、Her-2ER、PR、Bel-2、SCUBE2等100个以上基因进行靶向捕获检测,获得基因表达谱信息,构建基因表达相关的评分系统来区分五年内高风险和低风险复发转移人群,为临床提供更精准医疗建议。
本公开的一个方面涉及一种药物筛选或药物评价方法。在一个实施方案中,对384或96或更多药物处理的样品,通过本公开的方法构建测序文库后,经二代测序产生每个样品每个目标候选基因的表达信息,将产生的基因表达信息分为正调控基因群与负调控基因群,并构建特征基因差异表达谱,然后以核心算法进行基因表达模式的相似程度的计算,并给予相应分数(数值在-1到1之间)。分数越接近1代表用来处理不同样本的药物分子为正相关,说明药物分子具有相似的细胞作用。反之,分数越接近-1代表用来处理不同样本的药物分子为负相关,说明药物分子具有拮抗的细胞作用。
下面结合附图和实施例对本公开作进一步详细的说明。以下实施例仅用于说明本公开而不用于限制本公开的范围。实施例中未注明具体条件的实验方法,系按照本领域已知的常规条件,或按照制造厂商所建议的条件进行操作。
实施例
实施例1
在本实施例中,针对hGAPDH及hACTB基因60bp的参考序列,设计了不同长度的 探针序列(如表1所示,第一探针序列(D)和第二探针序列(A)同模板互配长度为20-25bp)。针对同一基因的相同互配长度的探针配对为探针对。
表1.探针序列


注:1.“-D-”表示第一探针序列,“-A-”表示第二探针序列;
2.下划线部分表示通用接头序列,加粗部分表示特异互补序列,两者中间为UMI分子序列。
实验步骤如下:
1.取正常培养的MDA-MB-231细胞,计数107个细胞,使用Trizol法提取总RNA。
2.取0.5μg RNA,加入100pmol探针对、2pmol oligo-dT-生物素和5μl链霉亲和素磁珠。
3.在杂交缓冲液中,在65℃孵育5min,45℃杂交1h。
4.磁力架吸附磁珠,弃去上清。使用漂洗缓冲液悬浮磁珠,磁力架吸附磁珠,重复2次,磁珠纯化去除多余未配对的探针。
5.T4 DNA连接酶缓冲液悬浮磁珠,磁力架吸附磁珠,弃去上清。
6.加入T4 DNA连接酶,悬浮磁珠,37℃孵育1h,
7.加入漂洗缓冲液悬浮磁珠,磁力架吸附磁珠,重复2次。
8.加入洗脱缓冲液悬浮磁珠,65℃孵育5分钟,磁力架吸附磁珠,收集上清,洗脱缓冲液从磁珠洗脱连接产物。
9.将连接产物通过DNA聚合酶酶互补为DNA双链。
10.使用荧光定量PCR对样本中的基因表达量相对定量。荧光定量PCR使用的引物如下表2所示。
表2.引物序列
定量结果如图2和图3所示,可以看出,不同互配探针的定量结果没有显著差异,表明实验所用的互配探针都可以有效检测样本中的基因表达量。
实施例2
在本实施例中,设计了针对多个不同基因的探针对(第一探针序列和第二探针序列),并针对第一探针序列的3’端通用接头序列,设计了封闭序列。
一组实验使用第一探针序列和第二探针序列进行建库。另一组实验首先用封闭序列封闭第一探针序列的3’端通用接头,随后和第二探针序列进行建库。两组实验的其他实验材料、试剂、建库流程完全相同。
所用探针序列和封闭序列如下表3所示。实验材料、流程等与实施例1相同。PCR产物使用磁珠纯化。采用安捷伦4150 TapeStation系统检测片段大小。
表3.探针序列和封闭序列


注:1.“-D-”表示第一探针序列,“-A-”表示第二探针序列;
2.下划线部分表示通用接头序列,加粗部分表示特异互补序列,两者中间为UMI 分子序列。
结果如图4和图5所示,从样本的检测峰图可以看出,与未加入封闭序列的建库产物(图4)相比,加入封闭序列后的建库产物中(图5),目的建库产物190bp条带的比率大大提升,表明封闭序列可以有效提升建库产物的产出率。190bp与192bp属于4150 TapeStation检测误差之内。
实施例3
本实施例中,为了检验本公开方法对微量复杂样本中的RNA检测灵敏度,设置了样本量的梯度。建库细胞样本选用SW620细胞。细胞梯度为每个建库反应使用3200个细胞、1600个细胞、800个细胞、400个细胞。
实验流程等与实施例1相同。探针序列、封闭序列与实施例2相同。通过荧光定量PCR对样本的基因表达量进行相对定量。结果如图6所示,可以看出,细胞样本的基因表达Ct值的线性相关性R2=0.9719,证明本发明的方法可有效检测到微量细胞裂解液中的探针基因的RNA表达,且有较好的线性关系。
实施例4
本实施例中,为了研究本公开方法中UMI分子序列对检测结果的影响,设置了多样本重复。建库细胞样本选用SW620细胞。细胞使用量为每个建库反应使用4000个细胞。
实验流程等与实施例1相同。探针序列组合选择为1300个基因的探针(未列出),封闭序列与实施例2相同。建库产物通过Illumina novaseq6000测序,生信数据解析样本中RNA表达量,其中“count”值表示每个基因检测到的“read”条数,“UMI_count”表示按照UMI分子序列,去除PCR扩增影响,一条RNA只保留一个“read”条数。
如图7所示,两个细胞样本的1300个基因的UMI-count值的线性相关性R2=0.9961。如图8所示,两个细胞样本的1300个基因的count值的线性相关性R2=0.9758。上述结果表明,通过UMI分子序列对RNA定量,消除PCR影响后,大大提升了数值质量和重复间数据的相关性。
实施例5
本实施例中,为了研究本公开方法的RNA检测准确性,使用了两种细胞类型:HepG2细胞和MDA-MB-231细胞。使用荧光定量PCR法(引物序列如表4所示)和本公开的方 法(实验流程等与实施例1相同,探针序列、封闭序列与实施例2相同)同时检测样本中的RNA含量,对比两种方法检测RNA表达的一致性。
(1)培养HepG2细胞和MDA-MB-231细胞,消化接种至384细胞板(3000个细胞/每孔)。PCR产物使用磁珠纯化,Illumina novaseq6000测序,生信数据解析样本中RNA表达量的UMI-count值。
(2)以ACTB基因为内参,分别计算采用不同检测方法的HepG2细胞和MDA-MB-231细胞的基因相对表达量。
结果如图9所示,两种方法检测10种基因在2种细胞中的相对表达量的相关性R2=0.9369,显示出较高的相关性,表明本公开方法检测RNA表达真实有效。
表4.荧光定量PCR的探针序列
本公开中提及的所有出版物、专利申请、专利、核酸和氨基酸序列以及其他参考文献 均通过引用全文的方式并入本文。
虽然通过参照本公开的某些优选实施方式,已经对本公开进行了图示和描述,但本领域的普通技术人员应该明白,以上内容是结合具体的实施方式对本公开所作的进一步详细说明,不能认定本公开的具体实施只局限于这些说明。本领域技术人员可以在形式上和细节上对其作各种改变,包括做出若干简单推演或替换,而不偏离本公开的精神和范围。

Claims (21)

  1. 一种构建测序文库的方法,包括以下步骤:
    1)提供含mRNA的样品溶液;
    2)通过磁珠富集样品溶液中的mRNA;
    3)加入至少一个基因探针组合,所述至少一个基因探针组合中的每一个基因探针组合包括第一探针序列和第二探针序列;
    4)加入封闭序列;
    5)退火,使基因探针组合与mRNA杂交;
    6)加入核酸连接酶,使杂交到mRNA的每一个基因探针组合中的第一探针序列和第二探针序列连接,形成核酸分子;
    7)加入洗脱缓冲液,使核酸分子与mRNA分离;
    8)以核酸分子为模板,在DNA聚合酶的作用下使封闭序列延伸,将模板互补为DNA双链;
    9)以DNA双链为模板进行PCR扩增,构建测序文库;
    其中,所述第一探针序列靶向mRNA的5’端,所述第一探针序列包括3’端通用接头序列、UMI分子序列和5’端特异互补序列,所述第二探针序列靶向mRNA的3’端,所述第二探针序列包括3’端特异互补序列、UMI分子序列和5’端通用接头序列,所述封闭序列结合至所述3’端通用接头序列。
  2. 根据权利要求1所述的方法,其中,所述样品溶液为细胞裂解液,所述细胞选自原代细胞、培养细胞、肿瘤组织细胞和类器官细胞。
  3. 根据权利要求1所述的方法,其中,所述磁珠为oligo-dT磁珠或链霉亲和素磁珠。
  4. 根据权利要求1所述的方法,其中,所述步骤2)至步骤4)以任意顺序进行或同时进行。
  5. 根据权利要求1所述的方法,其中,所述退火在37℃-45℃的温度下进行。
  6. 根据权利要求1所述的方法,其中,所述方法还包括在步骤5)之后且在步骤6)之前使用核酸连接酶缓冲液悬浮磁珠和使用磁力架吸附磁珠的步骤。
  7. 根据权利要求1所述的方法,其中,所述方法还包括在步骤6)之后且在步骤7)之前使用漂洗缓冲液悬浮磁珠和使用磁力架吸附磁珠的步骤。
  8. 根据权利要求1所述的方法,其中,所述核酸连接酶为T4 DNA连接酶或SplitR连 接酶。
  9. 根据权利要求1所述的方法,其中,在步骤9)中,使用分别与所述3’端通用接头序列和所述5’端通用接头序列互补的一对引物进行PCR扩增,所述一对引物中的至少一个(优选两个)引物包含index序列。
  10. 根据权利要求1所述的方法,其中,所述5’端特异互补序列或3’端特异互补序列的长度为20-25bp;所述UMI分子序列为4bp以上(优选4-6bp)的随机序列。
  11. 根据权利要求1所述的方法,其中,所述第一探针序列、第二探针序列和/或封闭序列为含有自然核苷酸或修饰核苷酸的单链核苷酸。
  12. 根据权利要求1所述的方法,其中,所述封闭序列所封闭区域的长度小于或等于所述3’端通用接头序列的全长。
  13. 一种通过权利要求1至12中任一项所述的方法构建的测序文库。
  14. 基因探针组合,其中,所述基因探针组合包括第一探针序列和第二探针序列,所述第一探针序列包括3’端通用接头序列、UMI分子序列和5’端特异互补序列,所述第二探针序列包括3’端特异互补序列、UMI分子序列和5’端通用接头序列,所述第一探针序列还包括封闭序列,所述封闭序列结合至所述3’端通用接头序列;优选地,所述5’端特异互补序列或3’端特异互补序列的长度为20-25bp,所述UMI分子序列为4bp以上(优选4-6bp)的随机序列;优选地,所述第一探针序列、第二探针序列和/或封闭序列为含有自然核苷酸或修饰核苷酸的单链核苷酸;优选地,所述封闭序列所封闭区域的长度小于或等于所述3’端通用接头序列的全长。
  15. 一种试剂盒,包含:权利要求14所述的基因探针组合。
  16. 根据权利要求15所述的试剂盒,其中,所述试剂盒还包含:一对引物,所述一对引物分别与所述3’端通用接头序列和所述5’端通用接头序列互补,所述一对引物中的至少一个(优选两个)引物包含index序列。
  17. 权利要求14所述的基因探针组合或权利要求15或16所述的试剂盒在定量PCR或文库构建中的应用。
  18. 一种测定样品中mRNA含量的方法,包括:通过权利要求1至12中任一项所述的方法构建测序文库,然后通过二代测序、三代测序、荧光光谱法或定量PCR法对样品中mRNA含量进行定量。
  19. 一种测序方法,包括:通过权利要求1至12中任一项所述的方法构建测序文库,然后利用所述测序文库进行测序。
  20. 根据权利要求19所述的测序方法,其中,所述测序方法为非诊断目的的测序方法。
  21. 一种药物筛选或药物评价方法,包括:接种细胞;用候选药物处理细胞;通过权利要求1至12中任一项所述的方法构建测序文库;利用所述测序文库进行测序;构建候选药物的基因表达谱。
PCT/CN2023/112554 2022-08-16 2023-08-11 一种高通量构建rna测序文库的方法及试剂盒 WO2024037449A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210982842.6A CN116065240A (zh) 2022-08-16 2022-08-16 一种高通量构建rna测序文库的方法及试剂盒
CN202210982842.6 2022-08-16

Publications (1)

Publication Number Publication Date
WO2024037449A1 true WO2024037449A1 (zh) 2024-02-22

Family

ID=86173744

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/112554 WO2024037449A1 (zh) 2022-08-16 2023-08-11 一种高通量构建rna测序文库的方法及试剂盒

Country Status (2)

Country Link
CN (1) CN116065240A (zh)
WO (1) WO2024037449A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116065240A (zh) * 2022-08-16 2023-05-05 格物致和生物科技(北京)有限公司 一种高通量构建rna测序文库的方法及试剂盒

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160068907A1 (en) * 2014-09-08 2016-03-10 BioSpyder Technologies, Inc. Profiling Expression at Transcriptome Scale
WO2019144582A1 (zh) * 2018-01-26 2019-08-01 厦门艾德生物医药科技股份有限公司 用于检测基因突变和已知、未知基因融合类型的高通量测序靶向捕获目标区域的探针和方法
CN111748551A (zh) * 2019-03-27 2020-10-09 纳昂达(南京)生物科技有限公司 封闭序列、捕获试剂盒、文库杂交捕获方法及建库方法
CN116065240A (zh) * 2022-08-16 2023-05-05 格物致和生物科技(北京)有限公司 一种高通量构建rna测序文库的方法及试剂盒

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160068907A1 (en) * 2014-09-08 2016-03-10 BioSpyder Technologies, Inc. Profiling Expression at Transcriptome Scale
WO2019144582A1 (zh) * 2018-01-26 2019-08-01 厦门艾德生物医药科技股份有限公司 用于检测基因突变和已知、未知基因融合类型的高通量测序靶向捕获目标区域的探针和方法
CN111748551A (zh) * 2019-03-27 2020-10-09 纳昂达(南京)生物科技有限公司 封闭序列、捕获试剂盒、文库杂交捕获方法及建库方法
CN116065240A (zh) * 2022-08-16 2023-05-05 格物致和生物科技(北京)有限公司 一种高通量构建rna测序文库的方法及试剂盒

Also Published As

Publication number Publication date
CN116065240A (zh) 2023-05-05

Similar Documents

Publication Publication Date Title
US9745614B2 (en) Reduced representation bisulfite sequencing with diversity adaptors
US9890375B2 (en) Isolated oligonucleotide and use thereof in nucleic acid sequencing
US9617598B2 (en) Methods of amplifying whole genome of a single cell
CA3062174A1 (en) Universal short adapters for indexing of polynucleotide samples
CA3220983A1 (en) Optimal index sequences for multiplex massively parallel sequencing
CN107750277A (zh) 使用无细胞dna片段大小来确定拷贝数变化
WO2020233094A1 (zh) 一种ngs建库分子接头及其制备方法和用途
CN105934523A (zh) 核酸的多重检测
US9334532B2 (en) Complexity reduction method
CN105899680A (zh) 核酸探针和检测基因组片段的方法
CN108866174B (zh) 一种循环肿瘤dna低频突变的检测方法
US20150065358A1 (en) Method for verifying bioassay samples
WO2024037449A1 (zh) 一种高通量构建rna测序文库的方法及试剂盒
CN111073961A (zh) 一种基因稀有突变的高通量检测方法
CN109576346A (zh) 高通量测序文库的构建方法及其应用
CN104946639B (zh) 构建基因突变测序文库的引物和方法以及试剂盒
CN108517567A (zh) 用于cfDNA建库的接头、引物组、试剂盒和建库方法
CN113249439A (zh) 一种简化dna甲基化文库及转录组共测序文库的构建方法
WO2021253372A1 (zh) 一种高兼容性的PCR-free建库和测序方法
CN108359723A (zh) 一种降低深度测序错误的方法
EP2333104A1 (en) RNA analytics method
US20180100180A1 (en) Methods of single dna/rna molecule counting
CN114875118B (zh) 确定细胞谱系的方法、试剂盒和装置
WO2019200580A1 (zh) 一种同时捕获基因组目标区域正反义双链的平行液相杂交捕获方法
WO2023092601A1 (zh) Umi分子标签及其应用、接头、接头连接试剂及试剂盒和文库构建方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23854350

Country of ref document: EP

Kind code of ref document: A1