CN114369650A - Design method of capture probe, capture probe and application thereof - Google Patents

Design method of capture probe, capture probe and application thereof Download PDF

Info

Publication number
CN114369650A
CN114369650A CN202210276899.4A CN202210276899A CN114369650A CN 114369650 A CN114369650 A CN 114369650A CN 202210276899 A CN202210276899 A CN 202210276899A CN 114369650 A CN114369650 A CN 114369650A
Authority
CN
China
Prior art keywords
sequence
artificial sequence
rna
capture probe
gene
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210276899.4A
Other languages
Chinese (zh)
Other versions
CN114369650B (en
Inventor
董珊珊
余进
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Xianhu Botanical Garden Shenzhen Garden Research Center
Original Assignee
Shenzhen Xianhu Botanical Garden Shenzhen Garden Research Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Xianhu Botanical Garden Shenzhen Garden Research Center filed Critical Shenzhen Xianhu Botanical Garden Shenzhen Garden Research Center
Priority to CN202210276899.4A priority Critical patent/CN114369650B/en
Publication of CN114369650A publication Critical patent/CN114369650A/en
Application granted granted Critical
Publication of CN114369650B publication Critical patent/CN114369650B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12QMEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
    • C12Q1/00Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
    • C12Q1/68Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
    • C12Q1/6869Methods for sequencing
    • CCHEMISTRY; METALLURGY
    • C12BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
    • C12NMICROORGANISMS OR ENZYMES; COMPOSITIONS THEREOF; PROPAGATING, PRESERVING, OR MAINTAINING MICROORGANISMS; MUTATION OR GENETIC ENGINEERING; CULTURE MEDIA
    • C12N15/00Mutation or genetic engineering; DNA or RNA concerning genetic engineering, vectors, e.g. plasmids, or their isolation, preparation or purification; Use of hosts therefor
    • C12N15/09Recombinant DNA-technology
    • C12N15/11DNA or RNA fragments; Modified forms thereof; Non-coding nucleic acids having a biological activity
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B20/00ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
    • G16B20/10Ploidy or copy number detection
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/20Polymerase chain reaction [PCR]; Primer or probe design; Probe optimisation
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B30/00ICT specially adapted for sequence analysis involving nucleotides or amino acids
    • G16B30/10Sequence alignment; Homology search
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • G16B40/30Unsupervised data analysis

Landscapes

  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Organic Chemistry (AREA)
  • Genetics & Genomics (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Proteomics, Peptides & Aminoacids (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biotechnology (AREA)
  • Microbiology (AREA)
  • Physics & Mathematics (AREA)
  • Biochemistry (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Plant Pathology (AREA)
  • Analytical Chemistry (AREA)
  • Immunology (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention relates to the technical field of biology, and discloses a design method of a capture probe, the capture probe and application thereof. The design method of the capture probe comprises the following steps: collecting a sample of the bryophyte group, preprocessing the sample, and performing transcriptome sequencing to obtain original sequencing data; filtering and carrying out transcriptome assembly on original sequencing data to obtain a protein sequence and a nucleotide sequence of a sample; performing clustering analysis on the protein sequence to obtain an orthologous single copy gene, and comparing and extracting the nucleotide sequence of the orthologous single copy gene to obtain a target single copy gene; screening the target single copy gene to obtain a target gene sequence, designing a DNA short sequence according to the target gene sequence, obtaining a complementary RNA short sequence according to the DNA short sequence, and synthesizing a capture probe according to the RNA short sequence. The capture probe provided by the invention is applied to genome sequencing of the moss plants, and has the advantages of high efficiency, low cost and obvious enrichment effect.

Description

Design method of capture probe, capture probe and application thereof
Technical Field
The invention relates to the technical field of biology, in particular to a design method of a capture probe, the capture probe and application thereof.
Background
Bryophytes include three major branches of the genus bryophyte (7300 species), the genus bryophyte (13000 species), and the genus bryophyte (250 species), and the phylogenetic relationship within these three branches is still lacking in systematic studies. At present, scholars mainly adopt a few molecular marker joint analysis methods for studying the phylogeny of moss plants, and have the problems of insufficient molecular data, incomplete sampling, insufficient support rate of obtained phylogenetic trees and the like, particularly some fast radiation evolvable groups (such as leafmoss) and the positions of some problem groups (such as hairy leafmoss).
The development of high-throughput sequencing technology has led to a rapid shift in molecular phylogeny from the initial use of only a few DNA fragments to the phylogenetic genomics that applies large-scale genomic data. The plant nuclear genome has huge gene quantity and high gene evolution rate, accords with the parental genetic rule, can reflect complex species evolution relation, and can effectively analyze species phylogenetic relation of different classification orders by using the nuclear gene-based phylogenetic genomics method. In recent years, researchers have acquired gene sets of species from transcriptome and/or genome sequencing data, and have acquired orthologous single copy genes by clustering for phylogenetic genomics analysis. However, methods for transcriptome sequencing are limited by the availability of fresh material from the species, whereas genome sequencing is inefficient and costly due to the enormous volume of data, and is not conducive to the application of large-scale phylogenetic genomics.
Disclosure of Invention
The invention provides a design method of a capture probe, the capture probe and application thereof, which can improve the efficiency of genome sequencing and reduce the cost.
In a first aspect, the present invention provides a method for designing a capture probe, comprising:
collecting a sample of the bryophyte group, preprocessing the sample, and performing transcriptome sequencing to obtain original sequencing data;
filtering and carrying out transcriptome assembly on the original sequencing data to obtain a protein sequence and a nucleotide sequence of the sample;
performing cluster analysis on the protein sequence to obtain a direct homologous single copy gene, and comparing and extracting the nucleotide sequence of the direct homologous single copy gene to obtain a target single copy gene;
screening the target single copy gene to obtain a target gene sequence, designing a DNA short sequence according to the target gene sequence, obtaining a complementary RNA short sequence according to the DNA short sequence, and synthesizing a capture probe according to the RNA short sequence.
In a second aspect, the invention provides a capture probe, wherein the capture probe is designed by the design method provided in the first aspect, and the nucleotide sequence of the capture probe comprises one or more of the probe sequence groups of SEQ ID No. 1-SEQ ID No. 100.
In a third aspect, the invention provides the use of a capture probe of the second aspect in gene sequencing.
The design method of the capture probe provided by the invention utilizes self-tested moss plant transcriptome data to obtain moss single-copy gene data, and synthesizes the RNA hybridization capture probe according to the moss single-copy gene design. The capture probe provided by the invention can be applied to the research of phylogenetic genomics of moss plants, the proportion of effective data is high, the genome sequencing efficiency is high, the cost is low, and the enrichment effect is obvious.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.
FIG. 1 is a schematic flow chart illustrating a method for designing a capture probe according to an embodiment of the present invention;
FIG. 2 is a graph showing the results of a sequencing process using capture probes in gene sequencing according to an embodiment of the present invention.
Detailed Description
In order to make the technical problems, technical solutions and advantageous effects solved by the present invention more apparent, the present invention is further described in detail below with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The experimental procedures used in the following examples are all conventional procedures unless otherwise specified. The materials, reagents and the like used in the following examples are commercially available unless otherwise specified, and techniques not described in detail are performed according to standard methods well known to those skilled in the art. The cell lines, reagents and carriers mentioned in this application are commercially available or otherwise publicly available, and are by way of example only and not exclusive to the present invention, and may be replaced by other suitable means or biological materials, respectively.
The present invention will be further illustrated by the following examples.
In one embodiment, as shown in FIG. 1, a method for designing a capture probe is provided, which includes the following steps S10-S40.
S10, collecting a sample of the bryophyte group, preprocessing the sample, and performing transcriptome sequencing to obtain original sequencing data.
Understandably, in order to ensure the richness and coverage of the sample, the representative group of the moss plant group needs to be selected as much as possible when the sample of the moss plant group is collected, and meanwhile, the moss plant group needs to be collected from a plurality of regions in a scattered manner according to the distribution of the group in order to reduce the regional limitation of the sample. The liverwort plants are 15 meshes in total and widely distributed, and in the embodiment, samples of liverwort groups are collected from the wild of a plurality of regions, so that 40 parts of fresh liverwort plant materials of 13 meshes of the liverwort plants are obtained. The sample is preserved in plastic preservation box and is prevented that the integrality from receiving destruction, carries out the preliminary treatment to the sample under the laboratory condition and gets rid of the impurity and be convenient for sample preparation. The transcriptome is the sum of all RNAs that a particular tissue or cell can transcribe at a certain developmental stage or functional state, and mainly comprises protein-encoding mRNAs and non-coding RNAs. The transcriptome sequencing can comprehensively and quickly obtain the sequence information of almost all transcripts of a specific tissue or organ of a certain species in a certain state, analyze the structure and the expression level of the transcripts, simultaneously discover unknown transcripts and rare transcripts, accurately identify variable splicing sites and coding sequence single nucleotide polymorphism, and provide the most comprehensive transcriptome information.
S20, filtering the original sequencing data, assembling a transcriptome, and obtaining the protein sequence and the nucleotide sequence of the sample.
Understandably, the raw sequencing data contains some linker sequence and low quality bases, which need to be filtered. In the transcriptome assembly, a sequencing method is used for generating sequence fragments (namely reads) from the genome of a species to be tested, splicing the fragments according to an overlapping region between the reads to form a longer continuous sequence (contig), splicing the contigs to form longer scafffolds which are allowed to contain blank sequences (gap), and positioning the scafffolds to a chromosome by eliminating errors and gaps of the scafffolds so as to obtain a high-quality whole genome sequence. For the analysis of transcriptome with reference genome, the transcriptome assembly is mainly to assemble the sequence segments aligned to the reference genome into transcriptome, and the common software has StringTie and cufflinks. For the transcriptome analysis without reference genome, the transcriptome assembly mainly comprises the de novo assembly of sequence fragments into transcriptome, and common software comprises Trinity, Oases and SOAPdenovo-Trans.
Understandably, in this embodiment, the adaptor sequence, the repetitive sequence and the low-quality base of the original sequencing data are filtered by using the trimmatic software on a certain supercomputing platform, so as to obtain clean sequencing data. In this embodiment, Trinity software is used to perform transcriptome assembly on clean data, perl scripts are compiled to select the longest transcript, and downstream transcoders are used to predict coding regions according to default parameters, so as to obtain protein sequences and nucleotide sequences of sample gene sets. The longest transcript is selected by adopting the self-written Perl script, so that data can be effectively filtered, and redundancy is removed.
S30, performing cluster analysis on the protein sequence to obtain a direct homologous single copy gene, and comparing and extracting the nucleotide sequence of the direct homologous single copy gene to obtain a target single copy gene.
Understandably, proteins are the performers and regulators of molecular functions, and are also the main carriers of vital activities. The remote homology detection of proteins is one of the main research tasks of structural genomics and functional genomics, the proteins with similar structures and functions are also similar, and the proteins with similar functions can be clustered into one class. Clustering is the process of dividing a collection of physical or abstract objects into classes composed of similar objects, cluster analysis is taxonomic-originated, but clustering is not equal to classification, and clustering differs from classification in that the class into which the clustering is required to be divided is unknown. The gene expression data clustering is to cluster genes with similar expression profiles into one class, perform clustering analysis on protein sequences to obtain clustering clusters of orthologous protein families, and screen orthologous single copy genes according to the clustering clusters. The single copy gene refers to a gene having a small copy number of only 1 in the genome, and most of them are constitutively expressed housekeeping genes (housekeeping genes) in an organism.
Understandably, in this example, the protein sequences of all 40 bryophyte samples and the protein sequences of two rotaphyte outer groups were put together and clustered by using Orthofinder software, the clustering parameter of markov clustering algorithm was set to default I =1.5, and the gene family clustering results were screened by using Kinfin software according to the group occupancy >0.65 for 1-to-1 orthologous single copy genes. And comparing the nucleotide sequences of orthologous single-copy genes by using MAFFT software, trimming a matrix by using TrimAl software, constructing an evolutionary tree by using IQTREE2 software, further trimming long branches according to the evolutionary tree, and screening a single-copy gene family with high resolution of the evolutionary tree to obtain a final target single-copy gene for downstream probe design.
S40, screening the target single copy gene to obtain a target gene sequence, designing a DNA short sequence according to the target gene sequence, obtaining a complementary RNA short sequence according to the DNA short sequence, and synthesizing a capture probe according to the RNA short sequence.
Understandably, the principle of hybrid capture is the artificial design of probes (in either DNA or RNA form) that can be partially or fully complementary to the target segment. The sample and the probe are mixed, the probe captures the target segment in the sample, the segment without designed probe is eluted and discarded, and then the probe and the capture segment are separated by denaturation (generally pH is adjusted to be alkaline), and the captured segment can be used for sequencing library construction. According to the screened target single copy gene sequence, a DNA short sequence which is overlapped can be designed, an RNA short sequence which is complementary with the DNA short sequence is obtained according to the base complementary principle, and an RNA hybridization capture probe is synthesized according to the RNA short sequence and is used for the gene capture sequencing of the target group.
Understandably, the target single-copy gene obtained in this example contains 1,390 single-copy gene data, and 371 genes with relatively conservative base sequences (sequence similarity 70% -85%), medium length (800-3000 bp) and high class occupancy (> 70%) are screened by the Geneious software. 2-3 representative species of three subclasses of the bryophyte are respectively selected from the three subclasses of the bryophyte, 1,030 item target gene sequences are obtained, and the cumulative total length is 1,031,187 bp. According to the target gene sequence, 19,856 DNA short sequences with the step length of 80bp and the mutual overlapping of 44bp are designed by Seqkit software, so that the target gene sequence is completely covered, and then the Seqkit software is used for generating corresponding complementary RNA short sequences for the short sequences, wherein the total length is 3,988,480 bp. These short RNA sequences were synthesized by primers to obtain capture probes. The screening condition and the design parameter of the target single copy gene are very critical, and the optimized screening condition and the design parameter can obtain the gene with stronger phylogenetic signal and the probe with higher capture efficiency. The gene sequences of two to three species with system representativeness are selected for designing the probes, so that the capture efficiency of the subsequent probes on the gene fragments of all the groups can be effectively improved.
Optionally, the pretreatment comprises separation, washing, microscopic examination, quick freezing and grinding.
Understandably, the embodiment stores the collected samples in a plastic freshness protection tape to return to a laboratory, and obtains the samples of the target species through separation of a dissecting mirror under the laboratory condition; washing the separated sample with sterile water for multiple times, and performing the endoscopic examination again through a dissecting mirror to remove possible interspecies pollution and algae pollution; absorbing moisture of the sample qualified by microscopic examination by using experimental absorbent paper, wrapping the sample by using tin foil paper, and then placing the wrapped sample into a liquid nitrogen tank for quick freezing for 10 minutes; the sample after the quick freezing is taken out and poured into a sterile mortar precooled by liquid nitrogen, and the sample is quickly and manually ground into powder.
Optionally, the transcriptome sequencing comprises: and extracting RNA of the sample, and performing library construction and transcriptome sequencing according to the RNA.
Understandably, plants contain secondary metabolites such as polysaccharides and polyphenols, which can be tightly bound to RNA after cell lysis to form insoluble complexes or jelly-like precipitates, which are difficult to remove. In one example, the RNA of the sample is extracted by a Vazyme Fast Pure Plant Total RNA Isolation Kit (RC401), and the Kit is suitable for the rapid extraction of RNA of various Plant samples. Obtaining RNA of a sample according to the operation flow of the kit, and then carrying out illumina library construction and transcriptome sequencing according to the RNA to obtain a 6G sample of original sequencing data.
Optionally, the performing cluster analysis on the protein sequence to obtain a direct homologous single copy gene includes:
performing clustering analysis according to the protein sequence of the bryophyte group and the protein sequence of the outer group to obtain a clustering cluster of a homologous protein family;
and screening out direct homologous single copy genes with the cluster occupancy rate of more than 65% according to the cluster.
Understandably, an outlier is required to be selected to determine a tree root for building the evolutionary tree, an outlier sequence is added to determine the root of the evolutionary tree, the root is used for determining the starting point of the evolution of the sequence, and the evolution sequence can be seen from the evolutionary tree after the root is determined. The general principle is to select the species with the closest genetic relationship outside the target class as the foreign class. The tree shape change difference is greatly caused by different outer group selections, and different types subdivided in the same species are selected, so that the closer the outer relation of the groups is, the better. The single copy gene is a valuable molecular marker in molecular systematics and plays an extremely important role in constructing the trunk and branches between the trunk and the tip of a life tree. In this example, stonewort was selected as the outer group of bryophytes, the protein sequences of all 40 samples of bryophytes and the protein sequences of two outer groups of stonewort were put together and clustered using the Orthofinder software, and the clustering results of the genes were screened for 1-to-1 orthologous single copy genes using the Kinfin software according to the group occupancy > 0.65. The OrthoFinder software maps gene duplication events in the gene tree onto branches of the species phylogenetic tree and provides some statistics in comparative genomics. The Kinfin software obtains rich ortholog aggregate annotation and screens ortholog homologous single copy genes through a protein clustering file output by the OrthoFinder software, functional annotation data and user-defined species classification.
Optionally, the target single copy gene is screened to obtain a target gene sequence, the similarity of the screened sequence is 70-80%, the length is 800-3000 bp, and the occupancy rate of the group is more than 70%.
Optionally, the step length of the DNA short sequences is 70-90 bp, and the length of the mutual overlapping is 34-54 bp.
Understandably, the short sequences of the capture probes have different design step lengths and different capture efficiencies, the step lengths need to be designed for different sample groups, the lengths of the head and tail parts between the front and back short sequences are mutually overlapped, and the capture efficiency is improved by connecting the head and the tail. The RNA short sequence of the capture probe is obtained according to a complementary DNA short sequence, when the overlapping DNA short sequence is designed, the step length of the short sequence is 70-90 bp, and the length of the overlapping is 34-54 bp. In one embodiment of the application, the Seqkit software is used for designing 19,856 DNA short sequences with the optimal step length of 80bp and the length of 44bp overlapped with each other so as to enable the DNA short sequences to completely cover a target gene sequence, and then the Seqkit software is used for generating complementary RNA short sequences on the DNA short sequences, wherein the total length is 3,988,480 bp.
Optionally, the invention provides a capture probe, wherein the capture probe is designed by the above design method, and the nucleotide sequence of the capture probe comprises one or more of probe sequence groups SEQ ID No.1 to SEQ ID No. 100.
Optionally, the invention provides an application of the capture probe in gene sequencing.
Understandably, genome-level sequencing is largely divided into whole genome sequencing, whole exon sequencing, and targeted sequencing. Whole genome sequencing is the sequencing of all bases of the whole genome, whole exon sequencing is the sequencing of all exons of the genome, and targeted sequencing is the sequencing of some selected genes. The targeted sequencing technology is mainly divided into two technical routes of multiplex PCR and hybrid capture. The hybridization capture sequencing is to break the genome DNA into fragments, add a capture probe designed according to the target region, the probe can be partially or completely complementary with the target fragment, so that the capture probe and the target fragment are hybridized to capture the target fragment, thereby achieving the purpose of enrichment.
In one embodiment, the capture probe sequence group SEQ ID NO. 1-SEQ ID NO.100 is designed and synthesized by the capture probe design method provided by the invention, and is directly applied to hybrid capture of a DNA library mixed with 52 moss plant samples, wherein the concentration of the DNA library is 20 ng/mu L, and the final volume is 20 mu L. According to the probe capture operation flow of the Arbor Biosciences single-copy nuclear gene capture kit, a hybrid capture library is obtained, the concentration is 10 ng/. mu.L, and the volume is 20. mu.L. The complete sequencing process comprises DNA extraction, purity and concentration detection, DNA fragmentation, sequencing library preparation, library quality detection, hybridization capture, capture sequence enrichment, PCR product purification, PCR product quality detection, sequencing and bioinformatics analysis. Sequencing the hybridization capture library on an illumina Hiseq 2000 sequencer to obtain 100G data, splitting the library, averagely splitting each species to 2G data, running a Hybpiper data analysis process, and extracting a target gene sequence.
In a pair of proportions, 10G data of the original unenriched DNA library mixed with 52 moss plant samples is respectively sequenced through second-generation sequencing of a common library to obtain 520G data in total, and a Hybpiper data analysis process is also operated to capture a target gene sequence for contrasting the hybridization capture effect of a capture probe on the target gene sequence.
The lengths of the gene sequences of 371 target single-copy genes captured by two sets of data obtained by a capture probe sequencing method and a second generation sequencing method of a common library are respectively plotted into heatmaps, and the sequencing result is shown in FIG. 2.
The heat map is essentially a matrix of values, each of which is a number, each of which is assigned a color according to a preset color scale (in fig. 2, "scale of length of captured target gene" is used as a color scale). Comparing the sequencing results of the capture probe sequencing and the second-generation sequencing of the common library, compared with the second-generation sequencing of the common library, the method has the advantages that the test data amount of the synthesized moss gene capture probe is lower, the capture efficiency is higher, and the method has a remarkable enrichment effect on the target gene sequence.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.
Sequence listing
<110> Shenzhen city fairy lake botanical garden (Shenzhen city garden research center)
<120> design method of capture probe, capture probe and application thereof
<160> 100
<170> SIPOSequenceListing 1.0
<210> 1
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 1
uacagucuag aucuguaggu cuagggcccg cggaagcuag guaaacgucu ccgcuuacga 60
cuucuauguc cucguccuag 80
<210> 2
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 2
acgucuccgc uuacgacuuc uauguccucg uccuaguuuu cuaauacagg uacaagccca 60
ggucgucgca uugccguccu 80
<210> 3
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 3
uacagguaca agcccagguc gucgcauugc cguccuucuc aaacugcugc cacguuccag 60
aguucuuccu caaguugaua 80
<210> 4
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 4
ugcugccacg uuccagaguu cuuccucaag uugauauugu ucuagaacuu ccuaaaguuc 60
uuccucaaga cgacguugcc 80
<210> 5
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 5
gaacuuccua aaguucuucc ucaagacgac guugccaugu caccaaguuc uaggacucga 60
cccaguccac uaaguugaag 80
<210> 6
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 6
aaguucuagg acucgaccca guccacuaag uugaaguccc ccuagucgcc uucuuacaaa 60
gggucaaaga acacguccga 80
<210> 7
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 7
cccuagucgc cuucuuacaa agggucaaag aacacguccg accccaacac uucuuccuag 60
aguaguucua ggugcccaaa 80
<210> 8
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 8
uacagucuag aucuauaggu cuagggaccg cggaaacugg gaaaacgucu ccgcuuacga 60
cuccuaagcc cacguccaag 80
<210> 9
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 9
acgucuccgc uuacgacucc uaagcccacg uccaagauuu cuaauacagg uacacgccca 60
ggucgucgcu uugccuucuu 80
<210> 10
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 10
uacagguaca cgcccagguc gucgcuuugc cuucuuucuc aaacuggugc cacgucccag 60
aguuuuuccu uaaauuaaug 80
<210> 11
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 11
uggugccacg ucccagaguu uuuccuuaaa uuaauguugu uuuaagaguu ccugaaguuc 60
uuccucaaaa caacguugcc 80
<210> 12
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 12
agaguuccug aaguucuucc ucaaaacaac guugccgugu caacaagucc uaggccucaa 60
cccaguccac uaagucgaag 80
<210> 13
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 13
aaguccuagg ccucaaccca guccacuaag ucgaaguccc ucuagucgcc uuuuuacaca 60
gcgucaagga acaaagccgg 80
<210> 14
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 14
cucuagucgc cuuuuuacac agcgucaagg aacaaagccg gccucaacac uucuuccuag 60
acuaguucua ggugccuaaa 80
<210> 15
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 15
uacaggugca cgcccagguc gucgcuuugc cuucuuucuc gaacuguugg caggucccag 60
aauucuuucu caaguugaug 80
<210> 16
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 16
uguuggcagg ucccagaauu cuuucucaag uugauguuau uuuaggacuu ccugaaguuc 60
uuccucaaga cgacguugcc 80
<210> 17
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 17
ggacuuccug aaguucuucc ucaagacgac guugccuugu caacaaguuc uaggacucaa 60
cccaguccac uaaguugagg 80
<210> 18
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 18
aaguucuagg acucaaccca guccacuaag uugagguccc gcuggucgcu uucuuacaca 60
gcgucaaaga ccaaguccga 80
<210> 19
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 19
cgcuggucgc uuucuuacac agcgucaaag accaaguccg accucaacac uuuuuccuag 60
acuaguucua ggugcccaaa 80
<210> 20
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 20
uuuggccaca uaaaagugcc ccgcgggagc aaggcaugua uucaauaaag cguaaagagc 60
auaucaaaaa gcacuuaauc 80
<210> 21
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 21
uacuguaguc cuccuuaccc uugagguugu cgaugucaua guggagaacg cuucauauuu 60
gaccacaagg acccacuggu 80
<210> 22
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 22
agaacgcuuc auauuugacc acaaggaccc acuggucaga caccccuucu gaucguagua 60
augggcgaag uacauacugu 80
<210> 23
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 23
ccuucugauc guaguaaugg gcgaaguaca uacuguucaa gcuguuaugc auaguccgau 60
gcuaaccaua acugaaagau 80
<210> 24
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 24
uuaugcauag uccgaugcua accauaacug aaagauaguu uuuguuacau agaucuccua 60
gcuugacaag cuaauguuga 80
<210> 25
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 25
uuacauagau cuccuagcuu gacaagcuaa uguugaaacc cuaugacgac ccguccucgc 60
gaagucuuca gaguaagguu 80
<210> 26
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 26
gacgacccgu ccucgcgaag ucuucagagu aagguucaau auagucucua agaagucacc 60
gacaacaaca aaugcuacaa 80
<210> 27
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 27
ucucuaagaa gucaccgaca acaacaaaug cuacaacguu uagcugucag uaaagacuua 60
ugacguucua cccaucuccu 80
<210> 28
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 28
ugucaguaaa gacuuaugac guucuaccca ucuccuccac gcgugccuug caccaucacu 60
acaauaguaa uacgaacacc 80
<210> 29
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 29
gccuugcacc aucacuacaa uaguaauacg aacacccuuu auuuugucua aaccaacugu 60
ucucuguuca aagauaacuc 80
<210> 30
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 30
ugucuaaacc aacuguucuc uguucaaaga uaacuccuuc cacuacgguu ucguucccug 60
aaaccccagu acaaauaacu 80
<210> 31
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 31
acgguuucgu ucccugaaac cccaguacaa auaacuuugu ucacgauuuc gacccaaguu 60
auaauuccgu gagaaagccu 80
<210> 32
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 32
gauuucgacc caaguuauaa uuccgugaga aagccuucua ucgucgucgg gacggaccau 60
accuccgaaa uagcagucgu 80
<210> 33
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 33
cgucgggacg gaccauaccu ccgaaauagc agucguuuuu gacuccuaga ucaacuacaa 60
uuggauuuug guuguggauu 80
<210> 34
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 34
gacuccuaga ucaacuacaa uuggauuuug guuguggauu acgacgggac ugucucuuau 60
uuugaccccg aacaaggacg 80
<210> 35
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 35
ggaacgcuuc auauucgacc acaaggaccc acuaguuaga cauccuuucu ggucguagua 60
gugggcgaag uacaugcugu 80
<210> 36
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 36
cuuucugguc guaguagugg gcgaaguaca ugcuguucaa acuguugugg auaguccgau 60
gguaaccaua gcuaaagaac 80
<210> 37
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 37
uuguggauag uccgauggua accauagcua aagaacaggu uuugcuacau gaaccuucua 60
gcuugacagg cggaggucga 80
<210> 38
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 38
cuacaugaac cuucuagcuu gacaggcgga ggucgaaacc cuauggcgac cuguccucgc 60
uaagucuuca gaguaagggu 80
<210> 39
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 39
ggcgaccugu ccucgcuaag ucuucagagu aagggucgau guagucccua agcagacacc 60
gucaucacca aauacuacaa 80
<210> 40
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 40
ucccuaagca gacaccguca ucaccaaaua cuacaacguu uagcugucag aaagaacuua 60
ugucguucua cccaucuccu 80
<210> 41
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 41
ugucagaaag aacuuauguc guucuaccca ucuccuccac gcgugacucg cuccaucacu 60
acaauaauaa uacgaccaac 80
<210> 42
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 42
gacucgcucc aucacuacaa uaauaauacg accaaccuuu guuuugccug aaccaacuau 60
ucuccguuca aagguagcuc 80
<210> 43
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 43
ugccugaacc aacuauucuc cguucaaagg uagcuccuuc cacuggacuu ucgcucucua 60
aagccgcaau acaaguagcu 80
<210> 44
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 44
ggacuuucgc ucucuaaagc cgcaauacaa guagcucugu ucacgcuuuc gcccuaaguu 60
auaauuccga gagaaggccu 80
<210> 45
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 45
gcuuucgccc uaaguuauaa uuccgagaga aggccuuuua ucgucgacga aacggacccu 60
accuccgaaa caguagccgc 80
<210> 46
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 46
cgacgaaacg gacccuaccu ccgaaacagu agccgcuuug uccuccugaa ucagcugcag 60
uuagauuuug guugagguuu 80
<210> 47
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 47
uccuccugaa ucagcugcag uuagauuuug guugagguuu acgauugggc cuugucuugu 60
uucggccucc gacgcggacg 80
<210> 48
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 48
uaccguagcc cuccuuaccc uugagguugu cgaugucaca gaggcgagcg guucauauuc 60
gaccacaaag acccucuagu 80
<210> 49
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 49
cgagcgguuc auauucgacc acaaagaccc ucuagucagc caacccuucu gaucguagua 60
augggcgaag uacauacuau 80
<210> 50
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 50
ccuucugauc guaguaaugg gcgaaguaca uacuauucaa acuauugugu auaguccguu 60
gguaaccgua acuaaaagau 80
<210> 51
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 51
uuguguauag uccguuggua accguaacua aaagauagcu uuuguuacau ggaucuccug 60
uccugacaag cagacgucga 80
<210> 52
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 52
uuacauggau cuccuguccu gacaagcaga cgucgacacc cuaugacgac cuguccucgc 60
caagucuuca gaauaagguu 80
<210> 53
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 53
gacgaccugu ccucgccaag ucuucagaau aagguucgau guaagcucua agaagacacc 60
gacaacaaca gauacuacaa 80
<210> 54
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 54
gcucuaagaa gacaccgaca acaacagaua cuacaacguu uagcugucag caaagaguua 60
ugacgcucua cccaccuccu 80
<210> 55
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 55
ugucagcaaa gaguuaugac gcucuaccca ccuccuccaa gcgugacuug cuccuucacu 60
acaauaguaa uacgaccauc 80
<210> 56
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 56
gacuugcucc uucacuacaa uaguaauacg accauccuuu auucugccua gaccaacugu 60
ucucuguuca aagauaacuc 80
<210> 57
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 57
ugccuagacc aacuguucuc uguucaaaga uaacuccuuc cacuacgguu ccguucccug 60
aaaccccagu acaaguaacu 80
<210> 58
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 58
acgguuccgu ucccugaaac cccaguacaa guaacuuugu ucacgguuuc gucccaaguu 60
auaauuccgu gaaaaggcuu 80
<210> 59
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 59
gguuucgucc caaguuauaa uuccgugaaa aggcuuuuua ucgucgucga gauggaccau 60
accuccgaaa cagaagucgu 80
<210> 60
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 60
cgucgagaug gaccauaccu ccgaaacaga agucguuucg uccuccuaga ucaucuacau 60
uuagauuuug guugugguuu 80
<210> 61
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 61
uccuccuaga ucaucuacau uuagauuuug guugugguuu acgacgaguc aaucucuuau 60
ucagaccccc aacacggacg 80
<210> 62
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 62
uacccacgcu ugagcaacua aaugucgaaa cagcgugcuc caugacagca cgaccgucuc 60
auaugacgaa agaggccguu 80
<210> 63
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 63
ugacagcacg accgucucau augacgaaag aggccguuaa agucguguua acggcagguc 60
acaaauguuu ucaacgggcg 80
<210> 64
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 64
ucguguuaac ggcaggucac aaauguuuuc aacgggcggu uguuguuguu caagugaaug 60
ugaacacuag cuguaugaaa 80
<210> 65
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 65
uuguuguuca agugaaugug aacacuagcu guaugaaagu ugauggaaca acuucuaccu 60
aagugaauag accagcaacg 80
<210> 66
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 66
auggaacaac uucuaccuaa gugaauagac cagcaacgcc uacuucugaa gccuuccguc 60
uauggcaaac guaaagaccu 80
<210> 67
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 67
cuucugaagc cuuccgucua uggcaaacgu aaagaccugg cgcaguuucu ccugaagucc 60
ucugcaauac cuccacccuc 80
<210> 68
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 68
caguuucucc ugaaguccuc ugcaauaccu ccacccuccc gucuaugccg cuaacgagua 60
ucgaaccugu uccuuaagcc 80
<210> 69
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 69
cuaugccgcu aacgaguauc gaaccuguuc cuuaagccca guuuuaacuu ucuuguguac 60
uggaagacac aacuuguagg 80
<210> 70
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 70
uuuaacuuuc uuguguacug gaagacacaa cuuguagggc uccucuacuu guuugauagc 60
uuuuaauucg uuguccaaag 80
<210> 71
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 71
cucuacuugu uugauagcuu uuaauucguu guccaaaguc uucacuucuc gcaguacuac 60
cuguuguaac ucuuccauga 80
<210> 72
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 72
cacuucucgc aguacuaccu guuguaacuc uuccaugauc uagcaccacu cuucuagcuu 60
caaaaccacc uguuuugucu 80
<210> 73
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 73
gcaccacucu ucuagcuuca aaaccaccug uuuugucugu uagaagccug cguccggcug 60
uugaaagucg cagucccauc 80
<210> 74
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 74
gaagccugcg uccggcuguu gaaagucgca gucccaucug ucgacgcugc guuuuacacc 60
aaccgcuuaa aguuucacuu 80
<210> 75
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 75
gacgcugcgu uuuacaccaa ccgcuuaaag uuucacuucg acuaucacga ucgcuaauau 60
uaacaacacu aggacuagua 80
<210> 76
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 76
acuucgacua ucacgaucgc uaauauuaac aacacuagga cuaguauacc aauagguaaa 60
cggugccuaa auucacguuc 80
<210> 77
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 77
uggcagcacg agcgccucau guggcgaaaa agacccuuga aaucguguua gcgucaaguc 60
acagaugucu ucaauggacg 80
<210> 78
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 78
ucguguuagc gucaagucac agaugucuuc aauggacguu uauuguuguu uaaauggaug 60
uguacgcugg caguguggaa 80
<210> 79
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 79
uuguuguuua aauggaugug uacgcuggca guguggaagu ugauagaaca ccuucuaccu 60
aagugcauaa accagcaacg 80
<210> 80
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 80
auagaacacc uucuaccuaa gugcauaaac cagcaacgcc uacuccuuaa gccaucuguc 60
uaaggaaaac guaaaaaccu 80
<210> 81
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 81
cuccuuaagc caucugucua aggaaaacgu aaaaaccugg cgcacuuccu ucugaagucc 60
gcugcaauac cuccuccguc 80
<210> 82
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 82
cacuuccuuc ugaaguccgc ugcaauaccu ccuccgucuc gucugugacg guaucgcgua 60
ucggaccugu uucuuaagcc 80
<210> 83
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 83
cugugacggu aucgcguauc ggaccuguuu cuuaagccca gguuugaauu ccucguguac 60
gucaagacgc agcucguagg 80
<210> 84
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 84
uuugaauucc ucguguacgu caagacgcag cucguagguc uccucuacuu auucgauagc 60
uuuuaauuuc guguucaaag 80
<210> 85
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 85
cucuacuuau ucgauagcuu uuaauuucgu guucaaagcc uccacuuccc guaguacaac 60
cuguuguaac ucuuccaaga 80
<210> 86
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 86
cacuucccgu aguacaaccu guuguaacuc uuccaagaac uagcgccacu cuucuaacuu 60
cacgaccaac uauucugccu 80
<210> 87
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 87
gcgccacucu ucuaacuuca cgaccaacua uucugccuau uggaagcauu gguccggcug 60
uuaaaggucg cugucccguc 80
<210> 88
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 88
gaagcauugg uccggcuguu aaaggucgcu gucccguccg ucgacgcauc uuucuacacc 60
aacgucuuga aauuccacuu 80
<210> 89
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 89
gacgcaucuu ucuacaccaa cgucuugaaa uuccacuucg acuaacauga acgcuauuag 60
uaacagcacu aaaauuagua 80
<210> 90
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 90
acuucgacua acaugaacgc uauuaguaac agcacuaaaa uuaguauacc aauagguaua 60
cgguaccaaa guucacguuc 80
<210> 91
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 91
ugacagcacg agcgccucau gugacgcaag agaccuuuga aaucauguua acgccagguc 60
acagaugucu ucaauggacg 80
<210> 92
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 92
ucauguuaac gccaggucac agaugucuuc aauggacguu uauuguuguu uaaauggaug 60
uguacgcugg caguguggaa 80
<210> 93
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 93
uuguuguuua aauggaugug uacgcuggca guguggaagu ugauagaaca ccuucuaccg 60
aaguguauaa accaacagcg 80
<210> 94
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 94
auagaacacc uucuaccgaa guguauaaac caacagcggc uacuucugaa accggcuguc 60
uagggaaagc guaaaaaccu 80
<210> 95
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 95
cuucugaaac cggcugucua gggaaagcgu aaaaaccugg cacacuuccu ucugaaaucc 60
gcugcaauac cuccuccguc 80
<210> 96
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 96
cacuuccuuc ugaaauccgc ugcaauaccu ccuccgucuc gucuaugacg guaacgcgua 60
ucgaaccugu uccuuaagcc 80
<210> 97
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 97
cuaugacggu aacgcguauc gaaccuguuc cuuaagccca gguuugaauu ccucguguac 60
guuaagacgc agcugguagg 80
<210> 98
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 98
uuugaauucc ucguguacgu uaagacgcag cugguagguc uccucuacuu auuugacagu 60
uuuuaauuuc gaguccaaag 80
<210> 99
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 99
cucuacuuau uugacaguuu uuaauuucga guccaaaguc uucacuuccc cuaguacuac 60
cuguuguagc uuuuccaaga 80
<210> 100
<211> 80
<212> RNA
<213> Artificial Sequence (Artificial Sequence)
<400> 100
cacuuccccu aguacuaccu guuguagcuu uuccaagaac uagcgccacu cuucuagcuu 60
cacgaccacc uguucugacu 80

Claims (8)

1. A method for designing a capture probe, comprising:
collecting a sample of the bryophyte group, preprocessing the sample, and performing transcriptome sequencing to obtain original sequencing data;
filtering and carrying out transcriptome assembly on the original sequencing data to obtain a protein sequence and a nucleotide sequence of the sample;
performing cluster analysis on the protein sequence to obtain a direct homologous single copy gene, and comparing and extracting the nucleotide sequence of the direct homologous single copy gene to obtain a target single copy gene;
screening the target single copy gene to obtain a target gene sequence, designing a DNA short sequence according to the target gene sequence, obtaining a complementary RNA short sequence according to the DNA short sequence, and synthesizing a capture probe according to the RNA short sequence.
2. The method of claim 1, wherein the pre-treatment comprises separating, washing, microscopic examination, rapid freezing, and grinding.
3. The method of claim 1, wherein the transcriptome sequencing comprises: and extracting RNA of the sample, and performing library construction and transcriptome sequencing according to the RNA.
4. The method for designing a capture probe according to claim 1, wherein the clustering analysis of the protein sequence to obtain an orthologous single copy gene comprises:
performing clustering analysis according to the protein sequence of the bryophyte group and the protein sequence of the outer group to obtain a clustering cluster of a homologous protein family;
and screening out direct homologous single copy genes with the cluster occupancy rate of more than 65% according to the cluster.
5. The capture probe design method of claim 1, wherein the target single copy gene is screened to obtain a target gene sequence, the screened sequence similarity is 70% -80%, the length is 800-3000 bp, and the group occupancy is > 70%.
6. The method for designing a capture probe according to claim 1, wherein the step length of the DNA short sequences is 70-90 bp, and the length of the overlapping is 34-54 bp.
7. A capture probe designed by the method of any one of claims 1 to 6, wherein the nucleotide sequence of the capture probe comprises one or more of the probe sequence groups SEQ ID No.1 to SEQ ID No. 100.
8. Use of the capture probe of claim 7 for gene sequencing.
CN202210276899.4A 2022-03-21 2022-03-21 Design method of capture probe, capture probe and application thereof Active CN114369650B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210276899.4A CN114369650B (en) 2022-03-21 2022-03-21 Design method of capture probe, capture probe and application thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210276899.4A CN114369650B (en) 2022-03-21 2022-03-21 Design method of capture probe, capture probe and application thereof

Publications (2)

Publication Number Publication Date
CN114369650A true CN114369650A (en) 2022-04-19
CN114369650B CN114369650B (en) 2022-06-17

Family

ID=81145189

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210276899.4A Active CN114369650B (en) 2022-03-21 2022-03-21 Design method of capture probe, capture probe and application thereof

Country Status (1)

Country Link
CN (1) CN114369650B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2015203545A1 (en) * 2011-05-04 2015-07-23 Htg Molecular Diagnostics, Inc. Quantitative nuclease protection assay (qnpa) and sequencing (qnps) improvements
CN106086013A (en) * 2016-06-30 2016-11-09 厦门艾德生物医药科技股份有限公司 A kind of probe for nucleic acid enriching capture and method for designing
CN109337956A (en) * 2018-09-07 2019-02-15 上海思路迪生物医学科技有限公司 Design method, capture probe, capture probe group and the kit of capture probe
WO2019197541A1 (en) * 2018-04-11 2019-10-17 Université de Bourgogne Detection method of somatic genetic anomalies, combination of capture probes and kit of detection
CN110699426A (en) * 2019-01-02 2020-01-17 上海臻迪基因科技有限公司 Gene target region enrichment method and kit
EP3677692A1 (en) * 2011-04-13 2020-07-08 Spatial Transcriptomics AB Method and product for localised or spatial detection of nucleic acid in a tissue sample
CN112888794A (en) * 2018-05-31 2021-06-01 潘森纳丽斯股份有限公司 Compositions, methods and systems for processing or analyzing a multi-species nucleic acid sample
CN113278611A (en) * 2021-03-07 2021-08-20 华中科技大学同济医学院附属协和医院 Capture sequencing probes and uses thereof
CN113755555A (en) * 2021-09-03 2021-12-07 浙江工商大学 Capture probe set for detecting food allergen, preparation method and application thereof

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3677692A1 (en) * 2011-04-13 2020-07-08 Spatial Transcriptomics AB Method and product for localised or spatial detection of nucleic acid in a tissue sample
AU2015203545A1 (en) * 2011-05-04 2015-07-23 Htg Molecular Diagnostics, Inc. Quantitative nuclease protection assay (qnpa) and sequencing (qnps) improvements
CN106086013A (en) * 2016-06-30 2016-11-09 厦门艾德生物医药科技股份有限公司 A kind of probe for nucleic acid enriching capture and method for designing
WO2018001258A1 (en) * 2016-06-30 2018-01-04 厦门艾德生物医药科技股份有限公司 Probe for nucleic acid enrichment and capture, and design method thereof
WO2019197541A1 (en) * 2018-04-11 2019-10-17 Université de Bourgogne Detection method of somatic genetic anomalies, combination of capture probes and kit of detection
CN112888794A (en) * 2018-05-31 2021-06-01 潘森纳丽斯股份有限公司 Compositions, methods and systems for processing or analyzing a multi-species nucleic acid sample
CN109337956A (en) * 2018-09-07 2019-02-15 上海思路迪生物医学科技有限公司 Design method, capture probe, capture probe group and the kit of capture probe
CN110699426A (en) * 2019-01-02 2020-01-17 上海臻迪基因科技有限公司 Gene target region enrichment method and kit
CN113278611A (en) * 2021-03-07 2021-08-20 华中科技大学同济医学院附属协和医院 Capture sequencing probes and uses thereof
CN113755555A (en) * 2021-09-03 2021-12-07 浙江工商大学 Capture probe set for detecting food allergen, preparation method and application thereof

Non-Patent Citations (7)

* Cited by examiner, † Cited by third party
Title
NIHARIKA SHARMA ET AL: "Transcriptome-wide profiling and expression analysis of transcription factor families in a liverwort, Marchantia polymorpha", 《BMC GENOMICS》 *
SARAH BANK ET AL: "Transcriptome and target DNA enrichment sequence data provide new insights into the phylogeny of vespid wasps (Hymenoptera: Aculeata: Vespidae)", 《MOLECULAR PHYLOGENETICS AND EVOLUTION》 *
STEFAN A. RENSING ET AL: "Moss transcriptome and beyond", 《TRENDS IN PLANT SCIENCE》 *
何其邹洪 等: "单细胞测序技术及其在植物中的研究进展", 《中国细胞生物学学报》 *
毛建丰 等: "结合系统发育与群体遗传学分析检验杂交是否存在的技术策略", 《生物多样性》 *
舒江平 等: "基于系统基因组学分析揭示早期陆生植物的复杂网状进化关系", 《生物多样性》 *
蒋费涛 等: "转录组学技术及其在植物系统学上的研究进展", 《现代盐化工》 *

Also Published As

Publication number Publication date
CN114369650B (en) 2022-06-17

Similar Documents

Publication Publication Date Title
Gao et al. Transcriptomic comparison reveals genetic variation potentially underlying seed developmental evolution of soybeans
Wysocki et al. The floral transcriptomes of four bamboo species (Bambusoideae; Poaceae): support for common ancestry among woody bamboos
Guo et al. Widespread and adaptive alterations in genome-wide gene expression associated with ecological divergence of two Oryza species
Zhang et al. Bioinformatic analysis of chromatin organization and biased expression of duplicated genes between two poplars with a common whole-genome duplication
Bista et al. Genomics of cold adaptations in the Antarctic notothenioid fish radiation
CN113667760B (en) SSR (simple sequence repeat) marker primer and method for evaluating genetic diversity of sparus praecox population
Wheeler et al. Transcription factors evolve faster than their structural gene targets in the flavonoid pigment pathway
Zhang et al. Improved reference genome annotation of Brassica rapa by pacific biosciences RNA sequencing
CN109280701A (en) Probe, genetic chip and preparation method and application for thalassemia detection
CN117051481B (en) Preparation method and application of space bar code chip
Long et al. Complete chloroplast genomes and comparative analysis of Ligustrum species
Alabi et al. Genome report: a draft genome of Alliaria petiolata (garlic mustard) as a model system for invasion genetics
CN114369650B (en) Design method of capture probe, capture probe and application thereof
Kolis et al. Population genomic consequences of life-history and mating system adaptation to a geothermal soil mosaic in yellow monkeyflowers
Ritter Guava biotechnologies, genomic achievements and future needs
Tian et al. Transcriptome sequencing and EST-SSR marker development in Salix babylonica and S. suchowensis
Wang et al. Genome assembly of Musa beccarii shows extensive chromosomal rearrangements and genome expansion during evolution of Musaceae genomes
Guo et al. Chromosomal-level assembly of the Leptodermis oblonga (Rubiaceae) genome and its phylogenetic implications
CN109825625B (en) Primer group for identifying haploid or homozygous diploid of populus tomentosa and application thereof
CN113337578A (en) Method for efficiently screening positive SNP of aquatic animals based on transcriptome data
CN112359102A (en) Method for constructing tobacco core germplasm based on genomics and application thereof
Zhou et al. Chromosome-level genome assembly of Niphotrichum japonicum provides new insights into heat stress responses in mosses
CN113403413B (en) cPPSSR (cyclic shift keying) marker primer developed based on peony chloroplast genome sequence and application
CN116515955B (en) Multi-gene targeting typing method
Lobov et al. Data on RNA-seq analysis of the oviducts of five closely related species genus Littorina (Mollusca, Caenogastropoda): L. saxatilis, L. arcana, L. compressa, L. obtusata, L. fabalis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant