CN114369650A

CN114369650A - Design method of capture probe, capture probe and application thereof

Info

Publication number: CN114369650A
Application number: CN202210276899.4A
Authority: CN
Inventors: 董珊珊; 余进
Original assignee: Shenzhen Xianhu Botanical Garden Shenzhen Garden Research Center
Current assignee: Shenzhen Xianhu Botanical Garden Shenzhen Garden Research Center
Priority date: 2022-03-21
Filing date: 2022-03-21
Publication date: 2022-04-19
Anticipated expiration: 2042-03-21
Also published as: CN114369650B

Abstract

The invention relates to the technical field of biology, and discloses a design method of a capture probe, the capture probe and application thereof. The design method of the capture probe comprises the following steps: collecting a sample of the bryophyte group, preprocessing the sample, and performing transcriptome sequencing to obtain original sequencing data; filtering and carrying out transcriptome assembly on original sequencing data to obtain a protein sequence and a nucleotide sequence of a sample; performing clustering analysis on the protein sequence to obtain an orthologous single copy gene, and comparing and extracting the nucleotide sequence of the orthologous single copy gene to obtain a target single copy gene; screening the target single copy gene to obtain a target gene sequence, designing a DNA short sequence according to the target gene sequence, obtaining a complementary RNA short sequence according to the DNA short sequence, and synthesizing a capture probe according to the RNA short sequence. The capture probe provided by the invention is applied to genome sequencing of the moss plants, and has the advantages of high efficiency, low cost and obvious enrichment effect.

Description

Design method of capture probe, capture probe and application thereof

Technical Field

The invention relates to the technical field of biology, in particular to a design method of a capture probe, the capture probe and application thereof.

Background

Bryophytes include three major branches of the genus bryophyte (7300 species), the genus bryophyte (13000 species), and the genus bryophyte (250 species), and the phylogenetic relationship within these three branches is still lacking in systematic studies. At present, scholars mainly adopt a few molecular marker joint analysis methods for studying the phylogeny of moss plants, and have the problems of insufficient molecular data, incomplete sampling, insufficient support rate of obtained phylogenetic trees and the like, particularly some fast radiation evolvable groups (such as leafmoss) and the positions of some problem groups (such as hairy leafmoss).

The development of high-throughput sequencing technology has led to a rapid shift in molecular phylogeny from the initial use of only a few DNA fragments to the phylogenetic genomics that applies large-scale genomic data. The plant nuclear genome has huge gene quantity and high gene evolution rate, accords with the parental genetic rule, can reflect complex species evolution relation, and can effectively analyze species phylogenetic relation of different classification orders by using the nuclear gene-based phylogenetic genomics method. In recent years, researchers have acquired gene sets of species from transcriptome and/or genome sequencing data, and have acquired orthologous single copy genes by clustering for phylogenetic genomics analysis. However, methods for transcriptome sequencing are limited by the availability of fresh material from the species, whereas genome sequencing is inefficient and costly due to the enormous volume of data, and is not conducive to the application of large-scale phylogenetic genomics.

Disclosure of Invention

The invention provides a design method of a capture probe, the capture probe and application thereof, which can improve the efficiency of genome sequencing and reduce the cost.

In a first aspect, the present invention provides a method for designing a capture probe, comprising:

collecting a sample of the bryophyte group, preprocessing the sample, and performing transcriptome sequencing to obtain original sequencing data;

filtering and carrying out transcriptome assembly on the original sequencing data to obtain a protein sequence and a nucleotide sequence of the sample;

performing cluster analysis on the protein sequence to obtain a direct homologous single copy gene, and comparing and extracting the nucleotide sequence of the direct homologous single copy gene to obtain a target single copy gene;

screening the target single copy gene to obtain a target gene sequence, designing a DNA short sequence according to the target gene sequence, obtaining a complementary RNA short sequence according to the DNA short sequence, and synthesizing a capture probe according to the RNA short sequence.

In a second aspect, the invention provides a capture probe, wherein the capture probe is designed by the design method provided in the first aspect, and the nucleotide sequence of the capture probe comprises one or more of the probe sequence groups of SEQ ID No. 1-SEQ ID No. 100.

In a third aspect, the invention provides the use of a capture probe of the second aspect in gene sequencing.

The design method of the capture probe provided by the invention utilizes self-tested moss plant transcriptome data to obtain moss single-copy gene data, and synthesizes the RNA hybridization capture probe according to the moss single-copy gene design. The capture probe provided by the invention can be applied to the research of phylogenetic genomics of moss plants, the proportion of effective data is high, the genome sequencing efficiency is high, the cost is low, and the enrichment effect is obvious.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.

FIG. 1 is a schematic flow chart illustrating a method for designing a capture probe according to an embodiment of the present invention;

FIG. 2 is a graph showing the results of a sequencing process using capture probes in gene sequencing according to an embodiment of the present invention.

Detailed Description

In order to make the technical problems, technical solutions and advantageous effects solved by the present invention more apparent, the present invention is further described in detail below with reference to the following embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

The experimental procedures used in the following examples are all conventional procedures unless otherwise specified. The materials, reagents and the like used in the following examples are commercially available unless otherwise specified, and techniques not described in detail are performed according to standard methods well known to those skilled in the art. The cell lines, reagents and carriers mentioned in this application are commercially available or otherwise publicly available, and are by way of example only and not exclusive to the present invention, and may be replaced by other suitable means or biological materials, respectively.

The present invention will be further illustrated by the following examples.

In one embodiment, as shown in FIG. 1, a method for designing a capture probe is provided, which includes the following steps S10-S40.

S10, collecting a sample of the bryophyte group, preprocessing the sample, and performing transcriptome sequencing to obtain original sequencing data.

Understandably, in order to ensure the richness and coverage of the sample, the representative group of the moss plant group needs to be selected as much as possible when the sample of the moss plant group is collected, and meanwhile, the moss plant group needs to be collected from a plurality of regions in a scattered manner according to the distribution of the group in order to reduce the regional limitation of the sample. The liverwort plants are 15 meshes in total and widely distributed, and in the embodiment, samples of liverwort groups are collected from the wild of a plurality of regions, so that 40 parts of fresh liverwort plant materials of 13 meshes of the liverwort plants are obtained. The sample is preserved in plastic preservation box and is prevented that the integrality from receiving destruction, carries out the preliminary treatment to the sample under the laboratory condition and gets rid of the impurity and be convenient for sample preparation. The transcriptome is the sum of all RNAs that a particular tissue or cell can transcribe at a certain developmental stage or functional state, and mainly comprises protein-encoding mRNAs and non-coding RNAs. The transcriptome sequencing can comprehensively and quickly obtain the sequence information of almost all transcripts of a specific tissue or organ of a certain species in a certain state, analyze the structure and the expression level of the transcripts, simultaneously discover unknown transcripts and rare transcripts, accurately identify variable splicing sites and coding sequence single nucleotide polymorphism, and provide the most comprehensive transcriptome information.

S20, filtering the original sequencing data, assembling a transcriptome, and obtaining the protein sequence and the nucleotide sequence of the sample.

Understandably, the raw sequencing data contains some linker sequence and low quality bases, which need to be filtered. In the transcriptome assembly, a sequencing method is used for generating sequence fragments (namely reads) from the genome of a species to be tested, splicing the fragments according to an overlapping region between the reads to form a longer continuous sequence (contig), splicing the contigs to form longer scafffolds which are allowed to contain blank sequences (gap), and positioning the scafffolds to a chromosome by eliminating errors and gaps of the scafffolds so as to obtain a high-quality whole genome sequence. For the analysis of transcriptome with reference genome, the transcriptome assembly is mainly to assemble the sequence segments aligned to the reference genome into transcriptome, and the common software has StringTie and cufflinks. For the transcriptome analysis without reference genome, the transcriptome assembly mainly comprises the de novo assembly of sequence fragments into transcriptome, and common software comprises Trinity, Oases and SOAPdenovo-Trans.

Understandably, in this embodiment, the adaptor sequence, the repetitive sequence and the low-quality base of the original sequencing data are filtered by using the trimmatic software on a certain supercomputing platform, so as to obtain clean sequencing data. In this embodiment, Trinity software is used to perform transcriptome assembly on clean data, perl scripts are compiled to select the longest transcript, and downstream transcoders are used to predict coding regions according to default parameters, so as to obtain protein sequences and nucleotide sequences of sample gene sets. The longest transcript is selected by adopting the self-written Perl script, so that data can be effectively filtered, and redundancy is removed.

S30, performing cluster analysis on the protein sequence to obtain a direct homologous single copy gene, and comparing and extracting the nucleotide sequence of the direct homologous single copy gene to obtain a target single copy gene.

Understandably, proteins are the performers and regulators of molecular functions, and are also the main carriers of vital activities. The remote homology detection of proteins is one of the main research tasks of structural genomics and functional genomics, the proteins with similar structures and functions are also similar, and the proteins with similar functions can be clustered into one class. Clustering is the process of dividing a collection of physical or abstract objects into classes composed of similar objects, cluster analysis is taxonomic-originated, but clustering is not equal to classification, and clustering differs from classification in that the class into which the clustering is required to be divided is unknown. The gene expression data clustering is to cluster genes with similar expression profiles into one class, perform clustering analysis on protein sequences to obtain clustering clusters of orthologous protein families, and screen orthologous single copy genes according to the clustering clusters. The single copy gene refers to a gene having a small copy number of only 1 in the genome, and most of them are constitutively expressed housekeeping genes (housekeeping genes) in an organism.

Understandably, in this example, the protein sequences of all 40 bryophyte samples and the protein sequences of two rotaphyte outer groups were put together and clustered by using Orthofinder software, the clustering parameter of markov clustering algorithm was set to default I =1.5, and the gene family clustering results were screened by using Kinfin software according to the group occupancy >0.65 for 1-to-1 orthologous single copy genes. And comparing the nucleotide sequences of orthologous single-copy genes by using MAFFT software, trimming a matrix by using TrimAl software, constructing an evolutionary tree by using IQTREE2 software, further trimming long branches according to the evolutionary tree, and screening a single-copy gene family with high resolution of the evolutionary tree to obtain a final target single-copy gene for downstream probe design.

S40, screening the target single copy gene to obtain a target gene sequence, designing a DNA short sequence according to the target gene sequence, obtaining a complementary RNA short sequence according to the DNA short sequence, and synthesizing a capture probe according to the RNA short sequence.

Understandably, the principle of hybrid capture is the artificial design of probes (in either DNA or RNA form) that can be partially or fully complementary to the target segment. The sample and the probe are mixed, the probe captures the target segment in the sample, the segment without designed probe is eluted and discarded, and then the probe and the capture segment are separated by denaturation (generally pH is adjusted to be alkaline), and the captured segment can be used for sequencing library construction. According to the screened target single copy gene sequence, a DNA short sequence which is overlapped can be designed, an RNA short sequence which is complementary with the DNA short sequence is obtained according to the base complementary principle, and an RNA hybridization capture probe is synthesized according to the RNA short sequence and is used for the gene capture sequencing of the target group.

Understandably, the target single-copy gene obtained in this example contains 1,390 single-copy gene data, and 371 genes with relatively conservative base sequences (sequence similarity 70% -85%), medium length (800-3000 bp) and high class occupancy (> 70%) are screened by the Geneious software. 2-3 representative species of three subclasses of the bryophyte are respectively selected from the three subclasses of the bryophyte, 1,030 item target gene sequences are obtained, and the cumulative total length is 1,031,187 bp. According to the target gene sequence, 19,856 DNA short sequences with the step length of 80bp and the mutual overlapping of 44bp are designed by Seqkit software, so that the target gene sequence is completely covered, and then the Seqkit software is used for generating corresponding complementary RNA short sequences for the short sequences, wherein the total length is 3,988,480 bp. These short RNA sequences were synthesized by primers to obtain capture probes. The screening condition and the design parameter of the target single copy gene are very critical, and the optimized screening condition and the design parameter can obtain the gene with stronger phylogenetic signal and the probe with higher capture efficiency. The gene sequences of two to three species with system representativeness are selected for designing the probes, so that the capture efficiency of the subsequent probes on the gene fragments of all the groups can be effectively improved.

Optionally, the pretreatment comprises separation, washing, microscopic examination, quick freezing and grinding.

Understandably, the embodiment stores the collected samples in a plastic freshness protection tape to return to a laboratory, and obtains the samples of the target species through separation of a dissecting mirror under the laboratory condition; washing the separated sample with sterile water for multiple times, and performing the endoscopic examination again through a dissecting mirror to remove possible interspecies pollution and algae pollution; absorbing moisture of the sample qualified by microscopic examination by using experimental absorbent paper, wrapping the sample by using tin foil paper, and then placing the wrapped sample into a liquid nitrogen tank for quick freezing for 10 minutes; the sample after the quick freezing is taken out and poured into a sterile mortar precooled by liquid nitrogen, and the sample is quickly and manually ground into powder.

Optionally, the transcriptome sequencing comprises: and extracting RNA of the sample, and performing library construction and transcriptome sequencing according to the RNA.

Understandably, plants contain secondary metabolites such as polysaccharides and polyphenols, which can be tightly bound to RNA after cell lysis to form insoluble complexes or jelly-like precipitates, which are difficult to remove. In one example, the RNA of the sample is extracted by a Vazyme Fast Pure Plant Total RNA Isolation Kit (RC401), and the Kit is suitable for the rapid extraction of RNA of various Plant samples. Obtaining RNA of a sample according to the operation flow of the kit, and then carrying out illumina library construction and transcriptome sequencing according to the RNA to obtain a 6G sample of original sequencing data.

Optionally, the performing cluster analysis on the protein sequence to obtain a direct homologous single copy gene includes:

performing clustering analysis according to the protein sequence of the bryophyte group and the protein sequence of the outer group to obtain a clustering cluster of a homologous protein family;

and screening out direct homologous single copy genes with the cluster occupancy rate of more than 65% according to the cluster.

Understandably, an outlier is required to be selected to determine a tree root for building the evolutionary tree, an outlier sequence is added to determine the root of the evolutionary tree, the root is used for determining the starting point of the evolution of the sequence, and the evolution sequence can be seen from the evolutionary tree after the root is determined. The general principle is to select the species with the closest genetic relationship outside the target class as the foreign class. The tree shape change difference is greatly caused by different outer group selections, and different types subdivided in the same species are selected, so that the closer the outer relation of the groups is, the better. The single copy gene is a valuable molecular marker in molecular systematics and plays an extremely important role in constructing the trunk and branches between the trunk and the tip of a life tree. In this example, stonewort was selected as the outer group of bryophytes, the protein sequences of all 40 samples of bryophytes and the protein sequences of two outer groups of stonewort were put together and clustered using the Orthofinder software, and the clustering results of the genes were screened for 1-to-1 orthologous single copy genes using the Kinfin software according to the group occupancy > 0.65. The OrthoFinder software maps gene duplication events in the gene tree onto branches of the species phylogenetic tree and provides some statistics in comparative genomics. The Kinfin software obtains rich ortholog aggregate annotation and screens ortholog homologous single copy genes through a protein clustering file output by the OrthoFinder software, functional annotation data and user-defined species classification.

Optionally, the target single copy gene is screened to obtain a target gene sequence, the similarity of the screened sequence is 70-80%, the length is 800-3000 bp, and the occupancy rate of the group is more than 70%.

Optionally, the step length of the DNA short sequences is 70-90 bp, and the length of the mutual overlapping is 34-54 bp.

Understandably, the short sequences of the capture probes have different design step lengths and different capture efficiencies, the step lengths need to be designed for different sample groups, the lengths of the head and tail parts between the front and back short sequences are mutually overlapped, and the capture efficiency is improved by connecting the head and the tail. The RNA short sequence of the capture probe is obtained according to a complementary DNA short sequence, when the overlapping DNA short sequence is designed, the step length of the short sequence is 70-90 bp, and the length of the overlapping is 34-54 bp. In one embodiment of the application, the Seqkit software is used for designing 19,856 DNA short sequences with the optimal step length of 80bp and the length of 44bp overlapped with each other so as to enable the DNA short sequences to completely cover a target gene sequence, and then the Seqkit software is used for generating complementary RNA short sequences on the DNA short sequences, wherein the total length is 3,988,480 bp.

Optionally, the invention provides a capture probe, wherein the capture probe is designed by the above design method, and the nucleotide sequence of the capture probe comprises one or more of probe sequence groups SEQ ID No.1 to SEQ ID No. 100.

Optionally, the invention provides an application of the capture probe in gene sequencing.

Understandably, genome-level sequencing is largely divided into whole genome sequencing, whole exon sequencing, and targeted sequencing. Whole genome sequencing is the sequencing of all bases of the whole genome, whole exon sequencing is the sequencing of all exons of the genome, and targeted sequencing is the sequencing of some selected genes. The targeted sequencing technology is mainly divided into two technical routes of multiplex PCR and hybrid capture. The hybridization capture sequencing is to break the genome DNA into fragments, add a capture probe designed according to the target region, the probe can be partially or completely complementary with the target fragment, so that the capture probe and the target fragment are hybridized to capture the target fragment, thereby achieving the purpose of enrichment.

In one embodiment, the capture probe sequence group SEQ ID NO. 1-SEQ ID NO.100 is designed and synthesized by the capture probe design method provided by the invention, and is directly applied to hybrid capture of a DNA library mixed with 52 moss plant samples, wherein the concentration of the DNA library is 20 ng/mu L, and the final volume is 20 mu L. According to the probe capture operation flow of the Arbor Biosciences single-copy nuclear gene capture kit, a hybrid capture library is obtained, the concentration is 10 ng/. mu.L, and the volume is 20. mu.L. The complete sequencing process comprises DNA extraction, purity and concentration detection, DNA fragmentation, sequencing library preparation, library quality detection, hybridization capture, capture sequence enrichment, PCR product purification, PCR product quality detection, sequencing and bioinformatics analysis. Sequencing the hybridization capture library on an illumina Hiseq 2000 sequencer to obtain 100G data, splitting the library, averagely splitting each species to 2G data, running a Hybpiper data analysis process, and extracting a target gene sequence.

In a pair of proportions, 10G data of the original unenriched DNA library mixed with 52 moss plant samples is respectively sequenced through second-generation sequencing of a common library to obtain 520G data in total, and a Hybpiper data analysis process is also operated to capture a target gene sequence for contrasting the hybridization capture effect of a capture probe on the target gene sequence.

The lengths of the gene sequences of 371 target single-copy genes captured by two sets of data obtained by a capture probe sequencing method and a second generation sequencing method of a common library are respectively plotted into heatmaps, and the sequencing result is shown in FIG. 2.

The heat map is essentially a matrix of values, each of which is a number, each of which is assigned a color according to a preset color scale (in fig. 2, "scale of length of captured target gene" is used as a color scale). Comparing the sequencing results of the capture probe sequencing and the second-generation sequencing of the common library, compared with the second-generation sequencing of the common library, the method has the advantages that the test data amount of the synthesized moss gene capture probe is lower, the capture efficiency is higher, and the method has a remarkable enrichment effect on the target gene sequence.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Sequence listing

<110> Shenzhen city fairy lake botanical garden (Shenzhen city garden research center)

<120> design method of capture probe, capture probe and application thereof

<160> 100

<170> SIPOSequenceListing 1.0

<210> 1

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 1

uacagucuag aucuguaggu cuagggcccg cggaagcuag guaaacgucu ccgcuuacga 60

cuucuauguc cucguccuag 80

<210> 2

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 2

acgucuccgc uuacgacuuc uauguccucg uccuaguuuu cuaauacagg uacaagccca 60

ggucgucgca uugccguccu 80

<210> 3

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 3

uacagguaca agcccagguc gucgcauugc cguccuucuc aaacugcugc cacguuccag 60

aguucuuccu caaguugaua 80

<210> 4

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 4

ugcugccacg uuccagaguu cuuccucaag uugauauugu ucuagaacuu ccuaaaguuc 60

uuccucaaga cgacguugcc 80

<210> 5

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 5

gaacuuccua aaguucuucc ucaagacgac guugccaugu caccaaguuc uaggacucga 60

cccaguccac uaaguugaag 80

<210> 6

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 6

aaguucuagg acucgaccca guccacuaag uugaaguccc ccuagucgcc uucuuacaaa 60

gggucaaaga acacguccga 80

<210> 7

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 7

cccuagucgc cuucuuacaa agggucaaag aacacguccg accccaacac uucuuccuag 60

aguaguucua ggugcccaaa 80

<210> 8

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 8

uacagucuag aucuauaggu cuagggaccg cggaaacugg gaaaacgucu ccgcuuacga 60

cuccuaagcc cacguccaag 80

<210> 9

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 9

acgucuccgc uuacgacucc uaagcccacg uccaagauuu cuaauacagg uacacgccca 60

ggucgucgcu uugccuucuu 80

<210> 10

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 10

uacagguaca cgcccagguc gucgcuuugc cuucuuucuc aaacuggugc cacgucccag 60

aguuuuuccu uaaauuaaug 80

<210> 11

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 11

uggugccacg ucccagaguu uuuccuuaaa uuaauguugu uuuaagaguu ccugaaguuc 60

uuccucaaaa caacguugcc 80

<210> 12

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 12

agaguuccug aaguucuucc ucaaaacaac guugccgugu caacaagucc uaggccucaa 60

cccaguccac uaagucgaag 80

<210> 13

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 13

aaguccuagg ccucaaccca guccacuaag ucgaaguccc ucuagucgcc uuuuuacaca 60

gcgucaagga acaaagccgg 80

<210> 14

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 14

cucuagucgc cuuuuuacac agcgucaagg aacaaagccg gccucaacac uucuuccuag 60

acuaguucua ggugccuaaa 80

<210> 15

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 15

uacaggugca cgcccagguc gucgcuuugc cuucuuucuc gaacuguugg caggucccag 60

aauucuuucu caaguugaug 80

<210> 16

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 16

uguuggcagg ucccagaauu cuuucucaag uugauguuau uuuaggacuu ccugaaguuc 60

uuccucaaga cgacguugcc 80

<210> 17

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 17

ggacuuccug aaguucuucc ucaagacgac guugccuugu caacaaguuc uaggacucaa 60

cccaguccac uaaguugagg 80

<210> 18

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 18

aaguucuagg acucaaccca guccacuaag uugagguccc gcuggucgcu uucuuacaca 60

gcgucaaaga ccaaguccga 80

<210> 19

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 19

cgcuggucgc uuucuuacac agcgucaaag accaaguccg accucaacac uuuuuccuag 60

acuaguucua ggugcccaaa 80

<210> 20

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 20

uuuggccaca uaaaagugcc ccgcgggagc aaggcaugua uucaauaaag cguaaagagc 60

auaucaaaaa gcacuuaauc 80

<210> 21

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 21

uacuguaguc cuccuuaccc uugagguugu cgaugucaua guggagaacg cuucauauuu 60

gaccacaagg acccacuggu 80

<210> 22

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 22

agaacgcuuc auauuugacc acaaggaccc acuggucaga caccccuucu gaucguagua 60

augggcgaag uacauacugu 80

<210> 23

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 23

ccuucugauc guaguaaugg gcgaaguaca uacuguucaa gcuguuaugc auaguccgau 60

gcuaaccaua acugaaagau 80

<210> 24

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 24

uuaugcauag uccgaugcua accauaacug aaagauaguu uuuguuacau agaucuccua 60

gcuugacaag cuaauguuga 80

<210> 25

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 25

uuacauagau cuccuagcuu gacaagcuaa uguugaaacc cuaugacgac ccguccucgc 60

gaagucuuca gaguaagguu 80

<210> 26

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 26

gacgacccgu ccucgcgaag ucuucagagu aagguucaau auagucucua agaagucacc 60

gacaacaaca aaugcuacaa 80

<210> 27

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 27

ucucuaagaa gucaccgaca acaacaaaug cuacaacguu uagcugucag uaaagacuua 60

ugacguucua cccaucuccu 80

<210> 28

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 28

ugucaguaaa gacuuaugac guucuaccca ucuccuccac gcgugccuug caccaucacu 60

acaauaguaa uacgaacacc 80

<210> 29

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 29

gccuugcacc aucacuacaa uaguaauacg aacacccuuu auuuugucua aaccaacugu 60

ucucuguuca aagauaacuc 80

<210> 30

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 30

ugucuaaacc aacuguucuc uguucaaaga uaacuccuuc cacuacgguu ucguucccug 60

aaaccccagu acaaauaacu 80

<210> 31

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 31

acgguuucgu ucccugaaac cccaguacaa auaacuuugu ucacgauuuc gacccaaguu 60

auaauuccgu gagaaagccu 80

<210> 32

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 32

gauuucgacc caaguuauaa uuccgugaga aagccuucua ucgucgucgg gacggaccau 60

accuccgaaa uagcagucgu 80

<210> 33

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 33

cgucgggacg gaccauaccu ccgaaauagc agucguuuuu gacuccuaga ucaacuacaa 60

uuggauuuug guuguggauu 80

<210> 34

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 34

gacuccuaga ucaacuacaa uuggauuuug guuguggauu acgacgggac ugucucuuau 60

uuugaccccg aacaaggacg 80

<210> 35

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 35

ggaacgcuuc auauucgacc acaaggaccc acuaguuaga cauccuuucu ggucguagua 60

gugggcgaag uacaugcugu 80

<210> 36

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 36

cuuucugguc guaguagugg gcgaaguaca ugcuguucaa acuguugugg auaguccgau 60

gguaaccaua gcuaaagaac 80

<210> 37

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 37

uuguggauag uccgauggua accauagcua aagaacaggu uuugcuacau gaaccuucua 60

gcuugacagg cggaggucga 80

<210> 38

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 38

cuacaugaac cuucuagcuu gacaggcgga ggucgaaacc cuauggcgac cuguccucgc 60

uaagucuuca gaguaagggu 80

<210> 39

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 39

ggcgaccugu ccucgcuaag ucuucagagu aagggucgau guagucccua agcagacacc 60

gucaucacca aauacuacaa 80

<210> 40

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 40

ucccuaagca gacaccguca ucaccaaaua cuacaacguu uagcugucag aaagaacuua 60

ugucguucua cccaucuccu 80

<210> 41

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 41

ugucagaaag aacuuauguc guucuaccca ucuccuccac gcgugacucg cuccaucacu 60

acaauaauaa uacgaccaac 80

<210> 42

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 42

gacucgcucc aucacuacaa uaauaauacg accaaccuuu guuuugccug aaccaacuau 60

ucuccguuca aagguagcuc 80

<210> 43

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 43

ugccugaacc aacuauucuc cguucaaagg uagcuccuuc cacuggacuu ucgcucucua 60

aagccgcaau acaaguagcu 80

<210> 44

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 44

ggacuuucgc ucucuaaagc cgcaauacaa guagcucugu ucacgcuuuc gcccuaaguu 60

auaauuccga gagaaggccu 80

<210> 45

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 45

gcuuucgccc uaaguuauaa uuccgagaga aggccuuuua ucgucgacga aacggacccu 60

accuccgaaa caguagccgc 80

<210> 46

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 46

cgacgaaacg gacccuaccu ccgaaacagu agccgcuuug uccuccugaa ucagcugcag 60

uuagauuuug guugagguuu 80

<210> 47

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 47

uccuccugaa ucagcugcag uuagauuuug guugagguuu acgauugggc cuugucuugu 60

uucggccucc gacgcggacg 80

<210> 48

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 48

uaccguagcc cuccuuaccc uugagguugu cgaugucaca gaggcgagcg guucauauuc 60

gaccacaaag acccucuagu 80

<210> 49

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 49

cgagcgguuc auauucgacc acaaagaccc ucuagucagc caacccuucu gaucguagua 60

augggcgaag uacauacuau 80

<210> 50

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 50

ccuucugauc guaguaaugg gcgaaguaca uacuauucaa acuauugugu auaguccguu 60

gguaaccgua acuaaaagau 80

<210> 51

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 51

uuguguauag uccguuggua accguaacua aaagauagcu uuuguuacau ggaucuccug 60

uccugacaag cagacgucga 80

<210> 52

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 52

uuacauggau cuccuguccu gacaagcaga cgucgacacc cuaugacgac cuguccucgc 60

caagucuuca gaauaagguu 80

<210> 53

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 53

gacgaccugu ccucgccaag ucuucagaau aagguucgau guaagcucua agaagacacc 60

gacaacaaca gauacuacaa 80

<210> 54

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 54

gcucuaagaa gacaccgaca acaacagaua cuacaacguu uagcugucag caaagaguua 60

ugacgcucua cccaccuccu 80

<210> 55

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 55

ugucagcaaa gaguuaugac gcucuaccca ccuccuccaa gcgugacuug cuccuucacu 60

acaauaguaa uacgaccauc 80

<210> 56

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 56

gacuugcucc uucacuacaa uaguaauacg accauccuuu auucugccua gaccaacugu 60

ucucuguuca aagauaacuc 80

<210> 57

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 57

ugccuagacc aacuguucuc uguucaaaga uaacuccuuc cacuacgguu ccguucccug 60

aaaccccagu acaaguaacu 80

<210> 58

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 58

acgguuccgu ucccugaaac cccaguacaa guaacuuugu ucacgguuuc gucccaaguu 60

auaauuccgu gaaaaggcuu 80

<210> 59

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 59

gguuucgucc caaguuauaa uuccgugaaa aggcuuuuua ucgucgucga gauggaccau 60

accuccgaaa cagaagucgu 80

<210> 60

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 60

cgucgagaug gaccauaccu ccgaaacaga agucguuucg uccuccuaga ucaucuacau 60

uuagauuuug guugugguuu 80

<210> 61

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 61

uccuccuaga ucaucuacau uuagauuuug guugugguuu acgacgaguc aaucucuuau 60

ucagaccccc aacacggacg 80

<210> 62

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 62

uacccacgcu ugagcaacua aaugucgaaa cagcgugcuc caugacagca cgaccgucuc 60

auaugacgaa agaggccguu 80

<210> 63

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 63

ugacagcacg accgucucau augacgaaag aggccguuaa agucguguua acggcagguc 60

acaaauguuu ucaacgggcg 80

<210> 64

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 64

ucguguuaac ggcaggucac aaauguuuuc aacgggcggu uguuguuguu caagugaaug 60

ugaacacuag cuguaugaaa 80

<210> 65

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 65

uuguuguuca agugaaugug aacacuagcu guaugaaagu ugauggaaca acuucuaccu 60

aagugaauag accagcaacg 80

<210> 66

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 66

auggaacaac uucuaccuaa gugaauagac cagcaacgcc uacuucugaa gccuuccguc 60

uauggcaaac guaaagaccu 80

<210> 67

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 67

cuucugaagc cuuccgucua uggcaaacgu aaagaccugg cgcaguuucu ccugaagucc 60

ucugcaauac cuccacccuc 80

<210> 68

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 68

caguuucucc ugaaguccuc ugcaauaccu ccacccuccc gucuaugccg cuaacgagua 60

ucgaaccugu uccuuaagcc 80

<210> 69

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 69

cuaugccgcu aacgaguauc gaaccuguuc cuuaagccca guuuuaacuu ucuuguguac 60

uggaagacac aacuuguagg 80

<210> 70

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 70

uuuaacuuuc uuguguacug gaagacacaa cuuguagggc uccucuacuu guuugauagc 60

uuuuaauucg uuguccaaag 80

<210> 71

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 71

cucuacuugu uugauagcuu uuaauucguu guccaaaguc uucacuucuc gcaguacuac 60

cuguuguaac ucuuccauga 80

<210> 72

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 72

cacuucucgc aguacuaccu guuguaacuc uuccaugauc uagcaccacu cuucuagcuu 60

caaaaccacc uguuuugucu 80

<210> 73

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 73

gcaccacucu ucuagcuuca aaaccaccug uuuugucugu uagaagccug cguccggcug 60

uugaaagucg cagucccauc 80

<210> 74

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 74

gaagccugcg uccggcuguu gaaagucgca gucccaucug ucgacgcugc guuuuacacc 60

aaccgcuuaa aguuucacuu 80

<210> 75

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 75

gacgcugcgu uuuacaccaa ccgcuuaaag uuucacuucg acuaucacga ucgcuaauau 60

uaacaacacu aggacuagua 80

<210> 76

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 76

acuucgacua ucacgaucgc uaauauuaac aacacuagga cuaguauacc aauagguaaa 60

cggugccuaa auucacguuc 80

<210> 77

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 77

uggcagcacg agcgccucau guggcgaaaa agacccuuga aaucguguua gcgucaaguc 60

acagaugucu ucaauggacg 80

<210> 78

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 78

ucguguuagc gucaagucac agaugucuuc aauggacguu uauuguuguu uaaauggaug 60

uguacgcugg caguguggaa 80

<210> 79

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 79

uuguuguuua aauggaugug uacgcuggca guguggaagu ugauagaaca ccuucuaccu 60

aagugcauaa accagcaacg 80

<210> 80

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 80

auagaacacc uucuaccuaa gugcauaaac cagcaacgcc uacuccuuaa gccaucuguc 60

uaaggaaaac guaaaaaccu 80

<210> 81

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 81

cuccuuaagc caucugucua aggaaaacgu aaaaaccugg cgcacuuccu ucugaagucc 60

gcugcaauac cuccuccguc 80

<210> 82

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 82

cacuuccuuc ugaaguccgc ugcaauaccu ccuccgucuc gucugugacg guaucgcgua 60

ucggaccugu uucuuaagcc 80

<210> 83

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 83

cugugacggu aucgcguauc ggaccuguuu cuuaagccca gguuugaauu ccucguguac 60

gucaagacgc agcucguagg 80

<210> 84

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 84

uuugaauucc ucguguacgu caagacgcag cucguagguc uccucuacuu auucgauagc 60

uuuuaauuuc guguucaaag 80

<210> 85

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 85

cucuacuuau ucgauagcuu uuaauuucgu guucaaagcc uccacuuccc guaguacaac 60

cuguuguaac ucuuccaaga 80

<210> 86

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 86

cacuucccgu aguacaaccu guuguaacuc uuccaagaac uagcgccacu cuucuaacuu 60

cacgaccaac uauucugccu 80

<210> 87

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 87

gcgccacucu ucuaacuuca cgaccaacua uucugccuau uggaagcauu gguccggcug 60

uuaaaggucg cugucccguc 80

<210> 88

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 88

gaagcauugg uccggcuguu aaaggucgcu gucccguccg ucgacgcauc uuucuacacc 60

aacgucuuga aauuccacuu 80

<210> 89

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 89

gacgcaucuu ucuacaccaa cgucuugaaa uuccacuucg acuaacauga acgcuauuag 60

uaacagcacu aaaauuagua 80

<210> 90

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 90

acuucgacua acaugaacgc uauuaguaac agcacuaaaa uuaguauacc aauagguaua 60

cgguaccaaa guucacguuc 80

<210> 91

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 91

ugacagcacg agcgccucau gugacgcaag agaccuuuga aaucauguua acgccagguc 60

acagaugucu ucaauggacg 80

<210> 92

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 92

ucauguuaac gccaggucac agaugucuuc aauggacguu uauuguuguu uaaauggaug 60

uguacgcugg caguguggaa 80

<210> 93

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 93

uuguuguuua aauggaugug uacgcuggca guguggaagu ugauagaaca ccuucuaccg 60

aaguguauaa accaacagcg 80

<210> 94

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 94

auagaacacc uucuaccgaa guguauaaac caacagcggc uacuucugaa accggcuguc 60

uagggaaagc guaaaaaccu 80

<210> 95

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 95

cuucugaaac cggcugucua gggaaagcgu aaaaaccugg cacacuuccu ucugaaaucc 60

gcugcaauac cuccuccguc 80

<210> 96

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 96

cacuuccuuc ugaaauccgc ugcaauaccu ccuccgucuc gucuaugacg guaacgcgua 60

ucgaaccugu uccuuaagcc 80

<210> 97

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 97

cuaugacggu aacgcguauc gaaccuguuc cuuaagccca gguuugaauu ccucguguac 60

guuaagacgc agcugguagg 80

<210> 98

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 98

uuugaauucc ucguguacgu uaagacgcag cugguagguc uccucuacuu auuugacagu 60

uuuuaauuuc gaguccaaag 80

<210> 99

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 99

cucuacuuau uugacaguuu uuaauuucga guccaaaguc uucacuuccc cuaguacuac 60

cuguuguagc uuuuccaaga 80

<210> 100

<211> 80

<212> RNA

<213> Artificial Sequence (Artificial Sequence)

<400> 100

cacuuccccu aguacuaccu guuguagcuu uuccaagaac uagcgccacu cuucuagcuu 60

cacgaccacc uguucugacu 80

Claims

1. A method for designing a capture probe, comprising:

2. The method of claim 1, wherein the pre-treatment comprises separating, washing, microscopic examination, rapid freezing, and grinding.

3. The method of claim 1, wherein the transcriptome sequencing comprises: and extracting RNA of the sample, and performing library construction and transcriptome sequencing according to the RNA.

4. The method for designing a capture probe according to claim 1, wherein the clustering analysis of the protein sequence to obtain an orthologous single copy gene comprises:

5. The capture probe design method of claim 1, wherein the target single copy gene is screened to obtain a target gene sequence, the screened sequence similarity is 70% -80%, the length is 800-3000 bp, and the group occupancy is > 70%.

6. The method for designing a capture probe according to claim 1, wherein the step length of the DNA short sequences is 70-90 bp, and the length of the overlapping is 34-54 bp.

7. A capture probe designed by the method of any one of claims 1 to 6, wherein the nucleotide sequence of the capture probe comprises one or more of the probe sequence groups SEQ ID No.1 to SEQ ID No. 100.

8. Use of the capture probe of claim 7 for gene sequencing.