CN111696627A - Design method of long-chain RNA specific probe - Google Patents
Design method of long-chain RNA specific probe Download PDFInfo
- Publication number
- CN111696627A CN111696627A CN202010225368.3A CN202010225368A CN111696627A CN 111696627 A CN111696627 A CN 111696627A CN 202010225368 A CN202010225368 A CN 202010225368A CN 111696627 A CN111696627 A CN 111696627A
- Authority
- CN
- China
- Prior art keywords
- probe
- probes
- rna
- long
- candidate
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 239000000523 sample Substances 0.000 title claims abstract description 359
- 238000013461 design Methods 0.000 title claims abstract description 48
- 238000000034 method Methods 0.000 title claims abstract description 39
- 108090000623 proteins and genes Proteins 0.000 claims abstract description 108
- 108091028075 Circular RNA Proteins 0.000 claims abstract description 76
- 108091032973 (ribonucleotides)n+m Proteins 0.000 claims abstract description 66
- 108020004999 messenger RNA Proteins 0.000 claims abstract description 65
- 102000042567 non-coding RNA Human genes 0.000 claims abstract description 23
- 108091027963 non-coding RNA Proteins 0.000 claims abstract description 22
- 102000040650 (ribonucleotides)n+m Human genes 0.000 claims abstract description 17
- 108091046869 Telomeric non-coding RNA Proteins 0.000 claims description 105
- 239000003391 RNA probe Substances 0.000 claims description 61
- 239000012634 fragment Substances 0.000 claims description 23
- 238000012216 screening Methods 0.000 claims description 18
- 230000000692 anti-sense effect Effects 0.000 claims description 15
- 238000004590 computer program Methods 0.000 claims description 12
- 238000001514 detection method Methods 0.000 claims description 12
- 230000002457 bidirectional effect Effects 0.000 claims description 9
- 230000027455 binding Effects 0.000 claims description 9
- 239000003623 enhancer Substances 0.000 claims description 9
- 238000003860 storage Methods 0.000 claims description 4
- 108020004518 RNA Probes Proteins 0.000 claims description 3
- 230000014509 gene expression Effects 0.000 abstract description 27
- 230000035945 sensitivity Effects 0.000 abstract description 6
- 102000004169 proteins and genes Human genes 0.000 abstract description 3
- 230000001105 regulatory effect Effects 0.000 abstract description 3
- 239000000203 mixture Substances 0.000 description 23
- 238000009396 hybridization Methods 0.000 description 19
- 108020004414 DNA Proteins 0.000 description 14
- 238000006243 chemical reaction Methods 0.000 description 10
- 239000000243 solution Substances 0.000 description 10
- 239000000872 buffer Substances 0.000 description 9
- 101150007515 esr2 gene Proteins 0.000 description 9
- 239000011259 mixed solution Substances 0.000 description 8
- 238000005516 engineering process Methods 0.000 description 7
- 238000012545 processing Methods 0.000 description 7
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 7
- 230000015572 biosynthetic process Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 6
- 239000007850 fluorescent dye Substances 0.000 description 6
- 238000001215 fluorescent labelling Methods 0.000 description 6
- 238000010839 reverse transcription Methods 0.000 description 6
- 239000000047 product Substances 0.000 description 5
- 238000003753 real-time PCR Methods 0.000 description 5
- 238000011160 research Methods 0.000 description 5
- 238000010828 elution Methods 0.000 description 4
- 239000000706 filtrate Substances 0.000 description 4
- 238000007363 ring formation reaction Methods 0.000 description 4
- 102000044126 RNA-Binding Proteins Human genes 0.000 description 3
- 108700020471 RNA-Binding Proteins Proteins 0.000 description 3
- 108010083644 Ribonucleases Proteins 0.000 description 3
- 102000006382 Ribonucleases Human genes 0.000 description 3
- 230000033228 biological regulation Effects 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 230000000295 complement effect Effects 0.000 description 3
- 201000010099 disease Diseases 0.000 description 3
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 238000000746 purification Methods 0.000 description 3
- 241000894007 species Species 0.000 description 3
- 238000003786 synthesis reaction Methods 0.000 description 3
- 238000013518 transcription Methods 0.000 description 3
- 230000035897 transcription Effects 0.000 description 3
- 238000011144 upstream manufacturing Methods 0.000 description 3
- 238000012795 verification Methods 0.000 description 3
- 238000005406 washing Methods 0.000 description 3
- YBJHBAHKTGYVGT-ZKWXMUAHSA-N (+)-Biotin Chemical compound N1C(=O)N[C@@H]2[C@H](CCCCC(=O)O)SC[C@@H]21 YBJHBAHKTGYVGT-ZKWXMUAHSA-N 0.000 description 2
- LFQSCWFLJHTTHZ-UHFFFAOYSA-N Ethanol Chemical compound CCO LFQSCWFLJHTTHZ-UHFFFAOYSA-N 0.000 description 2
- 108091028043 Nucleic acid sequence Proteins 0.000 description 2
- 101150090533 THBS1 gene Proteins 0.000 description 2
- 238000003556 assay Methods 0.000 description 2
- 210000000349 chromosome Anatomy 0.000 description 2
- 238000003776 cleavage reaction Methods 0.000 description 2
- 125000004122 cyclic group Chemical group 0.000 description 2
- 238000007405 data analysis Methods 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000003828 downregulation Effects 0.000 description 2
- 238000011065 in-situ storage Methods 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000002360 preparation method Methods 0.000 description 2
- 230000007017 scission Effects 0.000 description 2
- 230000003827 upregulation Effects 0.000 description 2
- 239000011534 wash buffer Substances 0.000 description 2
- 101150072531 10 gene Proteins 0.000 description 1
- 101150106774 9 gene Proteins 0.000 description 1
- 102000007260 Deoxyribonuclease I Human genes 0.000 description 1
- 108010008532 Deoxyribonuclease I Proteins 0.000 description 1
- 241000206602 Eukaryota Species 0.000 description 1
- 101000659879 Homo sapiens Thrombospondin-1 Proteins 0.000 description 1
- 108091027974 Mature messenger RNA Proteins 0.000 description 1
- 108091029480 NONCODE Proteins 0.000 description 1
- 206010028980 Neoplasm Diseases 0.000 description 1
- 206010060862 Prostate cancer Diseases 0.000 description 1
- 208000000236 Prostatic Neoplasms Diseases 0.000 description 1
- 238000011529 RT qPCR Methods 0.000 description 1
- 108090000638 Ribonuclease R Proteins 0.000 description 1
- 108091081021 Sense strand Proteins 0.000 description 1
- XUIMIQQOPSSXEZ-UHFFFAOYSA-N Silicon Chemical compound [Si] XUIMIQQOPSSXEZ-UHFFFAOYSA-N 0.000 description 1
- 238000000692 Student's t-test Methods 0.000 description 1
- 101710137500 T7 RNA polymerase Proteins 0.000 description 1
- 102100036034 Thrombospondin-1 Human genes 0.000 description 1
- 229920004892 Triton X-102 Polymers 0.000 description 1
- 230000003321 amplification Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000000137 annealing Methods 0.000 description 1
- 238000010420 art technique Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000031018 biological processes and functions Effects 0.000 description 1
- 229960002685 biotin Drugs 0.000 description 1
- 235000020958 biotin Nutrition 0.000 description 1
- 239000011616 biotin Substances 0.000 description 1
- 239000002981 blocking agent Substances 0.000 description 1
- 238000010804 cDNA synthesis Methods 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 238000005119 centrifugation Methods 0.000 description 1
- 230000002759 chromosomal effect Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 239000013024 dilution buffer Substances 0.000 description 1
- 238000004090 dissolution Methods 0.000 description 1
- 239000000975 dye Substances 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000004907 flux Effects 0.000 description 1
- 238000013467 fragmentation Methods 0.000 description 1
- 238000006062 fragmentation reaction Methods 0.000 description 1
- 230000030279 gene silencing Effects 0.000 description 1
- 239000011521 glass Substances 0.000 description 1
- 230000000415 inactivating effect Effects 0.000 description 1
- 238000007641 inkjet printing Methods 0.000 description 1
- 239000006210 lotion Substances 0.000 description 1
- 210000004962 mammalian cell Anatomy 0.000 description 1
- 239000003550 marker Substances 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 108091070501 miRNA Proteins 0.000 description 1
- 238000002493 microarray Methods 0.000 description 1
- 238000012775 microarray technology Methods 0.000 description 1
- 238000002156 mixing Methods 0.000 description 1
- 230000025308 nuclear transport Effects 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 150000007523 nucleic acids Chemical group 0.000 description 1
- 230000029279 positive regulation of transcription, DNA-dependent Effects 0.000 description 1
- 238000003908 quality control method Methods 0.000 description 1
- 239000011541 reaction mixture Substances 0.000 description 1
- 230000008844 regulatory mechanism Effects 0.000 description 1
- 238000012106 screening analysis Methods 0.000 description 1
- 229910052710 silicon Inorganic materials 0.000 description 1
- 239000010703 silicon Substances 0.000 description 1
- 239000007790 solid phase Substances 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 210000001519 tissue Anatomy 0.000 description 1
- 230000002103 transcriptional effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B25/00—ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
- G16B25/20—Polymerase chain reaction [PCR]; Primer or probe design; Probe optimisation
-
- C—CHEMISTRY; METALLURGY
- C12—BIOCHEMISTRY; BEER; SPIRITS; WINE; VINEGAR; MICROBIOLOGY; ENZYMOLOGY; MUTATION OR GENETIC ENGINEERING
- C12Q—MEASURING OR TESTING PROCESSES INVOLVING ENZYMES, NUCLEIC ACIDS OR MICROORGANISMS; COMPOSITIONS OR TEST PAPERS THEREFOR; PROCESSES OF PREPARING SUCH COMPOSITIONS; CONDITION-RESPONSIVE CONTROL IN MICROBIOLOGICAL OR ENZYMOLOGICAL PROCESSES
- C12Q1/00—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions
- C12Q1/68—Measuring or testing processes involving enzymes, nucleic acids or microorganisms; Compositions therefor; Processes of preparing such compositions involving nucleic acids
- C12Q1/6813—Hybridisation assays
- C12Q1/6834—Enzymatic or biochemical coupling of nucleic acids to a solid phase
- C12Q1/6837—Enzymatic or biochemical coupling of nucleic acids to a solid phase using probe arrays or probe chips
Landscapes
- Life Sciences & Earth Sciences (AREA)
- Chemical & Material Sciences (AREA)
- Health & Medical Sciences (AREA)
- Organic Chemistry (AREA)
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Genetics & Genomics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biotechnology (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Wood Science & Technology (AREA)
- Zoology (AREA)
- General Engineering & Computer Science (AREA)
- Biochemistry (AREA)
- Analytical Chemistry (AREA)
- Microbiology (AREA)
- Immunology (AREA)
- Chemical Kinetics & Catalysis (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Theoretical Computer Science (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention relates to the technical field of biology, in particular to a design method of a long-chain RNA specific probe; the long-chain RNA specific probe can simultaneously detect the expression abundance or differential expression of at least two long-chain RNAs in mRNA, circular RNA or long-chain non-coding RNA. The long-chain RNA specific probe obtained by the design method has high sensitivity and strong specificity, and can simultaneously, quickly and high-flux detect the expression abundance of mRNA of regulatory RNA molecules such as circular RNA, long-chain non-coding RNA and the like and expression protein of a trace sample.
Description
Technical Field
The invention relates to the technical field of biology, in particular to a design method of a long-chain RNA specific probe. A
Background
Circular RNA (circRNA) is a novel class of RNA molecules characterized by covalently closed loops, which are widely present in eukaryotes. The circRNA is derived from an exon or intron region of a gene, and is abundantly present in mammalian cells. The formation of circRNAs differs from the standard cleavage pattern of linear RNA by cleavage at the 5 'end of the donor exon and the 3' end of the acceptor exon, forming reverse splice sites (backsplicing). The following circular RNA forming models are mainly available (see FIG. 1):
(1) "loop-driven looping" or "exon skipping looping" as shown in FIG. 1A;
(2) "paired intron-driven looping (intron-looping)" or "direct backstitch looping" (as shown in fig. 1B);
(3) circular intron rna (cirnas) formation pattern, as shown in fig. 1C;
(4) dependent on the RNA Binding Proteins (RBPs) cyclization pattern, as shown in FIG. 1D;
(5) variable cyclization patterns similar to variable shear are shown in fig. 1E.
Current studies indicate that most circRNAs are conserved across different species. Meanwhile, the cyclic structure of the derivative is stable against degradation of RNase R. circRNA is gaining increasing attention due to its specificity and complexity of regulation of expression, as well as its important role in disease development. Like mirnas and long non-coding RNAs, circrnas have become a new research hotspot in the RNA field. The currently common technical means for detecting the expression abundance of circular RNA is real-time PCR, but the method has the defect of low research flux, and the research of circRNA is just started, so a high-flux reliable detection technology is urgently needed to meet the research requirement.
Long non-coding RNA (lncRNA) generally refers to linear non-coding RNA with the length of more than 200 bases, the overall expression abundance is lower than that of mRNA, the conservation is poor, about 40 percent of lncRNA has polyA tail, and the tissue expression specificity is stronger. It has important functions in transcriptional silencing, transcriptional activation, chromosome modification, nuclear transport and the like. lncRNA has been compared to the dark material of the universe, and in recent years, it has been found to be involved in a variety of biological processes, and is an important basis for maintaining gene function and associated with a variety of complex diseases. The position relationship between a long non-coding RNA and its nearest mRNA can be classified as: antisense long non-coding RNA (antisense lncRNA), synonymous long non-coding RNA (sense lncRNA), intron long non-coding RNA (intron lncRNA), intergenic long non-coding RNA (intergenic lncRNA), bidirectional long non-coding RNA (bidirectional lncRNA), enhancer long non-coding RNA (Enhancer lncRNA)
The gene chip technology is a revolutionary technology for carrying out a great deal of gene expression research by attaching high-density DNA fragments to the surface of a solid phase such as a glass slide, a silicon wafer and the like in a certain sequence or arrangement mode through a high-speed robot or an in-situ synthesis mode by a microarray technology, marking target fragments by fluorescence or biotin and using the base complementary hybridization principle. The gene chip technology is a leading-edge biotechnology in the field of life sciences that has been developed with the implementation of human genome projects. Currently, the classification and diagnosis of diseases has been further improved, and gene chip-based feature selection techniques have played a key role. After ten years of development, the gene chip technology is continuously perfected and mature, and is widely applied to various fields of life science, but no gene chip and method capable of simultaneously detecting the expression abundance of multiple long-chain RNAs such as mRNA, circular RNA, long-chain non-coding RNA and the like exist in the prior art.
Disclosure of Invention
In view of the above-mentioned disadvantages of the prior art, the present invention aims to provide a method for designing a long-chain RNA specific probe, which is used to solve the problems of the prior art.
In order to achieve the above objects and other related objects, the present invention provides in a first aspect a method for designing a specific probe for simultaneously detecting two or more long-chain RNAs, comprising the steps of:
s100, designing a probe of a target gene as a candidate probe according to the type of a preset probe, wherein the type of the preset probe is selected; at least two of an mRNA probe, a circular RNA probe, or a long non-coding RNA probe;
s200, comparing the candidate probe sequence with the full-length target sequences of all the probes;
s300, if the comparison result of a candidate probe accords with a preset value, reserving the candidate probe as a specific probe;
and if the comparison result of one candidate probe does not accord with the preset value, eliminating the candidate probe.
S400, if a target gene has no reserved specific probe, redesigning the probe of the target gene as a candidate probe according to the preset probe type, and continuing to execute the step S200.
If the specific probes reserved for the target gene only comprise part of the types in the preset probe types, redesigning the other preset probe types of the target gene as candidate probes, and continuing to execute the step S200 until all the specific probes reserved for the target gene comprise the probes of the preset types;
specifically, the preset values met in S300 are: the similarity between the candidate probe and the full-length target sequences of all the probes does not exceed a first preset value, and the base length continuously identical to the full-length target sequences of all the probes does not exceed a second preset value.
The non-compliance with the preset values is: the similarity between the candidate probe and at least one target sequence in the full-length target sequences of all the probes exceeds a first preset value, or the base length of continuous identity between the candidate probe and at least one target sequence in the full-length target sequences of all the probes exceeds a second preset value.
Preferably, the sequence used to design the mRNA probe in S100 is selected from the longest transcript sequence of mRNA of each target gene.
Preferably, the sequence used for designing the circular RNA probe in S100 is selected from fragments of reverse-spliced sequences of circular RNAs of respective target genes.
Preferably, the circular RNA probe in S100 is selected from circular RNA probes whose binding site to the reverse splicing sequence of the target gene is located at the reverse splicing site.
Preferably, the specific sequence used in S100 for designing an antisense long non-coding RNA or a synonymous long non-coding RNA probe is selected from the group consisting of: a fragment of a non-overlapping region of an antisense long non-coding RNA or a synonymous long non-coding RNA and mRNA;
specific sequences for designing intron long non-coding RNA, intergenic long non-coding RNA, bidirectional long non-coding RNA or enhancer long non-coding RNA probes are selected from: the longest long noncoding RNA fragment of each target gene.
In a second aspect, the present invention provides a system for designing long-chain RNA specific probes, which can be used to design specific probes for simultaneous detection of at least two long-chain RNAs among mRNAs, circular RNAs, or long-chain non-coding RNAs.
The system comprises:
a design module 1, configured to design a probe of a target gene as a candidate probe according to a preset probe type, where the preset probe type is selected from; at least two of an mRNA probe, a circular RNA probe, or a long non-coding RNA probe;
the comparison module 2 is used for comparing the candidate probe sequences with the full-length target sequences of all the probes;
the screening module 3 is used for judging whether the comparison results of the candidate probes and the full-length target sequences of all the probes accord with a preset value or not, and if so, reserving the candidate probes to be reserved as specific probes; if not, the candidate probe is eliminated.
The iteration module 4: and the specific probes are used for judging whether the target genes have reserved specific probes, if not, the probes of the target genes are redesigned as candidate probes according to the preset probe types, and the comparison module 2 is continuously executed.
If the specific probes reserved for each target gene only include part of the types in the preset probe types, redesigning the other preset probe types of the target gene as candidate probes, and continuing to execute the comparison module 2 until all the specific probes reserved for the target gene include the probes of the preset types.
Further, the sequence used to design the mRNA probe is selected from the sequence of the longest transcript of mRNA of each target gene.
Further, a fragment in which the binding site to the longest transcript of the target gene is located at the 3' end of the longest transcript is selected as an mRNA candidate probe for the target gene.
Further, the sequence used to design the circular RNA probe is selected from the group consisting of: reverse-spliced sequence of circular RNA of each target gene.
Further, the circular RNA probe is selected from circular RNA probes which are positioned at a reverse splicing site with a binding site of a reverse splicing sequence of a target gene.
Further, the sequence used to design the anti-sense long non-coding RNA or synonymous long non-coding RNA probe is selected from the group consisting of: the sequence of the non-overlapping region of the long non-coding RNA and the mRNA of the target gene;
the sequences used to design intron long non-coding RNA, intergenic long non-coding RNA, bidirectional long non-coding RNA, or enhancer long non-coding RNA probes are selected from: the longest long noncoding RNA fragment of each target gene.
Specifically, in the screening module 3,
and if the similarity between a candidate probe and the full-length target sequences of all the probes does not exceed a first preset value and the base length continuously same with the full-length target sequences of all the probes does not exceed a second preset value, determining that the comparison result of the candidate probe and the full-length target sequences of the probes conforms to a preset value.
And if the similarity between a candidate probe and at least one of the full-length target sequences of all the probes exceeds a first preset value or the continuous identical base length between the candidate probe and at least one of the full-length target sequences of all the probes exceeds a second preset value, determining that the comparison result between the candidate probe and the full-length target sequences of the probes does not meet the preset value.
A third aspect of the present invention provides a storage medium having stored thereon a computer program which, when executed by a computer, implements a method of designing a specific probe for simultaneously detecting two or more long-chain RNAs.
A fourth aspect of the present invention provides a service terminal comprising a processor and a memory; the memory is used for storing computer programs, and the processor is used for executing the computer programs stored by the memory, so that the service terminal can realize a method for designing specific probes for detecting more than two long-chain RNAs simultaneously when being executed.
As described above, the long-chain RNA specific probe, the design method thereof and the gene chip of the invention have the following beneficial effects:
1) by using a probe and a gene chip which are designed specifically, the hybridization principle of the gene chip is utilized, linear RNA removal is not needed, and long-chain RNA to be detected can be captured by specific hybridization;
2) and simultaneously, nearly 8 ten thousand of circular RNAs, 7 ten thousand of long-chain non-coding RNAs and 2 ten thousand of mRNAs are detected, so that the simultaneous, rapid and high-throughput detection of the expression abundance of the mRNA of the regulation and control RNA molecules such as the circular RNA, the long-chain non-coding RNA and the like and the expression protein of the micro sample is realized.
3) The gene chip technology is better applied to long-chain RNA expression profile analysis, makes up the defect that various RNAs of the existing transcriptome can not be detected simultaneously, and provides a technical method for researching a transcription regulation network.
Drawings
FIG. 1 shows 5 circRNA forming models in the prior art, wherein Panel A is the loop-forming model of interlocking-driven looping or exon skipping; b, a matched intron is driven to form a ring or is directly and reversely spliced into a ring model; pattern C is a model for the formation of cRNAs; d, an RBPs-dependent cyclization model; figure E is a variable cyclization model.
FIG. 2 is a schematic diagram showing six long non-coding RNAs classified according to their positional relationship on genome in the prior art, wherein a is intergenic long non-coding RNA, b is intron long non-coding RNA, c is bidirectional long non-coding RNA, d is enhancer long non-coding RNA, e is synonymous long non-coding RNA, and f is antisense long non-coding RNA.
FIG. 3 shows a schematic diagram of six regulatory mechanisms of long non-coding RNA in the prior art.
FIG. 4 is a flow chart showing the overall design of the long-chain RNA specific probe of the present invention.
FIG. 5 shows a schematic diagram of the design of the long-chain RNA probe of the present invention.
FIG. 6 shows a schematic diagram of a system for designing long-chain RNA specific probes according to the present invention.
FIG. 7 shows a schematic diagram of a service terminal for designing a long-chain RNA specific probe according to the present invention.
FIG. 8 is a graph showing the relative expression values of the probe expression value of the chip of the present invention and the probe sensitivity and specificity of the quantitative PCR assay.
FIG. 9 is a statistical chart showing the signal values of various long-chain RNAs detected by the gene chip according to the embodiment of the present invention.
FIG. 10 is a scanning diagram of a gene chip for detecting the expression abundance of long-chain RNA according to an embodiment of the present invention.
Element number description in fig. 6 and 7
1 design module
2 comparing module
3 screening module
4 iteration module
5 processor
6 memory
Detailed Description
In the present invention, the term "long-chain RNA" includes mRNA, long-chain non-coding RNA, circRNA and the like.
The term "probe" refers to a DNA or RNA nucleic acid sequence of known sequence that is complementary to a gene of interest.
The term "specific probe" refers to a probe which has strong specificity and no mutual interference when a plurality of long-chain RNAs are detected simultaneously.
The term "full-length target sequence" refers to the full-length sequence of the RNA in which the fragment is recognized by the probe, and the sequence of the RNA given in the sequence database is typically the sense strand sequence of DNA.
The term "similarity" refers to the fact that DNA sequences are formed by the combination of A, T, C, G four base sequences, and the similarity degree of the bases of the two sequences is scored by using various existing scoring schemes (such as a matching scoring matrix), and the score, i.e., similarity, represents the similarity degree.
The term "longest transcript," a transcript is the mature mRNA that is formed by transcription of a gene and encodes a protein, and because of the different splicing patterns that occur when mRNA is formed, multiple transcripts may be present in a gene, wherein the transcript with the longest sequence is the longest transcript of the gene.
One embodiment of the present invention provides a method for designing a specific probe for simultaneously detecting two or more long-chain RNAs, comprising the following steps:
s100, designing a probe of a target gene as a candidate probe according to the type of a preset probe, wherein the type of the preset probe is selected; at least two of an mRNA probe, a circular RNA probe, or a long non-coding RNA probe;
s200, comparing the candidate probe sequence with the full-length target sequences of all the probes;
s300, if the comparison result of a candidate probe accords with a preset value, reserving the candidate probe as a specific probe;
and if the comparison result of one candidate probe does not accord with the preset value, eliminating the candidate probe.
S400, if a target gene has no reserved specific probe, redesigning the probe of the target gene as a candidate probe according to the preset probe type, and continuing to execute the step S200.
If the specific probes reserved for the target gene only comprise part of the types in the preset probe types, redesigning the other preset probe types of the target gene as candidate probes, and continuing to execute the step S200 until all the specific probes reserved for the target gene comprise the probes of the preset types;
specifically, in step S100, the preset probe type may be two mRNA probes and two circular RNA probes, two mRNA probes and two long non-coding RNA probes, two circular RNA probes and two long non-coding RNA probes, or three mRNA probes, circular RNA probes, and long non-coding RNA probes.
In the preferred embodiment shown in FIG. 4, the predetermined probe species are three species, namely, mRNA probe, circular RNA probe and long non-coding RNA probe.
Specifically, in step S300:
and if the similarity between a candidate probe and the full-length target sequences of all the probes does not exceed a first preset value and the base length continuously same with the full-length target sequences of all the probes does not exceed a second preset value, determining that the comparison result of the candidate probe and the full-length target sequences of the probes conforms to a preset value.
And if the similarity between a candidate probe and at least one of the full-length target sequences of all the probes exceeds a first preset value or the continuous identical base length between the candidate probe and at least one of the full-length target sequences of all the probes exceeds a second preset value, determining that the comparison result between the candidate probe and the full-length target sequences of the probes does not meet the preset value.
The judgment that the target sequence does not meet the preset value can be determined after the full-length target sequences of a candidate probe and all probes are compared; alternatively, the alignment may be terminated when a full-length target sequence having a first similarity exceeding a first predetermined value is found or a full-length target sequence having a first consecutive identical base length to the candidate probe exceeding a second predetermined value is found, and it is determined that the target sequence does not meet the predetermined value.
Further, the first preset value and the second preset value may be changed according to different comparison programs, and it is necessary to experimentally verify whether the preset value is set reasonably at an initial design stage, so as to ensure the specificity of the probe screened under the preset value, for example, the comparison is performed by using a Bedtools offline comparison program, where the first preset value may be not more than 75%, and the second preset value may be not more than 15%.
In a preferred embodiment as shown in FIG. 4, the sequence used to design the mRNA probes in S100 is selected from the longest transcript sequence of mRNA for each gene of interest.
The sequence of the target gene can be confirmed using the prior art, for example, the target gene sequence can be derived from the GenBank database.
The longest transcript of the gene of interest can be identified using prior art techniques, e.g., the longest transcript can be derived from the Refseq database.
Further, as shown in FIG. 5, an mRNA probe having a binding site with the longest transcript of the target gene at the 3' end of the longest transcript was selected as an mRNA candidate probe for the target gene. The 3 'end of the longest transcript generally refers to a fragment within 300 bases from the first base of the 3' end.
The reverse transcription of the sample to be detected is started from the 3 'end, and the mRNA probe is arranged at the 3' end, so that the detection sensitivity can be improved. When a fragment of a non-overlapping region of an antisense long non-coding RNA or a synonymous long non-coding RNA and an mRNA is identified, the longest transcript of a candidate gene is used as a full-length target sequence of a target gene, and thus more accurate identification can be achieved.
Further, the sequence used for designing the circular RNA probe in S100 is selected from: a fragment of the reverse spliced sequence of the circular RNA.
The reverse splicing sequence refers to a sequence formed by connecting the 5 'end of the splicing donor exon and the 3' end of the splicing acceptor exon of the circular RNA linear sequence end to form a ring.
The circular RNA linear sequences can be obtained using existing techniques, for example, from sequences derived from circBase, circcpedia multidata libraries, after redundancy is removed based on sequence and chromosomal location.
The redundant sequence can be removed by existing software, for example, Bedtools.
Further, as shown in FIG. 5, in S100, a circular RNA probe having a binding site to the reverse splice sequence of the target gene at the reverse splice site is selected as a circular RNA candidate probe for the target gene.
The reverse splice site is the only different region of the circular RNA from the corresponding linear RNA, which is generated by joining the 5 'end of the splice-donor exon and the 3' end of the splice-acceptor exon end-to-end.
According to the special splicing pattern of circular RNA, reverse splicing, the reverse splicing sequence has a specific reverse splicing site (backsplying), but linear RNA does not have the site, so that a circular RNA probe is designed at the reverse splicing site, and the probe can specifically detect circular RNA in a sample.
Further, since the long non-coding RNAs have different types and the rules of different types of long non-coding RNAs in designing probes are different, it is necessary to design probes according to the types of the long non-coding RNAs.
In the preferred embodiment as shown in fig. 4, in S100,
the sequence used for designing the antisense long non-coding RNA or the synonymous long non-coding RNA probe is selected from the group consisting of: a fragment of a non-overlapping region of an antisense long non-coding RNA or a synonymous long non-coding RNA and mRNA;
the sequences used to design intron long non-coding RNA, intergenic long non-coding RNA, bidirectional long non-coding RNA, or enhancer long non-coding RNA probes are selected from: a fragment of the longest long noncoding RNA of each target gene.
The reason for choosing the above sequences for designing long non-coding RNA probes is: probes that avoid long non-coding RNAs detect isogenic mrnas.
The transcript sequence of the long non-coding RNA can be obtained by removing redundancy according to the sequence and chromosome position from sequences derived from Ensembl, NCBI, UCSC, GENECODE, NONCODE and other multidatabase.
In S100, the probe can be designed using existing probe design software, for example, the Agilent professional genechip probe design software earray can be used.
Specifically, in the preferred embodiment shown in FIG. 4, the specific probe design method can be designed to detect mRNA, circular RNA and long non-coding RNA simultaneously. The design method comprises the following steps: combining the mRNA longest transcript, the circular RNA reverse splicing sequence and the sequence for designing the long-chain non-coding RNA probe obtained by the method into a file, introducing earray software, setting corresponding parameters according to the design principle of the existing probe, such as GC proportion, annealing temperature and the like, and then designing the probe.
Further, due to the complexity of the genome, hybridization of probes to multiple target sequences is avoided, requiring specific screening using iterative assays: namely, each probe is compared with the full-length target sequences of all the probes, and specificity screening is carried out according to whether the preset value is met, namely whether the preset value is met, the similarity of each candidate probe and the full-length target sequences of all the probes, and the continuous identical base length between the candidate probe and the full-length target sequences of all the probes. And (3) screening conditions according to the set specificity: the similarity is not more than 75%, and the length of the continuous identical base between the continuous identical base and the full-length target sequence is not more than 15, the candidate probe satisfies the condition and is regarded as high in specificity, the candidate probe is reserved as a specific probe, the candidate probe does not satisfy the condition and is regarded as poor in specificity, and the candidate probe is abandoned. And judging whether the specific probe is reserved or not, and returning to redesign of the probe if the specific probe is not reserved. If the specific probe is reserved, whether each target gene in the reserved specific probe has an mRNA specific probe, a long-chain non-coding RNA specific probe and a circular RNA specific probe or not is continuously judged, if yes, the specific probe of the target gene is completely designed, and if not, other types of probes are continuously designed.
Further, when designing a probe using software, one probe may be designed for each probe type of each target gene, or a plurality of probes may be designed as candidate probes. When designing a probe, if the probe satisfies the condition of specific screening, the probe is directly used as a specific probe. If the probe does not meet the conditions for specific screening, the probe is redesigned. When the probes are redesigned, the number of the probes generated by the software can be set to be a plurality, and other parameters are not changed. After the software randomly generates a plurality of probes under the condition of meeting the set parameters, the probes are compared with the full-length target sequences of all the probes (including the newly designed probes and the screened specific probes), the probes meeting the specific screening are saved as specific probes, and the probes not meeting the specific screening are not saved. Any of the specific probes may be selected at the time of the experiment. If no specific probe has been generated this time, the probe is redesigned again until a specific probe is generated. Based on the existing probe design software, when designing probes for the same sequence, probes are randomly output in a plurality of probes meeting the parameter requirements according to the number of the probes required to be output, so that when redesigning the probes, different probe sequences can be obtained even if the same sequence and design parameters are adopted.
Further, the length of the specific probe for mRNA, circular RNA or long non-coding RNA is 50-70 nt.
For example, the number of the channels may be 55nt, 60nt, 65nt, and 70 nt.
Furthermore, more than two of the specific mRNA probes, the specific circular RNA probes or the specific long-chain non-coding RNA probes designed by the specific probe design method are integrated into a gene chip, and the gene chip can detect any two or three long-chain RNAs of the mRNA, the circular RNA or the long-chain non-coding RNA at the same time.
In a preferred embodiment, as shown in FIG. 6, a system for designing long-chain RNA specific probes is provided, which can be used to design specific probes for simultaneous detection of at least two long-chain RNAs among mRNAs, circular RNAs, or long non-coding RNAs.
The system comprises:
a design module 1, configured to design a probe of a target gene as a candidate probe according to a preset probe type, where the preset probe type is selected from; at least two of an mRNA probe, a circular RNA probe, or a long non-coding RNA probe;
the comparison module 2 is used for comparing the candidate probe sequences with the full-length target sequences of all the probes;
the screening module 3 is used for judging whether the comparison results of the candidate probes and the full-length target sequences of all the probes accord with a preset value or not, and if so, reserving the candidate probes to be reserved as specific probes; if not, the candidate probe is eliminated.
The iteration module 4: and the specific probes are used for judging whether the target genes have reserved specific probes, if not, the probes of the target genes are redesigned as candidate probes according to the preset probe types, and the comparison module 2 is continuously executed.
If the specific probes reserved for each target gene only include part of the types in the preset probe types, redesigning the other preset probe types of the target gene as candidate probes, and continuing to execute the comparison module 2 until all the specific probes reserved for the target gene include the probes of the preset types.
Specifically, in the design module 1, the preset probe types may be two types of mRNA probes and circular RNA probes, two types of mRNA probes and long-chain non-coding RNA probes, two types of circular RNA probes and long-chain non-coding RNA probes, and more three types of mRNA probes, circular RNA probes, and long-chain non-coding RNA probes.
Further, the sequence used to design the mRNA probe is selected from the longest transcript sequence of mRNA for each target gene.
Further, a fragment whose binding site to the longest transcript of the target gene is located at the 3' end of the longest transcript is selected as an mRNA candidate probe for the target gene.
Further, the sequence used to design the circular RNA probe is selected from the group consisting of: a fragment selected from the reverse splice sequence of the circular RNA.
Further, a circular RNA sequence having a binding site for a reverse splice sequence of the target gene at the reverse splice site is selected as a circular RNA candidate probe for the target gene.
Further, the sequence used to design the anti-sense long non-coding RNA or synonymous long non-coding RNA probe is selected from the group consisting of: a fragment of the target gene that binds to a non-overlapping region of an antisense long non-coding RNA or a synonymous long non-coding RNA and an mRNA;
the sequences used to design intron long non-coding RNA, intergenic long non-coding RNA, bidirectional long non-coding RNA, or enhancer long non-coding RNA probes are selected from: the fragment of each target gene that binds to the longest sequence of the long non-coding RNA.
Specifically, in the screening module 3,
and if the similarity between a candidate probe and the full-length target sequences of all the probes does not exceed a first preset value and the base length continuously same with the full-length target sequences of all the probes does not exceed a second preset value, determining that the comparison result of the candidate probe and the full-length target sequences of the probes conforms to a preset value.
And if the similarity between a candidate probe and at least one of the full-length target sequences of all the probes exceeds a first preset value or the continuous identical base length between the candidate probe and at least one of the full-length target sequences of all the probes exceeds a second preset value, determining that the comparison result between the candidate probe and the full-length target sequences of the probes does not meet the preset value.
Further, the first preset value and the second preset value may be changed according to different comparison programs, and an experiment is required to verify whether the preset value is set reasonably at an initial design stage, so as to ensure the specificity of the probe screened under the preset value, for example, the first preset value may be not more than 75%, and the second preset value may be not more than 15.
Candidate probes are subjected to a screening module and an iteration module in order to increase specificity from probe to probe and from probe to RNA of interest. The specific probe obtained after screening and iteration can ensure that the circular RNA detection probe and the detection probe of long-chain non-coding RNA and mRNA have high specificity and sensitivity at the same time.
It should be noted that the division of the modules of the above apparatus is only a logical division, and the actual implementation may be wholly or partially integrated into one physical entity, or may be physically separated. And these modules can be realized in the form of software called by processing element; or may be implemented entirely in hardware; and part of the modules can be realized in the form of calling software by the processing element, and part of the modules can be realized in the form of hardware. For example, the x module may be a processing element that is set up separately, or may be implemented by being integrated in a chip of the apparatus, or may be stored in a memory of the apparatus in the form of program code, and the function of the x module may be called and executed by a processing element of the apparatus. Other modules are implemented similarly. In addition, all or part of the modules can be integrated together or can be independently realized. The processing element described herein may be an integrated circuit having signal processing capabilities. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in the form of software.
Yet another embodiment of the present invention provides a storage medium having stored thereon a computer program which, when executed by a computer, implements a method of designing a specific probe for simultaneously detecting two or more long-chain RNAs.
Further, the storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic disk, U-disk, memory card, or optical disk.
As shown in fig. 8, still another embodiment of the present invention provides a service terminal including a processor 5 and a memory 6; the memory 6 is used for storing computer programs, and the processor 5 is used for executing the computer programs stored in the memory 6, so that the service terminal can realize a method for designing specific probes for detecting more than two long-chain RNAs simultaneously when being executed.
The memory 6 is used for storing a computer program. Preferably, the memory 6 comprises: various media that can store program codes, such as ROM, RAM, magnetic disk, U-disk, memory card, or optical disk.
The processor 5 is connected to the memory 6 and configured to execute the computer program stored in the memory 6, so that the service terminal executes the design method described above.
Preferably, the Processor 5 may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; the integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, or discrete hardware components.
Hereinafter, using ESR2 gene as an example, specific probes for simultaneously detecting mRNA, long non-coding RNA, and circular RNA of ESR2 gene were designed, i.e., three types of predetermined probes were mRNA-specific probes, long non-coding RNA-specific probes, and circular RNA-specific probes.
S100: finding ESR2 gene sequence (NM __001214902) in GenBank, finding the longest transcript (with the sequence number being NM __001214902) of ESR2 gene in Refseq database, designing mRNA probe as candidate probe according to the detection of ESR2 gene sequence, and the sequence is as follows:
ATAAAAGAGTTTTGGGAATACACTGAGCTTTGAGTGAAAGAAGCTGCAGTGGCCTCCCTG(SEQ IDNO:1)
the probe of long non-coding RNA ENST00000359491, which is transcribed homotropically with ESR2 gene and has partial exon overlap, is found in the Ensembl database, and is used as a candidate probe, and the sequence is as follows:
ATACCTGAGCAAGTGAAATTAAGAAGGGAATTGAAGCAAATATTCCTGACATCCAAGTGG(SEQ IDNO:2)
the cyclic RNA hsa _ circ _0102409 derived from the ESR2 gene is found in the circBase database, and is formed by reversely splicing the 7 th exon to the 12 th exon of the ESR2 gene from beginning to end. According to the characteristics of the 5 'end sequence (SEQ ID NO:3) and the 3' end sequence (SEQ ID NO:4) of the circular RNA, a probe covering the splicing site is designed to serve as a candidate probe, and the sequence is as follows (SEQ ID NO: 5):
GGATGAGGGGAAATGCGTAGAAGGAATTCTGGAAATCTTTGACATGCTCCTGGCAACTACTTCAAGGTTTCGAGAGTTAAAACTCCAACACAAAGAATATCTCTGTGTCAAGGCCATGATCCTGCTCAATTCCA(SEQ ID NO:3)
CCATTATACTTGCCCACGAATCTTTGAGAACATTATAATGACCTTTGTGCCTCTTCTTGCAAGGTGTTTTCTCAGCTGTTATCTCAAGACATGGATATAAAAAACTCACCATCTAGCCTTAATTCTCCTTCCTCCTACAACTGCAGTCAATCCATCTTACCCCTGGAGCACGGCTCCATATACATACCTTCCTCCTATGTAGACAGCCACCATGAATATCCAGCCATGACATTCTATAGCCCTGCTGTGATGAATTACAGCATTCCCAGCAATGTCACTAACTTGGAAGGTGGGCCTGGTCGGCAGACCACAAGCCCAAATGTGTTGTGGCCAACACCTGGGCACCTTTCTCCTTTAGTGGTCCATCGCCAGTTATCACATCTGTATGCGGAACCTCAAAAGAGTCCCTGGTGTGAAGCAAGATCGCTAGAACACACCTTACCTGTAAACAG(SEQ ID NO:4)
GCCATGATCCTGCTCAATTCCACCATTATACTTGCCCACGAATCTTTGAGAACATTATAA(SEQ IDNO:5)
the probe can specifically detect circular RNA hsa _ circ _ 0102409.
S200: and (3) aligning the candidate probes with the full-length target sequences of all the probes.
S300: the comparison results all accord with the preset value and are reserved as specific probes.
S400: the specific probe comprises all the preset probe types, and the specific probe design of the gene is completed.
The specific probe design method for detecting mRNA, circular RNA or long non-coding RNA of other target genes in the gene chip is the same as that of ESR2 gene.
Verification of specificity and sensitivity of specific probe and synthesis of chip
In order to verify the sensitivity and specificity of the probe after screening and iterative detection, the THBS1 gene, long-chain non-coding RNA ENST00000478845 from an intron and circular RNAhsa _ circ _0034426 formed by circularization of the 2 nd exon to the 7 th exon are selected for quantitative PCR verification, and the result is shown in FIG. 7, and the change fold of 3 long-chain RNAs of the THBS1 gene relative to a control group is consistent with the change trend of quantitative PCR expression in a chip.
The quantitative PCR experiment steps are as follows:
first Strand cDNA Synthesis
1. RNA was removed from a-80 ℃ freezer, thawed at 4 ℃ and then placed in a 0.2ml PCR tube to prepare the reaction system as follows:
2. the PCR tube was incubated at 37 ℃ for 15min, denatured at 98 ℃ for 5min, and incubated at 4 ℃.
SYBR Green qPCR
1. The reaction mixture (384 well plates) was placed in a 1.5mL centrifuge tube:
2. placing the PCR tube in a PCR instrument for reaction, incubating at 50 ℃ for 2min, and then incubating at 95 ℃ for 10 min; then 40 cycles were performed: at 95 ℃ for 15 seconds; 60 ℃, 1min, and finally the dissolution profile was added.
(III) the primer sequences are as follows:
THBS1
an upstream primer: GAACGGGACAACTGCCAGTA (SEQ ID NO:6)
A downstream primer: ACCTACAGCGAGTCCAGGAT (SEQ ID NO:7)
ENST00000478845
An upstream primer: TCGCGCATTCTTGGAAGTCT (SEQ ID NO:8)
A downstream primer: TGCCAGAGGGTGAAAAGCAA (SEQ ID NO:9)
hsa_circ_0034426
An upstream primer: CTGCAAAAAGGTGTCCTGCC (SEQ ID NO:10)
A downstream primer: TCAGGAACAGGACGCCTAGT (SEQ ID NO:11)
After the verification is finished, an Agilent company is entrusted to utilize an ink-jet printing chemical in-situ synthesis technology to customize the long-chain RNA gene expression abundance high-flux detection chip under the strict quality control condition.
Detection of expression abundance of long-chain RNA
The experimental operation comprises the following specific steps:
1. extracting and purifying total RNA of sample
Trizol extracts total RNA from the samples and then QIAGENKit (cat No. 74106) purified total RNA, detailed procedure as follows (see RNeasy Mini Protocol):
1) total RNA (100. mu.g or less) was dissolved in 100. mu.l RNase free (RNase-free) water, and 350. mu.l buffer RLT was added thereto and mixed well.
2) Add 250. mu.l of absolute ethanol and mix well with the tip of the sample gun.
3) A total of 700. mu.l of the total RNA-containing solution was transferred to an RNeasy column jacketed in a 2ml centrifuge tube, centrifuged at 13200rpm for 15 seconds, and the filtrate was discarded.
4) Mu.l of buffer RW1 was pipetted into an RNeasy mini column, centrifuged at 13200rpm for 15 seconds, and the filtrate was discarded.
5) Add 10. mu.l DNase I to 70. mu.l buffer RDD, mix well, add to the column and allow to stand at room temperature for 15 min.
6) Mu.l of buffer RW1 was pipetted into an RNeasy mini column, centrifuged at 13200rpm for 15 seconds, and the filtrate was discarded.
7) Aspirate 500. mu.l of buffer RPE into RNeasy mini column, centrifuge at 13200rpm for 15 seconds, discard the filtrate and repeat the procedure once.
8) Replace the new cannula, 13200rpm, 2 min. And the column was transferred to the elution tube.
9) RNeasy mini column was transferred into collection tube.
10) 30 μ l of RNase free water was aspirated, allowed to stand for 1min, and centrifuged at 13200rpm for 1 min.
11) 30ul of the sample in the elution tube was again transferred back to the column, allowed to stand for 1min and centrifuged at 13200rpm for 1 min.
12) The RNA concentration and A260/280 were determined from NanoDrop (NanoDrop ND-1000UV-VIS spectrophotometer).
2. Linear amplification of RNA and labeling of fluorescent cy3
1) A single marker Spike-In (RNA Spike-In Kit, One-Color, Agilent5188-5282) was prepared. Spike-in was diluted with dilution buffer according to different RNA starting amounts as shown in Table 1:
TABLE 1 RNA spike-in
2) Reverse transcription: reaction solutions having the compositions shown in table 2 were prepared:
TABLE 2 reverse transcription reaction solution composition
10-200ng of total RNA | 1.5μl |
Diluted one-dye spike in | 2.0μl |
T7 Promoter Primer | 0.8μl |
Nuclease-free water (white cap) | 1.0μl |
Total volume | 5.3μl |
The PCR instrument (MJ PTC-100) was incubated at 65 ℃ for 10min in an ice bath for 5 min. Meanwhile, 5X first strondbuffer is preheated for 3min at 80 ℃ and is reserved at room temperature. A reverse transcription mixed solution is prepared, and the specific composition is shown in table 3:
TABLE 3 reverse transcription Mixed solution composition
5X First Strand Buffer | 2.0μl |
0.1M DTT | 1.0μl |
10mM dNTP mix | 0.5μl |
AffinityScript RNase Block Mix | 1.2μl |
Total volume | 4.7μl |
The above 4.7. mu.l of the reverse transcription mixed solution was added to the denatured RNA in the ice bath, mixed well, centrifuged, and subjected to PCR reaction. And (3) PCR reaction conditions: reacting for 2 hours at 40 ℃; inactivating at 70 ℃ for 15 minutes; the reaction was carried out at 4 ℃ for 5 minutes.
3) Fluorescent markers
A mixed solution of fluorescence Labeling (Low Input Quick Amp Labeling Kit, One-Color, Agilent 5190-:
TABLE 4 fluorescent labeling Mixed solution composition
Nuclease-free water | 0.75 |
5 transcription buffer | 3.2μl |
0.1M DTT | 0.6μl |
NTP mix | 1.0μl |
T7 RNA polymerase mixture | 0.21μl |
Cy3-CTP | 0.24μl |
Total volume | 6.0μl |
Adding the 6.0 mul of the fluorescent labeling mixed solution, mixing uniformly, centrifuging, and carrying out PCR reaction to obtain a fluorescent labeling product. And (3) PCR reaction conditions: reacting for 2 hours at 40 ℃; the reaction was carried out at 4 ℃ for 5 minutes.
4) Fluorescent labeling product purification
A) Add 84. mu.l nuclease-free water to a total volume of 100. mu.l.
B) Add 350. mu.l of RLT and mix well.
C) Add 250. mu.l of absolute ethanol and mix well without centrifugation.
D) Mu.l of mix was transferred to the column. 13000rpm, centrifuge at 4 ℃ for 30 sec. The flow-through was discarded.
E) Add 500. mu.l of RPE, 13000rpm, centrifuge at 4 ℃ for 30 seconds. The flow-through was discarded.
F) An additional 500. mu.l of RPE was added, 13000rpm, and centrifuged at 4 ℃ for 60 seconds. The flow-through was discarded.
G) The cannula was replaced, allowed to idle at 13000rpm for 30 seconds at 4 ℃ and the column was transferred to the elution tube.
H) Add 30. mu.l of nuclease-free water, let stand for 1min, 13000rpm, and centrifuge at 4 ℃ for 30 seconds.
I) The 30. mu.l sample in the elution tube was again transferred back to the column, left to stand for 1min, 13000rpm, and centrifuged at 4 ℃ for 30 seconds.
J) RNA concentration, Cy3 concentration, A260/280 were measured using NanoDrop.
The requirements for the amount of probe used for purification of the fluorescently labeled product are shown in Table 5:
TABLE 5 amount of probe used for purification of fluorescently labeled product
1 × chip | cRNA>5μg | Cy3>6pmol/ |
2 × chip | cRNA>3.75μg | Cy3>6pmol/ |
4 × chip | cRNA>1.65μg | Cy3>6pmol/ |
5 × chip | cRNA>0.825μg | Cy3>6pmol/μg |
3. Hybridization of Gene chip
The purified fluorescence labeling product and the probe on the circular RNA gene chip are hybridized by utilizing the base complementary hybridization principle. The Hybridization Kit used was Gene Expression Hybridization Kit (Agilent 5188-5242). The method comprises the following specific steps:
1) the segmented mixed solution was prepared as in table 6:
TABLE 6 fragmentation mix solution composition
Composition of matter | 1x | 2x | 4x | 8x |
Cy3-cRNA | 5μg | 3.75μg | 1.65μg | 600ng |
10X blocking agent | 50μL | 25μL | 11μL | 5μL |
Nuclease-free water | Make up to 240 μ L | Make up to 120 mu L | Make up to 52.8 mu L | Make up to 24 mu L |
25 Xfragmentation buffer | 10μL | 5μL | 2.2μL | 1μL |
Total volume | 250μL | 125μL | 55μL | 25μL |
2) Preserving the temperature at 60 ℃ for 30min, then carrying out ice bath for 1min, and centrifuging for a short time.
3) Add an equal volume of 2 XGEx hybridization buffer HI-RPM as shown in Table 7 and mix well.
TABLE 7 hybridization mix solution composition
Composition of matter | 1x | 2x | 4x | 8x | |
Fragmenting cRNA in a mixed solution | 250μL | | 55μL | 25μL | |
2 XGEx hybridization buffer HI-RPM | 250μL | 125μL | 55μL | 25μL |
4)13000rpm, centrifuged for 1min and then placed on ice.
5) The hybridization chamber (Agilent G2534A) was placed on a horizontal table top, a coverslip with gasket was placed, and samples were added in the volumes shown in Table 8:
TABLE 8 hybridization sample addition volume
Composition of matter | 1x | 2x | 4x | 8x |
Preparation volume | 500μL | 250μL | 110μL | 50μL |
Hybridization volume | 490μL | 240μL | 100μL | 40μL |
6) The gene chip with the "Agilent" side down was mounted on a coverslip and the hybridization chamber was assembled quickly and hybridized for 17h in a hybridization oven (Agilent G2545A) at 65 ℃ and 10 rpm.
4. Washing and scanning gene chip
1) Wash 1 and wash 2ml of 10% Triton X-102 was added and wash 2 was preheated at 37 ℃ overnight.
2) The gene chip which has completed the hybridization was taken out of the hybridization oven, the hybridization chamber was disassembled, and the gene chip was washed according to steps 1 to 3 in Table 9:
TABLE 9 Gene chip washing procedure
Procedure for the preparation of the | Lotion composition | Temperature of | Time of washing |
Tear-off piece | |
At room temperature | - |
|
|
At room temperature | |
Wash solution | |||
2 Wash | |
37℃ | 1min |
3) The washed gene chip was loaded into a slide holder and scanned by a scanner (Agilent Microarray ScannerG2565 CA). The scan parameters are shown in table 10:
TABLE 10 Gene chip Scan parameters
5. Data analysis
The original data are normalized by limma package in R software, and the expression abundance of long-chain RNA and the RNA with differential expression are analyzed by using Fold-change (expression difference multiple) and T test (Student's T-test) statistical methods.
The abundances of expressed mrnas, long non-coding RNAs, and circular RNAs in the experimental group and the control group are shown in fig. 9 and 10.
Taking prostate cancer tumor (experimental group) and paracancer (control group) as examples, 328 mRNAs with 2-fold difference of up-regulated mRNAs and 892 mRNAs with 2-fold difference of down-regulated mRNAs are obtained by screening after gene chip screening and data analysis, and 1220 mRNAs with 2-fold difference of expression are screened out totally. 447 long-chain non-coding RNAs with 2-fold difference and up-regulation, 840 long-chain non-coding RNAs with 2-fold difference and down-regulation, and 1287 long-chain non-coding RNAs with 2-fold difference expression are screened in total. There were 508 circular RNAs that differed by 2-fold up-regulation and 1706 circular RNAs that differed by 2-fold down-regulation, and 2 differentially-expressed circular genes 2214 were screened in total.
In conclusion, the present invention effectively overcomes various disadvantages of the prior art and has high industrial utilization value.
The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.
Sequence listing
<110> Shanghai biochip Co., Ltd
<120> design method of long-chain RNA specific probe
<160>11
<170>SIPOSequenceListing 1.0
<210>1
<211>60
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>1
ataaaagagt tttgggaata cactgagctt tgagtgaaag aagctgcagt ggcctccctg 60
<210>2
<211>60
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>2
atacctgagc aagtgaaatt aagaagggaa ttgaagcaaa tattcctgac atccaagtgg 60
<210>3
<211>134
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>3
ggatgagggg aaatgcgtag aaggaattct ggaaatcttt gacatgctcc tggcaactac 60
ttcaaggttt cgagagttaa aactccaaca caaagaatat ctctgtgtca aggccatgat 120
cctgctcaat tcca 134
<210>4
<211>452
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>4
ccattatact tgcccacgaa tctttgagaa cattataatg acctttgtgc ctcttcttgc 60
aaggtgttttctcagctgtt atctcaagac atggatataa aaaactcacc atctagcctt 120
aattctcctt cctcctacaa ctgcagtcaa tccatcttac ccctggagca cggctccata 180
tacatacctt cctcctatgt agacagccac catgaatatc cagccatgac attctatagc 240
cctgctgtga tgaattacag cattcccagc aatgtcacta acttggaagg tgggcctggt 300
cggcagacca caagcccaaa tgtgttgtgg ccaacacctg ggcacctttc tcctttagtg 360
gtccatcgcc agttatcaca tctgtatgcg gaacctcaaa agagtccctg gtgtgaagca 420
agatcgctag aacacacctt acctgtaaac ag 452
<210>5
<211>60
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>5
gccatgatcc tgctcaattc caccattata cttgcccacg aatctttgag aacattataa 60
<210>6
<211>20
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>6
gaacgggaca actgccagta 20
<210>7
<211>20
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>7
acctacagcg agtccaggat 20
<210>8
<211>20
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>8
tcgcgcattc ttggaagtct 20
<210>9
<211>20
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>9
tgccagaggg tgaaaagcaa 20
<210>10
<211>20
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>10
ctgcaaaaag gtgtcctgcc 20
<210>11
<211>20
<212>DNA
<213> Artificial Sequence (Artificial Sequence)
<400>11
tcaggaacag gacgcctagt 20
Claims (15)
1. A design method of a specific probe for detecting more than two long-chain RNAs simultaneously comprises the following steps:
s100, designing a long-chain RNA probe of a target gene as a candidate probe according to the type of a preset probe, wherein the type of the preset probe is selected from; at least two of an mRNA probe, a circular RNA probe, or a long non-coding RNA probe;
s200, comparing the candidate probe sequence with the full-length target sequences of all the probes;
s300, if the comparison result of a candidate probe accords with a preset value, reserving the candidate probe as a specific probe;
if the comparison result of a candidate probe does not accord with the preset value, the candidate probe is eliminated;
s400, if a target gene has no reserved specific probe, redesigning a long-chain RNA probe of the target gene as a candidate probe according to the type of a preset probe, and continuing to execute S200;
if the specific probes reserved for the target gene only include part of the types in the preset probe types, redesigning the other preset probe types of the target gene as candidate probes, and continuing to execute S200 until all the specific probes reserved for the target gene include the probes of the preset types.
2. The method of claim 1, wherein the sequence used to design the mRNA probes in S100 is selected from the group consisting of the longest transcript sequence of mRNA of each gene of interest.
3. The method of claim 1, wherein the sequence used to design the circular RNA probe in S100 is selected from a fragment of a reverse spliced sequence of circular RNA.
4. The method of claim 3, wherein the circular RNA probe in S100 is selected from circular RNA probes whose binding site to the reverse splice sequence of the target gene is located at the reverse splice site.
5. The method of claim 1, wherein the sequence used in S100 for designing an antisense long non-coding RNA or synonymous long non-coding RNA probe is selected from the group consisting of: a fragment of a non-overlapping region of an antisense long non-coding RNA or a synonymous long non-coding RNA and mRNA; specific sequences for designing intron long non-coding RNA, intergenic long non-coding RNA, bidirectional long non-coding RNA or enhancer long non-coding RNA probes are selected from: a fragment of the longest long noncoding RNA of each target gene.
6. The method of claim 1, wherein in S300:
the preset values are met: the similarity between the candidate probe and the full-length target sequences of all the probes does not exceed a first preset value, and the continuous same base length between the candidate probe and the full-length target sequences of all the probes does not exceed a second preset value;
the non-compliance with the preset values is: the similarity between a candidate probe and at least one of the full-length target sequences of the probe exceeds a first preset value, or the base length of continuous identity between the candidate probe and at least one of the full-length target sequences of the probe exceeds a second preset value.
7. A system for designing long-chain RNA-specific probes, wherein said system can be used to design specific probes for simultaneous detection of at least two long-chain RNAs of mRNA, circular RNA or long-chain non-coding RNA.
8. The system of claim 7, wherein the system comprises:
the design module is used for designing a probe of a target gene as a candidate probe according to the type of a preset probe, wherein the type of the preset probe is selected from; at least two of an mRNA probe, a circular RNA probe, or a long non-coding RNA probe;
the comparison module is used for comparing the candidate probe sequences with the full-length target sequences of all the probes;
the screening module is used for judging whether the comparison result of the candidate probe and the full-length target sequences of all the probes accords with a preset value or not, and if so, reserving the candidate probe to be reserved as a specific probe; if not, eliminating the candidate probe;
an iteration module for judging whether each target gene has a reserved specific probe,
if a target gene has no reserved specific probe, redesigning the probe of the target gene as a candidate probe according to the type of a preset probe, and continuously executing the comparison module (2);
if the specific probes reserved for the target gene only comprise part of the types in the preset probe types, redesigning the other preset probe types of the target gene as candidate probes, and continuing to execute the comparison module (2) until all the specific probes reserved for the target gene comprise the probes of the preset types.
9. The system of claim 7, wherein the sequence used to design the mRNA probes is selected from the longest mRNA transcript sequence of each target gene.
10. The system of claim 7, wherein the sequence used to design the circular RNA probe is selected from the group consisting of: a fragment of an inverted splice sequence of a circular RNA.
11. The system of claim 10, wherein the circular RNA probe is selected from circular RNA probes that are positioned at a reverse splice site with respect to the binding site of a reverse splice sequence of a target gene.
12. The system of claim 7, wherein the sequence used to design the anti-sense long non-coding RNA or synonymous long non-coding RNA probe is selected from the group consisting of: a fragment of a non-overlapping region of an antisense long non-coding RNA or a synonymous long non-coding RNA and mRNA; the sequences used to design intron long non-coding RNA, intergenic long non-coding RNA, bidirectional long non-coding RNA, or enhancer long non-coding RNA probes are selected from: a fragment of the longest long noncoding RNA of each target gene.
13. The system of claim 7, wherein, in the screening module 3,
if the similarity between a candidate probe and the full-length target sequences of all the probes does not exceed a first preset value and the base length continuously same with the full-length target sequences of all the probes does not exceed a second preset value, determining that the comparison result of the candidate probe and the full-length target sequences of the probes conforms to a preset value;
and if the similarity between a candidate probe and at least one of the full-length target sequences of all the probes exceeds a first preset value or the continuous identical base length between the candidate probe and at least one of the full-length target sequences of all the probes exceeds a second preset value, determining that the comparison result between the candidate probe and the full-length target sequences of the probes does not meet the preset value.
14. A storage medium having a computer program stored thereon, wherein the computer program, when executed by a computer, implements the method of any of claims 1-6.
15. A service terminal, characterized in that the service terminal comprises a processor and a memory; the memory is adapted to store a computer program and the processor is adapted to execute the computer program stored by the memory to cause the service terminal to perform the method according to any of claims 1-6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010225368.3A CN111696627B (en) | 2020-03-26 | 2020-03-26 | Design method of long-chain RNA specific probe |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010225368.3A CN111696627B (en) | 2020-03-26 | 2020-03-26 | Design method of long-chain RNA specific probe |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111696627A true CN111696627A (en) | 2020-09-22 |
CN111696627B CN111696627B (en) | 2024-02-23 |
Family
ID=72476294
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010225368.3A Active CN111696627B (en) | 2020-03-26 | 2020-03-26 | Design method of long-chain RNA specific probe |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111696627B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115083516A (en) * | 2022-07-13 | 2022-09-20 | 北京先声医学检验实验室有限公司 | Panel design and evaluation method for detecting gene fusion based on targeted RNA sequencing technology |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5639612A (en) * | 1992-07-28 | 1997-06-17 | Hitachi Chemical Company, Ltd. | Method for detecting polynucleotides with immobilized polynucleotide probes identified based on Tm |
US20100297622A1 (en) * | 2009-05-20 | 2010-11-25 | Honghua Li | Method for high-throughput gene expression profile analysis |
CN105803101A (en) * | 2016-05-20 | 2016-07-27 | 上海伯豪生物技术有限公司 | Probe, gene chip and method for detecting expression abundance of circular RNA |
CN106676109A (en) * | 2016-12-08 | 2017-05-17 | 新疆医科大学第附属医院 | ENST00000418539.1, preparation or diagnostic agent or medicine or kit, and application of ENST00000418539.1 |
WO2018001258A1 (en) * | 2016-06-30 | 2018-01-04 | 厦门艾德生物医药科技股份有限公司 | Probe for nucleic acid enrichment and capture, and design method thereof |
US20180142284A1 (en) * | 2015-05-01 | 2018-05-24 | The General Hospital Corporation | Multiplex analysis of gene expression in individual living cells |
CN108342390A (en) * | 2018-02-13 | 2018-07-31 | 中国科学院苏州生物医学工程技术研究所 | Long-chain non-coding RNA for early diagnosing human prostata cancer and preparation, purposes |
CN109706269A (en) * | 2019-02-06 | 2019-05-03 | 浙江农林大学 | The multiple linking probe that a variety of fowl respiratory pathogens can be detected expands identification reagent box |
-
2020
- 2020-03-26 CN CN202010225368.3A patent/CN111696627B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5639612A (en) * | 1992-07-28 | 1997-06-17 | Hitachi Chemical Company, Ltd. | Method for detecting polynucleotides with immobilized polynucleotide probes identified based on Tm |
US20100297622A1 (en) * | 2009-05-20 | 2010-11-25 | Honghua Li | Method for high-throughput gene expression profile analysis |
US20180142284A1 (en) * | 2015-05-01 | 2018-05-24 | The General Hospital Corporation | Multiplex analysis of gene expression in individual living cells |
CN105803101A (en) * | 2016-05-20 | 2016-07-27 | 上海伯豪生物技术有限公司 | Probe, gene chip and method for detecting expression abundance of circular RNA |
WO2018001258A1 (en) * | 2016-06-30 | 2018-01-04 | 厦门艾德生物医药科技股份有限公司 | Probe for nucleic acid enrichment and capture, and design method thereof |
CN106676109A (en) * | 2016-12-08 | 2017-05-17 | 新疆医科大学第附属医院 | ENST00000418539.1, preparation or diagnostic agent or medicine or kit, and application of ENST00000418539.1 |
CN108342390A (en) * | 2018-02-13 | 2018-07-31 | 中国科学院苏州生物医学工程技术研究所 | Long-chain non-coding RNA for early diagnosing human prostata cancer and preparation, purposes |
CN109706269A (en) * | 2019-02-06 | 2019-05-03 | 浙江农林大学 | The multiple linking probe that a variety of fowl respiratory pathogens can be detected expands identification reagent box |
Non-Patent Citations (2)
Title |
---|
古再丽努尔・阿不都热依木;买尔旦・马合木提;韩静;张萌萌;马玉娇;柳惠斌;: "透明细胞肾细胞癌长链非编码RNA表达谱分析及初步验证" * |
贾纯琰;季小阳;白雪;戴豪扬;王建蒙;张文广;: "长链非编码RNA的调控机制及其在家畜中的预测方法" * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115083516A (en) * | 2022-07-13 | 2022-09-20 | 北京先声医学检验实验室有限公司 | Panel design and evaluation method for detecting gene fusion based on targeted RNA sequencing technology |
Also Published As
Publication number | Publication date |
---|---|
CN111696627B (en) | 2024-02-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2021168261A1 (en) | Capturing genetic targets using a hybridization approach | |
Dafforn et al. | Linear mRNA amplification from as little as 5 ng total RNA for global gene expression analysis | |
US20070254305A1 (en) | Methods of whole genome or microarray expression profiling using nucleic acids prepared from formalin fixed paraffin embedded tissue | |
Wilson et al. | Amplification protocols introduce systematic but reproducible errors into gene expression studies | |
EP1759019B1 (en) | Rapid production of oligonucleotides | |
US20070072175A1 (en) | Nucleotide array containing polynucleotide probes complementary to, or fragments of, cynomolgus monkey genes and the use thereof | |
US20050123980A1 (en) | Method of genome-wide nucleic acid fingerprinting of functional regions | |
Martin et al. | [14] Principles of differential display | |
CN111808854B (en) | Balanced joint with molecular bar code and method for quickly constructing transcriptome library | |
Belder et al. | From RNA isolation to microarray analysis: comparison of methods in FFPE tissues | |
Patel et al. | Validation and application of a high fidelity mRNA linear amplification procedure for profiling gene expression | |
WO2006110161A2 (en) | Method for identification and quantification of short or small rna molecules | |
CN109913458B (en) | circRNA and application thereof in detecting hypoxic-ischemic brain injury | |
CN111696627B (en) | Design method of long-chain RNA specific probe | |
CN114574569A (en) | Terminal transferase-based genome sequencing kit and sequencing method | |
US20020029113A1 (en) | Method and system for predicting splice variant from DNA chip expression data | |
CN108085399B (en) | Novel application of lncRNA and trans-regulatory gene WNT11 thereof | |
CN114875118B (en) | Methods, kits and devices for determining cell lineage | |
EP3225689B1 (en) | Method and device for correcting level of expression of small rna | |
US10036053B2 (en) | Determination of variants produced upon replication or transcription of nucleic acid sequences | |
AU2021100990A4 (en) | Transcriptome sequencing method suitable for genome assembly of viruses of nanoviridae and geminiviridae | |
Hu et al. | ScCAT-seq: Single-cell identification and quantification of mRNA isoforms by cost-effective short-read sequencing of cap and tail | |
WO2004035785A1 (en) | Human housekeeping genes and human tissue-specific genes | |
WO2024119481A1 (en) | Method for rapidly preparing multiplex pcr sequencing library and use thereof | |
Ginsberg | Microarray use for the analysis of the CNS |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |