CN111696627A

CN111696627A - Design method of long-chain RNA specific probe

Info

Publication number: CN111696627A
Application number: CN202010225368.3A
Authority: CN
Inventors: 张晓娜; 曹群发; 庞盼盼; 韩峻松
Original assignee: SHANGHAI BIOCHIP CO Ltd
Current assignee: SHANGHAI BIOCHIP CO Ltd
Priority date: 2020-03-26
Filing date: 2020-03-26
Publication date: 2020-09-22
Anticipated expiration: 2040-03-26
Also published as: CN111696627B

Abstract

The invention relates to the technical field of biology, in particular to a design method of a long-chain RNA specific probe; the long-chain RNA specific probe can simultaneously detect the expression abundance or differential expression of at least two long-chain RNAs in mRNA, circular RNA or long-chain non-coding RNA. The long-chain RNA specific probe obtained by the design method has high sensitivity and strong specificity, and can simultaneously, quickly and high-flux detect the expression abundance of mRNA of regulatory RNA molecules such as circular RNA, long-chain non-coding RNA and the like and expression protein of a trace sample.

Description

Design method of long-chain RNA specific probe

Technical Field

The invention relates to the technical field of biology, in particular to a design method of a long-chain RNA specific probe. A

Background

Circular RNA (circRNA) is a novel class of RNA molecules characterized by covalently closed loops, which are widely present in eukaryotes. The circRNA is derived from an exon or intron region of a gene, and is abundantly present in mammalian cells. The formation of circRNAs differs from the standard cleavage pattern of linear RNA by cleavage at the 5 'end of the donor exon and the 3' end of the acceptor exon, forming reverse splice sites (backsplicing). The following circular RNA forming models are mainly available (see FIG. 1):

(1) "loop-driven looping" or "exon skipping looping" as shown in FIG. 1A;

(2) "paired intron-driven looping (intron-looping)" or "direct backstitch looping" (as shown in fig. 1B);

(3) circular intron rna (cirnas) formation pattern, as shown in fig. 1C;

(4) dependent on the RNA Binding Proteins (RBPs) cyclization pattern, as shown in FIG. 1D;

(5) variable cyclization patterns similar to variable shear are shown in fig. 1E.

Current studies indicate that most circRNAs are conserved across different species. Meanwhile, the cyclic structure of the derivative is stable against degradation of RNase R. circRNA is gaining increasing attention due to its specificity and complexity of regulation of expression, as well as its important role in disease development. Like mirnas and long non-coding RNAs, circrnas have become a new research hotspot in the RNA field. The currently common technical means for detecting the expression abundance of circular RNA is real-time PCR, but the method has the defect of low research flux, and the research of circRNA is just started, so a high-flux reliable detection technology is urgently needed to meet the research requirement.

Long non-coding RNA (lncRNA) generally refers to linear non-coding RNA with the length of more than 200 bases, the overall expression abundance is lower than that of mRNA, the conservation is poor, about 40 percent of lncRNA has polyA tail, and the tissue expression specificity is stronger. It has important functions in transcriptional silencing, transcriptional activation, chromosome modification, nuclear transport and the like. lncRNA has been compared to the dark material of the universe, and in recent years, it has been found to be involved in a variety of biological processes, and is an important basis for maintaining gene function and associated with a variety of complex diseases. The position relationship between a long non-coding RNA and its nearest mRNA can be classified as: antisense long non-coding RNA (antisense lncRNA), synonymous long non-coding RNA (sense lncRNA), intron long non-coding RNA (intron lncRNA), intergenic long non-coding RNA (intergenic lncRNA), bidirectional long non-coding RNA (bidirectional lncRNA), enhancer long non-coding RNA (Enhancer lncRNA)

The gene chip technology is a revolutionary technology for carrying out a great deal of gene expression research by attaching high-density DNA fragments to the surface of a solid phase such as a glass slide, a silicon wafer and the like in a certain sequence or arrangement mode through a high-speed robot or an in-situ synthesis mode by a microarray technology, marking target fragments by fluorescence or biotin and using the base complementary hybridization principle. The gene chip technology is a leading-edge biotechnology in the field of life sciences that has been developed with the implementation of human genome projects. Currently, the classification and diagnosis of diseases has been further improved, and gene chip-based feature selection techniques have played a key role. After ten years of development, the gene chip technology is continuously perfected and mature, and is widely applied to various fields of life science, but no gene chip and method capable of simultaneously detecting the expression abundance of multiple long-chain RNAs such as mRNA, circular RNA, long-chain non-coding RNA and the like exist in the prior art.

Disclosure of Invention

In view of the above-mentioned disadvantages of the prior art, the present invention aims to provide a method for designing a long-chain RNA specific probe, which is used to solve the problems of the prior art.

In order to achieve the above objects and other related objects, the present invention provides in a first aspect a method for designing a specific probe for simultaneously detecting two or more long-chain RNAs, comprising the steps of:

s100, designing a probe of a target gene as a candidate probe according to the type of a preset probe, wherein the type of the preset probe is selected; at least two of an mRNA probe, a circular RNA probe, or a long non-coding RNA probe;

s200, comparing the candidate probe sequence with the full-length target sequences of all the probes;

s300, if the comparison result of a candidate probe accords with a preset value, reserving the candidate probe as a specific probe;

and if the comparison result of one candidate probe does not accord with the preset value, eliminating the candidate probe.

S400, if a target gene has no reserved specific probe, redesigning the probe of the target gene as a candidate probe according to the preset probe type, and continuing to execute the step S200.

If the specific probes reserved for the target gene only comprise part of the types in the preset probe types, redesigning the other preset probe types of the target gene as candidate probes, and continuing to execute the step S200 until all the specific probes reserved for the target gene comprise the probes of the preset types;

specifically, the preset values met in S300 are: the similarity between the candidate probe and the full-length target sequences of all the probes does not exceed a first preset value, and the base length continuously identical to the full-length target sequences of all the probes does not exceed a second preset value.

The non-compliance with the preset values is: the similarity between the candidate probe and at least one target sequence in the full-length target sequences of all the probes exceeds a first preset value, or the base length of continuous identity between the candidate probe and at least one target sequence in the full-length target sequences of all the probes exceeds a second preset value.

Preferably, the sequence used to design the mRNA probe in S100 is selected from the longest transcript sequence of mRNA of each target gene.

Preferably, the sequence used for designing the circular RNA probe in S100 is selected from fragments of reverse-spliced sequences of circular RNAs of respective target genes.

Preferably, the circular RNA probe in S100 is selected from circular RNA probes whose binding site to the reverse splicing sequence of the target gene is located at the reverse splicing site.

Preferably, the specific sequence used in S100 for designing an antisense long non-coding RNA or a synonymous long non-coding RNA probe is selected from the group consisting of: a fragment of a non-overlapping region of an antisense long non-coding RNA or a synonymous long non-coding RNA and mRNA;

specific sequences for designing intron long non-coding RNA, intergenic long non-coding RNA, bidirectional long non-coding RNA or enhancer long non-coding RNA probes are selected from: the longest long noncoding RNA fragment of each target gene.

In a second aspect, the present invention provides a system for designing long-chain RNA specific probes, which can be used to design specific probes for simultaneous detection of at least two long-chain RNAs among mRNAs, circular RNAs, or long-chain non-coding RNAs.

The system comprises:

a design module 1, configured to design a probe of a target gene as a candidate probe according to a preset probe type, where the preset probe type is selected from; at least two of an mRNA probe, a circular RNA probe, or a long non-coding RNA probe;

the comparison module 2 is used for comparing the candidate probe sequences with the full-length target sequences of all the probes;

the screening module 3 is used for judging whether the comparison results of the candidate probes and the full-length target sequences of all the probes accord with a preset value or not, and if so, reserving the candidate probes to be reserved as specific probes; if not, the candidate probe is eliminated.

The iteration module 4: and the specific probes are used for judging whether the target genes have reserved specific probes, if not, the probes of the target genes are redesigned as candidate probes according to the preset probe types, and the comparison module 2 is continuously executed.

If the specific probes reserved for each target gene only include part of the types in the preset probe types, redesigning the other preset probe types of the target gene as candidate probes, and continuing to execute the comparison module 2 until all the specific probes reserved for the target gene include the probes of the preset types.

Further, the sequence used to design the mRNA probe is selected from the sequence of the longest transcript of mRNA of each target gene.

Further, a fragment in which the binding site to the longest transcript of the target gene is located at the 3' end of the longest transcript is selected as an mRNA candidate probe for the target gene.

Further, the sequence used to design the circular RNA probe is selected from the group consisting of: reverse-spliced sequence of circular RNA of each target gene.

Further, the circular RNA probe is selected from circular RNA probes which are positioned at a reverse splicing site with a binding site of a reverse splicing sequence of a target gene.

Further, the sequence used to design the anti-sense long non-coding RNA or synonymous long non-coding RNA probe is selected from the group consisting of: the sequence of the non-overlapping region of the long non-coding RNA and the mRNA of the target gene;

the sequences used to design intron long non-coding RNA, intergenic long non-coding RNA, bidirectional long non-coding RNA, or enhancer long non-coding RNA probes are selected from: the longest long noncoding RNA fragment of each target gene.

Specifically, in the screening module 3,

and if the similarity between a candidate probe and the full-length target sequences of all the probes does not exceed a first preset value and the base length continuously same with the full-length target sequences of all the probes does not exceed a second preset value, determining that the comparison result of the candidate probe and the full-length target sequences of the probes conforms to a preset value.

And if the similarity between a candidate probe and at least one of the full-length target sequences of all the probes exceeds a first preset value or the continuous identical base length between the candidate probe and at least one of the full-length target sequences of all the probes exceeds a second preset value, determining that the comparison result between the candidate probe and the full-length target sequences of the probes does not meet the preset value.

A third aspect of the present invention provides a storage medium having stored thereon a computer program which, when executed by a computer, implements a method of designing a specific probe for simultaneously detecting two or more long-chain RNAs.

A fourth aspect of the present invention provides a service terminal comprising a processor and a memory; the memory is used for storing computer programs, and the processor is used for executing the computer programs stored by the memory, so that the service terminal can realize a method for designing specific probes for detecting more than two long-chain RNAs simultaneously when being executed.

As described above, the long-chain RNA specific probe, the design method thereof and the gene chip of the invention have the following beneficial effects:

1) by using a probe and a gene chip which are designed specifically, the hybridization principle of the gene chip is utilized, linear RNA removal is not needed, and long-chain RNA to be detected can be captured by specific hybridization;

2) and simultaneously, nearly 8 ten thousand of circular RNAs, 7 ten thousand of long-chain non-coding RNAs and 2 ten thousand of mRNAs are detected, so that the simultaneous, rapid and high-throughput detection of the expression abundance of the mRNA of the regulation and control RNA molecules such as the circular RNA, the long-chain non-coding RNA and the like and the expression protein of the micro sample is realized.

3) The gene chip technology is better applied to long-chain RNA expression profile analysis, makes up the defect that various RNAs of the existing transcriptome can not be detected simultaneously, and provides a technical method for researching a transcription regulation network.

Drawings

FIG. 1 shows 5 circRNA forming models in the prior art, wherein Panel A is the loop-forming model of interlocking-driven looping or exon skipping; b, a matched intron is driven to form a ring or is directly and reversely spliced into a ring model; pattern C is a model for the formation of cRNAs; d, an RBPs-dependent cyclization model; figure E is a variable cyclization model.

FIG. 2 is a schematic diagram showing six long non-coding RNAs classified according to their positional relationship on genome in the prior art, wherein a is intergenic long non-coding RNA, b is intron long non-coding RNA, c is bidirectional long non-coding RNA, d is enhancer long non-coding RNA, e is synonymous long non-coding RNA, and f is antisense long non-coding RNA.

FIG. 3 shows a schematic diagram of six regulatory mechanisms of long non-coding RNA in the prior art.

FIG. 4 is a flow chart showing the overall design of the long-chain RNA specific probe of the present invention.

FIG. 5 shows a schematic diagram of the design of the long-chain RNA probe of the present invention.

FIG. 6 shows a schematic diagram of a system for designing long-chain RNA specific probes according to the present invention.

FIG. 7 shows a schematic diagram of a service terminal for designing a long-chain RNA specific probe according to the present invention.

FIG. 8 is a graph showing the relative expression values of the probe expression value of the chip of the present invention and the probe sensitivity and specificity of the quantitative PCR assay.

FIG. 9 is a statistical chart showing the signal values of various long-chain RNAs detected by the gene chip according to the embodiment of the present invention.

FIG. 10 is a scanning diagram of a gene chip for detecting the expression abundance of long-chain RNA according to an embodiment of the present invention.

Element number description in fig. 6 and 7

1 design module

2 comparing module

3 screening module

4 iteration module

5 processor

6 memory

Detailed Description

In the present invention, the term "long-chain RNA" includes mRNA, long-chain non-coding RNA, circRNA and the like.

The term "probe" refers to a DNA or RNA nucleic acid sequence of known sequence that is complementary to a gene of interest.

The term "specific probe" refers to a probe which has strong specificity and no mutual interference when a plurality of long-chain RNAs are detected simultaneously.

The term "full-length target sequence" refers to the full-length sequence of the RNA in which the fragment is recognized by the probe, and the sequence of the RNA given in the sequence database is typically the sense strand sequence of DNA.

The term "similarity" refers to the fact that DNA sequences are formed by the combination of A, T, C, G four base sequences, and the similarity degree of the bases of the two sequences is scored by using various existing scoring schemes (such as a matching scoring matrix), and the score, i.e., similarity, represents the similarity degree.

The term "longest transcript," a transcript is the mature mRNA that is formed by transcription of a gene and encodes a protein, and because of the different splicing patterns that occur when mRNA is formed, multiple transcripts may be present in a gene, wherein the transcript with the longest sequence is the longest transcript of the gene.

One embodiment of the present invention provides a method for designing a specific probe for simultaneously detecting two or more long-chain RNAs, comprising the following steps:

specifically, in step S100, the preset probe type may be two mRNA probes and two circular RNA probes, two mRNA probes and two long non-coding RNA probes, two circular RNA probes and two long non-coding RNA probes, or three mRNA probes, circular RNA probes, and long non-coding RNA probes.

In the preferred embodiment shown in FIG. 4, the predetermined probe species are three species, namely, mRNA probe, circular RNA probe and long non-coding RNA probe.

Specifically, in step S300:

The judgment that the target sequence does not meet the preset value can be determined after the full-length target sequences of a candidate probe and all probes are compared; alternatively, the alignment may be terminated when a full-length target sequence having a first similarity exceeding a first predetermined value is found or a full-length target sequence having a first consecutive identical base length to the candidate probe exceeding a second predetermined value is found, and it is determined that the target sequence does not meet the predetermined value.

Further, the first preset value and the second preset value may be changed according to different comparison programs, and it is necessary to experimentally verify whether the preset value is set reasonably at an initial design stage, so as to ensure the specificity of the probe screened under the preset value, for example, the comparison is performed by using a Bedtools offline comparison program, where the first preset value may be not more than 75%, and the second preset value may be not more than 15%.

In a preferred embodiment as shown in FIG. 4, the sequence used to design the mRNA probes in S100 is selected from the longest transcript sequence of mRNA for each gene of interest.

The sequence of the target gene can be confirmed using the prior art, for example, the target gene sequence can be derived from the GenBank database.

The longest transcript of the gene of interest can be identified using prior art techniques, e.g., the longest transcript can be derived from the Refseq database.

Further, as shown in FIG. 5, an mRNA probe having a binding site with the longest transcript of the target gene at the 3' end of the longest transcript was selected as an mRNA candidate probe for the target gene. The 3 'end of the longest transcript generally refers to a fragment within 300 bases from the first base of the 3' end.

The reverse transcription of the sample to be detected is started from the 3 'end, and the mRNA probe is arranged at the 3' end, so that the detection sensitivity can be improved. When a fragment of a non-overlapping region of an antisense long non-coding RNA or a synonymous long non-coding RNA and an mRNA is identified, the longest transcript of a candidate gene is used as a full-length target sequence of a target gene, and thus more accurate identification can be achieved.

Further, the sequence used for designing the circular RNA probe in S100 is selected from: a fragment of the reverse spliced sequence of the circular RNA.

The reverse splicing sequence refers to a sequence formed by connecting the 5 'end of the splicing donor exon and the 3' end of the splicing acceptor exon of the circular RNA linear sequence end to form a ring.

The circular RNA linear sequences can be obtained using existing techniques, for example, from sequences derived from circBase, circcpedia multidata libraries, after redundancy is removed based on sequence and chromosomal location.

The redundant sequence can be removed by existing software, for example, Bedtools.

Further, as shown in FIG. 5, in S100, a circular RNA probe having a binding site to the reverse splice sequence of the target gene at the reverse splice site is selected as a circular RNA candidate probe for the target gene.

The reverse splice site is the only different region of the circular RNA from the corresponding linear RNA, which is generated by joining the 5 'end of the splice-donor exon and the 3' end of the splice-acceptor exon end-to-end.

According to the special splicing pattern of circular RNA, reverse splicing, the reverse splicing sequence has a specific reverse splicing site (backsplying), but linear RNA does not have the site, so that a circular RNA probe is designed at the reverse splicing site, and the probe can specifically detect circular RNA in a sample.

Further, since the long non-coding RNAs have different types and the rules of different types of long non-coding RNAs in designing probes are different, it is necessary to design probes according to the types of the long non-coding RNAs.

In the preferred embodiment as shown in fig. 4, in S100,

the sequence used for designing the antisense long non-coding RNA or the synonymous long non-coding RNA probe is selected from the group consisting of: a fragment of a non-overlapping region of an antisense long non-coding RNA or a synonymous long non-coding RNA and mRNA;

the sequences used to design intron long non-coding RNA, intergenic long non-coding RNA, bidirectional long non-coding RNA, or enhancer long non-coding RNA probes are selected from: a fragment of the longest long noncoding RNA of each target gene.

The reason for choosing the above sequences for designing long non-coding RNA probes is: probes that avoid long non-coding RNAs detect isogenic mrnas.

The transcript sequence of the long non-coding RNA can be obtained by removing redundancy according to the sequence and chromosome position from sequences derived from Ensembl, NCBI, UCSC, GENECODE, NONCODE and other multidatabase.

In S100, the probe can be designed using existing probe design software, for example, the Agilent professional genechip probe design software earray can be used.

Specifically, in the preferred embodiment shown in FIG. 4, the specific probe design method can be designed to detect mRNA, circular RNA and long non-coding RNA simultaneously. The design method comprises the following steps: combining the mRNA longest transcript, the circular RNA reverse splicing sequence and the sequence for designing the long-chain non-coding RNA probe obtained by the method into a file, introducing earray software, setting corresponding parameters according to the design principle of the existing probe, such as GC proportion, annealing temperature and the like, and then designing the probe.

Further, due to the complexity of the genome, hybridization of probes to multiple target sequences is avoided, requiring specific screening using iterative assays: namely, each probe is compared with the full-length target sequences of all the probes, and specificity screening is carried out according to whether the preset value is met, namely whether the preset value is met, the similarity of each candidate probe and the full-length target sequences of all the probes, and the continuous identical base length between the candidate probe and the full-length target sequences of all the probes. And (3) screening conditions according to the set specificity: the similarity is not more than 75%, and the length of the continuous identical base between the continuous identical base and the full-length target sequence is not more than 15, the candidate probe satisfies the condition and is regarded as high in specificity, the candidate probe is reserved as a specific probe, the candidate probe does not satisfy the condition and is regarded as poor in specificity, and the candidate probe is abandoned. And judging whether the specific probe is reserved or not, and returning to redesign of the probe if the specific probe is not reserved. If the specific probe is reserved, whether each target gene in the reserved specific probe has an mRNA specific probe, a long-chain non-coding RNA specific probe and a circular RNA specific probe or not is continuously judged, if yes, the specific probe of the target gene is completely designed, and if not, other types of probes are continuously designed.

Further, when designing a probe using software, one probe may be designed for each probe type of each target gene, or a plurality of probes may be designed as candidate probes. When designing a probe, if the probe satisfies the condition of specific screening, the probe is directly used as a specific probe. If the probe does not meet the conditions for specific screening, the probe is redesigned. When the probes are redesigned, the number of the probes generated by the software can be set to be a plurality, and other parameters are not changed. After the software randomly generates a plurality of probes under the condition of meeting the set parameters, the probes are compared with the full-length target sequences of all the probes (including the newly designed probes and the screened specific probes), the probes meeting the specific screening are saved as specific probes, and the probes not meeting the specific screening are not saved. Any of the specific probes may be selected at the time of the experiment. If no specific probe has been generated this time, the probe is redesigned again until a specific probe is generated. Based on the existing probe design software, when designing probes for the same sequence, probes are randomly output in a plurality of probes meeting the parameter requirements according to the number of the probes required to be output, so that when redesigning the probes, different probe sequences can be obtained even if the same sequence and design parameters are adopted.

Further, the length of the specific probe for mRNA, circular RNA or long non-coding RNA is 50-70 nt.

For example, the number of the channels may be 55nt, 60nt, 65nt, and 70 nt.

Furthermore, more than two of the specific mRNA probes, the specific circular RNA probes or the specific long-chain non-coding RNA probes designed by the specific probe design method are integrated into a gene chip, and the gene chip can detect any two or three long-chain RNAs of the mRNA, the circular RNA or the long-chain non-coding RNA at the same time.

In a preferred embodiment, as shown in FIG. 6, a system for designing long-chain RNA specific probes is provided, which can be used to design specific probes for simultaneous detection of at least two long-chain RNAs among mRNAs, circular RNAs, or long non-coding RNAs.

The system comprises:

Specifically, in the design module 1, the preset probe types may be two types of mRNA probes and circular RNA probes, two types of mRNA probes and long-chain non-coding RNA probes, two types of circular RNA probes and long-chain non-coding RNA probes, and more three types of mRNA probes, circular RNA probes, and long-chain non-coding RNA probes.

Further, the sequence used to design the mRNA probe is selected from the longest transcript sequence of mRNA for each target gene.

Further, a fragment whose binding site to the longest transcript of the target gene is located at the 3' end of the longest transcript is selected as an mRNA candidate probe for the target gene.

Further, the sequence used to design the circular RNA probe is selected from the group consisting of: a fragment selected from the reverse splice sequence of the circular RNA.

Further, a circular RNA sequence having a binding site for a reverse splice sequence of the target gene at the reverse splice site is selected as a circular RNA candidate probe for the target gene.

Further, the sequence used to design the anti-sense long non-coding RNA or synonymous long non-coding RNA probe is selected from the group consisting of: a fragment of the target gene that binds to a non-overlapping region of an antisense long non-coding RNA or a synonymous long non-coding RNA and an mRNA;

the sequences used to design intron long non-coding RNA, intergenic long non-coding RNA, bidirectional long non-coding RNA, or enhancer long non-coding RNA probes are selected from: the fragment of each target gene that binds to the longest sequence of the long non-coding RNA.

Specifically, in the screening module 3,

Further, the first preset value and the second preset value may be changed according to different comparison programs, and an experiment is required to verify whether the preset value is set reasonably at an initial design stage, so as to ensure the specificity of the probe screened under the preset value, for example, the first preset value may be not more than 75%, and the second preset value may be not more than 15.

Candidate probes are subjected to a screening module and an iteration module in order to increase specificity from probe to probe and from probe to RNA of interest. The specific probe obtained after screening and iteration can ensure that the circular RNA detection probe and the detection probe of long-chain non-coding RNA and mRNA have high specificity and sensitivity at the same time.

It should be noted that the division of the modules of the above apparatus is only a logical division, and the actual implementation may be wholly or partially integrated into one physical entity, or may be physically separated. And these modules can be realized in the form of software called by processing element; or may be implemented entirely in hardware; and part of the modules can be realized in the form of calling software by the processing element, and part of the modules can be realized in the form of hardware. For example, the x module may be a processing element that is set up separately, or may be implemented by being integrated in a chip of the apparatus, or may be stored in a memory of the apparatus in the form of program code, and the function of the x module may be called and executed by a processing element of the apparatus. Other modules are implemented similarly. In addition, all or part of the modules can be integrated together or can be independently realized. The processing element described herein may be an integrated circuit having signal processing capabilities. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in the form of software.

Yet another embodiment of the present invention provides a storage medium having stored thereon a computer program which, when executed by a computer, implements a method of designing a specific probe for simultaneously detecting two or more long-chain RNAs.

Further, the storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic disk, U-disk, memory card, or optical disk.

As shown in fig. 8, still another embodiment of the present invention provides a service terminal including a processor 5 and a memory 6; the memory 6 is used for storing computer programs, and the processor 5 is used for executing the computer programs stored in the memory 6, so that the service terminal can realize a method for designing specific probes for detecting more than two long-chain RNAs simultaneously when being executed.

The memory 6 is used for storing a computer program. Preferably, the memory 6 comprises: various media that can store program codes, such as ROM, RAM, magnetic disk, U-disk, memory card, or optical disk.

The processor 5 is connected to the memory 6 and configured to execute the computer program stored in the memory 6, so that the service terminal executes the design method described above.

Preferably, the Processor 5 may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; the integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, or discrete hardware components.

Hereinafter, using ESR2 gene as an example, specific probes for simultaneously detecting mRNA, long non-coding RNA, and circular RNA of ESR2 gene were designed, i.e., three types of predetermined probes were mRNA-specific probes, long non-coding RNA-specific probes, and circular RNA-specific probes.

S100: finding ESR2 gene sequence (NM __001214902) in GenBank, finding the longest transcript (with the sequence number being NM __001214902) of ESR2 gene in Refseq database, designing mRNA probe as candidate probe according to the detection of ESR2 gene sequence, and the sequence is as follows:

ATAAAAGAGTTTTGGGAATACACTGAGCTTTGAGTGAAAGAAGCTGCAGTGGCCTCCCTG(SEQ IDNO:1)

the probe of long non-coding RNA ENST00000359491, which is transcribed homotropically with ESR2 gene and has partial exon overlap, is found in the Ensembl database, and is used as a candidate probe, and the sequence is as follows:

ATACCTGAGCAAGTGAAATTAAGAAGGGAATTGAAGCAAATATTCCTGACATCCAAGTGG(SEQ IDNO:2)

the cyclic RNA hsa _ circ _0102409 derived from the ESR2 gene is found in the circBase database, and is formed by reversely splicing the 7 th exon to the 12 th exon of the ESR2 gene from beginning to end. According to the characteristics of the 5 'end sequence (SEQ ID NO:3) and the 3' end sequence (SEQ ID NO:4) of the circular RNA, a probe covering the splicing site is designed to serve as a candidate probe, and the sequence is as follows (SEQ ID NO: 5):

GGATGAGGGGAAATGCGTAGAAGGAATTCTGGAAATCTTTGACATGCTCCTGGCAACTACTTCAAGGTTTCGAGAGTTAAAACTCCAACACAAAGAATATCTCTGTGTCAAGGCCATGATCCTGCTCAATTCCA(SEQ ID NO:3)

CCATTATACTTGCCCACGAATCTTTGAGAACATTATAATGACCTTTGTGCCTCTTCTTGCAAGGTGTTTTCTCAGCTGTTATCTCAAGACATGGATATAAAAAACTCACCATCTAGCCTTAATTCTCCTTCCTCCTACAACTGCAGTCAATCCATCTTACCCCTGGAGCACGGCTCCATATACATACCTTCCTCCTATGTAGACAGCCACCATGAATATCCAGCCATGACATTCTATAGCCCTGCTGTGATGAATTACAGCATTCCCAGCAATGTCACTAACTTGGAAGGTGGGCCTGGTCGGCAGACCACAAGCCCAAATGTGTTGTGGCCAACACCTGGGCACCTTTCTCCTTTAGTGGTCCATCGCCAGTTATCACATCTGTATGCGGAACCTCAAAAGAGTCCCTGGTGTGAAGCAAGATCGCTAGAACACACCTTACCTGTAAACAG(SEQ ID NO:4)

GCCATGATCCTGCTCAATTCCACCATTATACTTGCCCACGAATCTTTGAGAACATTATAA(SEQ IDNO:5)

the probe can specifically detect circular RNA hsa _ circ _ 0102409.

S200: and (3) aligning the candidate probes with the full-length target sequences of all the probes.

S300: the comparison results all accord with the preset value and are reserved as specific probes.

S400: the specific probe comprises all the preset probe types, and the specific probe design of the gene is completed.

The specific probe design method for detecting mRNA, circular RNA or long non-coding RNA of other target genes in the gene chip is the same as that of ESR2 gene.

Verification of specificity and sensitivity of specific probe and synthesis of chip

In order to verify the sensitivity and specificity of the probe after screening and iterative detection, the THBS1 gene, long-chain non-coding RNA ENST00000478845 from an intron and circular RNAhsa _ circ _0034426 formed by circularization of the 2 nd exon to the 7 th exon are selected for quantitative PCR verification, and the result is shown in FIG. 7, and the change fold of 3 long-chain RNAs of the THBS1 gene relative to a control group is consistent with the change trend of quantitative PCR expression in a chip.

The quantitative PCR experiment steps are as follows:

first Strand cDNA Synthesis

1. RNA was removed from a-80 ℃ freezer, thawed at 4 ℃ and then placed in a 0.2ml PCR tube to prepare the reaction system as follows:

2. the PCR tube was incubated at 37 ℃ for 15min, denatured at 98 ℃ for 5min, and incubated at 4 ℃.

SYBR Green qPCR

1. The reaction mixture (384 well plates) was placed in a 1.5mL centrifuge tube:

2. placing the PCR tube in a PCR instrument for reaction, incubating at 50 ℃ for 2min, and then incubating at 95 ℃ for 10 min; then 40 cycles were performed: at 95 ℃ for 15 seconds; 60 ℃, 1min, and finally the dissolution profile was added.

(III) the primer sequences are as follows:

THBS1

an upstream primer: GAACGGGACAACTGCCAGTA (SEQ ID NO:6)

A downstream primer: ACCTACAGCGAGTCCAGGAT (SEQ ID NO:7)

ENST00000478845

An upstream primer: TCGCGCATTCTTGGAAGTCT (SEQ ID NO:8)

A downstream primer: TGCCAGAGGGTGAAAAGCAA (SEQ ID NO:9)

hsa_circ_0034426

An upstream primer: CTGCAAAAAGGTGTCCTGCC (SEQ ID NO:10)

A downstream primer: TCAGGAACAGGACGCCTAGT (SEQ ID NO:11)

After the verification is finished, an Agilent company is entrusted to utilize an ink-jet printing chemical in-situ synthesis technology to customize the long-chain RNA gene expression abundance high-flux detection chip under the strict quality control condition.

Detection of expression abundance of long-chain RNA

The experimental operation comprises the following specific steps:

1. extracting and purifying total RNA of sample

Trizol extracts total RNA from the samples and then QIAGEN

Kit (cat No. 74106) purified total RNA, detailed procedure as follows (see RNeasy Mini Protocol):

1) total RNA (100. mu.g or less) was dissolved in 100. mu.l RNase free (RNase-free) water, and 350. mu.l buffer RLT was added thereto and mixed well.

2) Add 250. mu.l of absolute ethanol and mix well with the tip of the sample gun.

3) A total of 700. mu.l of the total RNA-containing solution was transferred to an RNeasy column jacketed in a 2ml centrifuge tube, centrifuged at 13200rpm for 15 seconds, and the filtrate was discarded.

4) Mu.l of buffer RW1 was pipetted into an RNeasy mini column, centrifuged at 13200rpm for 15 seconds, and the filtrate was discarded.

5) Add 10. mu.l DNase I to 70. mu.l buffer RDD, mix well, add to the column and allow to stand at room temperature for 15 min.

6) Mu.l of buffer RW1 was pipetted into an RNeasy mini column, centrifuged at 13200rpm for 15 seconds, and the filtrate was discarded.

7) Aspirate 500. mu.l of buffer RPE into RNeasy mini column, centrifuge at 13200rpm for 15 seconds, discard the filtrate and repeat the procedure once.

8) Replace the new cannula, 13200rpm, 2 min. And the column was transferred to the elution tube.

9) RNeasy mini column was transferred into collection tube.

10) 30 μ l of RNase free water was aspirated, allowed to stand for 1min, and centrifuged at 13200rpm for 1 min.

11) 30ul of the sample in the elution tube was again transferred back to the column, allowed to stand for 1min and centrifuged at 13200rpm for 1 min.

12) The RNA concentration and A260/280 were determined from NanoDrop (NanoDrop ND-1000UV-VIS spectrophotometer).

2. Linear amplification of RNA and labeling of fluorescent cy3

1) A single marker Spike-In (RNA Spike-In Kit, One-Color, Agilent5188-5282) was prepared. Spike-in was diluted with dilution buffer according to different RNA starting amounts as shown in Table 1:

TABLE 1 RNA spike-in

2) Reverse transcription: reaction solutions having the compositions shown in table 2 were prepared:

TABLE 2 reverse transcription reaction solution composition

10-200ng of total RNA	1.5μl
		Diluted one-dye spike in	2.0μl
T7 Promoter Primer	0.8μl
		Nuclease-free water (white cap)	1.0μl
Total volume	5.3μl

The PCR instrument (MJ PTC-100) was incubated at 65 ℃ for 10min in an ice bath for 5 min. Meanwhile, 5X first strondbuffer is preheated for 3min at 80 ℃ and is reserved at room temperature. A reverse transcription mixed solution is prepared, and the specific composition is shown in table 3:

TABLE 3 reverse transcription Mixed solution composition

5X First Strand Buffer	2.0μl
		0.1M DTT	1.0μl
10mM dNTP mix	0.5μl
		AffinityScript RNase Block Mix	1.2μl
Total volume	4.7μl

The above 4.7. mu.l of the reverse transcription mixed solution was added to the denatured RNA in the ice bath, mixed well, centrifuged, and subjected to PCR reaction. And (3) PCR reaction conditions: reacting for 2 hours at 40 ℃; inactivating at 70 ℃ for 15 minutes; the reaction was carried out at 4 ℃ for 5 minutes.

3) Fluorescent markers

A mixed solution of fluorescence Labeling (Low Input Quick Amp Labeling Kit, One-Color, Agilent 5190-:

TABLE 4 fluorescent labeling Mixed solution composition

Nuclease-free water	0.75μl
		5 transcription buffer	3.2μl
0.1M DTT	0.6μl
		NTP mix	1.0μl
T7 RNA polymerase mixture	0.21μl
		Cy3-CTP	0.24μl
Total volume	6.0μl

Adding the 6.0 mul of the fluorescent labeling mixed solution, mixing uniformly, centrifuging, and carrying out PCR reaction to obtain a fluorescent labeling product. And (3) PCR reaction conditions: reacting for 2 hours at 40 ℃; the reaction was carried out at 4 ℃ for 5 minutes.

4) Fluorescent labeling product purification

A) Add 84. mu.l nuclease-free water to a total volume of 100. mu.l.

B) Add 350. mu.l of RLT and mix well.

C) Add 250. mu.l of absolute ethanol and mix well without centrifugation.

D) Mu.l of mix was transferred to the column. 13000rpm, centrifuge at 4 ℃ for 30 sec. The flow-through was discarded.

E) Add 500. mu.l of RPE, 13000rpm, centrifuge at 4 ℃ for 30 seconds. The flow-through was discarded.

F) An additional 500. mu.l of RPE was added, 13000rpm, and centrifuged at 4 ℃ for 60 seconds. The flow-through was discarded.

G) The cannula was replaced, allowed to idle at 13000rpm for 30 seconds at 4 ℃ and the column was transferred to the elution tube.

H) Add 30. mu.l of nuclease-free water, let stand for 1min, 13000rpm, and centrifuge at 4 ℃ for 30 seconds.

I) The 30. mu.l sample in the elution tube was again transferred back to the column, left to stand for 1min, 13000rpm, and centrifuged at 4 ℃ for 30 seconds.

J) RNA concentration, Cy3 concentration, A260/280 were measured using NanoDrop.

The requirements for the amount of probe used for purification of the fluorescently labeled product are shown in Table 5:

TABLE 5 amount of probe used for purification of fluorescently labeled product

1 × chip	cRNA>5μg	Cy3>6pmol/μg
			2 × chip	cRNA>3.75μg	Cy3>6pmol/μg
4 × chip	cRNA>1.65μg	Cy3>6pmol/μg
			5 × chip	cRNA>0.825μg	Cy3>6pmol/μg

3. Hybridization of Gene chip

The purified fluorescence labeling product and the probe on the circular RNA gene chip are hybridized by utilizing the base complementary hybridization principle. The Hybridization Kit used was Gene Expression Hybridization Kit (Agilent 5188-5242). The method comprises the following specific steps:

1) the segmented mixed solution was prepared as in table 6:

TABLE 6 fragmentation mix solution composition

Composition of matter	1x	2x	4x	8x
					Cy3-cRNA	5μg	3.75μg	1.65μg	600ng
10X blocking agent	50μL	25μL	11μL	5μL
					Nuclease-free water	Make up to 240 μ L	Make up to 120 mu L	Make up to 52.8 mu L	Make up to 24 mu L
25 Xfragmentation buffer	10μL	5μL	2.2μL	1μL
					Total volume	250μL	125μL	55μL	25μL

2) Preserving the temperature at 60 ℃ for 30min, then carrying out ice bath for 1min, and centrifuging for a short time.

3) Add an equal volume of 2 XGEx hybridization buffer HI-RPM as shown in Table 7 and mix well.

TABLE 7 hybridization mix solution composition

Composition of matter	1x	2x	4x	8x
					Fragmenting cRNA in a mixed solution	250μL	125μL	55μL	25μL
2 XGEx hybridization buffer HI-RPM	250μL	125μL	55μL	25μL

4)13000rpm, centrifuged for 1min and then placed on ice.

5) The hybridization chamber (Agilent G2534A) was placed on a horizontal table top, a coverslip with gasket was placed, and samples were added in the volumes shown in Table 8:

TABLE 8 hybridization sample addition volume

Composition of matter	1x	2x	4x	8x
					Preparation volume	500μL	250μL	110μL	50μL
Hybridization volume	490μL	240μL	100μL	40μL

6) The gene chip with the "Agilent" side down was mounted on a coverslip and the hybridization chamber was assembled quickly and hybridized for 17h in a hybridization oven (Agilent G2545A) at 65 ℃ and 10 rpm.

4. Washing and scanning gene chip

1) Wash 1 and wash 2ml of 10% Triton X-102 was added and wash 2 was preheated at 37 ℃ overnight.

2) The gene chip which has completed the hybridization was taken out of the hybridization oven, the hybridization chamber was disassembled, and the gene chip was washed according to steps 1 to 3 in Table 9:

TABLE 9 Gene chip washing procedure

Procedure for the preparation of the	Lotion composition	Temperature of	Time of washing
				Tear-off piece	GE Wash solution 1	At room temperature	-
Wash solution 1 Wash	GE Wash solution 1	At room temperature	1min
				Wash solution
2 Wash	GE Wash solution 2	37℃	1min

GE Wash 1, GE Wash 2 in Table 9 were from Gene Expression Wash Buffer Kit (Gene Expression Wash Buffer Kit, brand: Agilent, cat # 5188-

3) The washed gene chip was loaded into a slide holder and scanned by a scanner (Agilent Microarray ScannerG2565 CA). The scan parameters are shown in table 10:

TABLE 10 Gene chip Scan parameters

5. Data analysis

The original data are normalized by limma package in R software, and the expression abundance of long-chain RNA and the RNA with differential expression are analyzed by using Fold-change (expression difference multiple) and T test (Student's T-test) statistical methods.

The abundances of expressed mrnas, long non-coding RNAs, and circular RNAs in the experimental group and the control group are shown in fig. 9 and 10.

Taking prostate cancer tumor (experimental group) and paracancer (control group) as examples, 328 mRNAs with 2-fold difference of up-regulated mRNAs and 892 mRNAs with 2-fold difference of down-regulated mRNAs are obtained by screening after gene chip screening and data analysis, and 1220 mRNAs with 2-fold difference of expression are screened out totally. 447 long-chain non-coding RNAs with 2-fold difference and up-regulation, 840 long-chain non-coding RNAs with 2-fold difference and down-regulation, and 1287 long-chain non-coding RNAs with 2-fold difference expression are screened in total. There were 508 circular RNAs that differed by 2-fold up-regulation and 1706 circular RNAs that differed by 2-fold down-regulation, and 2 differentially-expressed circular genes 2214 were screened in total.

In conclusion, the present invention effectively overcomes various disadvantages of the prior art and has high industrial utilization value.

The foregoing embodiments are merely illustrative of the principles and utilities of the present invention and are not intended to limit the invention. Any person skilled in the art can modify or change the above-mentioned embodiments without departing from the spirit and scope of the present invention. Accordingly, it is intended that all equivalent modifications or changes which can be made by those skilled in the art without departing from the spirit and technical spirit of the present invention be covered by the claims of the present invention.

Sequence listing

<110> Shanghai biochip Co., Ltd

<120> design method of long-chain RNA specific probe

<160>11

<170>SIPOSequenceListing 1.0

<210>1

<211>60

<212>DNA

<213> Artificial Sequence (Artificial Sequence)

<400>1

ataaaagagt tttgggaata cactgagctt tgagtgaaag aagctgcagt ggcctccctg 60

<210>2

<211>60

<212>DNA

<213> Artificial Sequence (Artificial Sequence)

<400>2

atacctgagc aagtgaaatt aagaagggaa ttgaagcaaa tattcctgac atccaagtgg 60

<210>3

<211>134

<212>DNA

<213> Artificial Sequence (Artificial Sequence)

<400>3

ggatgagggg aaatgcgtag aaggaattct ggaaatcttt gacatgctcc tggcaactac 60

ttcaaggttt cgagagttaa aactccaaca caaagaatat ctctgtgtca aggccatgat 120

cctgctcaat tcca 134

<210>4

<211>452

<212>DNA

<213> Artificial Sequence (Artificial Sequence)

<400>4

ccattatact tgcccacgaa tctttgagaa cattataatg acctttgtgc ctcttcttgc 60

aaggtgttttctcagctgtt atctcaagac atggatataa aaaactcacc atctagcctt 120

aattctcctt cctcctacaa ctgcagtcaa tccatcttac ccctggagca cggctccata 180

tacatacctt cctcctatgt agacagccac catgaatatc cagccatgac attctatagc 240

cctgctgtga tgaattacag cattcccagc aatgtcacta acttggaagg tgggcctggt 300

cggcagacca caagcccaaa tgtgttgtgg ccaacacctg ggcacctttc tcctttagtg 360

gtccatcgcc agttatcaca tctgtatgcg gaacctcaaa agagtccctg gtgtgaagca 420

agatcgctag aacacacctt acctgtaaac ag 452

<210>5

<211>60

<212>DNA

<213> Artificial Sequence (Artificial Sequence)

<400>5

gccatgatcc tgctcaattc caccattata cttgcccacg aatctttgag aacattataa 60

<210>6

<211>20

<212>DNA

<213> Artificial Sequence (Artificial Sequence)

<400>6

gaacgggaca actgccagta 20

<210>7

<211>20

<212>DNA

<213> Artificial Sequence (Artificial Sequence)

<400>7

acctacagcg agtccaggat 20

<210>8

<211>20

<212>DNA

<213> Artificial Sequence (Artificial Sequence)

<400>8

tcgcgcattc ttggaagtct 20

<210>9

<211>20

<212>DNA

<213> Artificial Sequence (Artificial Sequence)

<400>9

tgccagaggg tgaaaagcaa 20

<210>10

<211>20

<212>DNA

<213> Artificial Sequence (Artificial Sequence)

<400>10

ctgcaaaaag gtgtcctgcc 20

<210>11

<211>20

<212>DNA

<213> Artificial Sequence (Artificial Sequence)

<400>11

tcaggaacag gacgcctagt 20

Claims

1. A design method of a specific probe for detecting more than two long-chain RNAs simultaneously comprises the following steps:

s100, designing a long-chain RNA probe of a target gene as a candidate probe according to the type of a preset probe, wherein the type of the preset probe is selected from; at least two of an mRNA probe, a circular RNA probe, or a long non-coding RNA probe;

if the comparison result of a candidate probe does not accord with the preset value, the candidate probe is eliminated;

s400, if a target gene has no reserved specific probe, redesigning a long-chain RNA probe of the target gene as a candidate probe according to the type of a preset probe, and continuing to execute S200;

if the specific probes reserved for the target gene only include part of the types in the preset probe types, redesigning the other preset probe types of the target gene as candidate probes, and continuing to execute S200 until all the specific probes reserved for the target gene include the probes of the preset types.

2. The method of claim 1, wherein the sequence used to design the mRNA probes in S100 is selected from the group consisting of the longest transcript sequence of mRNA of each gene of interest.

3. The method of claim 1, wherein the sequence used to design the circular RNA probe in S100 is selected from a fragment of a reverse spliced sequence of circular RNA.

4. The method of claim 3, wherein the circular RNA probe in S100 is selected from circular RNA probes whose binding site to the reverse splice sequence of the target gene is located at the reverse splice site.

5. The method of claim 1, wherein the sequence used in S100 for designing an antisense long non-coding RNA or synonymous long non-coding RNA probe is selected from the group consisting of: a fragment of a non-overlapping region of an antisense long non-coding RNA or a synonymous long non-coding RNA and mRNA; specific sequences for designing intron long non-coding RNA, intergenic long non-coding RNA, bidirectional long non-coding RNA or enhancer long non-coding RNA probes are selected from: a fragment of the longest long noncoding RNA of each target gene.

6. The method of claim 1, wherein in S300:

the preset values are met: the similarity between the candidate probe and the full-length target sequences of all the probes does not exceed a first preset value, and the continuous same base length between the candidate probe and the full-length target sequences of all the probes does not exceed a second preset value;

the non-compliance with the preset values is: the similarity between a candidate probe and at least one of the full-length target sequences of the probe exceeds a first preset value, or the base length of continuous identity between the candidate probe and at least one of the full-length target sequences of the probe exceeds a second preset value.

7. A system for designing long-chain RNA-specific probes, wherein said system can be used to design specific probes for simultaneous detection of at least two long-chain RNAs of mRNA, circular RNA or long-chain non-coding RNA.

8. The system of claim 7, wherein the system comprises:

the design module is used for designing a probe of a target gene as a candidate probe according to the type of a preset probe, wherein the type of the preset probe is selected from; at least two of an mRNA probe, a circular RNA probe, or a long non-coding RNA probe;

the comparison module is used for comparing the candidate probe sequences with the full-length target sequences of all the probes;

the screening module is used for judging whether the comparison result of the candidate probe and the full-length target sequences of all the probes accords with a preset value or not, and if so, reserving the candidate probe to be reserved as a specific probe; if not, eliminating the candidate probe;

an iteration module for judging whether each target gene has a reserved specific probe,

if a target gene has no reserved specific probe, redesigning the probe of the target gene as a candidate probe according to the type of a preset probe, and continuously executing the comparison module (2);

if the specific probes reserved for the target gene only comprise part of the types in the preset probe types, redesigning the other preset probe types of the target gene as candidate probes, and continuing to execute the comparison module (2) until all the specific probes reserved for the target gene comprise the probes of the preset types.

9. The system of claim 7, wherein the sequence used to design the mRNA probes is selected from the longest mRNA transcript sequence of each target gene.

10. The system of claim 7, wherein the sequence used to design the circular RNA probe is selected from the group consisting of: a fragment of an inverted splice sequence of a circular RNA.

11. The system of claim 10, wherein the circular RNA probe is selected from circular RNA probes that are positioned at a reverse splice site with respect to the binding site of a reverse splice sequence of a target gene.

12. The system of claim 7, wherein the sequence used to design the anti-sense long non-coding RNA or synonymous long non-coding RNA probe is selected from the group consisting of: a fragment of a non-overlapping region of an antisense long non-coding RNA or a synonymous long non-coding RNA and mRNA; the sequences used to design intron long non-coding RNA, intergenic long non-coding RNA, bidirectional long non-coding RNA, or enhancer long non-coding RNA probes are selected from: a fragment of the longest long noncoding RNA of each target gene.

13. The system of claim 7, wherein, in the screening module 3,

if the similarity between a candidate probe and the full-length target sequences of all the probes does not exceed a first preset value and the base length continuously same with the full-length target sequences of all the probes does not exceed a second preset value, determining that the comparison result of the candidate probe and the full-length target sequences of the probes conforms to a preset value;

14. A storage medium having a computer program stored thereon, wherein the computer program, when executed by a computer, implements the method of any of claims 1-6.

15. A service terminal, characterized in that the service terminal comprises a processor and a memory; the memory is adapted to store a computer program and the processor is adapted to execute the computer program stored by the memory to cause the service terminal to perform the method according to any of claims 1-6.