CN110265084A - The method and relevant device of riboSnitch element are rich in or lacked in prediction cancer gene group - Google Patents
The method and relevant device of riboSnitch element are rich in or lacked in prediction cancer gene group Download PDFInfo
- Publication number
- CN110265084A CN110265084A CN201910484578.1A CN201910484578A CN110265084A CN 110265084 A CN110265084 A CN 110265084A CN 201910484578 A CN201910484578 A CN 201910484578A CN 110265084 A CN110265084 A CN 110265084A
- Authority
- CN
- China
- Prior art keywords
- ribosnitch
- mutation
- cancer
- value
- group
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/20—Allele or variant detection, e.g. single nucleotide polymorphism [SNP] detection
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B20/00—ICT specially adapted for functional genomics or proteomics, e.g. genotype-phenotype associations
- G16B20/50—Mutagenesis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B25/00—ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B30/00—ICT specially adapted for sequence analysis involving nucleotides or amino acids
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B5/00—ICT specially adapted for modelling or simulations in systems biology, e.g. gene-regulatory networks, protein interaction networks or metabolic networks
Landscapes
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Medical Informatics (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Theoretical Computer Science (AREA)
- Evolutionary Biology (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Molecular Biology (AREA)
- Proteomics, Peptides & Aminoacids (AREA)
- Genetics & Genomics (AREA)
- Chemical & Material Sciences (AREA)
- Analytical Chemistry (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Software Systems (AREA)
- Public Health (AREA)
- Evolutionary Computation (AREA)
- Epidemiology (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Bioethics (AREA)
- Artificial Intelligence (AREA)
- Physiology (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The present invention relates to detection mutation to the method and relevant device for being rich in or lacking riboSnitch element in RNA secondary structure influence technique field, more particularly to prediction cancer gene group.The present invention provides the software of entitled SNIPER a kind of, it can be used in predicting riboSNitch and identify the non-coding element that riboSNitch is rich in or lacked in tumour.The content of present invention not only includes the introduction to software SNIPER, and the riboSNitch in analysis cancer in somatic mutation for the first time.
Description
Technical field
The present invention relates to detection mutation in RNA secondary structure influence technique field, more particularly to prediction cancer gene group
It is rich in or lacks the method and relevant device of riboSnitch element.
Background technique
RNA secondary structure can influence cell processes from many aspects, including influence rna stability, RNA positioning, RNA turns
Record, RNA processing or even translation of protein etc..Wherein, change the single nucleotide variations (SNV) of RNA secondary structure, referred to as
riboSNitch.These mutation may influence human health, lead to certain human diseases.On especially some non-coding regions
The transcription and translation of mutation, secondary structure and gene is closely related.Noncoding region is especially had studied in the present invention
The non-translational region (UTRs) and non-coding RNA (ncRNA) of riboSNitch, especially gene.Currently, the mankind are to these changes
The mutation research of RNA secondary structure is also very limited.In addition, mass mutation can be generated during cancer occurrence and development, but
It is current not systematic influence of the somatic mutation to RNA secondary structure during studying cancer occurrence and development of someone.
RNA secondary structure is by influencing the translation efficiencies such as RNA positioning, stability, montage and protein in gene regulation
It plays a crucial role.Since most of human genomes are transcribed 1, and the structure of RNA may will affect post-transcriptional control and turn over
All protein translation processes such as starting, extension and termination during translating2.Therefore, further investigation RNA secondary structure may
Facilitate us and more fully understands its molecule and biological action in regulation.
The secondary structure of RNA only has single-stranded or double-strand both of these case for each base.Currently, multiple groups are ground
Study carefully personnel and developed and identifies that single-stranded or double-strand structure detects RNA secondary structure using probe specificity,
In include SHAPE-Seq and FragSeq, Mod-seq and PARS etc.3–6.With the rapid development of two generation techniques, pass through probe
The method of prediction RNA secondary structure has also increased accordingly efficiency and accuracy7.In addition, there are also researcher develop it is different
Software, can be from the secondary structure of sequence prediction RNA, including ViennaRNA, RNA-MoIP and RNASNP8–10.Nearest one
Research, analyzes RNA structure (father, mother and child) in family using the method for PARS, research finds the family
In mutation, there is nearly 15% transcription SNV to change the secondary structure of the part RNA, be accredited as riboSNitch.In addition, this
Kind mutation be it is heritable, disclose the generality of riboSNitch in human genome6,11.Although full-length genome research has been demonstrate,proved
Bright riboSNitch significant missing near RNA controlling element such as miRNA and protein binding site, illustrates the second level knot in the region
Structure is than more conservative11.In addition, nearest research may will affect RNA it is also shown that changing the mutation of partial rna secondary structure
Binding protein (RBPs) and its binding affinity.These discoveries highlight the importance of RNA secondary structure around binding site, this
Illustrate that SNV can further influence the joint efficiency of RBP and miRNA by changing partial rna secondary structure, so as in base
Because playing key effect in regulation.
The pathogenic SNV of the secondary structure of RNA key is destroyed, it may be by changing secondary structure influence RNA function and most
Lead to disease eventually.Studies have found that leading to high-speed rail proteinemia cataract syndrome and retinoblastoma disease phase
Point mutation is closed, the expression of the RNA secondary structure controlling gene protein of 5 ' UTR of change may be passed through14–16, these mutation are all demonstrate,proved
Bright is riboSNitch.Researcher has also discovered on long non-coding RNA (incRNA) ribalgilase MRP
RiboSNitch, may be incomplete related with human cartilage hair development17.In addition to this, it was discovered by researchers that the 3'UTR of FKBP5
In riboSNitch significantly change the secondary structure of 3 ' UTR of gene, and then affect the combination of miR-320a, study people
Member indicates that the treatment to pain after chronic trauma can be mediated by changing secondary structure18.These researchs illustrate
The importance of riboSNitch in human body, and these riboSNitch may can be taken as target spot, for treating correlation
Disease.
With the rapid development of two generation sequencing technologies, in the research of cancer gene group, by thousands of tumor samples
The case where being sequenced, disclosing somatic mutation in various cancers type.Most previous researchs are concentrated mainly on base
Because of code area (CDS) and the noncoding region (UTR) of the encoding gene of group, such as protein coding gene, it is few be absorbed in it is non-
The research of code area.RiboSNitch is had found in non-small cell lung cancer, especially in UTR and microRNA binding site week
It encloses19.SNV in the 5'UTR that retinoblastoma correlation has been found that RB1 changes expression and may be by changing RNA
Structure and it is carcinogenic14。
Due to RNA secondary structure regulate and control after rna transcription it is related to protein translation have a critically important function, and at present
These somatic mutations for changing secondary structure in cancer gene group were studied there are no people.This series of studies is established for us
Basis, specifies generality of the riboSNitch in cancer development.However, work of the riboSNitch in cancer gene group
With still unclear, therefore we determine using the two cancer gene group databases of TCGA and ICGC, to carrying out cancer gene group
In riboSNitch analyzed, with clear riboSNitch whether cancer gene group non-coding region (including coding
The area UTR of gene and long non-coding gene) in it is generally existing, and if it is related with tumour generation.
Summary of the invention
The present invention provides the software of entitled SNIPER a kind of, it can be used in predicting riboSNitch and identify rich in tumour
Contain or lack the non-coding element of riboSNitch.The content of present invention not only includes the introduction and first to software SNIPER
RiboSNitch in secondary analysis cancer in somatic mutation.We have found that the riboSNitch in cancer is easier to cause a disease.This
Outside, by constructing the random mutation model of three bases, we also predict significant enrichment in cancer or lack riboSNitch's
Element, these elements quilt is it is considered that be extremely important in cancer disease process, and be associated with close non-coding with cancer occurrence and development
Area's Genetic elements may adjust the expression of gene or protein by changing RNA secondary structure.In short, of the invention
Research mainly highlights importance of the RNA secondary structure in cancer gene group, and provides the new strategy of one kind to identify
New cancer related gene.
It is an object of the present invention to provide one kind to influence journey to RNA secondary structure based on MeanDiff value and prediction mutation
The method of degree, comprising: the alkali between cancer sequence and corresponding normal sequence is calculated with the calculation equation of MeanDiff value
Difference of the base to pairing probability, the calculation equation of the MeanDiff value are as follows:
Wherein k is the position being mutated in transcript, and w is window size, BPPref,iAnd BPPalt,iRespectively represent reference sequences
With i-th of base-pair probability of mutant nucleotide sequence, the value range of i is [k-w, k+w];
The MeanDiff value of all mutation is sorted from large to small, MeanDiff value is bigger, predicts the mutation to RNA second level
Structure influence degree is bigger.
A further object of the present invention, which is to provide a kind of be mutated based on the prediction of EucDiff value, influences journey to RNA secondary structure
The method of degree, comprising: the base between cancer sequence and corresponding normal sequence is calculated with the calculation equation of EucDiff value
To the difference of pairing probability, the calculation equation of the EucDiff value are as follows:
Wherein k is the position being mutated in transcript, and w is window size, BPPref,iAnd BPPalt,iRespectively represent reference sequences
With i-th of base-pair probability of mutant nucleotide sequence, the value range of i is [k-w, k+w];
The EucDiff value of all mutation is sorted from large to small, EucDiff value is bigger, predicts the mutation to RNA second level knot
Structure influence degree is bigger.
In the above-mentioned further embodiment based on MeanDiff value and EucDiff value, according to the above method, wherein
The mutation of sequence preceding 2.5% is riboSNitch;2.5% mutation is non-riboSNitch after sequence.Further
Embodiment in, according to any of the above-described method, wherein w is 2bp, 5bp, 10bp, 15bp, 20bp, 25bp, 50bp or
200bp, preferably 200bp.In a further embodiment, according to any of the above-described method, wherein the RNA secondary structure is
The RNA secondary structure of mature transcript.In a further embodiment, according to any of the above-described method, used in dash forward
Become sequence data and comes from ICGC database, TCGA database, thousand human genome databases or other somatic mutation data.Into
In the embodiment of one step, according to any of the above-described method, wherein this method further includes filtering all idels;By the prominent of hg38
Become the mutation for being changed to hg19;Removal reduces the single nucleotide variations (SNV) of confidence level.In a further embodiment, according to
Any of the above-described method, wherein the single nucleotide variations that removal reduces confidence level are to filter out to compare abnormal area, repetitive sequence
Region and hyperfrequency mistake generation area.In a further embodiment, according to any of the above-described method, wherein further including choosing
A major transcript of each gene is selected, and cancer somatic mutation is annotated.In a further embodiment,
According to any of the above-described method, wherein calculating the base-pair between cancer sequence and corresponding normal sequence using RNAplfold
Match probability.
A further object of the present invention is to provide a kind of equipment for predicting to be mutated to RNA secondary structure influence degree,
Include:
Processor, including the module for obtaining sequence from database;For calculating the MeanDiff value or EucDiff
The computing module of value;For the sorting module to the MeanDiff value or the sequence of EucDiff value;With for by ranking results it is defeated
Output module out;And memory, it is stored thereon with instruction, described instruction makes described when being executed by the processor
It manages device and executes method described in any of the above embodiments.
It is a further object to provide one kind based on MeanDiff value and the prediction mutation of EucDiff value to RNA bis-
The method of level structure influence degree, comprising:
Cancer sequence and corresponding normal sequence are calculated with the calculation equation of MeanDiff value and EucDiff value respectively
Between base-pair pairing probability difference, the calculation equation of the MeanDiff value and EucDiff value are as follows:
Wherein k is the position being mutated in transcript, and w is window size, BPPref,iAnd BPPalt,iRespectively represent reference sequences
With i-th of base-pair probability of mutant nucleotide sequence, the value range of i is [k-w, k+w];
The MeanDiff value of all mutation and EucDiff value are sorted from large to small respectively, MeanDiff value and EucDiff
The all forward mutation of value sequence, predicts that the mutation is big to RNA secondary structure influence degree.
It is above-mentioned based on MeanDiff value and EucDiff value prediction mutation to RNA secondary structure influence degree method into
In the embodiment of one step, according to above-mentioned any one method, wherein the mutation of sequence preceding 2.5% is riboSNitch;Row
2.5% mutation is non-riboSNitch after sequence.According to above-mentioned any one method, wherein w is 2bp, 5bp, 10bp,
15bp, 20bp, 25bp, 50bp or 200bp, preferably 200bp.According to above-mentioned any one method, wherein the RNA secondary structure
It is the RNA secondary structure of mature transcript.According to above-mentioned any one method, used in mutant nucleotide sequence data come from ICGC
Database, TCGA database, thousand human genome databases or other somatic mutation data.According to above-mentioned any one method,
Wherein this method further includes filtering all idels;The mutation of hg38 is changed to the mutation of hg19;Removal reduces the monokaryon of confidence level
Thuja acid makes a variation (SNV).According to above-mentioned any one method, wherein the single nucleotide variations that removal reduces confidence level are to filter out ratio
To abnormal area, repetitive sequence region and hyperfrequency mistake generation area.According to above-mentioned any one method, wherein further including choosing
A major transcript of each gene is selected, and cancer somatic mutation is annotated.According to above-mentioned any one method,
The base-pair between cancer sequence and corresponding normal sequence wherein, which is calculated, using RNAplfold matches probability.
It is a further object to provide it is a kind of for predict mutation to the equipment of RNA secondary structure influence degree,
Include:
Processor, including the module for obtaining sequence from database;For calculating MeanDiff the and EucDiff value
Computing module;Sorting module for sorting to MeanDiff the and EucDiff value;For the two ranking results to be taken friendship
The module of collection;With the output module for that will sort and intersection result exports;And memory, it is stored thereon with instruction, it is described
Instruction executes the processor according to method described in any of the above embodiments.
It is a further object to provide be rich in or lack riboSNitch element in a kind of prediction cancer gene group
Method, comprising:
Using the mutation of prediction as prediction group;Using the mutation actually occurred in cancer as observation group, wherein different patients
In the same mutation that same site occurs by separate counts;Calculate the mutant nucleotide sequence of prediction group and the practical mutation sequence of observation group
The RNA secondary structure of column;
MeanDiff value and EucDiff value, the calculation equation for calculating separately each mutation in two groups are as follows;
Wherein k is the position being mutated in transcript, and w is window size, BPPref,iAnd BPPalt,iRespectively represent reference sequences
With i-th of base-pair probability of mutant nucleotide sequence, the value range of i is [k-w, k+w];
The MeanDiff value of all mutation and EucDiff value are sorted from large to small respectively, MeanDiff value and EucDiff
The intersection for being worth highest preceding 2.5% mutation is prediction group and the corresponding riboSNitch of observation group;
The riboSNitch number of comparison prediction group and observation group carries out unilateral Fisher and accurately examines and hypergeometry point
Cloth is examined, and to identify significant enrichment riboSNitch, and obtains false discovery rate (FDR) using the p value that the correction of BH method is examined
Value;After correcting FDR, the result of FDR < 0.05 is considered as the element for being rich in or lacking riboSNitch.
The further embodiment party of the method for riboSNitch element is rich in or lacked in above-mentioned prediction cancer gene group
In case, according to the method for any one, wherein the preparation method of the mutant nucleotide sequence of prediction group is the mutation composed according to intragenic mutation
Rate three base number corresponding with each transcript, obtains the random mutation number of each three base situation of each transcript,
Carry out duplicate random sampling according to mutation number, to the transcript sequence of each gene, according to neutral mutation rate carry out with
Machine mutation, to obtain prediction group;The neutral mutation rate in cancer gene group is wherein indicated using intragenic mutation spectrum.Into
In the embodiment of one step, according to the method for any one, wherein w is 2bp, 5bp, 10bp, 15bp, 20bp, 25bp, 50bp or
200bp, preferential 200bp.In a further embodiment, according to the method for any one, wherein random mutation is directed to non-coding
Area carries out, and preferably noncoding region is 5 ' UTR, 3 ' UTR and/or IncRNA.In a further embodiment, according to any one
Method, wherein random mutation number is 1000 times.In a further embodiment, according to the method for any one, wherein using
Mutant nucleotide sequence data come from ICGC database, TCGA database, thousand human genome databases or other somatic mutation data.
In a further embodiment, according to the method for any one, wherein further comprising will be in cancer gene group
RiboSNitch is compared with other riboSNitch, is accurately examined using unilateral Fisher whether to determine riboSNitch
It is enriched in cancer gene group region, P value is less than 10-3All elements be considered as cancer specific enrichment or missing
The element of riboSNitch.In a further embodiment, according to the method for any one, wherein being calculated using RNAplfold
The RNA secondary structure of sequence.In a further embodiment, according to the method for any one, wherein w is 200bp, single base
Across window be 150bp.
It is a further object to provide one kind for predicting to be rich in or lack riboSnitch in cancer gene group
The equipment of element, comprising: processor, including the module for obtaining sequence from database;For calculating described in claim 1
The computing module of MeanDiff and EucDiff value;Sorting module for sorting to MeanDiff the and EucDiff value;With
In the module that the two ranking results are taken to intersection;For the module of comparison prediction group and the riboSNitch number of observation group, use
In the module tested;For identifying the module of the element of enrichment or missing riboSNitch;It is defeated for exporting result
Module out;And memory, it is stored thereon with instruction, described instruction holds the processor when being executed by the processor
Row is according to method described in any of the above embodiments.
The invention also includes the computer-readable mediums for being stored with above-mentioned any instruction, wherein described instruction is by handling
The method that the processor that device is when executing executes any of the above-described.
Beneficial effects of the present invention are as follows: method provided by the invention can predict cell mutation to RNA secondary structure
Influence degree.The software of entitled SNIPER provided by the invention can be used in predicting riboSNitch and identify in tumour to be rich in
Or the non-coding element of missing riboSNitch.The present invention analyzes in cancer in somatic mutation for the first time
RiboSNitch has found that the riboSNitch in cancer is easier to cause a disease.The present invention constructs the random mutation model of three bases,
And the element of significant enrichment or missing riboSNitch in cancer are predicted, these elements are by it is considered that be non-in cancer disease process
It is often important, and close non-coding region gene element is associated with cancer occurrence and development, may by change RNA secondary structure come
Adjust the expression of gene or protein.
Detailed description of the invention
Fig. 1 is SNIPER flow chart.
The ROC curve of Fig. 2 distinct methods different windows size.A figure respectively indicates MeanDiff with B figure and EucDiff exists
ROC curve under different windows.C figure indicates to take preceding 2.5% MeanDiff and EucDiff to hand over when window size is 50bp
Collection sports riboSNitch, while taking rear 2.5% when sporting riboSNitc of MeanDiff and EucDiff intersection
ROC curve.
Fig. 3 is the Difference in Pathogenicity of riboSNitch and non-riboSNitch.A figure respectively indicates ICGC data with B figure
The Difference in Pathogenicity distribution situation of riboSNitch (red) and non-riboSNitch (blue) in collection and TCGA data set.
All mutation are divided into 5 class shown in figure according to the scoring situation of FATHMM by us, and conspicuousness is calculated by Chi-square Test.C
Figure and D figure respectively indicate the riboSNitch (red) and non-riboSNitch (blue) of ICGC data set and TCGA data set
FATHMM score distribution situation.P value passes through Mann-Whitney checking computation.E figure indicates in benign and disease cause mutation
Ratio riboSNitch (red) and non-riboSNitch (blue) shared in all mutation.Conspicuousness is examined by card side
Test calculating.
Fig. 4-7 is distribution of the value of MeanDiff or EucDiff in different mutation types.
Fig. 8 is the non-coding element that riboSNitch is enriched in cancer.Manhattan figure is respectively represented rich in riboSNitch
5'UTR (A figure), 3'UTR (B figure) and incRNA (C figure), wherein being only labelled with the non-coding element of FDR < 0.2.Runic gene
Indicate FDR < 0.05, Blue Gene shows that the gene is accredited as a kind of element of cancer specific enrichment.
Fig. 9 is the non-coding element that riboSNitch is lacked in cancer.Manhattan figure respectively represents missing riboSNitch
5'UTR (A figure), 3'UTR (B figure) and incRNA (C figure), wherein being labelled with the non-coding element of FDR < 0.2.Runic gene table
Show FDR < 0.05, Blue Gene shows that the gene is accredited as a kind of element of cancer specific missing.
Specific embodiment
Below in conjunction with specific embodiment the present invention is described in detail, cannot be construed as limiting the scope of the invention.
Embodiment
1.1 materials and method
1.1.1 data collection
Most of cancer somatic mutation data used in this research come from ICGC and TCGA database.In addition, we
The somatic mutation of the melanoma of 25 genome sequencings and the gastric cancer of 100 genome sequencings is also collected20,21。
The accidental data collection of normal person is obtained from thousand human genome databases, and the present invention uses the data of Phase 322。
Firstly, we have filtered out all indels, point mutation is only remained for further analyzing.Then, it uses
The mutation that UCSC liftOver kit annotates all hg38 is changed to the mutation of hg1923.In order to remove the SNV of low confidence,
The germ line mutation of the somatic mutation of cancer databases and thousand human genomes is filtered by we, the section intersection of filtering be from
(https: //personal.broadinstitute.org/anshul/projects/ of Broad Institute downloading
encode/rawdata/blacklists).The list is all blacklist regions that wherein hg19 refers to genome as the mankind
Set, these blacklist regions include compare abnormal area, repetitive sequence region and hyperfrequency mistake generation area etc. be easy
Occur to compare wrong region.
We predict riboSNtiches and non-riboSNitch potential impact using fathmm-MKL24.According to cause
Characteristic of disease scoring, we fall into 5 types all variations: benign (scoring ∈ [0,0.2)), may be benign (scoring ∈ [0.2,0.4)),
Potential pathogenic (scoring ∈ [0.4,0.6)), may be pathogenic (score ∈ [0.6,0.8)) and pathogenic (score ∈ [0.8,
1])。
MiRNA and the information of target interaction are from TargetScan v7.1 and miRanda-mirSVR data
Library25,26.Only selection has the microRNA binding site of high confidence level in our analysis: for TargetScan, using
The binding site of the conservative miRNA family of PCT >=90;For miRanda, using > 1 with the conservative of high mirSVR scoring
MiRNA binding site.RBP and the information of target interaction come from CLIPdb, and the present invention mainly uses HeLa cell line
CLIP-seq data27.Binding site comprising the prediction of a variety of methods in CLIPdb database is as a result, the present invention mainly uses
The prediction result of PiRaNhA28。
Furthermore it is known that Cancer Gene Census (CGC) of the cancer gene in COSMIC database29, wherein
Including existing oncogene and tumor suppressor gene.The relevant incRNA of cancer is downloaded from Lnc2Cancer database30, include
It may all incRNA information relevant to cancer.
1.1.2 gene annotation
The GTF file of the annotation coordinate of gene comes from the website ENCODE (https: //www.gencodegenes.org).
The present invention mainly uses the annotation information of GENCODE v1931.From GTF file, gene is divided by " gene_type "
For protein coding gene, pseudogene, long non-coding RNA and other small Noncoding genes etc..It is prominent in order to more accurately annotate
Gene where becoming, we obtain 19035 human protein's encoding gene lists and length 3435 from HGNC database
Non-coding RNA list32.In subsequent research of the invention, we only account for encoding gene in HGNC database and
Somatic mutation on incRNA.
It is worth noting that, a gene can generate multiple transcripts by alternative splicing, and different primary sequences,
Different secondary structures may be will form, i.e., different transcripts will form different RNA secondary structures.In order to reduce it is this not
Certainty, while reducing calculation amount, in the present invention, the one of each gene has only been selected in our RNA structure prediction research
A major transcription sheet.In order to obtain major transcription sheet, We conducted following steps: (1) we will transcribe coding base more first
Because being ranked up according to major splice isotype (APPRIS) level, APPRIS is turn of alternative splicing gene in human genome
Record isomers provides reliable classification schemes33.APPRIS horizontal extent is from 1 to 5, wherein 1 is considered as most reliable transcription
This.(2) if multiple transcripts of gene APPRIS having the same is horizontal, the transcript quilt containing CCDS ID
It is considered more reliable transcript34.(3) it if there is multiple transcripts APPRIS having the same is horizontal and has CCDS ID, then presses
According to the horizontal sequence of RNA annotation in GENCODE, wherein 1 is most stable of transcript.It (4) if cannot be from all above methods
Middle selection major transcription sheet, then will select longest transcript as major transcription sheet.In brief, the selection of major transcript
It is generated according to following priority: APPRIS > CCDS > transcript steady level > transcript length.
After the major transcript for selecting each gene, all cancer somatic mutations are annotated.Furthermore, it is necessary to
What is reminded is in analysis of the invention, we only account for the RNA secondary structure of mature transcript.Finally, we are from TCGA
Have found 3332314 cancer somatic mutations in database, ICGC database and research before, and from 1000 genomes
1917818 germ line mutations are had found in data set.
1.1.3 RNA secondary structure prediction and riboSNitch detection
It is of the invention that we mainly use RNAplfold (http://www.tbi.univie.ac.at/RNA/) to RNA's
Secondary structure predicted, RNAplfold is one in ViennaRNA for predicting the software of partial rna secondary structure8。
In view of RNA folding is a total transcription, the i.e. process of side transcription edge contraction.We are big by maximum base span and window
Small parameter is respectively set to 150bp and 200bp35.The each site and other sites of the available given sequence of RNAplfold
Life base pairing probability.Using the base pairing probability matrix (BPPM) of output, we can reliably detect wild
The difference of RNA structure between type and mutant, it can calculating each mutation influences size to the secondary structure of the part RNA.
The definition of riboSNitch is the SNV having a significant impact to partial rna secondary structure6.In cancer gene group, it is
Influence of the prediction somatic mutation to secondary structure, we can use RNAplfold calculate tumour sequence and it is corresponding just
Pairing probability variation between Chang Xulie, further predicts the riboSNitch in somatic mutation.Due to lacking primitive sequencer
Data, we use the reference sequences of each gene as normal sequence, that is, standardize.The reference sequences of each transcript by
Getfasta module in BEDTools extracts36, used in gtf file be gencode v19.By with mutation
Base replacement obtains corresponding tumour sequence referring to base.Later, RNAplfold is applied to normal sequence and cancer sequence
To predict the base-pair probability in each site.In our study, we eliminate intron sequences, only predict mature turn
Record the RNA secondary structure of object.
In order to calculate RNA secondary structure difference, we are counted using two different methods (MeanDiff and EucDiff)
The difference of base-pair pairing probability (BPP) is calculated, that is, indicates to influence the size of local secondary structure due to point mutation.It is noticeable
Be that structure change is not limited to single base, thus we to calculate base-pair in the wbp window size around mutational site general
The change (w=200bp in subsequent calculating in the present invention) of rate.The calculation equation of MeanDiff and EucDiff is respectively as follows:
Wherein k is the position being mutated in transcript, and w is window size, BPPref,iAnd BPPalt,iRespectively represent reference sequences
With i-th of base-pair probability of mutant nucleotide sequence, the value range of i is [k-w, k+w].All mutation are finally according to MeanDiff
It is sorted from large to small with EucDiff value.
In addition, it is contemplated that it is known mutation to local secondary structure have tremendous influence, and may influence miRNA or
The combination of RBP and target binding site.So we also all riboSNitch and non-riboSNitch are navigated to it is known
MiRNA and RBP binding site on.In our analysis, according to the binding site determined in 1.1.1, it is believed that place
Mutation around the binding site or its within 20bp is with the somatic mutation for influencing miRNA and RBP combination potentiality.
1.1.4 using the performance of standard data set assessment MeanDiff and EucDiff
In order to determine the performance of verifying MeanDiff and EucDiff, we used pass through reality in the article reported before
It tests identified riboSNitch and non-riboSNith sequence to be verified as standard data set sequence, these data sets
It is the standard data set obtained by large-scale experimental data6,35.Data set includes 1058 riboSNitche and 1058
Non- riboSNitche sequence, and total length is 101bp.RNAplfold is used to calculate the RNA structure of all sequences, accordingly
MeanDiff and EucDiff is calculated using different window sizes, is respectively 2bp, 5bp, 10bp, 15bp, 20bp, 25bp by w
With 50bp (arriving max-window value).According to the research proposal of forefathers, preceding 2.5% MeanDiff or EucDiff is considered as
RiboSNitch, rear 2.5% mutation are non-riboSNitch35.Finally, the ROC for calculating MeanDiff and EucDiff is bent
Line assesses the performance of the MeanDiff and EucDiff of different window sizes by AUC value.Wherein, AUC value and ROC curve are equal
It is calculated and is obtained using R packet " pROC "37。
1.1.5 it detects significant enrichment or lacks the non-coding element of riboSNitch
Cancer is an evolutionary process, wherein a large amount of somatic mutations are the neutral mutations not selected.We use base
The neutral mutation in cancer is simulated in the random mutation program of the spectrum of mutation, the data group of this random mutation is known as " pre- by we
Survey group ".The random mutation rate of the prediction group can from intergenic region or include subregion calculate because the two regions be not by
Select the neutral mutation region of pressure.So in our researchs of the invention, for the neutrality in more accurate simulation cancer
Selection, we select to compose using cancer intragenic mutation to indicate the neutral mutation rate in cancer gene group.Due to TCGA data
The intragenic mutation of concentration is seldom, so we only used ICGC data set in follow-up study of the invention.
In view of different transcripts has different backgrounds, during prediction group simulates random mutation, Wo Menye
Need to consider the nucleotide sequence situation of different transcripts itself.In view of mutation rate and mutation type with transcript itself
Sequence background is related, and in the calculating process of mutation rate, we consider 96 kinds of different mutation types, that is, considers mutation position
The previous base (5 ' end) and the latter base (3 ' end) of point.We calculate the mutation of 96 possible intragenic mutation spectrums
Frequency, i.e., 3 ' alkali of 5 ' base *, 4 seed type of 6 kinds * 4 seed types of mutation type (including C > A, C > G, C > T, T > A, T > C, T > G)
Base;In addition, we are during random mutation, it is also contemplated that the sequence differences of different transcripts itself, likewise, we
The base at 5 ' ends and the base at 3 ' ends are considered, 32 kinds of corresponding three nucleotide, i.e. 4 seed type of base * of 2 seed types are amounted to
5 ' base *, 4 seed type 3 ' bases.Wherein, it is contemplated that the complementary pairing principle of base, research of the invention only considered two
Kind base: cytimidine (C) and thymidine (T).In conclusion according to the above-mentioned mutation rate for obtaining introne and each transcription
This corresponding three bases number, can be obtained the random mutation number of each three base situation of each transcript.It is subsequent, Wo Mengen
Carry out duplicate random sampling according to obtained mutation number, to the transcript sequence of each gene, according to neutral mutation rate into
Row random mutation, to obtain our prediction group.Random sampling is simulated using R.In addition, respectively to each in the present invention
The non-coding region (including 5 ' UTR, 3 ' UTR, incRNA) of a transcript random mutation 1000 times.
After obtaining the sequence of random mutation, the RNA second level knot that RNAplfold calculates mutant nucleotide sequence is can be used in we
Structure can therefrom calculate the MeanDiff and EucDiff of each mutation, i.e., mutation is to secondary structure effect.In order to subtract
Less predict riboSNitch in false positive, in research of the invention, by MeanDiff or EucDiff value it is highest before
The intersection of 2.5% mutation is considered as riboSNitch.Since our research is concentrated mainly on the mutation of non-coding region, so I
Random mutation only is carried out to non-coding region, i.e., the random mutation based on intragenic mutation spectrum is also only in the region of non-coding
Simulation and calculating.
After having obtained prediction group data by random mutation, we are using the somatic mutation actually occurred in cancer as true
Real observation group.The riboSNitch number of the observation group really occurred and the riboSNitch number of prediction group are compared, it can
Finding the Genetic elements of significant enrichment and significant missing riboSNitch in cancer gene group, (element in the present invention mainly only wraps
Include non-coding region: UTRs and incRNA).Since in same site same mutation may occur for different patients, so seeing
Examining in group such mutation, we are also simulated using repeatable random mutation by separate counts, and in prediction group.
So by comparing the riboSNitch number of observation group and prediction group, i.e., predictable significant enrichment and missing riboSNitch
Element.
After obtaining observation group riboSNitch and expected riboSNitch quantity, subsequent we pass through progress unilateral side
Fisher is accurately examined and hypergeometric distribution is examined to identify significant enrichment riboSNitch, and examined using the correction of BH method
P value obtains FDR value38.After correcting false discovery rate, the result of FDR < 0.05 is considered as the member of enrichment or missing riboSNitch
Part.All processes and code are packaged into software SNIPER using perl program, and statistical analysis is completed using R.
1.1.6 the element of cancer specific enrichment or missing riboSNitch
Cancer specific element in order to obtain, we by cancer gene group riboSNitch and 1000Genome data
The riboSNitch of concentration is compared.It is accurately examined using unilateral Fisher to determine riboSNitch whether in cancer base
Because being enriched in group region.P value is less than 10-3All elements be considered as the element of cancer specific.
1.1.7 SNIPER software package function introduction
SNIPER process software package mainly includes two parts, first part be calculate mutation MeanDiff and
EucDiff value, the second part are to be enriched with or lacked riboSNitch's according to the mutation rate of cancer sample introne
Element.
Firstly, being predicted on all somatic mutations of ICGC data set and each transcript respectively according to 96 kinds of mutation types
The RNA secondary structure for 1000 random mutations that intragenic mutation frequency and trinucleotide distribution situation generate.Then, it utilizes
MeanDiff and EucDiff calculates the secondary structure difference of sequence after reference sequences and mutation.Then, preceding 2.5%
The mutation of meandiff and EucDiff is defined as riboSNitch, and rear 2.5% MeanDiff and EucDiff is defined as
non-riboSNitch.RiboSNitch quantity observe by comparing and expected can detecte enrichment or missing
The element of riboSNitch.
1.2 conclusion
1.2.1 MeanDiff and EucDiff is the effective ways for detecting riboSNitch
For each mutation, group is referred to according to the mankind, we can replace with gene reference sequence Central Plains base prominent
Base after change is to obtain mutant nucleotide sequence.Then, we utilize RNA secondary structure prediction software prediction reference sequences and mutation
The RNA secondary structure of sequence, the subsequent influence that can be mutated according to secondary structure prediction to RNA structure.According to document before for
The research proposal of riboSNitch35, we select the algorithm RNAplfold based on BPPM to predict RNA conformation.Meanwhile considering
It is total transcription to RNA folding, so predicting the probability of the RNA secondary structure of mutation part also more using RNAplfold
Properly.
Invention introduces two kinds of new methods to calculate riboSNitch:MeanDiff and EucDiff.In order to assess this
The performance of two methods, we used 1,058 riboSNitch and 1, the normal datas of 058 non-riboSNitch sequence
Collection, and length is 101bp.The data set is the riboSNitch found in a three-person household by laboratory facilities PARS
With non-riboSNitch sequence6,35.Let us choose MeanDiff value or EucDiff value is minimum and maximum 2.5% thinks
It is riboSNitch and non-riboSNitch.By comparing ROC curve and AUC value, it has been found that MeanDiff and EucDiff
Better than the optimum value in previous articles under identical conditions, prediction result (table 1) that when value is obtained using software SNPfold.
Illustrate that MeanDiff and EucDiff can more accurately distinguish riboSNitch and non-riboSNitch mutation.
Pass through MeanDiff, the AUC value of EucDiff and SNPfold. method under 1 different windows size of table
* NA indicates not providing the result under such window in Corley et al (2015)
In order to verify window size to prediction riboSNitch influence, we be provided with window size be 2bp, 5bp,
10bp, 15bp, 20bp, 25bp, 50bp (i.e. standard data set can take maximum value) simultaneously compare.From Fig. 2, it may be seen that
With the increase of window value, AUC value is higher, i.e., better to the prediction effect of riboSNitch and non-riboSNitch.So
In subsequent research, we can be by way of improving window size, the accuracy of Lai Tigao riboSNitch prediction.It examines
Considering RNA folding is a total transcription, and subsequent we use riboSNitch in RNAplfold searching cancer gene group
When, the window size used is 200bp, and the window of crossing over of single base is 150bp.
In addition, according to ROC curve, it is observed that concentrating in this group of normal data, the performance of MeanDiff is slightly better than
EucDiff.But in order to reduce error rate, in subsequent research, we are by preceding 2.5% MeanDiff value and EucDiff value
Those of intersection mutation is considered riboSNitch, and those of rear 2.5% MeanDiff value and EucDiff value intersection are mutated
It is considered non-riboSNitch.The riboSNitch and non-riboSNitch obtained in this way, final AUC value
It can reach 0.774 (Fig. 2), further improve the accuracy of riboSNitch prediction, this facilitates our subsequent prediction cancers
RiboSNtich in genome.
1.2.2 the riboSNitch and non-riboSNitch in cancer gene group
In order to find the riboSNitch in cancer gene group, we have collected a large amount of somatic mutation data, including
The body cell of the full genome sequencing of 25 melanomas and 100 gastric cancers of TCGA and ICGC data set and previous publications is prominent
Parameter evidence20,39.Due to only having sub-fraction sequence that can be transcribed into RNA in genome and being folded into transcript, so at me
Research in only used and fall in very at least part of point mutation data.It is included in addition, having been filtered out in our analysis
Son mutation only considers the point mutation on mature transcript.Various cancers, somatic mutation quantity on transcript regions and
The quantity of riboSNitch is in table S1.
All fall in after the mutation on exon is obtained, it is contemplated that individual gene variable can be cut by different
It connects, forms a variety of different transcripts, is i.e. a point mutation may have different influences to different transcripts.In order to reduce
Operand has only selected a most important transcript, the choosing of the transcript for each gene in our subsequent analyses
Selecting mode has detailed introduction in 1.1.2 gene annotation part.RiboSNitch and non-in somatic mutation in order to obtain
RiboSNitch mutation, we predict the secondary structure of the sequence before and after somatic mutation with RNAplfold, and general by what is obtained
Rate matrix calculates MeanDiff and EucDiff value.Finally, the body of preceding 2.5% MeanDiff value and EucDiff value intersection is thin
Cytoplasmic process change is considered riboSNitch, and the somatic mutation of the MeanDiff value of tail 2.5% and EucDiff value intersection is considered
non-riboSNitch。
1.2.3 the riboSNitch in cancer gene group is it is more likely that pathogenic mutation
In order to determine whether riboSNitch and non-riboSNitch may cause different functional consequences, we are used
Fathmm-MKL annotates the function effect scoring of somatic mutation24.In view of fathmm-MKL prediction score be one from 0 to
1 continuous decimal, we are subsequent to be divided into score in 5 sections from low to high: benign, possible benign, potential pathogenic, Ke Nengzhi
Characteristic of disease and pathogenic.We have found that the functional consequences of riboSNitch and non-riboSNitch from benign to pathogenic not
Deng especially having especially significant variation (Fig. 3 A and 3B) in benign and pathogenic.RiboSNitch as a whole
Fathmm-MKL scores also above non-riboSNitch.Since higher score shows that mutation is more pathogenic, it is concluded that
RiboSNitch ratio non-riboSNitch is more likely to cause a disease (Fig. 3 C and 3D) in cancer gene group, this is also and before normal
It matches in research in human genome16.These results also imply that the somatic mutation for significantly changing RNA secondary structure more may be used
It can be pathogenic.
In order to confirm that above-mentioned riboSNitch is more likely to pathogenic as a result, we are from ClinVar, UniProt and mankind's base
Because mutation database (HGMD) has collected 91183 pathogenic mutations in total and 79090 benign mutation40–42, with determination
Whether riboSNitch is easier really occurs in pathogenic mutation.Since most of the mutation in these databases comes from normal person
Class sample, we using thousand human genomes MeanDiff and EucDiff before 2.5% and rear 2.5% corresponding MeanDiff and
EucDiff value is as cutoff, and similarly, using preceding 2.5% mutation intersection as riboSNitch, rear 2.5% mutation is handed over
Collection is used as non-riboSNitch.As shown in FIGURE 3 E, it is seen that riboSNitch and non-riboSNitch is in benign mutation
And have apparent difference in pathogenic mutation, and pathogenic mutation is more likely to be riboSNitch, and benign variant tends to non-
RiboSNitch (P value=2.87E-05, Chi-square Test).Therefore, we are prominent using known benign and pathogenic point in database
Change observes that the distribution of riboSNitch and non-riboSNitch in benign mutation and pathogenic mutation is really different, and causes
Disease mutation contains more riboSNitch really, and contains more non-riboSNitch in benign mutation really.
1.2.4 in cancer gene group riboSNitch feature
In order to determine the more features of riboSNitch in cancer gene group, we first by mutation be divided into 6 kinds it is different
Mutation type (C > A, C > G, C > T, T > A, T > C, T > G).We have found that the value of MeanDiff or EucDiff is in different mutation classes
Distribution in type is different, this shows that different mutation types may have different influences (Fig. 4-7) to RNA secondary structure.
It is interesting that compared with other mutation types in ICGC and TCGA data set, it has been found that C > G is mutated whether
MeanDiff or EucDiff value is all higher, illustrates that influence of C > G mutation to RNA secondary structure is bigger.In addition, in 5 ' UTR
Mutation and the mutation of 3 ' UTR, incRNA and protein-coding region compare, have higher MeanDiff and EucDiff
Value (P < 2.2e-16, Mann-Whitney are examined), i.e. mutation occur to be easier to change RNA secondary structure in 5 ' UTR, this may
It is related with there is many high conservative structural domains on 5 ' UTR.It is interesting that having more than 80% in 5 ' UTR in cancer gene group
Somatic mutation occur on GC base-pair.Due between GC pairs there are three hydrogen bond, the GC base-pair ratio AT base-pair in structure
It is more stable.Therefore, which also demonstrates in oncogene group a large amount of GC to mutation, it may be possible to pass through and destroy part
The stability of RNA structure and then the function of influencing gene perhaps can also help us to explain why C > T is prominent in cancer gene group
Becoming can be more.
1.2.4 the riboSNitch of gene function is influenced
In order to determine riboSNitch whether functional area of the significant enrichment in cancer gene group, Wo Mencong
The higher miRNA binding site of confidence level is had collected in TargetScan and miRanda data set25,26.Simultaneously also from CLIPdb
The binding site of RBP has been collected in database.From result it is observed that compared with non-riboSNitch,
RiboSNitch is aobvious around the RBP binding site (P value=1.79E-07, unilateral Fisher are accurately examined) in cancer gene group
Missing, this is consistent with the research previously to trio family6, illustrate RBP combination target in cancer gene group also by pure
Change selection.However, riboSNtich is enriched with (P value=5E-21, unilateral Fisher are accurately examined) around miRNA combination target,
Show that riboSNitch may combine the function of further influence gene in cancer by influencing miRNA.
1.2.5 the Computational frame SNIPER for predicting to be rich in or lack riboSNitch element
During this investigation it turned out, we are intended to develop a Computational frame, can be identified from mutation riboSNitch and
The method for being rich in or lacking riboSNitch element is identified from cancer somatic mutation.We assume that with cancer gene group
Other genes are compared, and more mutation will be occurred by, which being enriched on the gene of riboSNitch, will affect RNA secondary structure, then illustrate richness
The gene of collection riboSNitch experienced the positive selection of RNA structure during the occurrence and development of cancer.In view of us
MeanDiff and EucDiff both methods prediction riboSNitch is had found, subsequent we are still come using both methods
The riboSNitch quantity in cancer gene group is judged, and by way of duplicate sampling, the case where according to cancer thumping variability
It carries out random mutation and obtains prediction group, the riboSNitch number of prediction group and observation group is finally compared and is counted inspection
It tests, obtain significant enrichment riboSNitch and lacks the element of riboSNitch.The Computational frame and detail of SNIPER exists
1.1.5 partially there is detailed introduction.Different from pervious method, this method uses intragenic mutation rate rather than shows outside
Sub- mutation rate is mutated to simulate, therefore the gene rich in riboSNitch that SNIPER is detected, can be considered as in cancer
By positive selection gene in structure.In addition, we observe riboSNitch it is more likely that causing a disease prominent in the research of 1.2.3
Become, therefore it is presumed that these enrichment riboSNitch genes be also likely to be the important gene of function in cancer gene group,
Considerable effect may also be played in cancer occurrence and development.Similarly, the gene for lacking riboSNitch may be cancer
In structural conservation gene, it may be possible to the indispensable gene of cell, and playing a significant role in cancer cell.
Although primary sequence is critically important for the adjusting of gene expression amount, RNA secondary structure is in rna expression even egg
It also plays an important role in white expression, especially adjustment process after the transcription of influence RNA, such as tied with RBP or miRNA with corresponding
The interaction of coincidence point.Therefore, our method can help us to understand the mutation that these change RNA secondary structure, together
When can be used to identify mutation in significantly change the point mutation of structure, and identify enrichment or lack riboSNitch gene and member
Part, it is believed that these genes for being mutated and being enriched with these mutation may be related to development of cancer, and potential impact gene
Function.
1.2.6 the non-coding element of specific enrichment riboSNitch in cancer is identified
In subsequent research, in order to obtain the non-coding element for being enriched with riboSNitch in full-length genome, we will
SNIPER is enriched with the element of riboSNitch for detecting in ICGC data set in ICGC data set, and where these elements
Function of gene during cancer occurrence and development is analyzed.The RNA secondary structure of UTR region is extremely important, and mutation may
Influence gene expression is combined by changing microRNA or RBP, to facilitate tumour12,14,18,19,43.In our point
In analysis, while using two methods of MeanDiff and EucDiff prediction mutation to the influence degree of secondary structure, and value is preceding
2.5% mutation intersection is as last riboSNitch.After obtaining the riboSNitch number of observation group and prediction group.Most
Afterwards, checking computation significance,statistical is accurately examined and be enriched with using Fisher and obtains P value, and P value is carried out using BH method
Correction38.Since the function of protein coding gene is complicated, SNIPER is only used for the detection of non-coding element by we, and by
In 3'UTR and 5'UTR RNA structure for Gene regulation and translation stability be required.To sum up, the further part master of this section
If coming in identification code gene non-coding region (UTR) and long non-coding RNA (incRNA) using our method SNIPER
Candidate element.
Firstly, SNIPER is applied in the somatic mutation of 5'UTR by we, to find that enrichment changes second level knot in gene
5 ' UTR of structure.It will be seen that there are two the FDR values of 5 ' UTR of gene to be less than 0.05:KAT6A and NOTCH2 from Fig. 8.
The two genes are all closely related gene to occur with cancer, and two genes are in COSMIC cancer gene database
It (CGC) is known cancer gene in.For the gene rich in riboSNitch in the region 5'UTR, it has been found that cancer gene
Enrichment degree is 224 times higher than random distribution (P value < 2.2e-16, Chi-square Test), that is, illustrates cancer gene in our result
Obvious (table S2: different FDR enrichment scores) are enriched with, cancer correlation can be found using SNIPER really by also further illustrating
Gene.Wherein, NOTCH2 is cell-membrane receptor, closely related with the Proliferation, Differentiation of cell.NOTCH2 be both oncogene and
Tumor suppressor gene, it plays an important role in cancer signal path44,45.KAT6A is lysine acetyltransferase gene, previous
Research be proved to participate in and control the cell growth of breast cancer46.In addition, when q value is relaxed to 0.2, RALGPS2 gene
5 ' UTR region also be accredited enrichment riboSNitch region.RALGPS2 be also considered as in cancer potentially drive because
Son, and the gene is proved to affect cell survival and the cell cycle of lung carcinoma cell under study for action47.In conclusion we
It was found that NOTCH2, KAT6A and RALGPS2 are potential cancer driving genes, it was demonstrated that this method of SNIPER, i.e., from influence
The angle of RNA secondary structure is set out, us can also be helped to find the relevant driving gene of cancer.
We also identify the element that riboSNitch is rich in the region 3'UTR with our method.The case where with 5'UTR
Equally, using SNIPER identify 7 regions 3'UTR rich in riboSNitch, including CLCNKB, CYP4B1, SLC9B1,
CCDC104, POLR2M, ACAD11 and DIO1, q value cutoff value are 0.05 (Fig. 8 B).CYP4B1 is a kind of cytochromes enzyme, before
Research in find that the gene finds higher expression in bladder tumor patients48.SLC9B1 is a kind of Na+/H+ transport protein,
Help to maintain Cell Homeostasis49.POLR2M is rna plymerase ii subunit M, is played a crucial role in genetic transcription,
It is considered as the candidate driving gene of prostate cancer50.ACAD11 is the gene in Acyl dehydrogenase family, it participates in cell survival
And it plays a crucial role in TP53 related pathways51.DIO1 genes encoding Type I iodine thyronine takes off iodine enzyme, be cell Proliferation,
The important regulatory factor of differentiation and metabolism52.In addition, when the cutoff of q value is reduced to 0.2, SMO, SRPK1, FOXD4 and
The 3'UTR of DBP is also accredited as enrichment riboSNitch.Wherein, SRPK1 is clearly reported with tumor inhibition effect, and
It and is candidate driving gene53,54.To sum up, illustrate that SNIPER can not only identify the cancer driving element of 5 ' UTR, also can successfully reflect
Make the cancer driving element on 3 ' UTR.
In order to determine the element of the element really cancer specific whether being determined above enrichment riboSNitch, i.e. this yuan
The riboSNitch number of part in cancer with have apparent difference in normal person.We will observe in cancer gene group
The riboSNitch observed in riboSNitch and thousand human genomes is compared.In 5'UTR, it has been found that KAT6A
It is the element of cancer specific enrichment riboSNitch with RALGPS2.In 3 ' UTR, other than CLCNKB and SMO, separately
Outer nine genes (including CYP4B1, SLC9B1, CCDC104, POLR2M, ACAD11, DIO1, SRPK1, FOXD4, DBP) are reflected
It surely is 3 ' the UTR elements rich in riboSNitch of cancer specific.These results indicate that cancer specific riboSNitch is rich
Element of set is known as the driving element for being likely to become the relevant function element of cancer or presumption.
For incRNAs, when the cutoff of q value is 0.05, only USP30-AS1 is accredited as enrichment riboSNitch
IncRNA, and this incRNA be not cancer specific enrichment riboSNitch element.It is relaxed to when by q value cutoff
When 0.1, other three incRNA are identified, comprising: LINC01365, ZNF503-AS1 and LINC00689.Wherein ZNF503-
AS1 and LINC00689 is predicted to be the incRNA (Fig. 8 C) of cancer specific enrichment riboSNitch.It is interesting that ZNF503-
AS1 can promote the proliferation and migration of pigment epithelial cell, and the table of ZNF503-AS1 by adjusting its antisense gene ZNF503
Up to the prognostic indicator for having proved to be squamous cell lung carcinoma55,56.In the website FuncPred, one is predicted by idiotype network
The website of incRNA function57, it has been found that three kinds of incRNA predicted above (USP30-AS1, ZNF503-AS1 and
LINC00689) there is potential correlation with cancer disease process and corresponding FDR is respectively less than 0.05.
1.2.7 the non-coding element of specific deficiency riboSNitch in cancer is identified
Our method can also be used for the element of the significant missing riboSNitch in prediction cancer gene group, these RNA
The relevant element of secondary structure may be the indispensable element in cancer occurrence and development.We identify 5'UTR, 3'UTR and
The element (table S3) of riboSNitch is significantly lacked in incRNA.When the cutoff of q value is set as 0.05, it was found that 4 significant
The 5'UTR element (ING3, RBM22, NSA2 and TAF2) of riboSNitch is lacked, all these elements are all that cancer specific lacks
Lose the element (Fig. 9 A) of riboSNitch.It has been found that the 3'UTR elements of 22 significant missing riboSNitch, but only
Two elements (KPNA4 and GABBR2) are accredited as cancer specific.In 3'UTR element, it has been found that ING3, RBM22,
NSA2 and KPNA4 is shown as the conditionity indispensable gene in cancer cell in OGEE v2 database58.Our result of study mentions
Having supplied cancer specific riboSNitch depleted region may be the evidence of cancer indispensable element.In addition, being in q value cutoff
When 0.05, it has been found that the incRNA of 7 significant missing riboSNitch, but only discovery LINC00698 is cancer specific
IncRNA, the up-regulation of the gene is proved to may be related to the occurrence and development of cancer59。
1.3 discussing
Our research is cut from the angle of bis- elder sister's structure of RNA, provides potential cancer in a kind of detection cancer for everybody
The new method of gene is driven, and highlights the mutation of influence RNA secondary structure on noncoding region to cancer gene expression regulation
Importance.As far as we know, this research is for the first time to significantly changing RNA second level in two cancer databases of TCGA and ICGC
The comprehensive study that the somatic mutation of structure is analyzed.We have found that different mutation types has RNA secondary structure
There is different influences, and this mutation is enriched in around the binding site of miRNA, but again around the binding site of RBP
Missing.These results indicate that somatic mutation can also influence cancer by changing RNA secondary structure in cancer gene group
Occurrence and development19, in some instances it may even be possible to there are the potentiality for adjusting gene or protein expression12,13,18。
Additionally, it has been found that in cancer gene group riboSNitch it is more likely that disease cause mutation, this with exist before
Conclusion in trio family is coincide6, illustrating that these mutation that RNA secondary structure is significantly changed in cancer gene group more have can
Can be related to disease, even result in the generation of cancer.Therefore, we have developed a new method SNIPER for detecting cancer
In riboSNitch, and predict non-coding region be rich in riboSNitch element.We are composed based on cancer intragenic mutation
With one neutral mutation model of the trinucleotide background constructing of each transcript, for constructing the pre- of a cancer random mutation
Survey group.By comparing the riboSNitch of riboSNitch and prediction group in cancer databases, so that it may prediction be rich in or
Lack the element of riboSNitch.In view of 96 kinds of spectrums of mutation for used in the analysis at us being introne, this makes
SNIPER can more effectively detect the positive selection signal in observation group during prediction.As the present invention above shown in, richness
Non-coding element containing riboSNitch is likely to become cancer driven factor, and the non-coding element for lacking riboSNitch is big
Part is all the indispensable gene of cancer gene group.In addition, we also identify significant enrichment or lack riboSNitch's
IncRNA, but also need more experimental datas and then function of these incRNA in cancer progression can be studied.To sum up institute
It states, we successfully construct a method SNIPER, which finds in cancer gene group is rich in
The element of riboSNitch and missing riboSNitch.In addition, our method also can help us to identify more
Cancer driven factor and indispensable gene.
Currently, having developed many experimental techniques and software to detect and analyze the RNA second level knot of full-length genome
Structure6,9,10,60,61.In view of RNA secondary structure whether in vivo or is all in vitro height change, therefore still it is difficult
RNA secondary structure is accurately predicted by single method.There is software to identify list by integrating different calculation methods
A mutation influences RNA secondary structure.It will be appreciated, however, that the riboSNitch based on experimental data prediction is still than existing
Software prediction riboSNitch it is more effective62.Consider that we lack the experimental data of cancer gene group RNA secondary structure, therefore originally
In order to study the mutation of the riboSNitch in full-length genome cancer databases in invention, predicted using MeanDiff and EucDiff
Influence of the somatic mutation to RNA secondary structure is still acceptable.As shown in Fig. 2, the performance of MeanDiff and EucDiff
It is much better than the other methods listed in previous research.Therefore, it is predicted using both methods in cancer gene group
RiboSNitch is feasible.Certainly, we also need the reason of further making great efforts to explore the RNA secondary structure that mutation influences,
Further exploring riboSNitch influences the associated biomolecules mechanism of gene expression.
In the existing method using non-coding driving element in somatic mutation identification cancer, mainly by comparing non-volume
Mutation rate between target region and corresponding flank region finds positive selection signal in code region, and this method can help us
Find more function element, promoter, enhancer and silencer etc.63,64.In the present invention, we have developed a kind of new
Method SNIPER, it is by comparing element and prominent at random using the influence of RNA secondary structure caused by somatic mutation as measurement standard
The riboSNitch number of the prediction group of change finally identifies the positive selection signal that secondary structure is significantly changed in cancer gene group.To the greatest extent
Most gene in pipe genome can transcribe, but we are only absorbed in the UTRs and incRNA of detection encoding gene
The situation of change of RNA secondary structure.Although many researchs have been carried out to predict potential functionality incRNA57,65,66, but
The molecular function of a large amount of incRNA still needs to be explored.
New-generation sequencing technology allows the variation to human genome to carry out Whole genome analysis, and it is right to greatly strengthen us
Influence the understanding of RNA secondary structure correlation variation.Especially in cancer gene group, with tiring out for cancer gene group sequencing data
Accumulated amount, we have a large amount of cancer somatic mutation database.Although the observation group that we finally use
2% of riboSNitch total quantity less than general cell mutation quantity, but SNIPER can still find cancer sample from genome
The region of riboSNitch is enriched with or lacked in this.If subsequent accidental data continues to build up, it is believed that it is following we can be into
One step analyzes the distributional difference of the enrichment of various cancers type or missing riboSNitch element.In addition, if subsequent have more
Data and the preferably method of prediction riboSNitch, are beneficial to us and more effectively identify cancer in non-coding region
Candidate driven factor and required element.
In the present invention, in our preliminary analysis cancer gene group riboSNitch characteristic, find in cancer gene
RiboSNitch in group is it is more likely that pathogenic mutation.In addition, we successfully construct a Computational frame SNIPER, to have
Potential driven factor and indispensable gene in the identification cancer of effect.We pass through the spy to RNA secondary structure in cancer gene group
Rope has obtained potential function element relevant to RNA secondary structure.But our method and thinking need more data
It is verified.In brief, we emphasize that importance of the riboSNitch in cancer gene group, but subsequent how to assess
These mutation whether participate in tumour really and these mutation how to influence post-transcriptional control in cancer and gene turns over
It translates, is still a challenge.
Above with detailed description of the preferred embodimentsthe present invention has been described, those skilled in the art are without departing substantially from spirit of that invention the case where
Under, equivalent modification or modification can be made, equally within the scope of the claims.
Bibliography
1.Abbosh,C.et al.Phylogenetic ctDNA analysis depicts early-stage lung
cancer evolution.Nature545,446–451(2017).
2.Mortimer,S.A.,Kidwell,M.A.&Doudna,J.A.Insights into RNA structure
and function from genome-wide studies.Nat.Rev.Genet.15,469–479(2014).
3.Julius,B.,Lucks.Multiplexed RNA structure characterization with
selective 2’-hydroxyl acylation analyzed by primer extension sequencing
(SHAPE-Seq).
4.Underwood,J.G.et al.FragSeq:transcriptome-wide RNA structure
probing using high-throughput sequencing.Nat.Methods7,995–1001(2010).
5.Talkish,J.,May,G.,Lin,Y.,Woolford,J.L.&McManus,C.J.Mod-seq:high-
throughput sequencing for chemical probing of RNA structure.RNA20,713–720
(2014).
6.Wan,Y.et al.Landscape and variation of RNA secondary structure
across the human transcriptome.Nature505,706–709(2014).
7.Bai,Y.,Dai,X.,Harrison,A.,Johnston,C.&Chen,M.Toward a next-
generation atlas of RNA secondary structure.Brief.Bioinform.17,63–77(2016).
8.Hofacker,I.L.RNA Secondary Structure Analysis Using the Vienna RNA
Package.in Current Protocols in Bioinformatics(eds.Baxevanis,A.D.,Petsko,
G.A.,Stein,L.D.&Stormo,G.D.)(John Wiley&Sons,Inc.,2009).
9.Yao,J.,Reinharz,V.,Major,F.&Waldispühl,J.RNA-MoIP:prediction of RNA
secondary structure and local 3D motifs from sequence data.Nucleic Acids
Res.45,W440–W444(2017).
10.Sabarinathan,R.et al.RNAsnp:Efficient Detection of Local RNA
Secondary Structure Changes Induced by SNPs.Hum.Mutat.34,546–556(2013).
11.Lokody,I.RNA:riboSNitch reveal heredity in RNA secondary
structure.Nat.Rev.Genet.15,219–219(2014).
12.Luo,Z.,Yang,Q.&Yang,L.RNA Structure Switches RBP
Binding.Mol.Cell64,219–220(2016).
13.Taliaferro,J.M.et al.RNA Sequence Context Effects Measured In
Vitro Predict In Vivo Protein Binding and Regulation.Mol.Cell64,294–306
(2016).
14.Kutchko,K.M.et al.Multiple conformations are a conserved and
regulatory feature of the RB1 5′UTR.RNA21,1274–1285(2015).
15.Martin,J.S.et al.Structural effects of linkage disequilibrium on
the transcriptome.RNA18,77–87(2012).
16.Halvorsen,M.,Martin,J.S.,Broadaway,S.&Laederach,A.Disease-
Associated Mutations That Alter the RNA Structural Ensemble.PLoS Genet.6,
e1001074(2010).17.Rogler,L.E.et al.Small RNAs derived from incRNA RNase MRP
have gene-silencing activity relevant to human cartilage–hair
hypoplasia.Hum.Mol.Genet.23,368–382(2014).
18.Linnstaedt,S.D.et al.A Functional riboSNitch in the 3′Untranslated
Region of FKBP5Alters MicroRNA-320a Binding Efficiency and Mediates
Vulnerability to Chronic Post-Traumatic Pain.J.Neurosci.38,8407–8420(2018).
19.Sabarinathan,R.et al.Transcriptome-Wide Analysis of UTRs in Non-
Small Cell Lung Cancer Reveals Cancer-Related Genes with SNV-Induced Changes
on RNA Secondary Structure and miRNA Target Sites.PLoS ONE9,e82699(2014).
20.Berger,M.F.et al.Melanoma genome sequencing reveals frequent PREX2
mutations.Nature(2012).doi:10.1038/nature11071
21.Wang,K.et al.Exome sequencing identifies frequent mutation of
ARID1A in molecular subtypes of gastric cancer.Nat.Genet.43,1219–1223(2011).
22.Mu,X.J.,Lu,Z.J.,Kong,Y.,Lam,H.Y.K.&Gerstein,M.B.Analysis of
genomic variation in non-coding elements using population-scale sequencing
data from the 1000Genomes Project.Nucleic Acids Res.39,7058–7076(2011).
23.Rosenbloom,K.R.et al.The UCSC Genome Browser database:2015
update.Nucleic Acids Res.43,D670–D681(2015).
24.Shihab,H.A.et al.An integrative approach to predicting the
functional effects of non-coding and coding sequence
variation.Bioinformatics31,1536–1543(2015).25.Agarwal,V.,Bell,G.W.,Nam,J.-W.&
Bartel,D.P.Predicting effective microRNA target sites in mammalian
mRNAs.elife4,e05005(2015).
26.Betel,D.,Koppal,A.,Agius,P.,Sander,C.&Leslie,C.Comprehensive
modeling of microRNA targets predicts functional non-conserved and non-
canonical sites.
Genome Biol.11,R90(2010).
27.Yang,Y.-C.T.et al.CLIPdb:a CLIP-seq database for protein-RNA
interactions.BMC Genomics16,51(2015).
28.Uren,P.J.et al.Site identification in high-throughput RNA–protein
interaction data.Bioinformatics28,3013–3020(2012).
29.Forbes,S.A.et al.COSMIC:exploring the world’s knowledge of somatic
mutations in human cancer.Nucleic Acids Res.43,D805–D811(2015).
30.Ning,S.et al.Lnc2Cancer:a manually curated database of
experimentally supported incRNAs associated with various human
cancers.Nucleic Acids Res.44,D980–D985(2016).
31.Harrow,J.et al.GENCODE:The reference human genome annotation for
The ENCODE Project.Genome Res.22,1760–1774(2012).
32.Yates,B.et al.Genenames.org:the HGNC and VGNC resources in
2017.Nucleic Acids Res.45,D619–D625(2017).
33.Rodriguez,J.M.et al.APPRIS:annotation of principal and alternative
splice isoforms.Nucleic Acids Res.41,D110–D117(2013).
34.Pruitt,K.D.et al.The consensus coding sequence(CCDS)project:
Identifying a common protein-coding gene set for the human and mouse
genomes.Genome Res.19,1316–1323(2009).
35.Corley,M.,Solem,A.,Qu,K.,Chang,H.Y.&Laederach,A.Detecting
riboSNitch with RNA folding algorithms:a genome-wide benchmark.Nucleic Acids
Res.43,1859–1868(2015).
36.Quinlan,A.R.&Hall,I.M.BEDTools:a flexible suite of utilities for
comparing genomic features.Bioinformatics26,841–842(2010).
37.Robin,X.et al.pROC:an open-source package for R and S+to analyze
and compare ROC curves.BMC Bioinformatics12,77(2011).
38.Hochberg,Y.&Benjamini,Y.More powerful procedures for multiple
significance testing.Stat.Med.9,811–818(1990).
39.Wang,K.et al.Whole-genome sequencing and comprehensive molecular
profiling identify new driver mutations in gastric cancer.Nat.Genet.46,573–
582(2014).
40.Landrum,M.J.et al.ClinVar:public archive of relationships among
sequence variation and human phenotype.Nucleic Acids Res.42,D980–D985(2014).
41.Apweiler,R.et al.UniProt:the Universal Protein
knowledgebase.Nucleic Acids Res.32,D115–D119(2004).
42.Stenson,P.D.et al.The Human Gene Mutation Database:building a
comprehensive mutation repository for clinical and molecular genetics,
diagnostic testing and personalized genomic medicine.Hum.Genet.133,1–9(2014).
43.Lackey,L.L.,Coria,A.,Tolson,C.,McArthur,E.&Laederach,A.Abstract
505:Somatic and inherited riboSNitch in TPT1 and LCP1 mRNA secondary
structures.Cancer Res.77,505–505(2017).
44.Agrawal,N.et al.Exome Sequencing of Head and Neck Squamous Cell
Carcinoma Reveals Inactivating Mutations in NOTCH1.Science333,1154–1157
(2011).
45.Hayashi,T.et al.Not all NOTCH Is Created Equal:The Oncogenic Role
of NOTCH2in Bladder Cancer and Its Implications for Targeted
Therapy.Clin.Cancer Res.22,2981–2992(2016).
46.Turner-Ivey,B.et al.KAT6A,a Chromatin Modifier from the 8p11-p12
Amplicon is a Candidate Oncogene in Luminal Breast Cancer.Neoplasia16,644–655
(2014).
47.Santos,A.O.,Parrini,M.C.&Camonis,J.RalGPS2 Is Essential for
Survival and Cell Cycle Progression of Lung Cancer Cells Independently of Its
Established Substrates Ral GTPases.PLOS ONE11,e0154840(2016).
48.Imaoka,S.et al.CYP4B1 Is a Possible Risk Factor for Bladder Cancer
in Humans.Biochem.Biophys.Res.Commun.277,776–780(2000).
49.Chintapalli,V.R.et al.Transport proteins NHA1 and NHA2 are
essential for survival,but have distinct transport modalities.Proc.Natl.Acad
.Sci.112,11720–11725(2015).
50.Schinke,E.N.et al.A novel approach to identify driver genes
involved in androgen-independent prostate cancer.Mol.Cancer13,120(2014).
51.Jiang,D.et al.Analysis of p53 transactivation domain mutants
reveals Acad11as a metabolic target important for p53 pro-survival
function.Cell Rep.10,1096–1109(2015).
52.P.et al.Restoration of type 1 iodothyronine deiodinase
expression in renal cancer cells downregulates oncoproteins and affects key
metabolic pathways as well as anti-oxidative system.PLoS ONE12,(2017).
53.Gammons,M.V.et al.Targeting SRPK1 to control VEGF-mediated tumour
angiogenesis in metastatic melanoma.Br.J.Cancer111,477–485(2014).
54.Mavrou,A.et al.Serine–arginine protein kinase 1(SRPK1)inhibition
as a potential novel targeted therapeutic strategy in prostate
cancer.Oncogene34,4311–4319(2015).
55.Tang,R.-X.et al.Identification of a RNA-Seq based prognostic
signature with five incRNAs for lung squamous cell carcinoma.Oncotarget8,
50761–50773(2017).
56.Chen,X.et al.IncRNA ZNF503-AS1 promotes RPE differentiation by
downregulating ZNF503 expression.Cell Death Dis.8,e3046(2017).
57.Perron,U.,Provero,P.&Molineris,I.In silico prediction of incRNA
function using tissue specific and evolutionary conserved expression.BMC
Bioinformatics18,(2017).
58.Chen,W.-H.,Lu,G.,Chen,X.,Zhao,X.-M.&Bork,P.OGEE v2:an update of
the online gene essentiality database with special focus on differentially
essential genes in human cancer cell lines.Nucleic Acids Res.45,D940–D944
(2017).
59.Wang,H.et al.Comprehensive analysis of aberrantly expressed
profiles of incRNAs and miRNAs with associated ceRNA network in muscle-
invasive bladder cancer.Oncotarget7,86174–86185(2016).
60.Lackey,L.,Coria,A.,Woods,C.,McArthur,E.&Laederach,A.Allele-
specific SHAPE-MaP assessment of the effects of somatic variation and protein
binding on mRNA structure.RNA24,513–528(2018).
61.Ouyang,Z.,Snyder,M.P.&Chang,H.Y.SeqFold:Genome-scale
reconstruction of RNA secondary structure integrating high-throughput
sequencing data.Genome Res.23,377–387(2013).
62.Woods,C.T.&Laederach,A.Classification of RNA structure change by
‘gazing’at experimental data.Bioinformatics33,1647–1655(2017).
63.Lanzós,A.et al.Discovery of Cancer Driver Long Noncoding RNAs
across 1112 Tumour Genomes:New Candidates and Distinguishing
Features.Sci.Rep.7,(2017).
64.Mularoni,L.,Sabarinathan,R.,Deu-Pons,J.,Gonzalez-Perez,A.&López-
Bigas,N.OncodriveFML:a general framework to identify coding and non-coding
regions with cancer driver mutations.Genome Biol.17,(2016).
65.Baytak,E.et al.Whole transcriptome analysis reveals dysregulated
oncogenic incRNAs in natural killer/T-cell lymphoma and establishes MIR155HG
as a target of PRDM1.Tumor Biol.39,1010428317701648(2017).
66.Li,Y.et al.IncRNA ontology:inferring incRNA functions based on
chromatin states and expression patterns.Oncotarget6,39793–39805(2015).
Claims (10)
1. being rich in or lacking the method for riboSnitch element in a kind of prediction cancer gene group, comprising:
Using the mutation of prediction as prediction group;Using the mutation actually occurred in cancer as observation group, wherein different patients are same
The same mutation that one site occurs is by separate counts;The practical mutant nucleotide sequence of the mutant nucleotide sequence and observation group of calculating prediction group
RNA secondary structure;
MeanDiff value and EucDiff value, the calculation equation for calculating separately each mutation in two groups are as follows;
Wherein k is the position being mutated in transcript, and w is window size, BPPref,iAnd BPPalt,iIt respectively represents reference sequences and dashes forward
Become i-th of base-pair probability of sequence, the value range of i is [k-w, k+w];
The MeanDiff value of all mutation and EucDiff value are sorted from large to small respectively, MeanDiff value and EucDiff value are most
The intersection of high preceding 2.5% mutation is prediction group and the corresponding riboSNitch of observation group;
The riboSNitch number of comparison prediction group and observation group carries out unilateral Fisher and accurately examines and hypergeometric distribution inspection
It tests, to identify significant enrichment riboSNitch, and obtains false discovery rate (FDR) value using the p value that the correction of BH method is examined;School
After positive FDR, the result of FDR < 0.05 is considered as the element for being rich in or lacking riboSNitch.
2. the method for claim 1 wherein the preparation method of the mutant nucleotide sequence of prediction group is the mutation composed according to intragenic mutation
Rate three base number corresponding with each transcript, obtains the random mutation number of each three base situation of each transcript,
Carry out duplicate random sampling according to mutation number, to the transcript sequence of each gene, according to neutral mutation rate carry out with
Machine mutation, to obtain prediction group;The neutral mutation rate in cancer gene group is wherein indicated using intragenic mutation spectrum.
3. in any one of preceding the method for claim, wherein w is 2bp, 5bp, 10bp, 15bp, 20bp, 25bp, 50bp or
200bp, preferably 200bp.
4. wherein random mutation is carried out for noncoding region in any one of preceding the method for claim, preferably noncoding region is 5 '
UTR, 3 ' UTR and/or IncRNA.
5. method for claim 4, wherein random mutation number is 1000 times.
6. in any one of preceding the method for claim, used in mutant nucleotide sequence data come from ICGC database, TCGA data
Library, thousand human genome databases or other somatic mutation data.
7. in any one of preceding the method for claim, wherein further comprise by cancer gene group riboSNitch and other
Whether riboSNitch is compared, accurately examined using unilateral Fisher to determine riboSNitch in cancer gene group region
Middle enrichment, P value is less than 10-3All elements be considered as cancer specific enrichment or missing riboSNitch element.
8. in any one of preceding the method for claim, wherein using the RNA secondary structure of the RNAplfold sequence of calculation.
9. wherein w is 200bp, and the window of crossing over of single base is 150bp in any one of preceding the method for claim.
10. a kind of equipment for predicting to be rich in or lack riboSnitch element in cancer gene group, comprising:
Processor, including the module for obtaining sequence from database;For calculate MeanDiff described in claim 1 and
The computing module of EucDiff value;Sorting module for sorting to MeanDiff the and EucDiff value;For the two to be arranged
Sequence result takes the module of intersection;For the module of comparison prediction group and the riboSNitch number of observation group, for testing
Module;For identifying the module of the element of enrichment or missing riboSNitch;Output module for exporting result;And
Memory, is stored thereon with instruction, and described instruction executes the processor according to right
It is required that method described in any one of 1-9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910484578.1A CN110265084A (en) | 2019-06-05 | 2019-06-05 | The method and relevant device of riboSnitch element are rich in or lacked in prediction cancer gene group |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910484578.1A CN110265084A (en) | 2019-06-05 | 2019-06-05 | The method and relevant device of riboSnitch element are rich in or lacked in prediction cancer gene group |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110265084A true CN110265084A (en) | 2019-09-20 |
Family
ID=67916793
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910484578.1A Pending CN110265084A (en) | 2019-06-05 | 2019-06-05 | The method and relevant device of riboSnitch element are rich in or lacked in prediction cancer gene group |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110265084A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111705133A (en) * | 2020-07-15 | 2020-09-25 | 南京凡亦达生物科技有限公司 | Application of LncRNAs in preparation of primary liver cancer early diagnosis kit |
CN112687329A (en) * | 2019-10-17 | 2021-04-20 | 中国科学技术大学 | Cancer prediction system based on non-cancer tissue mutation information and construction method thereof |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105528532A (en) * | 2014-09-30 | 2016-04-27 | 深圳华大基因科技有限公司 | A feature analysis method for RNA editing sites |
CN106755377A (en) * | 2016-12-12 | 2017-05-31 | 浙江省中医院 | A kind of gastric cancer serum Testing and appraisal kit and method |
CN107109698A (en) * | 2014-09-22 | 2017-08-29 | 加利福尼亚大学董事会 | RNA STITCH are sequenced:For RNA in directly mapping cell:The measure of RNA interactions |
US20190147975A1 (en) * | 2016-04-07 | 2019-05-16 | Dana-Farber Cancer Institute, Inc. | Sf3b1 suppression as a therapy for tumors harboring sf3b1 copy loss |
-
2019
- 2019-06-05 CN CN201910484578.1A patent/CN110265084A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107109698A (en) * | 2014-09-22 | 2017-08-29 | 加利福尼亚大学董事会 | RNA STITCH are sequenced:For RNA in directly mapping cell:The measure of RNA interactions |
CN105528532A (en) * | 2014-09-30 | 2016-04-27 | 深圳华大基因科技有限公司 | A feature analysis method for RNA editing sites |
US20190147975A1 (en) * | 2016-04-07 | 2019-05-16 | Dana-Farber Cancer Institute, Inc. | Sf3b1 suppression as a therapy for tumors harboring sf3b1 copy loss |
CN106755377A (en) * | 2016-12-12 | 2017-05-31 | 浙江省中医院 | A kind of gastric cancer serum Testing and appraisal kit and method |
Non-Patent Citations (1)
Title |
---|
FUNAN HE ET AL.: "Integrative Analysis of Somatic Mutations in Non-coding Regions Altering RNA Secondary Structures in Cancer Genomes", 《SCIENTIFIC REPORTS》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112687329A (en) * | 2019-10-17 | 2021-04-20 | 中国科学技术大学 | Cancer prediction system based on non-cancer tissue mutation information and construction method thereof |
CN112687329B (en) * | 2019-10-17 | 2024-05-17 | 中国科学技术大学 | Cancer prediction system based on non-cancer tissue mutation information and construction method thereof |
CN111705133A (en) * | 2020-07-15 | 2020-09-25 | 南京凡亦达生物科技有限公司 | Application of LncRNAs in preparation of primary liver cancer early diagnosis kit |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Prensner et al. | Noncanonical open reading frames encode functional proteins essential for cancer cell survival | |
Grant et al. | Identification of cell cycle–regulated genes periodically expressed in U2OS cells and their regulation by FOXM1 and E2F transcription factors | |
Takeda et al. | A three-dimensional RNA motif in Potato spindle tuber viroid mediates trafficking from palisade mesophyll to spongy mesophyll in Nicotiana benthamiana | |
Conte et al. | An improved genome reference for the African cichlid, Metriaclima zebra | |
Chaillou | Ribosome specialization and its potential role in the control of protein translation and skeletal muscle size | |
Han et al. | Alternative applications for distinct RNA sequencing strategies | |
CN110265084A (en) | The method and relevant device of riboSnitch element are rich in or lacked in prediction cancer gene group | |
CN110310701A (en) | Based on EucDiff value prediction mutation to the method and relevant device of RNA secondary structure influence degree | |
Yang et al. | SARS‐CoV‐2, SARS‐CoV, and MERS‐CoV encode circular RNAs of spliceosome‐independent origin | |
Lopes-Ramos et al. | Regulatory network of PD1 signaling is associated with prognosis in glioblastoma multiforme | |
Liu et al. | Wnt-regulated lncRNA discovery enhanced by in vivo identification and CRISPRi functional validation | |
Chen et al. | Incorporating the human gene annotations in different databases significantly improved transcriptomic and genetic analyses | |
Imada et al. | Transcriptional landscape of PTEN loss in primary prostate cancer | |
Rangan et al. | De novo 3D models of SARS-CoV-2 RNA elements and small-molecule-binding RNAs to aid drug discovery | |
Esteve-Codina | RNA-seq data analysis, applications and challenges | |
Zhang et al. | Time to infer miRNA sponge modules | |
CN114388063B (en) | Non-differential gene associated with malignant phenotype of tumor cell and screening method and application thereof | |
Li et al. | Comprehensive profiling of epigenetic modifications in fast-growing Moso bamboo shoots | |
CN110415766A (en) | It is a kind of to predict mutation to the method and relevant device of RNA secondary structure influence degree | |
CN110299186A (en) | Based on MeanDiff value prediction mutation to the method and relevant device of RNA secondary structure influence degree | |
CN113564162A (en) | Homologous recombination repair gene capture probe set, kit and application thereof | |
Lorenzini et al. | KAP1 is a new non-genetic vulnerability of malignant pleural mesothelioma (MPM) | |
Qi et al. | LncRNA SBF2-AS1 inhibits apoptosis and promotes proliferation in lung cancer cell via regulating FOXM1 | |
WO2023283600A1 (en) | Method for analyzing an ability of target nucleic acid sequences to impact gene expression | |
Diensthuber et al. | Enhanced detection of RNA modifications and mappability with high-accuracy nanopore RNA basecalling models |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190920 |
|
RJ01 | Rejection of invention patent application after publication |