CN111028887A - Method and device for identifying ncRNA (non-coding ribonucleic acid) cooperative competition network - Google Patents
Method and device for identifying ncRNA (non-coding ribonucleic acid) cooperative competition network Download PDFInfo
- Publication number
- CN111028887A CN111028887A CN201911229601.9A CN201911229601A CN111028887A CN 111028887 A CN111028887 A CN 111028887A CN 201911229601 A CN201911229601 A CN 201911229601A CN 111028887 A CN111028887 A CN 111028887A
- Authority
- CN
- China
- Prior art keywords
- ncrna
- competition
- mrna
- cooperative
- cooperative competition
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 56
- 229920002477 rna polymer Polymers 0.000 title abstract description 6
- 108020004999 messenger RNA Proteins 0.000 claims abstract description 141
- 201000010099 disease Diseases 0.000 claims abstract description 112
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims abstract description 112
- 230000014509 gene expression Effects 0.000 claims abstract description 94
- 108020004417 Untranslated RNA Proteins 0.000 claims abstract description 75
- 102000039634 Untranslated RNA Human genes 0.000 claims abstract description 75
- 230000036961 partial effect Effects 0.000 claims description 39
- 238000012360 testing method Methods 0.000 claims description 32
- 230000035945 sensitivity Effects 0.000 claims description 30
- 238000004422 calculation algorithm Methods 0.000 claims description 22
- 230000004083 survival effect Effects 0.000 claims description 17
- 239000000090 biomarker Substances 0.000 claims description 15
- 230000002860 competitive effect Effects 0.000 claims description 15
- 108091008109 Pseudogenes Proteins 0.000 claims description 9
- 102000057361 Pseudogenes Human genes 0.000 claims description 9
- 108091028075 Circular RNA Proteins 0.000 claims description 8
- 108091046869 Telomeric non-coding RNA Proteins 0.000 claims description 6
- 238000001325 log-rank test Methods 0.000 claims description 6
- 108090000623 proteins and genes Proteins 0.000 abstract description 23
- 206010028980 Neoplasm Diseases 0.000 abstract description 14
- 201000011510 cancer Diseases 0.000 abstract description 14
- 238000003759 clinical diagnosis Methods 0.000 abstract description 8
- 201000010915 Glioblastoma multiforme Diseases 0.000 description 39
- 208000005017 glioblastoma Diseases 0.000 description 39
- 108020005198 Long Noncoding RNA Proteins 0.000 description 26
- 206010033128 Ovarian cancer Diseases 0.000 description 13
- 206010061535 Ovarian neoplasm Diseases 0.000 description 13
- 238000010586 diagram Methods 0.000 description 13
- 238000004364 calculation method Methods 0.000 description 12
- 230000006870 function Effects 0.000 description 12
- 102000048850 Neoplasm Genes Human genes 0.000 description 8
- 108700019961 Neoplasm Genes Proteins 0.000 description 8
- 238000012545 processing Methods 0.000 description 8
- 206010060862 Prostate cancer Diseases 0.000 description 7
- 208000000236 Prostatic Neoplasms Diseases 0.000 description 7
- 230000015654 memory Effects 0.000 description 7
- 238000004458 analytical method Methods 0.000 description 6
- 201000005243 lung squamous cell carcinoma Diseases 0.000 description 6
- 238000007781 pre-processing Methods 0.000 description 6
- 238000004590 computer program Methods 0.000 description 4
- 230000034994 death Effects 0.000 description 4
- 238000011156 evaluation Methods 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000011223 gene expression profiling Methods 0.000 description 3
- 230000003993 interaction Effects 0.000 description 3
- 238000012007 large scale cell culture Methods 0.000 description 3
- 108091070501 miRNA Proteins 0.000 description 3
- 239000002679 microRNA Substances 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000001105 regulatory effect Effects 0.000 description 3
- 230000002195 synergetic effect Effects 0.000 description 3
- 206010006187 Breast cancer Diseases 0.000 description 2
- 208000026310 Breast neoplasm Diseases 0.000 description 2
- 108091026815 Competing endogenous RNA (CeRNA) Proteins 0.000 description 2
- 206010058467 Lung neoplasm malignant Diseases 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000010201 enrichment analysis Methods 0.000 description 2
- 238000010195 expression analysis Methods 0.000 description 2
- 238000007689 inspection Methods 0.000 description 2
- 201000005202 lung cancer Diseases 0.000 description 2
- 208000020816 lung neoplasm Diseases 0.000 description 2
- 238000012045 magnetic resonance elastography Methods 0.000 description 2
- 108091027963 non-coding RNA Proteins 0.000 description 2
- 102000042567 non-coding RNA Human genes 0.000 description 2
- 102000004169 proteins and genes Human genes 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000002626 targeted therapy Methods 0.000 description 2
- 230000008685 targeting Effects 0.000 description 2
- 108091032955 Bacterial small RNA Proteins 0.000 description 1
- 108700011259 MicroRNAs Proteins 0.000 description 1
- 108091027981 Response element Proteins 0.000 description 1
- 241000251774 Squalus Species 0.000 description 1
- 230000006907 apoptotic process Effects 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000031018 biological processes and functions Effects 0.000 description 1
- 230000033228 biological regulation Effects 0.000 description 1
- 239000003990 capacitor Substances 0.000 description 1
- 230000024245 cell differentiation Effects 0.000 description 1
- 230000010261 cell growth Effects 0.000 description 1
- 230000012292 cell migration Effects 0.000 description 1
- 230000004663 cell proliferation Effects 0.000 description 1
- 238000000546 chi-square test Methods 0.000 description 1
- 230000002153 concerted effect Effects 0.000 description 1
- 235000019800 disodium phosphate Nutrition 0.000 description 1
- 210000004072 lung Anatomy 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000000101 novel biomarker Substances 0.000 description 1
- 102000039446 nucleic acids Human genes 0.000 description 1
- 108020004707 nucleic acids Proteins 0.000 description 1
- 150000007523 nucleic acids Chemical class 0.000 description 1
- 239000002773 nucleotide Substances 0.000 description 1
- 125000003729 nucleotide group Chemical group 0.000 description 1
- 229940099990 ogen Drugs 0.000 description 1
- 238000004806 packaging method and process Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B25/00—ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
- G16B25/10—Gene or protein expression profiling; Expression-ratio estimation or normalisation
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B15/00—ICT specially adapted for analysing two-dimensional or three-dimensional molecular structures, e.g. structural or functional relations or structure alignment
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B40/00—ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16B—BIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
- G16B50/00—ICT programming tools or database systems specially adapted for bioinformatics
Landscapes
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Engineering & Computer Science (AREA)
- Medical Informatics (AREA)
- Spectroscopy & Molecular Physics (AREA)
- Biophysics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Biotechnology (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Theoretical Computer Science (AREA)
- Bioethics (AREA)
- Databases & Information Systems (AREA)
- Genetics & Genomics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Chemical & Material Sciences (AREA)
- Crystallography & Structural Chemistry (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Data Mining & Analysis (AREA)
- Epidemiology (AREA)
- Evolutionary Computation (AREA)
- Public Health (AREA)
- Software Systems (AREA)
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention provides a method and a device for identifying ncRNA (non-coding ribonucleic acid) cooperative competition networks, and relates to the technical field of gene identification. According to the invention, by acquiring ncRNA and mRNA expression profile data of a target disease type matching sample, and according to the ncRNA and mRNA expression profile data and preset ncRNA-mRNA competition relationship data, determining ncRNA-ncRNA pairs consisting of two ncRNAs meeting preset conditions in the ncRNA and mRNA expression profile data as a cooperative competition relationship pair, recognition of the ncRNA-ncRNA cooperative competition relationship pair can be realized, a ncRNA cooperative competition network formed by multiple ncRNA cooperations and mRNA competition is recognized, and reference can be provided for clinical diagnosis and targeted treatment of target genes of human complex diseases such as cancer.
Description
Technical Field
The invention relates to the technical field of gene identification, in particular to a method and a device for identifying ncRNA (non-coding ribonucleic acid) cooperative competition networks.
Background
Micro ribonucleic acid (microRNA, miRNA) is a non-coding small RNA regulating molecule with the endogenous length of about 22 nucleotides, and can regulate the expression level of messenger RNA (mRNA) of protein coding genes. The existing research shows that: miRNA plays an important role in regulation and control in biological processes such as cell differentiation, cell proliferation, cell growth, cell migration, cell apoptosis and cancer. According to the hypothesis of endogenous competing RNAs (cernas), gene expression is regulated by the mutual competition between different gene transcripts of response elements (MREs) of mirnas. These transcripts with competitive relationship are collectively called cerRNA, and include mRNA encoding protein, long-strand non-coding RNA (lncRNA), pseudogene transcript (pseudogene), and circular RNA (circRNA), etc., and the formed RNA regulatory network is called the cerRNA interaction network.
The CERNA interaction network is closely related to a plurality of human complex diseases (such as cancer), can be used as a novel biomarker for diagnosing and targeting treatment of the human complex diseases, and provides reference for clinical diagnosis and targeting treatment of the human complex diseases such as cancer.
In general, in a non-coding RNA (ncRNA) -linked ceRNA interaction network, the competition relationship between ncRNA and target gene mRNA is a many-to-many relationship. This competition relationship indicates that: multiple ncRNAs can compete with the target gene mRNA cooperatively, and a ncRNA cooperative competition network is formed. While the research on the cooperative competition relationship in the ncRNA cooperative competition network can help to understand the cooperative competition mechanism of ncRNA in human complex diseases, no feasible scheme in the prior art can be used for identifying the ncRNA cooperative competition network.
Disclosure of Invention
The invention aims to provide a method and a device for identifying a ncRNA (non-coding ribonucleic acid) cooperative competition network, which can screen the ncRNA cooperative competition network associated with complex diseases and provide reference for clinical diagnosis and targeted treatment of human complex diseases such as cancer.
In a first aspect, an embodiment of the present invention provides a method for identifying an ncRNA cooperative competition network, including: acquiring ncRNA and mRNA expression profile data of a target disease type matching sample; and determining ncRNA-ncRNA pairs consisting of two ncRNAs meeting preset conditions in the ncRNA and mRNA expression profile data to form a cooperative competition relationship pair according to the ncRNA and mRNA expression profile data and preset ncRNA-mRNA competition relationship data.
In an optional embodiment, determining that ncRNA-ncRNA pairs composed of two ncrnas satisfying a preset condition in the ncRNA and mRNA expression profile data are paired into a cooperative competition relationship pair according to the ncRNA and mRNA expression profile data and preset ncRNA-mRNA competition relationship data includes: obtaining ncRNA and mRNA expression profile data of ncRNA1And ncRNA2Composed ncRNA1-ncRNA2Pairing; calculating the ncRNA according to preset ncRNA-mRNA competition relation data1-ncRNA2Pairing corresponding cooperative competition mRNA statistical significance probability values, positive correlation significance probability values and sensitivity partial correlation coefficient values; if ncRNA1-ncRNA2When the pairing simultaneously satisfies that the statistical significance probability value of the cooperative competition mRNA is smaller than a first threshold value, the positive correlation significance probability value is smaller than a second threshold value, and the sensitivity partial correlation coefficient value is larger than a third threshold value, determining that the ncRNA1-ncRNA2The pairing is a cooperative competition relationship pair.
In alternative embodiments, ncRNA is calculated1-ncRNA2Pairing corresponding cooperative competing mRNA statistical significance probability values, comprising: measuring the ncRNA by adopting a hyper-geometric distribution test algorithm according to preset ncRNA-mRNA competitive relation data1-ncRNA2Alignment of ncRNA1And ncRNA2The probability value of statistical significance of the cooperative competition mRNA.
In alternative embodiments, ncRNA is calculated1-ncRNA2Pairing corresponding positive correlation significance probability values comprising: calculating ncRNA1-ncRNA2Alignment of ncRNA1And ncRNA2Pearson correlation coefficient therebetween; and calculating to obtain a positive correlation significance probability value according to the Pearson correlation coefficient.
In alternative embodiments, ncRNA is calculated1-ncRNA2Pairing corresponding sensitivity partial correlation coefficient values, comprising: according to ncRNA1-ncRNA2Alignment of ncRNA1And ncRNA2Correlation coefficient values between and corresponding ncRNA under mRNA conditions1-ncRNA2Alignment of ncRNA1And ncRNA2And calculating to obtain sensitivity partial correlation coefficient values.
In alternative embodiments, the ncRNA1-ncRNA2When the pairing simultaneously satisfies that the statistical significance probability value of the cooperative competition mRNA is smaller than a first threshold value, the positive correlation significance probability value is smaller than a second threshold value, and the sensitivity partial correlation coefficient value is larger than a third threshold value, determining the ncRNA1-ncRNA2Pairing is a cooperative competition relationship pair, including: if the statistical significance probability value of the cooperative competition mRNA is less than 0.05, the positive correlation significance probability value is less than 0.05 and the sensitivity partial correlation coefficient value is more than 0.1 are simultaneously satisfied, determining the ncRNA1-ncRNA2The pairing is a cooperative competition relationship pair.
In an optional embodiment, before determining that ncRNA-ncRNA pairs composed of two ncrnas satisfying a preset condition in the ncRNA and mRNA expression profile data are paired into a cooperative competition relationship pair according to the ncRNA and mRNA expression profile data and preset ncRNA-mRNA competition relationship data, the method further includes: and acquiring prior ncRNA-mRNA competition network data associated with the ncRNA and mRNA expression profile data of the target disease type matching sample by fusing a plurality of different databases to obtain ncRNA-mRNA competition relationship data.
In an optional embodiment, before determining that ncRNA-ncRNA pairs composed of two ncrnas satisfying a preset condition in the ncRNA and mRNA expression profile data are paired into a cooperative competition relationship pair according to the ncRNA and mRNA expression profile data and preset ncRNA-mRNA competition relationship data, the method further includes: and preprocessing the ncRNA and mRNA expression profile data, and removing repeated items in the ncRNA and mRNA expression profile data and ncRNA and mRNA without gene names.
In an alternative embodiment, the method further comprises: pairing ncRNAs identified as cooperative competition pairs in the following manner1-ncRNA2And (3) evaluating the ncRNA-ncRNA cooperative competition network formed by pairing: 1) fitting whether the connectivity of the ncRNA-ncRNA cooperative competition network obeys power law distribution or not to determine whether the ncRNA-ncRNA cooperative competition network belongs to a nonstandard network or not; 2) determining nodes with the connectivity higher than the first 10% in the ncRNA-ncRNA cooperative competition network as pivot nodes; 3) determination of ncRNAs with both ncRNAs associated with a target disease type1-ncRNA2Pairing is a ncRNA-ncRNA cooperative competition pair corresponding to the target disease type; 4) identifying the ncRNA-ncRNA cooperative competition module by utilizing a Markov clustering algorithm based on the ncRNA-ncRNA cooperative competition network; 5) determining a ncRNA-ncRNA cooperative competition module with a significance probability value related to the functionality of the target disease type less than 0.05 as a ncRNA-ncRNA cooperative competition module corresponding to the target disease type according to the ncRNAs related to the prior target disease type and a hyper-geometric distribution inspection algorithm; 6) for each ncRNA-ncRNA cooperative competition module, calculating a risk value of each target disease type matching sample by using a multivariate Cox model; dividing the target disease type matching sample into a high risk sample set and a low risk sample set according to the risk value of the target disease type matching sample; calculating a risk value according to the high risk sample set and the low risk sample set; calculating significance probability values of the difference of the survival times of the high risk sample set and the low risk sample set according to a logarithmic rank test algorithm to obtain a logarithmic rank test significance value; and determining the ncRNA-ncRNA cooperative competition module with the risk value of more than 1 and the significance probability value of the logarithmic rank test of less than 0.05 as the biomarker of the target disease type.
In alternative embodiments, the target disease type includes any of the following: glioblastoma multiforme, squamous cell carcinoma of the lung, ovarian cancer, and prostate cancer.
In alternative embodiments, the ncRNA comprises any one of: long non-coding RNA, circular RNA, and pseudogenes.
In a second aspect, an embodiment of the present invention provides an apparatus for identifying ncRNA cooperative competition network, including: the acquisition module is used for acquiring ncRNA and mRNA expression profile data of the target disease type matching sample; and the recognition module is used for determining that the ncRNA-ncRNA pair formed by two ncRNAs meeting the preset conditions in the ncRNA and mRNA expression profile data is a cooperative competition relationship pair according to the ncRNA and mRNA expression profile data and the preset ncRNA-mRNA competition relationship data.
In an alternative embodiment, the identification module comprises: an acquisition submodule for acquiring ncRNA and ncRNA in mRNA expression profile data1And ncRNA2Composed ncRNA1-ncRNA2Pairing; a calculation submodule for calculating the ncRNA according to the preset ncRNA-mRNA competition relation data1-ncRNA2Matching corresponding cooperative competition mRNA statistical significance probability value, positive correlation significance probability value and sensitivity partial correlation coefficient value; recognition submodule for rogowski RNA1-ncRNA2When the pairing simultaneously satisfies that the statistical significance probability value of the cooperative competition mRNA is less than a first threshold value, the positive correlation significance probability value is less than a second threshold value, and the sensitivity partial correlation coefficient value is greater than a third threshold value, determining the ncRNA1-ncRNA2The pairing is a cooperative competition relationship pair.
In an alternative embodiment, the calculation submodule is specifically configured to measure the ncRNA by using a hyper-geometric distribution test algorithm according to preset ncRNA-mRNA competition relationship data1-ncRNA2Alignment of ncRNA1And ncRNA2The probability value of statistical significance of the cooperative competition mRNA.
In an alternative embodiment, the calculation submodule, in particular for calculating ncRNA1-ncRNA2Alignment of ncRNA1And ncRNA2Pearson correlation coefficient therebetween; and calculating to obtain a positive correlation significance probability value according to the Pearson correlation coefficient.
In an alternative embodiment, the computational submodule, in particular for computing the function of ncRNA1-ncRNA2Alignment of ncRNA1And ncRNA2Correlation coefficient values between and corresponding ncRNA under mRNA conditions1-ncRNA2Alignment of ncRNA1And ncRNA2The value of the partial correlation coefficient between them, calculatingAnd acquiring sensitivity partial correlation coefficient values.
In an alternative embodiment, the identification submodule is specifically configured to determine the ncRNA if the synergistic competitive mRNA statistical significance probability value is less than 0.05, the positive correlation significance probability value is less than 0.05, and the sensitivity partial correlation coefficient value is greater than 0.1 are simultaneously satisfied1-ncRNA2The pairing is a cooperative competition relationship pair.
In an alternative embodiment, the apparatus further comprises: and the competition data module is used for obtaining the prior ncRNA-mRNA competition network data related to the ncRNA and mRNA expression profile data of the target disease type matching sample by fusing a plurality of different databases before the recognition module determines that the ncRNA-ncRNA pair consisting of two ncRNAs meeting the preset conditions in the ncRNA and mRNA expression profile data is a cooperative competition relationship pair according to the ncRNA and mRNA expression profile data and the preset ncRNA-mRNA competition relationship data.
In an alternative embodiment, the apparatus comprises: and the preprocessing module is used for preprocessing the ncRNA and mRNA expression profile data and removing repeated items in the ncRNA and mRNA expression profile data and ncRNA and mRNA without gene names before the recognition module determines that the ncRNA-ncRNA pair consisting of two ncRNAs meeting preset conditions in the ncRNA and mRNA expression profile data is a cooperative competition relationship pair according to the ncRNA and mRNA expression profile data and preset ncRNA-mRNA competition relationship data.
In an alternative embodiment, the apparatus further comprises: an evaluation module for pairing ncRNAs determined as cooperative competition pairs in the following manner1-ncRNA2And (3) evaluating the ncRNA-ncRNA cooperative competition network formed by pairing: 1) fitting whether the connectivity of the ncRNA-ncRNA cooperative competition network obeys power law distribution or not to determine whether the ncRNA-ncRNA cooperative competition network belongs to a scale-free network or not; 2) determining nodes with the connectivity higher than the first 10% in the ncRNA-ncRNA cooperative competition network as pivot nodes; 3) determining ncRNAs for which both ncRNAs are associated with a target disease type1-ncRNA2Pairing is a ncRNA-ncRNA cooperative competition pair corresponding to the target disease type; 4) based on the ncRNA-ncRNA cooperative competition network,identifying the ncRNA-ncRNA cooperative competition module by utilizing a Markov clustering algorithm; 5) determining a ncRNA-ncRNA cooperative competition module with a significance probability value related to the functionality of the target disease type less than 0.05 as a ncRNA-ncRNA cooperative competition module corresponding to the target disease type according to the ncRNAs related to the prior target disease type and a hyper-geometric distribution inspection algorithm; 6) for each ncRNA-ncRNA cooperative competition module, calculating a risk value of each target disease type matching sample by using a multivariate Cox model; dividing the target disease type matching sample into a high risk sample set and a low risk sample set according to the risk value of the target disease type matching sample; calculating a risk value according to the high risk sample set and the low risk sample set; calculating significance probability values of the difference of the survival times of the high-risk sample set and the low-risk sample set according to a logarithmic rank test algorithm to obtain logarithmic rank test significance probability values; and determining the ncRNA-ncRNA cooperative competition module with the risk value of more than 1 and the significance probability value of the logarithmic rank test of less than 0.05 as the biomarker of the target disease type.
In alternative embodiments, the target disease type includes any of the following: glioblastoma multiforme, squamous cell carcinoma of the lung, ovarian cancer, and prostate cancer.
In alternative embodiments, the ncRNA comprises any one of: long non-coding RNA, circular RNA, and pseudogenes.
In a third aspect, an embodiment of the present invention provides an ncRNA cooperative competition network identification device, including: the network identification device comprises a processor, a storage medium and a bus, wherein the storage medium stores machine-readable instructions executable by the processor, when the ncRNA cooperative competition network identification device runs, the processor and the storage medium communicate through the bus, and the processor executes the machine-readable instructions to execute the method of the first aspect.
In a fourth aspect, the present invention further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the method according to the first aspect.
The invention has the beneficial effects that:
according to the invention, by acquiring ncRNA and target gene mRNA expression profile data of a target disease type matching sample, and according to the ncRNA and mRNA expression profile data and preset ncRNA-mRNA competition relationship data, determining ncRNA-ncRNA pairs consisting of two ncRNAs meeting preset conditions in the ncRNA and mRNA expression profile data as a cooperative competition relationship pair, recognition of the ncRNA-ncRNA cooperative competition relationship pair can be realized, and a ncRNA cooperative competition network formed by multiple ncRNA cooperations and mRNA target gene competition can be recognized, so that reference can be provided for clinical diagnosis and targeted treatment of human complex diseases such as cancer.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a schematic flow chart of a ncRNA cooperative competition network identification method according to an embodiment of the present invention;
FIG. 2 is another schematic flow chart of the ncRNA cooperative competition network identification method provided in the embodiment of the present invention;
FIG. 3 is a schematic flow chart of a ncRNA cooperative competition network identification method according to an embodiment of the present invention;
FIG. 4 is a schematic flow chart of a ncRNA cooperative competition network identification method according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of an ncRNA cooperative competition network identification apparatus according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of an identification module according to an embodiment of the present invention;
fig. 7 is another schematic structural diagram of a ncRNA cooperative competition network identification apparatus according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of a ncRNA cooperative competition network recognition apparatus according to an embodiment of the present invention;
fig. 9 is a schematic structural diagram of a ncRNA cooperative competition network recognition apparatus according to an embodiment of the present invention;
fig. 10 is a schematic structural diagram of a ncRNA cooperative competition network identification device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations.
Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without any inventive step, are within the scope of the present invention.
It should be noted that: like reference numbers and letters denote like items in the following figures and formulas, and thus, once an item is defined in one figure or formula, it need not be further defined and explained in subsequent figures or formulas. It should also be noted that the descriptions of first, second, third, etc. are merely used for distinguishing and are not intended to indicate relative importance.
Some embodiments of the invention are described in detail below with reference to the accompanying drawings. The embodiments described below and the features of the embodiments can be combined with each other without conflict.
The embodiment of the invention provides a method for identifying an ncRNA (non-reference nucleic acid) collaborative competition network, wherein an execution main body of the method for identifying the ncRNA collaborative competition network can be a terminal device with computing capability, such as: a desktop computer, a notebook computer, a server, a cloud terminal, a customized terminal or an intelligent terminal, etc., which are not limited herein.
Fig. 1 is a schematic flow diagram of a ncRNA cooperative competition network identification method according to an embodiment of the present invention, and as shown in fig. 1, the ncRNA cooperative competition network identification method may include:
and S110, acquiring ncRNA and mRNA expression profile data of the target disease type matching sample.
Wherein the target disease type may include any one of: glioblastoma multiforme, squamous cell carcinoma of the lung, ovarian cancer, and prostate cancer, the present invention is not particularly limited as to the type of disease targeted.
Taking the target disease type as Glioblastoma multiforme (GBM) and ncRNA as lncRNA as an example, the ncRNA and mRNA expression profile data of the target disease type matching sample can be obtained as follows: lncRNA and mRNA expression profiling data of glioblastoma multiforme matched samples were collected from The international famous Cancer gene expression profiling database Cancer Genome map (The Cancer Genome Atlas, TCGA). Wherein, the address of TCGA is https:// cancerrgeneme.
And S120, determining that the ncRNA-ncRNA pair formed by two ncRNAs meeting preset conditions in the ncRNA and mRNA expression profile data is a cooperative competition relationship pair according to the ncRNA and mRNA expression profile data and preset ncRNA-mRNA competition relationship data.
In an optional embodiment, before determining that ncRNA-ncRNA pairs composed of two ncrnas satisfying a preset condition in the ncRNA and mRNA expression profile data are paired into a cooperative competition relationship pair according to the ncRNA and mRNA expression profile data and preset ncRNA-mRNA competition relationship data, the method may further include: and acquiring prior ncRNA-mRNA competitive network data associated with the ncRNA and mRNA expression profile data of the target disease type matching sample by fusing a plurality of different databases, thereby obtaining the ncRNA-mRNA competitive relationship data of the ncRNA-mRNA competitive relationship.
For example, a plurality of different databases can be fused in advance to obtain prior ncRNA-mRNA competition network data associated with ncRNA and mRNA expression profile data of a target disease type matching sample, so as to obtain prior ncRNA-mRNA competition relationship.
The ncRNA-mRNA competition relationship refers to the competition relationship between ncRNA and mRNA of shared MREs, and the prior ncRNA-mRNA competition network data can be computer prediction type or experimental verification type data. The competing network data used may originate from a single database, or merge multiple different databases.
Also taking the aforementioned glioblastoma multiforme as an example, the lncRNA-mRNA competition relationship pairs associated with the glioblastoma multiforme expression profile data can be obtained by integrating the four databases of miRSponge, LncCeRBase, LncACTdb v2.0 and ENCORI. Lncmnas associated with glioblastoma multiforme can also be collected from three databases lncrnodisease v2.0, Lnc2Cancer v2.0 and MNDR v 2.0.
Fig. 2 is another schematic flow chart of the ncRNA cooperative competition network identification method provided in the embodiment of the present invention, and optionally, as shown in fig. 2, the step S120 may specifically include:
s121, obtaining ncRNA and ncRNA in mRNA expression profile data1And ncRNA2Composed ncRNA1-ncRNA2And (6) pairing.
Wherein the ncRNA1-ncRNA2Alignment of ncRNA1And ncRNA2Are used to represent two ncRNAs of the same type. Alternatively, the type of ncRNA can be any of long non-coding RNA (incrna), circular RNA (circrna), or pseudogene transcript (also known as pseudogene).
Taking ncRNA cooperative competition involving lncRNA, circRNA, and pseudogene as an example, the cooperative competition mode may specifically include the following six types: pseudo-ogen-pseudo-ogen, pseudo-ogen-circRNA, pseudo-ogen-lncRNA, circRNA-circRNA, circRNA-lncRNA, and lncRNA-lncRNA.
S122, calculating the ncRNA according to preset ncRNA-mRNA competition relation data1-ncRNA2The corresponding pair of cooperative competition mRNA statistical significance probability value, positive correlation significance probability value and sensitivity partial correlation coefficient value.
S123, rhonRNA1-ncRNA2When the pairing simultaneously satisfies that the statistical significance probability value of the cooperative competition mRNA is smaller than a first threshold value, the positive correlation significance probability value is smaller than a second threshold value, and the sensitivity partial correlation coefficient value is larger than a third threshold value, determining the ncRNA1-ncRNA2The pairing is a cooperative competition relationship pair.
In an alternative embodiment, the first threshold may be 0.05, the second threshold may be 0.05, the third threshold may be 0.1, and if the statistical significance probability value of the cooperative competition mRNA is less than 0.05, the positive correlation significance probability value is less than 0.05, and the sensitivity partial correlation coefficient value is greater than 0.1, it may be determined that the ncRNA is present1-ncRNA2The pairing is a cooperative competition relationship pair.
Alternatively, the second threshold may be the same as the first threshold, or may be different from the first threshold. It should be noted that the first threshold, the second threshold and the third threshold are only exemplary in the embodiment of the present invention, and in order to improve the accuracy of the ncRNA cooperative competition network identification, a person skilled in the art may set the specific values of the first threshold, the second threshold and the third threshold to other values according to actual needs, for example: the second threshold may also be 0.01, 0.001, etc., and the invention is not limited in this regard.
As described above, according to the embodiment of the present invention, by obtaining ncRNA and target gene mRNA expression profile data of a target disease type matching sample, and determining ncRNA-ncRNA pairs composed of two ncrnas satisfying a preset condition in the ncRNA and mRNA expression profile data as a cooperative competition relationship pair according to the ncRNA and mRNA expression profile data and preset ncRNA-mRNA competition relationship data, it is possible to identify a ncRNA cooperative competition network formed by a plurality of ncrnas in cooperation with the target gene mRNA, and further provide reference for clinical diagnosis and targeted therapy of human complex diseases such as cancer.
In an alternative embodiment, the ncRNA is calculated in step S122 above1-ncRNA2The step of pairing corresponding statistical significance values may comprise: according to the preset ncRNA-mRNA competitive relationshipMeasuring the data by adopting a hyper-geometric distribution test algorithm to measure the ncRNA1-ncRNA2Alignment of ncRNA1And ncRNA2The probability value of statistical significance of the cooperative competition mRNA.
The statistical significance probability value calculation formula may be as follows:
wherein p-value represents ncRNA1And ncRNA2Statistical significance probability values of the interoperable competitive mRNAs; n is a radical of1Represents the number of all mRNAs in the data set, M1And K1Respectively represent ncRNA1And ncRNA2Number of competing mRNAs, L1(the value is usually not less than 3) represents ncRNA1And ncRNA2Number of synergistically competing mRNAs.
Fig. 3 is a schematic flow chart of a ncRNA cooperative competition network identification method according to an embodiment of the present invention.
As shown in FIG. 3, in an alternative embodiment, the ncRNA is calculated in step S122 above1-ncRNA2The step of matching corresponding positive correlation significance probability values may comprise:
s1221, calculating ncRNA1-ncRNA2Alignment of ncRNA1And ncRNA2Pearson correlation coefficient therebetween.
Each ncRNA1-ncRNA2Alignment of ncRNA1And ncRNA2Pearson Correlation (PC) values between them were calculated as follows:
wherein,representation of ncRNA1And ncRNA2Pearson correlation coefficient therebetween; variable x ═ x1,x2,...,xs) And y ═ y1,y2,...,ys) Respectively represent ncRNA1And ncRNA2The amount of gene expression of (a),andrespectively representing the mean expression quantity of variables x and y, and s is the number of matched samples.
And S1222, calculating and obtaining a positive correlation significance probability value according to the Pearson correlation coefficient.
When in useWhen greater than 0, the positive correlation significance probability value is calculated as follows (not calculated when not greater than 0):
p-value=2pt(t-value);
wherein,representation of ncRNA1And ncRNA2Pearson correlation coefficient therebetween; the pt () function is used to calculate the probability p value corresponding to the t-value, i.e., in the formula, p-value represents the positive correlation significance probability value, and s is the number of matched samples.
In an alternative embodiment, the ncRNA is calculated in step S122 above1-ncRNA2The step of matching the corresponding sensitivity partial correlation coefficient values may comprise: according to ncRNA1-ncRNA2Alignment of ncRNA1And ncRNA2Correlation coefficient values between and corresponding ncRNA under mRNA conditions1-ncRNA2Alignment of ncRNA1And ncRNA2And calculating to obtain the sensitivity partial correlation coefficient value.
Specifically, to calculate the Sensitivity Partial Pearson Correlation (SPPC)) The value, the condition variable mRNA, is taken into account. Considering ncRNA1And ncRNA2Under the precondition of a synergistically competing mRNA, ncRNA1And ncRNA2Values of cross-sensitivity partial correlation coefficientsThe definition is as follows:
wherein,is ncRNA1And ncRNA2The partial correlation coefficient value between the ncRNAs under the precondition of considering the cooperative competition of the mRNAs1And ncRNA2The partial correlation coefficient value between. Hypothetical ncRNA1-ncRNA2Synergistically competing for m (the value of which is usually not less than 3) mrnas, and is represented by Z ═ Z (Z)1,Z2,...,Zm) Then the partial correlation coefficient valueThe calculation is as follows:
wherein x is (x)1,x2,...,xs),y=(y1,y2,...,ys),Zi=(zi,1,zi,2,...,zi,s)(i∈[1,2,...,m]), cor(x,y|(Z1,Z2,...,Zm) Is represented by (Z)1,Z2,...,Zm) The value of the partial correlation coefficient between x and y under the condition, cor (x, y | (Z)1,Z2,...,Zm-1) Is represented by (Z)1,Z2,...,Zm-1) Value of partial correlation coefficient between x and y under the condition cor (x, Z)m|(Z1,Z2,...,Zm-1) Is represented by (Z)1,Z2,...,Zm) Strip for packaging articlesX and Z under workpiecemValue of partial correlation coefficient between, cor (y, Z)m|(Z1,Z2,...,Zm-1) Is represented by (Z)1,Z2,...,Zm) Under the condition of y and ZmThe partial correlation coefficient value between.
The ncRNA calculated based on the above step S1221-ncRNA2The matched corresponding cooperative competition mRNA statistical significance probability value, positive correlation significance probability value and sensitivity partial correlation coefficient value can be determined through the step S1231-ncRNA2And whether the pairing is a cooperative competition relationship pair or not is judged, and the ncRNA cooperative competition network identification is further realized.
In an optional embodiment, before determining that ncRNA-ncRNA pairs composed of two ncrnas satisfying a preset condition in the ncRNA and mRNA expression profile data are a cooperative competition relationship pair according to the ncRNA and mRNA expression profile data and the preset ncRNA-mRNA competition relationship data, the method may further include: and preprocessing the ncRNA and mRNA expression profile data, and removing repeated items in the ncRNA and mRNA expression profile data and ncRNA and mRNA without gene names.
Accordingly, the ncRNA and the ncRNA in the mRNA expression profile data obtained in step S121 are obtained1And ncRNA2Composed ncRNA1-ncRNA2When in matching, the ncRNA and mRNA expression profile data of the sample can be matched according to the preprocessed target disease type to obtain the ncRNA1-ncRNA2And (6) pairing.
Optionally, the ncRNA cooperative competition network identification method may further include:
pairing ncRNAs identified as cooperative competition pairs in the following manner1-ncRNA2And (3) evaluating the ncRNA-ncRNA cooperative competition network formed by pairing:
1) fitting whether the connectivity of the ncRNA-ncRNA cooperative competition network complies with power law distribution to determine whether the ncRNA-ncRNA cooperative competition network belongs to a scale-free network (network topology analysis):
previous studies have shown that: true biomolecular networks tend to be scale-free networks. In a scaleless biomolecular network, most molecules are connected together through a few hub molecules, which means that the positions of the molecules in the biomolecular network are not equal, and the hub molecules play a key role in connecting the integrity of the biomolecular network.
The scale-free network refers to that the distribution of connectivity in the network is subjected to power law distribution, and the distribution form is expressed as y-bxa(x is the connectivity, y is the frequency of occurrence of the connectivity, and a and b are parameters). To evaluate whether the identified ncRNA-ncRNA cooperative competition network belongs to a scale-free network, one can fit whether the connectivity of the ncRNA-ncRNA cooperative competition network obeys a power law distribution. The goodness-of-fit test statistic R is provided for the goodness-of-fit2Measured by the amount of R2A closer to 1 indicates a closer to the power law distribution.
2) Determining the nodes with the connectivity higher than the first 10% in the ncRNA-ncRNA cooperative competition network as pivot nodes (identifying pivot ncRNAs):
the pivotal ncRNAs play a key role in connecting ncRNA-ncRNA to compete for network integrity cooperatively, and the pivotal ncRNAs can be used as biomarkers (biomarkers) to provide references for clinical diagnosis and targeted treatment of human complex diseases such as cancer. Typically, the nodes with high connectivity (the first 10%) are considered to be hub nodes. In this example, the top 10% high connectivity ncRNAs were considered pivotal ncRNAs.
3) Determining ncRNAs for which both ncRNAs are associated with a target disease type1-ncRNA2Pairing to obtain ncRNA-ncRNA cooperative competition pairs corresponding to the target disease types (ncRNA-ncRNA cooperative competition pairs for identifying the target disease types):
and extracting ncRNA-ncRNA cooperative competition pairs related to the target disease type based on the ncRNAs of the target disease type. For each ncRNA-ncRNA cooperative competition pair, the ncRNA-ncRNA cooperative competition pair associated with the target disease type is considered to be a ncRNA-ncRNA cooperative competition pair if and only if both ncRNAs in the cooperative competition pair are associated with the target disease type.
4) Based on the ncRNA-ncRNA cooperative competition network, utilizing a Markov clustering algorithm to identify a ncRNA-ncRNA cooperative competition module (identify the lncRNA-lncRNA cooperative competition module):
and identifying the ncRNA-ncRNA cooperative competition module by utilizing a Markov clustering algorithm (MCL) based on the ncRNA-ncRNA cooperative competition network. In each ncRNA-ncRNA cooperative competition module, the number of ncRNAs is at least 3.
5) Determining a ncRNA-ncRNA cooperative competition module with a significance probability value related to the functionality of the target disease type less than 0.05 as a ncRNA-ncRNA cooperative competition module corresponding to the target disease type (ncRNA-ncRNA cooperative competition module enrichment analysis) according to the ncRNAs related to the prior target disease type and a hyper-geometric distribution test algorithm:
based on the ncRNAs associated with the prior target disease types, each ncRNA-ncRNA cooperative competition module is tested for functional association with the target disease type using hyper-geometric distribution test (hyper-geometric distribution test), which is calculated as follows:
wherein p-value represents a significance probability value related to the target disease type functionality; n is a radical of2Representing the number of ncRNAs in the data set, M2Number of ncRNAs representing association of target disease type in data set, K2The number of ncRNAs in the ncRNA-ncRNA cooperative competition module, L2And (3) representing the number of ncRNAs related to the target disease type in the ncRNA-ncRNA cooperative competition module. p-value is less than 0.05, the ncRNA-ncRNA cooperative competition module is identified as the target disease type association module.
6) For each ncRNA-ncRNA cooperative competition module, calculating a risk value of each target disease type matching sample by using a multivariate Cox model; dividing the target disease type matching sample into a high risk sample set and a low risk sample set according to the risk value of the target disease type matching sample; calculating a risk value according to the high risk sample set and the low risk sample set; calculating significance probability values of the difference of the survival times of the high-risk sample set and the low-risk sample set according to a logarithmic rank test algorithm to obtain logarithmic rank test significance probability values; and determining the ncRNA-ncRNA cooperative competition module with the risk value of more than 1 and the significance probability value of the logarithmic rank test of less than 0.05 as the biomarker of the target disease type (ncRNA-ncRNA cooperative competition module survival analysis):
for each ncRNA-ncRNA cooperative competition module, a multivariate Cox model is applied to calculate the risk value of each target disease type sample, and the calculation is as follows:
h(t,R)=h0(t)exp(β'R)=h0(t)exp(β1R1+β2R2+...+βkRk)
wherein h (t, R) is the risk function value at time t for glioblastoma multiforme samples with a covariate R, t is the survival time, and R ═ R (R)1,R2,...,Rk) ' is ncRNAs which may influence survival time, h0(t) is the risk function value β (β) when all covariates are taken to be 01,β2,...,βk) ' is the regression coefficient of the Cox model.
And according to the risk function value h (t, R) of each sample, equally dividing the target disease type sample into two groups of high-risk samples and low-risk samples. The risk value (HR) for the high risk and low risk groups of samples for the disease type of interest was calculated as follows:
HR=h(t,Rh)/h(t,Rl)=exp[β(Rh-Rl)]
wherein, h (t, R)h) Risk function values for high risk groups for the target disease type, h (t, R)l) The risk function value for the low risk group for the disease type of interest,are high risk ncRNAs that may affect survival time,are low risk ncRNAs that may affect survival time, the threshold for HR may be set to 1.
Further, Log-rank test (Log-rank test) can be used to compare the time to live of two sets of samples with high risk and low risk of the target disease typeIf not, the test statistic is chi-square%2The calculation is as follows:
wherein A is the number of death cases of the observed target disease type, and T is the number of death cases of the theoretical target disease type. Calculated χ2The larger the value, the smaller the differential significance p-value, indicating that the time to live of the high-risk and low-risk groups of samples for the target disease type is unlikely to be the same. HR value greater than 1 and log rank significance p value less than 0.05, ncRNA-ncRNA co-competition module was identified as target disease type module biomarker.
Based on the foregoing embodiments, the embodiment of the present invention further provides a method for identifying a ncRNA cooperative competition network, and fig. 4 is a schematic flow chart of the method for identifying a ncRNA cooperative competition network provided by the embodiment of the present invention.
As shown in fig. 4, the ncRNA cooperative competition network identification method may include:
s401, acquiring ncRNA and mRNA expression profile data of the target disease type matching sample.
S402, preprocessing the ncRNA and mRNA expression profile data, and removing repeated items in the ncRNA and mRNA expression profile data and ncRNA and mRNA without gene names.
S403, obtaining ncRNA and ncRNA in mRNA expression profile data1And ncRNA2Composed ncRNA1-ncRNA2And (6) pairing.
S404, measuring the ncRNA by adopting a hyper-geometric distribution test algorithm according to preset ncRNA-mRNA competition relation data1-ncRNA2Alignment of ncRNA1And ncRNA2The probability value of statistical significance of the cooperative competition mRNA.
S405, calculating ncRNA1-ncRNA2Alignment of ncRNA1And ncRNA2Pearson correlation coefficient therebetween.
And S406, calculating and obtaining a positive correlation significance probability value according to the Pearson correlation coefficient.
S407According to ncRNA1-ncRNA2Alignment of ncRNA1And ncRNA2Correlation coefficient values between and corresponding ncRNA under mRNA conditions1And ncRNA2And calculating to obtain the sensitivity partial correlation coefficient value.
S408, judging whether the statistical significance probability value of the cooperative competition mRNA is less than 0.05, the positive correlation significance probability value is less than 0.05 and the sensitivity partial correlation coefficient value is more than 0.1.
If yes, go to step S409; if not, finishing or continuously obtaining new ncRNA1-ncRNA2The pairing performs the above-described process (not shown in the figure).
S409, determining ncRNA1-ncRNA2The pairing is a cooperative competition relationship pair.
In order to make the technical scheme recorded in the ncRNA cooperative competition network identification method provided by the embodiment of the present invention clearer, the present invention now describes the steps of the ncRNA cooperative competition network identification method through the following specific embodiments:
example 1
Taking the lncRNA-lncRNA cooperative competition in glioblastoma multiforme as an example, the method for identifying lncRNA-lncRNA cooperative competition network in this embodiment is implemented by the following steps:
step 1: data source acquisition
IncRNA and mRNA expression profile data of glioblastoma multiforme (GBM) matched samples were collected from the international famous cancer gene expression profile database TCGA (the cancer gene atlas, https:// cancer gene. nih. gov /). 9704 lncRNAs and 18282 mRNAs expression profile data of 451 breast cancer matched samples and sample clinical information are finally obtained through pretreatment (removing repeated items and lncRNAs and mRNAs without gene names). In this example, the ncRNA is lncRNA,
the prior lncRNA-mRNA competitive network data is obtained by fusing a plurality of different databases. Specifically, by integrating miRSponge, LncCerbase, LncACTdb v2.0 and ENCORI four databases. Finally, 10099 lncRNA-mRNA competition pairs associated with the glioblastoma multiforme expression profile data were obtained. 166 lncRNAscope associated glioblastoma multiforme lncRNAscope v2.0, Lnc2Cancer v2.0 and MNDR v2.0 databases were also collected.
Step 2: identification of lncRNA-lncRNA cooperative competition network
In the lncRNA-lncRNA cooperative competition network, each lncRNA-lncRNA cooperative competition pair must satisfy the condition: the significance probability p value of the synergistic competitive mRNAs is less than 0.05, the positive correlation significance probability p value is less than 0.05, and the sensitivity partial correlation coefficient value
And step 3: evaluation of lncRNA-lncRNA cooperative competition network
The identified lncRNA-lncRNA cooperative competition network can then be evaluated from the following six aspects:
1) network topology analysis
Previous studies have shown that: true biomolecular networks tend to be scale-free networks. In a scaleless biomolecular network, most molecules are connected together through a few hub molecules, which means that the positions of the molecules in the biomolecular network are not equal, and the hub molecules play a key role in connecting the integrity of the biomolecular network.
The scale-free network refers to that the distribution of connectivity in the network is subjected to power law distribution, and the distribution form is expressed as y-bxa(x is the connectivity, y is the frequency of occurrence of the connectivity, and a and b are parameters). To evaluate whether the identified lncRNA-lncRNA cooperative competition network belongs to a scale-free network, it can be fitted whether the connectivity of the lncRNA-lncRNA cooperative competition network obeys power law distribution. The goodness-of-fit test statistic R is provided for the goodness-of-fit2Measured by the amount of R2A closer to 1 indicates a closer networkA power law distribution.
2) Identifying hinge lncRNAs
The hinge lncRNAs play a key role in connecting lncRNA-lncRNA to compete for network integrity cooperatively, and the hinge lncRNAs can be used as biomarkers (biomarkers) to provide references for clinical diagnosis and targeted treatment of human complex diseases such as cancer. Typically, the nodes with high connectivity (the first 10%) are considered to be hub nodes. In this example, lncRNAs with high first 10% connectivity are considered pivotal lncRNAs.
3) lncRNA-lncRNA cooperative competition pair for recognizing glioblastoma multiforme association
Based on glioblastoma multiforme lncRNAs, extracting glioblastoma multiforme associated lncRNA-lncRNA cooperative competition pairs. For each lncRNA-lncRNA cooperative competition pair, the lncRNA-lncRNA cooperative competition pair is considered to be a glioblastoma multiforme-associated lncRNA-lncRNA cooperative competition pair if and only if both lncrnas in the cooperative competition pair are associated with glioblastoma multiforme.
4) Module for identifying lncRNA-lncRNA cooperative competition
And identifying the lncRNA-lncRNA cooperative competition module by utilizing a Markov clustering algorithm (MCL) based on the lncRNA-lncRNA cooperative competition network. In each lncRNA-lncRNA cooperative competition module, the number of lncRNA is at least 3.
5) lncRNA-lncRNA cooperative competition module enrichment analysis
Based on the prior glioblastoma multiforme-associated lncRNAs, a hyper-geometric distribution test (hyper-geometric distribution test) was used to test whether each lncRNA-lncRNA co-competition module is functionally associated with glioblastoma multiforme, calculated as follows:
wherein N is2Representing the number of lncRNAs in the data set, M2Representing the number of glioblastoma multiforme-associated lncRNAs in the data set, K2Is lncRNA-Number of lncRNAs in lncRNA cooperative competition Module, L2Indicating the number of lncRNAs associated with glioblastoma multiforme in lncRNA-lncRNA cooperative competition module.
In this example, the significance probability p is less than 0.05, and the lncRNA-lncRNA cooperative competition module is determined as the glioblastoma multiforme association module.
6) lncRNA-lncRNA cooperative competition module survival analysis
For each lncRNA-lncRNA co-competition module, a multivariate Cox model was applied to calculate the risk value for each glioblastoma multiforme sample, as follows:
h(t,R)=h0(t)exp(β'R)=h0(t)exp(β1R1+β2R2+...+βkRk)
wherein h (t, R) is the risk function value at time t for glioblastoma multiforme samples with a covariate R, t is the survival time, and R ═ R (R)1,R2,...,Rk) ' is lncRNAs which may influence the survival time, h0(t) is the risk function value β (β) when all covariates are taken to be 01,β2,...,βk) ' is the regression coefficient of the Cox model.
Based on the risk function value h (t, R) of each sample, the 451 glioblastoma multiforme samples were equally divided into two groups of high-risk and low-risk samples. The risk value (HR) of the high-risk and low-risk groups of glioblastoma multiforme samples was calculated as follows:
HR=h(t,Rh)/h(t,Rl)=exp[β(Rh-Rl)]
wherein, h (t, R)h) The risk function value for the high risk group of glioblastoma multiforme, h (t, R)l) The risk function value for the low risk group of glioblastoma multiforme,are high risk lncRNAs that may affect survival time,are low risk lncRNAs that may affect survival time, the threshold for HR in this example is set to 1.
Further, Log-rank test (Log-rank test) can be used to compare the survival time of the high-risk and low-risk glioblastoma multiforme samples, with chi-square test statistic2The calculation is as follows:
wherein A is the number of cases of glioblastoma multiforme death observed, and T is the number of theoretical glioblastoma multiforme death cases. Calculated χ2The larger the value, the smaller the differential significance p-value, indicating that the survival time of the glioblastoma multiforme samples in both high-risk and low-risk groups is unlikely to be the same. In this example, if the HR value is greater than 1 and the p-value of the log-rank test significance probability is less than 0.05, the lncRNA-lncRNA competition module is identified as the glioblastoma multiforme module biomarker.
Example 2
Taking the lncRNA-lncRNA cooperative competition corresponding to the lung squamous cell carcinoma as an example, the method for identifying the lncRNA-lncRNA cooperative competition network in the embodiment is implemented by the following steps:
in step 1 of this example, lncRNA and mRNA expression profiling data of Lung Squamous cell carcinoma (Lung Squalus CellCarcinoma, LSCC) matched samples were collected from the International famous cancer gene expression profiling database TCGA (the cancer gene expression class, https:// cancer gene. Through pretreatment (removing repeated items and lncRNA and mRNA without gene names), 9704 lncRNAs and 18282 mRNAs expression profile data of 113 breast cancer matching samples and sample clinical information are finally obtained. In this example, the ncRNA is lncRNA,
the prior lncRNA-mRNA competition network data is the same as that in the embodiment 1, and 10099 lncRNA-mRNA competition relationship pairs related to the lung cancer expression profile data are finally obtained. 429 lncRNAs associated with lung Cancer can also be collected from the three databases LncRNADisease v2.0, Lnc2Cancer v2.0 and MNDR v 2.0.
Other steps are the same as embodiment 1 and are not described herein again.
Example 3
Taking lncRNA-lncRNA cooperative competition in ovarian cancer as an example, the method for identifying lncRNA-lncRNA cooperative competition network in this embodiment is implemented by the following steps:
in step 1 of this example, lncRNA and mRNA expression profile data of Ovarian Cancer (Ovarian Cancer, OvCa) matched samples were collected from the International famous Cancer gene expression profiling database TCGA (the Cancer atlas, https:// Cancer. nih. gov. /). 9704 lncRNAs and 18282 mRNAs expression profile data of 585 ovarian cancer matched samples were finally obtained by pretreatment (removing duplicates and lncRNAs and mRNAs without gene names), as well as sample clinical information. In this example, the ncRNA is lncRNA,
the prior lncRNA-mRNA competition network data is the same as that in example 1, and 10099 lncRNA-mRNA competition relationship pairs which are related to the ovarian cancer expression profile data are finally obtained. 140 lncRNAs associated with ovarian cancer can also be collected from three databases, lncrnodisease v2.0, lnc2cancer rv2.0 and MNDR v 2.0.
Other steps are the same as embodiment 1 and are not described herein again.
Example 4
Taking the lncRNA-lncRNA cooperative competition corresponding to prostate cancer as an example, the method for identifying lncRNA-lncRNA cooperative competition network in this embodiment is implemented by the following steps:
in step 1 of this embodiment, the video game is selected from commemorative slon-kaTourmaline Cancer Center MSKCC (medical Sloan-decorating Cancer Center, https:// www.mskcc.org /) data were collected on lncRNA and mRNA expression profiles of Prostate Cancer (Prostate Cancer, PrCa) matched samples. Through pretreatment (removing repeated items and lncRNA and mRNA without gene names), 9704 lncRNAs and 18282 mRNAs expression profile data of 150 ovarian cancer matched samples and sample clinical information are finally obtained. In this example, the ncRNA is lncRNA,
the prior lncRNA-mRNA competition network data is the same as that in example 1, and 10099 lncRNA-mRNA competition relationship pairs which are related to the ovarian cancer expression profile data are finally obtained. 141 lncRNAs associated with ovarian cancer can also be collected from three databases, lncrnodisease v2.0, lnc2cancer rv2.0 and MNDR v 2.0.
Other steps are the same as embodiment 1, and are not repeated herein.
Based on the foregoing examples 1-4, the evaluation of the incrna-incrna cooperative competition network recognition results can be shown in the following tables 1-6. Wherein, table 1 is the lncRNA-lncRNA cooperative competition network topology analysis mined in examples 1-4; table 2 shows hinge lncRNAs excavated in examples 1-4; table 3 shows the disease-associated lncRNA-lncRNA cooperative competition relationships mined in examples 1 to 4; table 4 shows the lncRNA-lncRNA cooperative competition module mined in examples 1 to 4; table 5 shows the disease enrichment-associated lncRNA-lncRNA competition module in examples 1-4; table 6 shows the lncRNA-lncRNA competition module used as the biomarker in examples 1 to 4.
TABLE 1 lncRNA-lncRNA Competition network topology analysis mined in examples 1-4
Table 2 hinge lncRNAs excavated in examples 1-4
TABLE 3 disease-associated lncRNA-lncRNA concerted competition relationships explored in examples 1-4
TABLE 4 lncRNA-lncRNA Competition Module mined in examples 1-4
TABLE 5 lncRNA-lncRNA Competition Module associated with disease enrichment in examples 1-4
TABLE 6 lncRNA-lncRNA Competition Module serving as biomarker in examples 1-4
As shown in Table 1, examples 1-4 the lncRNA-lncRNA cooperative competition network mined in the four data sets (GBM, LSCC, OvCa and PrCa) substantially conforms to the scale-free network characteristics (goodness of fit test statistic R) of the real biomolecule network2Both greater than 0.69). A portion of the pivotal lncRNAs and lncRNA-lncRNA synergistic competition relationship are associated with diseases (GBM, LSCC, OvCa and PrCa) (as shown in tables 2 and 3). Most of the mined lncRNA-lncRNA cooperative competition modules (see Table 4) areAssociated with significant enrichment of disease (see table 5) and as biomarker (see table 6). The method provided by the invention has basically consistent results in the four data sets, and can robustly identify the lncRNA-lncRNA cooperative competition network.
In conclusion, the ncRNA cooperative competition network identification method provided by the invention can effectively excavate the cooperative competition relationship between the ncRNAs, and the identified ncRNA cooperative competition network basically conforms to the scale-free network characteristics of a real biomolecular network. Based on the identified ncRNA cooperative competition network, the disease associated pivot ncRNAs, the disease associated ncRNA cooperative competition network and module, and the disease biomarker can be further identified. Especially, in the application of the complex disease gene expression profile data set, the method provides technical support and understanding means for clinical diagnosis and targeted therapy of human complex diseases such as cancer and the like, and has important biological significance.
Based on the ncRNA cooperative competition network identification method provided by the method embodiment, the embodiment of the invention also correspondingly provides an ncRNA cooperative competition network identification device. Fig. 5 is a schematic structural diagram of an ncRNA cooperative competition network identification apparatus according to an embodiment of the present invention, and as shown in fig. 5, the ncRNA cooperative competition network identification apparatus may include: the acquisition module 10 is used for acquiring ncRNA of a target disease type matching sample and target gene mRNA expression profile data; and the identification module 20 is configured to determine that ncRNA-ncRNA pairs composed of two ncrnas which meet a preset condition in the ncRNA and mRNA expression profile data are paired into a cooperative competition relationship pair according to the ncRNA and mRNA expression profile data and preset ncRNA-mRNA competition relationship data.
Fig. 6 is a schematic structural diagram of an identification module according to an embodiment of the present invention.
As shown in fig. 6, in an alternative embodiment, the identification module 20 includes: an acquisition submodule 21 for acquiring ncRNA and ncRNA in mRNA expression profile data1And ncRNA2Composed ncRNA1-ncRNA2Pairing; a calculation submodule 22 for calculating the ncRNA according to the preset ncRNA-mRNA competition relation data1-ncRNA2The probability value of statistical significance and positive correlation of the corresponding cooperative competitive mRNAA saliency probability value, and a sensitivity partial correlation coefficient value; identifier module 23 for ncRNA1-ncRNA2And when the pairing simultaneously satisfies that the statistical significance probability value of the cooperative competition mRNA is less than a first threshold value, the positive correlation significance probability value is less than a second threshold value, and the sensitivity partial correlation coefficient value is greater than a third threshold value, determining the ncRNA1-ncRNA2The pairing is a cooperative competition relationship pair.
In an alternative embodiment, the calculation submodule 22 is specifically configured to measure the ncRNA by using the hyper-geometric distribution test algorithm according to the pre-set ncRNA-mRNA competitive relationship data1-ncRNA2Alignment of ncRNA1And ncRNA2The probability value of statistical significance of the cooperative competition mRNA.
In an alternative embodiment, the calculation submodule 22 is specifically adapted to calculate ncRNA1-ncRNA2Alignment of ncRNA1And ncRNA2Pearson correlation coefficient therebetween; and calculating to obtain a positive correlation significance probability value according to the Pearson correlation coefficient.
In an alternative embodiment, the calculation submodule 22 is specifically adapted to calculate the ncRNA1-ncRNA2Alignment of ncRNA1And ncRNA2Correlation coefficient values between and corresponding ncRNA under mRNA conditions1-ncRNA2Alignment of ncRNA1And ncRNA2And calculating to obtain the sensitivity partial correlation coefficient value.
In an alternative embodiment, the identifier module 23 is specifically configured to determine the ncRNA if the co-competition mRNA statistical significance probability value of less than 0.05, the positive correlation significance probability value of less than 0.05, and the sensitivity partial correlation coefficient value of greater than 0.1 are simultaneously satisfied1-ncRNA2The pairing is a cooperative competition relationship pair.
Fig. 7 is another schematic structural diagram of a ncRNA cooperative competition network identification apparatus according to an embodiment of the present invention.
As shown in fig. 7, in an alternative embodiment, the apparatus further comprises: and the competition data module 30 is configured to obtain prior ncRNA-mRNA competition network data associated with the ncRNA and mRNA expression profile data of the target disease type matching sample by fusing multiple different databases before the recognition module 20 determines that the ncRNA-ncRNA pair composed of two ncrnas meeting a preset condition in the ncRNA and mRNA expression profile data is a cooperative competition relationship pair according to the ncRNA and mRNA expression profile data and preset ncRNA-mRNA competition relationship data.
Fig. 8 is a schematic structural diagram of a ncRNA cooperative competition network recognition apparatus according to an embodiment of the present invention.
As shown in fig. 8, in an alternative embodiment, the apparatus further comprises: and the preprocessing module 40 is configured to preprocess the ncRNA and mRNA expression profile data before the recognition module 20 determines that the ncRNA-ncRNA pair composed of two ncrnas which meet a preset condition in the ncRNA and mRNA expression profile data is a cooperative competition relationship pair according to the ncRNA and mRNA expression profile data and the preset ncRNA-mRNA competition relationship data, so as to remove duplicate items in the ncRNA and mRNA expression profile data and the ncRNA and mRNA without gene names.
Fig. 9 is a schematic structural diagram of a ncRNA cooperative competition network recognition apparatus according to an embodiment of the present invention.
As shown in fig. 9, in an alternative embodiment, the apparatus further comprises: an evaluation module 50 for pairing ncRNAs determined as cooperative competition pairs in the following manner1-ncRNA2And (3) evaluating the ncRNA-ncRNA cooperative competition network formed by pairing: 1) fitting whether the connectivity of the ncRNA-ncRNA cooperative competition network obeys power law distribution or not to determine whether the ncRNA-ncRNA cooperative competition network belongs to a scale-free network or not; 2) determining nodes with the connectivity higher than the first 10% in the ncRNA-ncRNA cooperative competition network as pivot nodes; 3) determining ncRNAs for which both ncRNAs are associated with a target disease type1-ncRNA2Pairing to obtain a ncRNA-ncRNA cooperative competition pair corresponding to the target disease type; 4) identifying the ncRNA-ncRNA cooperative competition module by utilizing a Markov clustering algorithm based on the ncRNA-ncRNA cooperative competition network; 5) determining ncRNA-ncRNA cooperative competition with significance probability value less than 0.05 related to the target disease type functionality according to the ncRNAs related to the prior target disease type and the hyper-geometric distribution test algorithmThe module is a ncRNA-ncRNA cooperative competition module corresponding to the target disease type; 6) for each ncRNA-ncRNA cooperative competition module, calculating the risk value of each target disease type matching sample by applying a multivariate Cox model; dividing the target disease type matching sample into a high risk sample set and a low risk sample set according to the risk value of the target disease type matching sample; calculating a risk value according to the high risk sample set and the low risk sample set; calculating significance probability values of the difference of the survival times of the high-risk sample set and the low-risk sample set according to a logarithmic rank test algorithm to obtain logarithmic rank test significance probability values; and determining the ncRNA-ncRNA cooperative competition module with the risk value of more than 1 and the significance probability value of the logarithmic rank test of less than 0.05 as the biomarker of the target disease type.
In alternative embodiments, the target disease type includes any of the following: glioblastoma multiforme, squamous cell carcinoma of the lung, ovarian cancer, and prostate cancer.
In alternative embodiments, the ncRNA comprises any one of: long non-coding RNA, circular RNA, and pseudogenes.
These above modules may be one or more integrated circuits configured to implement the above methods, such as: one or more Application Specific Integrated Circuits (ASICs), or one or more microprocessors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs), among others. For another example, when a module is implemented as a processing element scheduler code, the processing element may be a general-purpose processor, such as a Central Processing Unit (CPU) or other processor capable of calling program code. For another example, these modules may be integrated together and implemented in the form of a System-On-a-Chip (SOC).
The embodiment of the invention also provides ncRNA cooperative competition network identification equipment, which can be the desktop computer, the notebook computer, the server, the cloud, the customized terminal or the intelligent terminal and the like.
Fig. 10 is a schematic structural diagram of a ncRNA cooperative competition network identification device according to an embodiment of the present invention.
As shown in fig. 10, the ncRNA cooperative competition network identification device may include: the ncRNA cooperative competition network identification device comprises a processor 100, a storage medium 200 and a bus 300, wherein the storage medium 200 stores machine readable instructions executable by the processor 100, when the ncRNA cooperative competition network identification device runs, the processor 100 communicates with the storage medium 200 through the bus 300, and the processor 100 executes the machine readable instructions to execute the ncRNA cooperative competition network identification method in the foregoing method embodiment.
It is noted that a processor may include one or more processing cores (e.g., a single-core processor or a multi-core processor). Merely by way of example, a Processor may include a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), an Application Specific Instruction Set Processor (ASIP), a Graphics Processing Unit (GPU), a Physical Processing Unit (PPU), a Digital Signal Processor (DSP), a Field Programmable Gate Array (FPGA), a Programmable Logic Device (PLD), a controller, a microcontroller Unit, a Reduced Instruction Set computer (Reduced input Set Computing), a microprocessor, or the like, or any combination thereof.
The storage medium may include: including mass storage, removable storage, volatile Read-and-write Memory, or Read-Only Memory (ROM), among others, or any combination thereof. By way of example, mass storage may include magnetic disks, optical disks, solid state drives, and the like; removable memory may include flash drives, floppy disks, optical disks, memory cards, zip disks, tapes, and the like; volatile read-write Memory may include Random Access Memory (RAM); the RAM may include Dynamic RAM (DRAM), Double data Rate Synchronous Dynamic RAM (DDR SDRAM); static RAM (SRAM), Thyristor-Based Random Access Memory (T-RAM), Zero-capacitor RAM (Zero-RAM), and the like. By way of example, ROMs may include Mask Read-Only memories (MROMs), Programmable ROMs (PROMs), erasable Programmable ROMs (PERROMs), Electrically Erasable Programmable ROMs (EEPROMs), compact disk ROMs (CD-ROMs), digital versatile disks (ROMs), and the like.
For ease of illustration, only one processor is depicted in the ncRNA cooperative competition network identification apparatus. However, it should be noted that the ncRNA cooperative competition network identification apparatus in the present invention may further include a plurality of processors, and thus, the steps performed by one processor described in the present invention may also be performed by a plurality of processors in combination or individually. For example, if the ncRNA co-competes with the processor of the network identification device to perform steps a and B, it should be understood that steps a and B can also be performed by two different processors together or in one processor alone. For example, a first processor performs step a and a second processor performs step B, or the first processor and the second processor perform steps a and B together.
Optionally, the present invention further provides a computer readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps of the ncRNA cooperative competition network identification method described in the foregoing method embodiments.
In the several embodiments provided in the present invention, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device) or a Processor (Processor) to execute some steps of the methods according to the embodiments of the present invention. And the aforementioned storage medium includes: a U disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, those skilled in the art will appreciate that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.
Claims (10)
1. A method for identifying a ncRNA cooperative competition network is characterized by comprising the following steps:
acquiring ncRNA and mRNA expression profile data of a target disease type matching sample;
and determining that the ncRNA-ncRNA pair formed by two ncRNAs meeting preset conditions in the ncRNA and mRNA expression profile data is a cooperative competition relationship pair according to the ncRNA and mRNA expression profile data and preset ncRNA-mRNA competition relationship data.
2. The method of claim 1, wherein determining that a ncRNA-ncRNA pair consisting of two ncRNAs meeting a preset condition in the ncRNA and mRNA expression profile data is a cooperative competition pair according to the ncRNA and mRNA expression profile data and preset ncRNA-mRNA competition relationship data comprises:
obtaining the ncRNA in the ncRNA and mRNA expression profile data1And ncRNA2Composed ncRNA1-ncRNA2Pairing;
calculating the ncRNA according to preset ncRNA-mRNA competition relation data1-ncRNA2Matching corresponding cooperative competition mRNA statistical significance probability value, positive correlation significance probability value and sensitivity partial correlation coefficient value;
if the ncRNA1-ncRNA2And when the pairing simultaneously meets the conditions that the statistical significance probability value of the cooperative competition mRNA is smaller than a first threshold value, the positive correlation significance probability value is smaller than a second threshold value, and the sensitivity partial correlation coefficient value is larger than a third threshold value, determining that the ncRNA1-ncRNA2The pairing is a cooperative competition relationship pair.
3. The method of claim 2, wherein said ncRNA is calculated1-ncRNA2Pairing corresponding cooperative competing mRNA statistical significance probability values, comprising:
measuring the ncRNA by adopting a hyper-geometric distribution test algorithm according to preset ncRNA-mRNA competitive relation data1-ncRNA2Alignment of ncRNA1And ncRNA2The probability value of statistical significance of the cooperative competition mRNA.
4. The method of claim 3, wherein said ncRNA is calculated1-ncRNA2Pairing corresponding positive correlation significance probability values comprising:
calculating the ncRNA1-ncRNA2Alignment of ncRNA1And ncRNA2Pearson correlation coefficient therebetween;
and calculating and obtaining the positive correlation significance probability value according to the Pearson correlation coefficient.
5. The method of claim 4, wherein said ncRNA is calculated1-ncRNA2Pairing corresponding sensitivity partial correlation coefficient values, comprising:
according to the ncRNA1-ncRNA2Alignment of ncRNA1And ncRNA2The correlation coefficient value between and the corresponding ncRNA under mRNA conditions1-ncRNA2Alignment of ncRNA1And ncRNA2And calculating to obtain the sensitivity partial correlation coefficient value.
6. The method of any of claims 2-5, wherein said ncRNA is present as a recombinant vector1-ncRNA2And when the pairing simultaneously meets the conditions that the statistical significance probability value of the cooperative competition mRNA is smaller than a first threshold value, the positive correlation significance probability value is smaller than a second threshold value, and the sensitivity partial correlation coefficient value is larger than a third threshold value, determining that the ncRNA1-ncRNA2Pairing is a cooperative competition relationship pair, including:
if the statistical significance probability value of the cooperative competition mRNA is less than 0.05, the positive correlation significance probability value is less than 0.05 and the sensitivity partial correlation coefficient value is more than 0.1 are simultaneously satisfied, determining that the ncRNA1-ncRNA2The pairing is a cooperative competition relationship pair.
7. The method of any one of claims 1 to 5, wherein before determining that ncRNA-ncRNA pairs consisting of two ncRNAs satisfying a predetermined condition in the ncRNA and mRNA expression profile data are in a cooperative competition relationship pair according to the ncRNA and mRNA expression profile data and the predetermined ncRNA-mRNA competition relationship data, the method further comprises:
and acquiring prior ncRNA-mRNA competitive network data associated with the ncRNA and mRNA expression profile data of the target disease type matching sample by fusing a plurality of different databases to obtain the ncRNA-mRNA competitive relationship data.
8. The method according to any one of claims 1-5, further comprising:
pairing said ncRNAs determined as cooperative competition pairs in the following manner1-ncRNA2And (3) evaluating the ncRNA-ncRNA cooperative competition network formed by pairing:
1) fitting whether the connectivity of the ncRNA-ncRNA cooperative competition network obeys power law distribution or not to determine whether the ncRNA-ncRNA cooperative competition network belongs to a scale-free network or not;
2) determining nodes with the connectivity higher than the first 10% in the ncRNA-ncRNA cooperative competition network as pivot nodes;
3) determining ncRNAs for which both ncRNAs are associated with the target disease type1-ncRNA2Pairing to obtain a ncRNA-ncRNA cooperative competition pair corresponding to the target disease type;
4) identifying the ncRNA-ncRNA cooperative competition module by utilizing a Markov clustering algorithm based on the ncRNA-ncRNA cooperative competition network;
5) determining a ncRNA-ncRNA cooperative competition module with a significance probability value related to the target disease type functionality smaller than 0.05 as a ncRNA-ncRNA cooperative competition module corresponding to the target disease type according to the ncRNAs related to the target disease type in a priori manner and a hyper-geometric distribution test algorithm;
6) for each ncRNA-ncRNA cooperative competition module, calculating a risk value of each target disease type matching sample by applying a multivariate Cox model; dividing the target disease type matching sample into a high risk sample set and a low risk sample set according to the risk value of the target disease type matching sample; calculating a risk value according to the high-risk sample set and the low-risk sample set; calculating the significance probability value of the difference of the survival times of the high-risk sample set and the low-risk sample set according to a logarithmic rank test algorithm to obtain a logarithmic rank test significance probability value; determining the ncRNA-ncRNA co-competition module with the risk value greater than 1 and the log-rank test significance probability value less than 0.05 as the biomarker for the target disease type.
9. The method of any one of claims 1 to 5, wherein said ncRNA comprises any one of: long non-coding RNA, circular RNA, and pseudogenes.
10. An ncRNA cooperative competition network recognition device, comprising:
the acquisition module is used for acquiring ncRNA and mRNA expression profile data of the target disease type matching sample;
and the recognition module is used for determining that the ncRNA-ncRNA pair formed by two ncRNAs meeting preset conditions in the ncRNA and mRNA expression profile data is a cooperative competition relationship pair according to the ncRNA and mRNA expression profile data and preset ncRNA-mRNA competition relationship data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911229601.9A CN111028887B (en) | 2019-12-04 | 2019-12-04 | Method and device for identifying ncRNA (non-coding ribonucleic acid) cooperative competition network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911229601.9A CN111028887B (en) | 2019-12-04 | 2019-12-04 | Method and device for identifying ncRNA (non-coding ribonucleic acid) cooperative competition network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111028887A true CN111028887A (en) | 2020-04-17 |
CN111028887B CN111028887B (en) | 2021-04-06 |
Family
ID=70204255
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911229601.9A Expired - Fee Related CN111028887B (en) | 2019-12-04 | 2019-12-04 | Method and device for identifying ncRNA (non-coding ribonucleic acid) cooperative competition network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111028887B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112071369A (en) * | 2020-09-10 | 2020-12-11 | 暨南大学附属第一医院(广州华侨医院) | Module marker mining method and device, computer equipment and storage medium |
CN113539360A (en) * | 2021-07-21 | 2021-10-22 | 西北工业大学 | IncRNA characteristic recognition method based on correlation optimization and immune enrichment |
CN113921085A (en) * | 2021-10-26 | 2022-01-11 | 李永生 | Prediction method for non-coding RNA gene synergistic regulation and control effect |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102799796A (en) * | 2011-05-24 | 2012-11-28 | 上海聚类生物科技有限公司 | Method for association analysis of long noncoding ribonucleic acid (LncRNA) and messenger ribonucleic acid (mRNA) |
CN106202993A (en) * | 2016-07-12 | 2016-12-07 | 王亚帝 | Utilize the method that mrna expression spectrum combines screening cardiotoxicity induced by anthracyclines gene with competitive endogenous RNA express spectra |
WO2019147663A1 (en) * | 2018-01-24 | 2019-08-01 | Freenome Holdings, Inc. | Methods and systems for abnormality detection in the patterns of nucleic acids |
-
2019
- 2019-12-04 CN CN201911229601.9A patent/CN111028887B/en not_active Expired - Fee Related
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102799796A (en) * | 2011-05-24 | 2012-11-28 | 上海聚类生物科技有限公司 | Method for association analysis of long noncoding ribonucleic acid (LncRNA) and messenger ribonucleic acid (mRNA) |
CN106202993A (en) * | 2016-07-12 | 2016-12-07 | 王亚帝 | Utilize the method that mrna expression spectrum combines screening cardiotoxicity induced by anthracyclines gene with competitive endogenous RNA express spectra |
WO2019147663A1 (en) * | 2018-01-24 | 2019-08-01 | Freenome Holdings, Inc. | Methods and systems for abnormality detection in the patterns of nucleic acids |
Non-Patent Citations (2)
Title |
---|
JUNPENG ZHANG 等: "《Inferring and analyzing module-specific lncRNA-mRNA causal regulatory networks in human cancer》", 《BRIEFINGS IN BIOINFORMATICS》 * |
王腾玉 等: "《基于lncRNA-mRNA网络识别高血压相关的lncRNA及其功能》", 《国际遗传学杂志》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112071369A (en) * | 2020-09-10 | 2020-12-11 | 暨南大学附属第一医院(广州华侨医院) | Module marker mining method and device, computer equipment and storage medium |
CN112071369B (en) * | 2020-09-10 | 2021-08-03 | 暨南大学附属第一医院(广州华侨医院) | Module marker mining method and device, computer equipment and storage medium |
CN113539360A (en) * | 2021-07-21 | 2021-10-22 | 西北工业大学 | IncRNA characteristic recognition method based on correlation optimization and immune enrichment |
CN113921085A (en) * | 2021-10-26 | 2022-01-11 | 李永生 | Prediction method for non-coding RNA gene synergistic regulation and control effect |
CN113921085B (en) * | 2021-10-26 | 2023-08-04 | 李永生 | Prediction method for synergistic regulation and control effect of non-coding RNA genes |
Also Published As
Publication number | Publication date |
---|---|
CN111028887B (en) | 2021-04-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111028887B (en) | Method and device for identifying ncRNA (non-coding ribonucleic acid) cooperative competition network | |
Sun et al. | NTSMDA: prediction of miRNA–disease associations by integrating network topological similarity | |
Stanfield et al. | Myometrial transcriptional signatures of human parturition | |
US20200370112A1 (en) | Methods utilizing single cell genetic data for cell population analysis and applications thereof | |
CN104704499A (en) | Systems and methods relating to network-based biomarker signatures | |
CN111081317A (en) | Gene spectrum-based breast cancer lymph node metastasis prediction method and prediction system | |
Larsson et al. | Comparative microarray analysis | |
CN110322926B (en) | Identification method and device of miRNA sponge module | |
Leng et al. | Construction of a long non‑coding RNA-mediated competitive endogenous RNA network reveals global patterns and regulatory markers in gestational diabetes | |
JP2022524484A (en) | How to predict the survival rate of cancer patients | |
CN111383709B (en) | Recognition method and device for CERNA competition module, electronic equipment and storage medium | |
CN107463797B (en) | Biological information analysis method and device for high-throughput sequencing, equipment and storage medium | |
CN116798632A (en) | Stomach cancer molecular typing and prognosis prediction model construction method based on metabolic genes and application | |
Karman et al. | Lung gene expression and single cell analyses reveal two subsets of idiopathic pulmonary fibrosis (IPF) patients associated with different pathogenic mechanisms | |
CN115148291A (en) | Single-sample CERNA competition module identification method and device, electronic equipment and storage medium | |
CN110993020B (en) | Identification method of miRNA sponge interaction | |
Angeloni et al. | Functional genomics meta-analysis to identify gene set enrichment networks in cardiac hypertrophy | |
CN108108589B (en) | Method for identifying esophageal squamous carcinoma marker based on network index difference analysis | |
CN113724789A (en) | Single-sample CERNA network identification method, device, electronic equipment and storage medium | |
KR20240046481A (en) | Systems and methods for associating compounds with physiological conditions using fingerprint analysis | |
CN110462056A (en) | Samples sources detection method, device and storage medium based on DNA sequencing data | |
Zhu et al. | A global similarity learning for clustering of single-cell RNA-seq data | |
CN116486908B (en) | Single cell miRNA sponge network reasoning method, device, equipment and storage medium | |
Xie et al. | Clustering single-cell RNA sequencing data via iterative smoothing and self-supervised discriminative embedding | |
Kojima et al. | Identifying regulational alterations in gene regulatory networks by state space representation of vector autoregressive models and variational annealing |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20210406 |