CN112071369B - Module marker mining method and device, computer equipment and storage medium - Google Patents

Module marker mining method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN112071369B
CN112071369B CN202010944760.3A CN202010944760A CN112071369B CN 112071369 B CN112071369 B CN 112071369B CN 202010944760 A CN202010944760 A CN 202010944760A CN 112071369 B CN112071369 B CN 112071369B
Authority
CN
China
Prior art keywords
network
mirna
profile data
expression profile
disease
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010944760.3A
Other languages
Chinese (zh)
Other versions
CN112071369A (en
Inventor
王莹莹
李克深
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
First Affiliated Hospital of Jinan University
Original Assignee
First Affiliated Hospital of Jinan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by First Affiliated Hospital of Jinan University filed Critical First Affiliated Hospital of Jinan University
Priority to CN202010944760.3A priority Critical patent/CN112071369B/en
Publication of CN112071369A publication Critical patent/CN112071369A/en
Application granted granted Critical
Publication of CN112071369B publication Critical patent/CN112071369B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B40/00ICT specially adapted for biostatistics; ICT specially adapted for bioinformatics-related machine learning or data mining, e.g. knowledge discovery or pattern finding
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B25/00ICT specially adapted for hybridisation; ICT specially adapted for gene or protein expression
    • G16B25/10Gene or protein expression profiling; Expression-ratio estimation or normalisation
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16BBIOINFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR GENETIC OR PROTEIN-RELATED DATA PROCESSING IN COMPUTATIONAL MOLECULAR BIOLOGY
    • G16B45/00ICT specially adapted for bioinformatics-related data visualisation, e.g. displaying of maps or networks

Landscapes

  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Medical Informatics (AREA)
  • Biophysics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Biotechnology (AREA)
  • Evolutionary Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Genetics & Genomics (AREA)
  • Artificial Intelligence (AREA)
  • Bioethics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • Databases & Information Systems (AREA)
  • Epidemiology (AREA)
  • Evolutionary Computation (AREA)
  • Public Health (AREA)
  • Software Systems (AREA)
  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention discloses a method and a device for excavating a module marker, computer equipment and a storage medium, wherein the method comprises the following steps: constructing an initial mitcn network; constructing an mDCN network and a mi-m-DCN network, and expanding the initial miDCN network to form an expanded miDCN network; module mining is carried out on the initial mitcn network and the expanded mitcn network to obtain a mitcn module; acquiring a gold standard miRNA set; acquiring the connection relation between miRNAs of a gold standard miRNA set in a miDCN module; and calculating the similarity of the miRNA in the gold standard miRNA set and the rest miRNAs in the miDCN module, and taking the miRNA with the highest similarity in the miDCN module as a new marker. The invention combines the data of two layers of miRNA and mRNA, fully utilizes the biological relation between the two layers, and excavates the miRNA module marker of complex diseases, and has the characteristics of more comprehensiveness, system and portability.

Description

Module marker mining method and device, computer equipment and storage medium
Technical Field
The invention relates to a method and a device for excavating a module marker, computer equipment and a storage medium, and belongs to the field of excavation of module markers.
Background
The complex diseases are controlled by various genetic factors, the pathogenesis of the diseases is considered to be influenced by a plurality of genes based on the conjecture of relevant theories of biology and medicine, but the action of each gene is weak, no main effect gene exists, the phenomenon is called as 'micro-effect', and in order to find the micro-effect genes, the current module object mining algorithm for the diseases is mainly focused on the gene level, namely the mRNA level.
However, such analysis does not consider the influence of other factors on the change of gene expression, and is only a single-level analysis which is not comprehensive enough. The expression change at this level is controlled to some extent by microRNA (usually abbreviated as miRNA), which is a small, non-coding RNA, plays a role in regulating the expression of its target gene after transcription, and is negatively related to the expression of its target gene, so the micro-efficacy of the gene is also affected by it.
Disclosure of Invention
In view of the above, the invention provides a module marker mining method, a module marker mining device, a computer device and a storage medium, which combine data of two layers of miRNA and mRNA, fully utilize biological relationship between the miRNA and the mRNA, mine miRNA module markers of complex diseases, and have the characteristics of more comprehensiveness, systematicness and portability.
The invention aims to provide a module marker excavating method.
A second object of the present invention is to provide a modular marker excavating device.
It is a third object of the invention to provide a computer apparatus.
It is a fourth object of the present invention to provide a storage medium.
The first purpose of the invention can be achieved by adopting the following technical scheme:
a method of modular marker mining, the modular markers being miRNA modular markers of a complex disease, the method comprising:
constructing an initial mitcn network based on miRNA expression profile data;
constructing an mDCN network and a mi-m-DCN network based on miRNA expression profile data and mRNA expression profile data, and expanding the initial miDCN network to form an expanded miDCN network;
module mining is carried out on the initial mitcn network and the expanded mitcn network to obtain a mitcn module;
acquiring a gold standard miRNA set; wherein the set of gold standard miRNAs comprises miRNAs related to a disease under study in a public database;
acquiring the connection relation between miRNAs of a gold standard miRNA set in a miDCN module;
and calculating the similarity of the miRNA in the gold standard miRNA set and the rest miRNAs in the miDCN module, and taking the miRNA with the highest similarity in the miDCN module as a new marker.
Further, the constructing of the initial mitcn network based on the miRNA expression profile data specifically includes:
generating a matrix with x rows and z columns according to miRNA expression profile data; wherein, each row represents a miRNA, each column represents a sample, E represents miRNA expression profile data, S represents disease group expression profile data consisting of m samples of the researched disease, and C represents control group expression profile data consisting of z-m +1 samples of a control group;
expressing any pair of miRNA by i-j, calculating correlation coefficient r and p value representing statistical significance of i-j in disease group expression profile data S and control group expression profile data C, wherein the correlation coefficient r and the p value calculated in the disease group expression profile data S are respectively marked as rs,i-jAnd ps,i-jThe correlation coefficient r and the p value calculated in the expression profile data C of the control group were respectively designated as rc,i-jAnd pc,i-j
Performing DC when at least one of the p-value of i-j in the disease group expression profile data S and the p-value of i-j in the control group expression profile data C is less than or equal to a set threshold valuei-jCalculating a score;
constructing all miRNA pairs with DC values into an initial MiDCN network; wherein, the network node is miRNA, and the weight of the edge is DC value.
Further, if at least one of the p value of i-j in the disease group expression profile data S and the p value of i-j in the control group expression profile data C is less than or equal to a set threshold value, performing DCi-jThe score calculation specifically includes:
if the p value of i-j in the disease group expression profile data S and the p value of i-j in the control group expression profile data C are both smaller than or equal to the set threshold value, calculating the DC according to the following formulai-jAnd (3) fractional:
DCi-j=|rs,i-j*(1-ps,i-j)-rc,i-j*(1-pc,i-j)|
if the p value of i-j in the disease group expression profile data S is less than or equal to the set threshold value and the p value of i-j in the control group expression profile data C is greater than the set threshold value, calculating DC according to the following formulai-jAnd (3) fractional:
DCi-j=|rs,i-j*(1-ps,i-j)-rc,i-j*pc,i-j|
if the p-value of i-j in the control group expression profile data C is less than or equal to the set threshold value, and the p-value of i-j in the disease group expression profile data S is greater than the set threshold value,then DC is calculated as followsi-jAnd (3) fractional:
DCi-j=|rs,i-j*ps,i-j-rc,i-j*(1-pc,i-j)|。
further, the constructing all miRNA pairs having DC values into an initial mitcn network specifically includes:
constructing a disease TBD-miDCN network; wherein the disease TBD-miDCN network comprises all ps,i-jAnd pc,i-jmiRNA relationship pairs all having statistical significance;
constructing a disease generation-mitcn network: wherein the disease Generation-MiDCN network comprises all ps,i-jA miRNA-relationship pair having statistical significance;
constructing a disease deletion-mitcn network: wherein the disease deletion-miDCN network comprises all pc,i-jA miRNA-relationship pair having statistical significance.
Further, the constructing an mDCN network and an mi-m-DCN network based on the miRNA expression profile data and the mRNA expression profile data, and expanding the initial mimcn network to form an expanded mimcn network specifically includes:
generating a matrix with x rows and z columns according to mRNA expression profile data; wherein, each row represents one miRNA, each column represents one sample, let E1 represent mRNA expression profile data, S1 represent disease group expression profile data consisting of m samples of the disease under study, C1 represent control group expression profile data consisting of z-m +1 samples of the control group;
any pair of mRNA is represented by i1-j1, and correlation coefficient r and p value representing statistical significance of i1-j1 in disease group expression profile data S1 and control group expression profile data C1 are calculated, wherein the correlation coefficient r and the p value calculated in disease group expression profile data S1 are respectively marked as rs1,i1-j1And ps1,i1-j1The correlation coefficient r and the p value calculated in the control group expression profile data C1 were respectively designated as rc1,i1-j1And pc1,i1-j1
Performing DC when at least one of the p-value of i1-j1 in the disease group expression profile data S1 and the p-value of i1-j1 in the control group expression profile data C1 is less than or equal to a set threshold valuei1-j1Calculating a score;
constructing all mRNA pairs with DC values into an mDCN network; wherein the network node is mRNA and the weight of the edge is DC value;
generating a matrix with x rows and z columns according to the miRNA expression profile data and the mRNA expression profile data; wherein, each row represents a miRNA-mRNA pair, each column represents a sample, E2 represents miRNA-mRNA expression profile data, S2 represents disease group expression profile data consisting of m samples of the disease to be researched, and C2 represents control group expression profile data consisting of z-m +1 samples of a control group;
any one miRNA-mRNA pair is represented by i2-j2, a correlation coefficient r of i2-j2 in disease group expression profile data S2 and control group expression profile data C2 and a p value representing statistical significance are calculated, wherein the correlation coefficient r and the p value calculated in the disease group expression profile data S2 are respectively marked as rs2,i2-j2And ps2,i2-j2The correlation coefficient r and the p value calculated in the control group expression profile data C2 were respectively designated as rc2,i2-j2And pc2,i2-j2
If at least one of the p-value of i2-j2 in the disease group expression profile data S2 and the p-value of i2-j2 in the control group expression profile data C2 is less than or equal to a set threshold value, and the correlation coefficient r<0, then DC is performedi2-j2Calculating a score;
constructing all miRNA-mRNA pairs with DC values into a mi-m-DCN-e network, selecting the results of a plurality of miRNA target gene prediction algorithms, calculating a TNet fraction for each miRNA-mRNA pair, namely the number of times that the target gene prediction algorithm predicts the miRNA-mRNA pairs as a target relation, and constructing a mi-m-DCN-t network; wherein the network nodes are miRNA-mRNA pairs and the weights of the edges are TNet scores;
the method comprises the steps of utilizing an initial miDCN network, an mDCN network, a mi-m-DCN-e network/mi-m-DCN-t network as input, utilizing a computer algorithm with the similarity relation of nodes as a calculation basis to predict a new miRNA-miRNA relation, and adding the miDCN network to form an expanded miDCN network.
Further, the constructing all the mRNA pairs with DC values into an mDCN network specifically includes:
constructing a disease TBD-mDCN network; wherein the disease TBD-mDCN network comprises all ps1,i1-j1And pc1,i1-j1mRNA relationship pairs which all have statistical significance;
constructing a disease generation-mDCN network: wherein the disease Generation-mDCN network comprises all ps1,i1-j1mRNA relationship pairs with statistical significance;
constructing a disease deletion-mDCN network: wherein the disease deletion-mDCN network comprises all pc1,i1-j1mRNA relationship pairs with statistical significance.
Constructing all miRNA-mRNA pairs with DC values into a mi-m-DCN-e network, and specifically comprising the following steps:
constructing a disease TBD-mi-m-DCN-e network; wherein the disease TBD-mi-m-DCN-e network comprises all ps2,i2-j2And pc2,i2-j2miRNA-mRNA pairs that are both statistically significant;
constructing a disease generation-mi-m-DCN-e network: wherein the disease generation-mi-m-DCN-e network comprises all ps2,i2-j2A miRNA-mRNA pair of statistical significance;
constructing a disease deletion-mi-m-DCN-e network: wherein the disease deletion-mi-m-DCN-e network comprises all pc2,i2-j2A miRNA-mRNA pair of statistical significance.
Further, module mining is performed on the initial and extended mitcn networks to obtain a mitcn module, which specifically includes:
module mining is carried out by utilizing a module mining algorithm aiming at the initial mitcn network and the expanded mitcn network;
discarding all modules containing the initial mitcn network nodes with the percentage larger than the set percentage or modules with only one edge, and screening the mitcn modules of the expanded mitcn network;
and searching miRNA contained in the screened miDCN module and the researched disease as keywords through a network literature database, and classifying the miRNA in the miDCN module according to functions.
The second purpose of the invention can be achieved by adopting the following technical scheme:
a modular marker-mining device, the modular marker being a miRNA modular marker of a complex disease, the device comprising:
the constructing unit is used for constructing an initial mitcn network based on the miRNA expression profile data;
the extension unit is used for constructing an mDCN network and a mi-m-DCN network based on the miRNA expression profile data and the mRNA expression profile data, and extending the initial miDCN network to form an extended miDCN network;
the mining unit is used for carrying out module mining on the initial and extended mitcn networks to obtain a mitcn module;
the first acquisition unit is used for acquiring a gold standard miRNA set; wherein the set of gold standard miRNAs comprises miRNAs related to a disease under study in a public database;
the second acquisition unit is used for acquiring the connection relation between miRNAs of the gold standard miRNA set in the miDCN module;
and the calculating unit is used for calculating the similarity of the miRNA in the gold standard miRNA set and the degrees of other miRNAs in the miDCN module, and taking the miRNA with the highest similarity in the miDCN module as a new marker.
The third purpose of the invention can be achieved by adopting the following technical scheme:
a computer device comprises a processor and a memory for storing a program executable by the processor, wherein the processor executes the program stored in the memory to realize the module marker mining method.
The fourth purpose of the invention can be achieved by adopting the following technical scheme:
a storage medium stores a program which, when executed by a processor, implements the modular marker mining method described above.
Compared with the prior art, the invention has the following beneficial effects:
the MiDCN module mining method based on the mRNA expression profile data has the advantages that the miRNA expression profile data and the mRNA expression profile data are utilized to construct an initial MiDCN network, an mDCN network and a mi-m-DCN network are constructed, the initial MiDCN network is expanded to form an expanded MiDCN network, the initial MiDCN network and the expanded MiDCN network are subjected to module mining to obtain the MiDCN module, the data of two biological layers are combined, various classical methods are utilized for analysis, the obtained result is more accurate, the intrinsic biological connection between the data is fully considered, the obtained result has more practical significance, and the MiDCN module mining method based on the mRNA expression profile data and the mRNA expression profile data can be used for mining miRNA module markers related to complex diseases such as schizophrenia and the like for multi-gene genetic diseases.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the structures shown in the drawings without creative efforts.
Fig. 1 is a flowchart of a module marker mining method according to embodiment 1 of the present invention.
Fig. 2 is a schematic diagram of a module marker excavating method according to embodiment 1 of the present invention.
Fig. 3 is a flowchart of constructing an initial mitcn network according to embodiment 1 of the present invention.
Fig. 4 is a flowchart of expanding an initial mitcn network according to embodiment 1 of the present invention.
Fig. 5 is a block diagram showing a structure of a module marker excavating apparatus according to embodiment 2 of the present invention.
Fig. 6 is a block diagram of a computer device according to embodiment 3 of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be described in detail and completely with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some embodiments of the present invention, but not all embodiments, and all other embodiments obtained by a person of ordinary skill in the art without creative efforts based on the embodiments of the present invention belong to the protection scope of the present invention.
Example 1:
as shown in fig. 1 to 2, the present embodiment provides a method for mining a modular marker, where the modular marker is a miRNA modular marker of schizophrenia, the method including the following steps:
s101, constructing an initial mitCN network based on miRNA expression profile data.
The expression profile Data based on the miRNA can be derived from deep sequencing Data or chip Data, in this embodiment, the miRNA expression profile Data is obtained from a public database, the DCN Network refers to a Data Communication Network, that is, a Data Communication Network, the mitcn Network is a Data Communication Network based on the miRNA, and as can be understood, the following mDCN Network is a Data Communication Network based on the mRNA, and the mi-m-DCN Network is a Data Communication Network based on the miRNA and the mRNA.
As shown in fig. 3, the step S101 specifically includes:
(1) data splitting and combining calculation: generating a matrix of x rows and z columns according to miRNA expression profile data, wherein each row represents one miRNA, each column represents one sample, let E represent miRNA expression profile data, S represents disease group expression profile data consisting of m samples of the disease to be researched, C represents control group expression profile data consisting of z-m +1 samples of a control group, and all possible combinations of x miRNAs are
Figure GDA0003038905830000061
For any pair of combinations k, the following steps (2) to (3) are performed, and E, S and C are expressed as follows:
Figure GDA0003038905830000062
in this example, after performing bioinformatics routine preprocessing on the expression profile data E, x is 230, z is 30, and m is 15, that is, 230 rows (230 mirnas) and 30 columns (30 samples) are included, wherein, S group includes disease group expression profile data composed of 15 samples of schizophrenic patients, C group includes control group expression profile data composed of 15 samples of healthy people, and all possible combinations of 230 mirnas are thenIs composed of
Figure GDA0003038905830000071
And (4) respectively.
(2) And (3) correlation calculation: and (3) expressing any pair of miRNA by using i-j, and calculating a correlation coefficient r and a p value representing statistical significance of i-j in disease group expression profile data S and control group expression profile data C by using a spearman rank correlation or spearman correlation coefficient method and the like.
In the embodiment, the correlation coefficient r and the p value representing the statistical significance of i-j in the disease group expression profile data S and the control group expression profile data C are calculated by using spearman rank correlation, and the correlation coefficient r and the p value obtained by calculation in the disease group expression profile data S are respectively marked as rs,i-jAnd ps,i-jThe correlation coefficient r and the p value calculated in the expression profile data C of the control group were respectively designated as rc,i-jAnd pc,i-j
(3) And (3) calculating a DC value: performing DC when at least one of the p-value of i-j in the disease group expression profile data S and the p-value of i-j in the control group expression profile data C is less than or equal to a set threshold valuei-jAnd (4) calculating the score.
Setting a threshold value to be 0.05 aiming at the p value obtained in the step (2), selecting at least one miRNA pair with the p value less than or equal to the set threshold value, and carrying out DCi-jThe calculation of the score specifically comprises the following steps:
if the p value of i-j in the disease group expression profile data S and the p value of i-j in the control group expression profile data C are both smaller than or equal to the set threshold value, calculating the DC according to the following formulai-jAnd (3) fractional:
DCi-j=|rs,i-j*(1-ps,i-j)-rc,i-j*(1-pc,i-j)|
if the p value of i-j in the disease group expression profile data S is less than or equal to the set threshold value and the p value of i-j in the control group expression profile data C is greater than the set threshold value, calculating DC according to the following formulai-jAnd (3) fractional:
DCi-j=|rs,i-j*(1-ps,i-j)-rc,i-j*pc,i-j|
if p value of i-j in the expression profile data C of the control group is smallIs equal to or more than the set threshold value, and the p value of i-j in the disease group expression spectrum data S is more than the set threshold value, the DC is calculated according to the following formulai-jAnd (3) fractional:
DCi-j=|rs,i-j*ps,i-j-rc,i-j*(1-pc,i-j)|
from the three formulas, if the values are significant in S and C, multiplying the r value by 1-p respectively, and subtracting the two values to obtain an absolute value of a difference value; if p is significant only in S or C, then the insignificant p is directly multiplied by the r and the absolute value of the difference is calculated, summarized as:
Figure GDA0003038905830000072
(4) constructing an initial mitcn network: constructing all miRNA pairs with DC values into an initial MiDCN network; wherein, the network node is miRNA, and the weight of the edge is DC value.
And (4) constructing three mitcn networks according to the calculation mode of the DC value in the step (3), wherein the method comprises the following steps:
A. construction of a disease TBD-mitCN network (Tobedetermined, to be assigned group) comprising 3375 all ps,i-jAnd pc,i-jAnd the miRNA relation pairs with statistical significance are marked as raw-TBD-miDCN.
B. Construction of a disease Generation-MiDCN network comprising 5274 ps,i-jA miRNA-relationship pair having statistical significance.
C. Construction of a deletion of disease-MiDCN network comprising 3293 Total pc,i-jA miRNA-relationship pair having statistical significance.
S102, constructing an mDCN network and an mi-m-DCN network based on miRNA expression profile data and mRNA expression profile data, and expanding the initial miDCN network to form an expanded miDCN network.
As shown in fig. 4, the step S102 specifically includes:
(1) constructing an mDCN network, using the method of step S101, as follows:
A. data splitting and combining calculation: generating a matrix with x rows and z columns according to mRNA expression profile data; wherein, each row represents one miRNA, each column represents one sample, let E1 represent mRNA expression profile data, S1 represent disease group expression profile data consisting of m samples of the disease under study, and C1 represent control group expression profile data consisting of z-m +1 samples of the control group.
B. And (3) correlation calculation: any pair of mRNA is represented by i1-j1, and correlation coefficient r and p value representing statistical significance of i1-j1 in disease group expression profile data S1 and control group expression profile data C1 are calculated, wherein the correlation coefficient r and the p value calculated in disease group expression profile data S1 are respectively marked as rs1,i1-j1And ps1,i1-j1The correlation coefficient r and the p value calculated in the control group expression profile data C1 were respectively designated as rc1,i1-j1And pc1,i1-j1
C. And (3) calculating a DC value: performing DC when at least one of the p-value of i1-j1 in the disease group expression profile data S1 and the p-value of i1-j1 in the control group expression profile data C1 is less than or equal to a set threshold valuei1-j1And (4) calculating the score.
D. Constructing an mDCN network: constructing all mRNA pairs with DC values into an mDCN network; where the network node is the mRNA and the weight of the edge is the DC value.
In this example, mRNA expression profile data E1 was obtained from public databases, and after bioinformatics routine preprocessing, x 1311, z 30, and m 18 included 1311 rows (1311 mrnas) and 30 columns (30 samples), where S group included disease group expression profile data composed of 18 samples of schizophrenic patients, C included control group expression profile data composed of 12 samples of healthy persons, and all possible combinations of 1311 mirnas were disease group expression profile data composed of 18 samples of schizophrenic patients, and all possible combinations of 1311 mirnas were
Figure GDA0003038905830000081
Three mDCN networks were constructed: "disease TBD-mDCN" has 120097 sides (designated as raw-TBD-mDCN), "disease deletion-mDCN" has 139231 sides (designated as raw-lost-mDCN), "disease production-mDCN" has 235870 sides (designated as raw-gain-mDCN).
(2) Constructing the mi-m-DCN network by using the method of the step S101 as follows:
A. data splitting and combining calculation: generating a matrix with x rows and z columns according to the miRNA expression profile data and the mRNA expression profile data; wherein each row represents a miRNA-mRNA pair, each column represents a sample, let E2 represent miRNA-mRNA expression profile data, S2 represent disease group expression profile data consisting of m samples of the disease under study, and C2 represent control group expression profile data consisting of z-m +1 samples of the control group.
B. And (3) correlation calculation: any one miRNA-mRNA pair is represented by i2-j2, a correlation coefficient r of i2-j2 in disease group expression profile data S2 and control group expression profile data C2 and a p value representing statistical significance are calculated, wherein the correlation coefficient r and the p value calculated in the disease group expression profile data S2 are respectively marked as rs2,i2-j2And ps2,i2-j2The correlation coefficient r and the p value calculated in the control group expression profile data C2 were respectively designated as rc2,i2-j2And pc2,i2-j2
C. And (3) calculating a DC value: if at least one of the p-value of i2-j2 in the disease group expression profile data S2 and the p-value of i2-j2 in the control group expression profile data C2 is less than or equal to a set threshold value, and the correlation coefficient r<0, then DC is performedi2-j2And (4) calculating the score.
D. Constructing a mi-m-DCN network: constructing all miRNA-mRNA pairs with DC values into a mi-m-DCN-e network, selecting the results of a plurality of miRNA target gene prediction algorithms, calculating a TNet fraction for each miRNA-mRNA pair, namely the number of times that the target gene prediction algorithm predicts the miRNA-mRNA pairs as a target relation, and constructing a mi-m-DCN-t network; wherein the network nodes are miRNA-mRNA pairs and the weights of the edges are TNet scores.
In this example, based on the expression profile data of miRNA and mRNA in step S101 and step S102 (1), all possible combinations between 1311 and 230 mirnas are 1311 × 230 ═ 301530, all miRNA-mRNA pairs with r <0 and p ≦ 0.05 were selected to construct two mi-m-DCN-e networks: the disease deletion-mi-m-DCN-e network has 32369 edges (marked as raw-lost-mi-m-DCN-e) and the disease generation-mi-m-DCN-e network has 13210 edges (marked as raw-gain-mi-m-DCN-e), and the miRNA-mRNA pair does not meet the construction condition of the disease TBD-mi-m-DCN-e, so the network is not constructed; in addition, the results of 10 miRNA target gene prediction algorithms (DIANA-microT, mirSVR, PicTar5, RNA22, RNAhybrid, TargetScan, PITA, MirTarget2, TargetMiner, MiRanda) were selected, a TNet score was calculated for each pair of miRNA-mRNA combinations, and 82071 total TNet of miRNA-mRNA combinations was equal to or greater than 1 for constructing mi-m-DCN-t network (denoted as raw-mi-m-DCN-e).
(3) Expanding the mitcn network: the method comprises the steps of utilizing an initial miDCN network, an mDCN network, a mi-m-DCN-e network/mi-m-DCN-t network as input, utilizing a computer algorithm with the similarity relation of nodes as a calculation basis to predict a new miRNA-miRNA relation, and adding the miDCN network to form an expanded miDCN network.
Wherein, the computer algorithm adopts an algorithm based on random walk and published in the journal of the academic journal of Methods in 2017 (the article is shown in detail in Peng W, Lan W, Zhong J, et al. A novel method of predicting microRNA-diseases associated with microRNA, disease, gene and environment factors networks [ J ]. Methods,2017,124:69-77.), two new miRNA-miRNA relations are predicted, two constructed MiDCN networks are respectively added, and two new expanded MiDCN networks are formed: extended-lost-mistCnN-e, extended-gain-mistCnN-e, extended-lost-mistCnN-t, extended-gain-mistCnN-t.
And S103, module mining is carried out on the initial and extended mitCN networks to obtain a mitCN module.
(1) And (3) excavating the module: and performing module mining by using a module mining algorithm aiming at the initial and expanded MiDCN networks.
In this embodiment, for the seven mitcn networks, the following is performed:
raw-TBD-miDCN,raw-lost-miDCN,extended-lost-miDCN-e,extended-lost-miDCN-t,raw-gain-miDCN,extended-gain-miDCN-e,extended-gain-miDCN-t
and performing module mining by using various module mining algorithms of the computer, such as module mining by using a method 'cluster _ label _ prop' for mining network modules in an igraph package of the R language.
(2) Module screening: and discarding all modules containing the initial mitcn network nodes more than a set percentage or modules with only one edge, and screening the mitcn modules of the expanded mitcn network.
In this embodiment, the percentage is set to 90%, all modules including the initial miscdn network node greater than or equal to 90% or modules having only one edge are discarded, and finally twelve miscdn modules are screened out in five miscdn networks (raw-TBD-miscdn, raw-lost-miscdn, extended-lost-miscdn-e, raw-gain-miscdn, extended-gain-miscdn-e).
(3) And (4) module function annotation: and searching miRNA contained in the screened miDCN module and the researched disease as keywords through a network literature database, and classifying the miRNA in the miDCN module according to functions.
In this embodiment, the miRNA contained in the screened miDCN module and "schizochrysia" are searched as keywords through a network literature database, and the miRNA in the miDCN module is classified into 3 types according to functions: associated with schizophrenia, associated with other psychiatric disorders, with no evidence of association.
The following steps S104 to S106 are steps of verifying the new marker, classifying the miRNA in the mimcn module obtained in step S103, summarizing the function of the mimcn module, and predicting the function of the miRNA that is not annotated with any function in the mimcn module, specifically, according to the biological meaning of the miRNA in the mimcn module, such as directly related to the occurrence of schizophrenia, related to other psychiatric diseases except for schizophrenia, and related to psychiatric diseases without evidence, if 80% of the mirnas contained in one of the mimcn modules are directly related to the occurrence of schizophrenia, it can be summarized that the function of this mimcn module is directly related to the occurrence of schizophrenia, and the other 20% of the mimcn modules can be predicted to be related to the occurrence of schizophrenia, but not directly related to but indirectly related to the occurrence of schizophrenia.
S104, obtaining a gold standard miRNA set.
Specifically, mirnas confirmed in public databases that are associated with the disease under study were selected as the gold-standard miRNA pool.
S105, obtaining the connection relation between the miRNAs of the gold standard miRNA set in the miDCN module.
S106, calculating the similarity of the miRNA in the gold standard miRNA set and the other miRNA in the miDCN module, and taking the miRNA with the highest similarity in the miDCN module as a new marker.
Specifically, the similarity of the degrees of the miRNAs in the gold standard set and the rest miRNAs in the module is calculated by using methods such as a pierce correlation coefficient, a spearman rank correlation and the like, and the new marker with the maximum correlation coefficient and the obvious p value is selected.
In this embodiment, the connection relationship between the mirnas in the miscnn module obtained in step S103 in the gold standard miRNA set is obtained, the similarity between the mirnas in the gold standard set and the degrees of the other mirnas in the module is calculated by using spearman rank correlation, and it is found in one miscnn module that the degree similarity between has-miR-346 and has-miR-184 in the gold standard miRNA set is the highest and is r ═ 1 and p ═ 0, and through literature search, it is found that miR-184 is related to mdd (major compressed dissorder) [ PMID:27468165 ]. Therefore, miR-184 was selected as a novel marker.
After performing the above steps S101 to S106, the new markers were verified, and the SD rat was modeled for schizophrenia using MK-801 in this example. After the model is successfully created through ethological verification such as water maze, rat brain tissue is taken, and a PCR experiment is utilized to verify that miR-184 is up-regulated in the rat brain tissue of schizophrenia, verify the correlation with schizophrenia, and illustrate the accuracy of the technical scheme of the embodiment.
Wherein, the PCR result is as follows:
Figure GDA0003038905830000111
the calculation method is as follows:
(1) step 1: the method for normalizing the sample difference by internal reference comprises the following steps: ctInternal reference-CtTarget=△Ct
(2) Step 2: other samples were compared to control samples by: delta Ctcase group-△Ctcontrol group=△△Ct
(3) And step 3: using a formulaic meterThe method comprises the following steps: multiple change 2-△△Ct
According to the steps, the results of three repetitions are calculated as follows: 1.972465, 2.789487, 2.928171, demonstrated that miR-184 is up-regulated in the brain tissue of schizophrenia.
Those skilled in the art will appreciate that all or part of the steps in the method for implementing the above embodiments may be implemented by a program instructing associated hardware, and the corresponding program may be stored in a computer-readable storage medium.
It should be noted that although the method operations of the above-described embodiments are depicted in the drawings in a particular order, this does not require or imply that these operations must be performed in this particular order, or that all of the illustrated operations must be performed, to achieve desirable results. Rather, the depicted steps may change the order of execution. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step execution, and/or one step broken down into multiple step executions.
Example 2:
as shown in fig. 5, the present embodiment provides a module marker mining device, where the module marker is a miRNA module marker of a complex disease, the device includes a construction unit 501, an expansion unit 502, a mining unit 503, a first obtaining unit 504, a second obtaining unit 505, and a calculation unit 506, and specific functions of each unit are as follows:
the constructing unit 501 is configured to construct an initial mitcn network based on miRNA expression profile data.
The expanding unit 502 is configured to construct an mDCN network and an mi-m-DCN network based on the miRNA expression profile data and the mRNA expression profile data, and expand the initial mimcn network to form an expanded mimcn network.
And the mining unit 503 is configured to perform module mining on the initial and extended mitcn networks to obtain a mitcn module.
A first obtaining unit 504, configured to obtain a gold standard miRNA set; wherein the set of gold standard miRNAs comprises miRNAs related to the disease under study in a public database.
And a second obtaining unit 505, configured to obtain a connection relationship between the mirnas in the mitcn module of the gold standard miRNA set.
And the calculating unit 506 is used for calculating the similarity between the miRNAs in the gold standard miRNA set and the rest miRNAs in the miDCN module, and taking the miRNA with the highest similarity in the miDCN module as a new marker.
The specific implementation of each unit in this embodiment may refer to embodiment 1, which is not described herein any more; it should be noted that the apparatus provided in this embodiment is only illustrated by the division of the above functional units, and in practical applications, the functions may be completed by allocating the above functions to different functional units as needed, that is, dividing the internal structure into different functional units to complete all or part of the functions described above.
Example 3:
as shown in fig. 6, the present embodiment provides a computer apparatus, which may be a computer, including a processor 602, a memory, an input device 603, a display 604, and a network interface 605 connected via a device bus 601; wherein, the processor 602 is used to provide calculation and control capability, the memories include a nonvolatile storage medium 606 and an internal memory 607, the nonvolatile storage medium 606 stores an operating device, a computer program and a database, the internal memory 1207 provides an environment for the operating device and the computer program in the nonvolatile storage medium 606 to run, and when the computer program is executed by the processor 602, the module tag mining method of the above embodiment 1 is implemented as follows:
constructing an initial mitcn network based on miRNA expression profile data;
constructing an mDCN network and a mi-m-DCN network based on miRNA expression profile data and mRNA expression profile data, and expanding the initial miDCN network to form an expanded miDCN network;
module mining is carried out on the initial mitcn network and the expanded mitcn network to obtain a mitcn module;
acquiring a gold standard miRNA set; wherein the set of gold standard miRNAs comprises miRNAs related to a disease under study in a public database;
acquiring the connection relation between miRNAs of a gold standard miRNA set in a miDCN module;
and calculating the similarity of the miRNA in the gold standard miRNA set and the rest miRNAs in the miDCN module, and taking the miRNA with the highest similarity in the miDCN module as a new marker.
Example 4:
the present embodiment provides a storage medium, which is a computer-readable storage medium, and stores a computer program, and when the computer program is executed by a processor, the computer program implements the module marker mining method of embodiment 1, as follows:
constructing an initial mitcn network based on miRNA expression profile data;
constructing an mDCN network and a mi-m-DCN network based on miRNA expression profile data and mRNA expression profile data, and expanding the initial miDCN network to form an expanded miDCN network;
module mining is carried out on the initial mitcn network and the expanded mitcn network to obtain a mitcn module;
acquiring a gold standard miRNA set; wherein the set of gold standard miRNAs comprises miRNAs related to a disease under study in a public database;
acquiring the connection relation between miRNAs of a gold standard miRNA set in a miDCN module;
and calculating the similarity of the miRNA in the gold standard miRNA set and the rest miRNAs in the miDCN module, and taking the miRNA with the highest similarity in the miDCN module as a new marker.
It should be noted that the computer readable storage medium of the present embodiment may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor device, apparatus, or a combination of any of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution apparatus, device, or apparatus. In this embodiment, however, a computer readable signal medium may include a propagated data signal with a computer readable program embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable storage medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution apparatus, device, or apparatus. The computer program embodied on the computer readable storage medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
In summary, the initial miDCN network is constructed by using miRNA expression profile data and mRNA expression profile data, the mDCN network and the mi-m-DCN network are constructed, the initial miDCN network is expanded to form an expanded miDCN network, the initial miDCN network and the expanded miDCN network are subjected to module mining to obtain a miDCN module, the data of two biological layers are combined, and a plurality of classical methods are used for analysis, so that the obtained result is more accurate, and the intrinsic biological relation among the data is fully considered, so that the obtained result has more practical significance, and can be used for mining miRNA module markers related to multi-gene genetic diseases, such as schizophrenia and other complex diseases.
The above description is only for the preferred embodiments of the present invention, but the protection scope of the present invention is not limited thereto, and any person skilled in the art can substitute or change the technical solution and the inventive concept of the present invention within the scope of the present invention.

Claims (9)

1. A method of modular marker mining, said modular markers being miRNA modular markers for complex diseases, said method comprising:
constructing an initial mitcn network based on miRNA expression profile data;
constructing an mDCN network and a mi-m-DCN network based on miRNA expression profile data and mRNA expression profile data, and expanding the initial miDCN network to form an expanded miDCN network;
module mining is carried out on the initial mitcn network and the expanded mitcn network to obtain a mitcn module;
acquiring a gold standard miRNA set; wherein the set of gold standard miRNAs comprises miRNAs related to a disease under study in a public database;
acquiring the connection relation between miRNAs of a gold standard miRNA set in a miDCN module;
calculating the similarity of the miRNA in the gold standard miRNA set and the similarity of the miRNA in the MiDCN module, and taking the miRNA with the highest similarity in the MiDCN module as a new marker;
the module mining is performed on the initial miDCN network and the extended miDCN network to obtain the miDCN module, and the module mining specifically includes:
module mining is carried out by utilizing a module mining algorithm aiming at the initial mitcn network and the expanded mitcn network;
discarding all modules containing the initial mitcn network nodes with the percentage larger than the set percentage or modules with only one edge, and screening the mitcn modules of the expanded mitcn network;
and searching miRNA contained in the screened miDCN module and the researched disease as keywords through a network literature database, and classifying the miRNA in the miDCN module according to functions.
2. The method of modular marker mining according to claim 1, wherein the constructing an initial mitcn network based on miRNA expression profile data specifically comprises:
generating a matrix with x rows and z columns according to miRNA expression profile data; wherein, each row represents a miRNA, each column represents a sample, E represents miRNA expression profile data, S represents disease group expression profile data consisting of m samples of the researched disease, and C represents control group expression profile data consisting of z-m +1 samples of a control group;
expressing any pair of miRNA by i-j, calculating correlation coefficient r and p value representing statistical significance of i-j in disease group expression profile data S and control group expression profile data C, wherein the correlation coefficient r and the p value calculated in the disease group expression profile data S are respectively marked as rs,i-jAnd ps,i-jThe correlation coefficient r and the p value calculated in the expression profile data C of the control group were respectively designated as rc,i-jAnd pc,i-j
Performing DC when at least one of the p-value of i-j in the disease group expression profile data S and the p-value of i-j in the control group expression profile data C is less than or equal to a set threshold valuei-jCalculating a score;
constructing all miRNA pairs with DC values into an initial MiDCN network; wherein, the network node is miRNA, and the weight of the edge is DC value.
3. The method of claim 2, wherein the DC is performed if at least one of the p-value of i-j in the disease group expression profile data S and the p-value of i-j in the control group expression profile data C is less than or equal to a predetermined threshold valuei-jThe score calculation specifically includes:
if the p value of i-j in the disease group expression profile data S and the p value of i-j in the control group expression profile data C are both smaller than or equal to the set threshold value, calculating the DC according to the following formulai-jAnd (3) fractional:
DCi-j=|rs,i-j*(1-ps,i-j)-rc,i-j*(1-pc,i-j)|
if the p value of i-j in the disease group expression profile data S is less than or equal to the set threshold value and the p value of i-j in the control group expression profile data C is greater than the set threshold value, calculating DC according to the following formulai-jAnd (3) fractional:
DCi-j=|rs,i-j*(1-ps,i-j)-rc,i-j*pc,i-j|
if the p value of i-j in the control group expression profile data C is less than or equal to the set threshold value and the p value of i-j in the disease group expression profile data S is greater than the set threshold value, calculating the DC according to the following formulai-jAnd (3) fractional:
DCi-j=|rs,i-j*ps,i-j-rc,i-j*(1-pc,i-j)|。
4. the method for module marker mining according to claim 2, wherein the constructing all miRNA pairs having DC values as an initial mitcn network comprises:
constructing a disease TBD-miDCN network; wherein the disease TBD-miDCN network comprises all ps,i-jAnd pc,i-jmiRNA relationship pairs all having statistical significance;
constructing a disease generation-mitcn network: wherein the disease Generation-MiDCN network comprises all ps,i-jA miRNA-relationship pair having statistical significance;
constructing a disease deletion-mitcn network: wherein the disease deletion-miDCN network comprises all pc,i-jA miRNA-relationship pair having statistical significance.
5. The module marker mining method according to claim 1, wherein the constructing of the mDCN network and the mi-m-DCN network based on the miRNA expression profile data and the mRNA expression profile data, and the expanding of the initial mimcn network to form an expanded mimcn network specifically comprises:
generating a matrix with x rows and z columns according to mRNA expression profile data; wherein, each row represents one miRNA, each column represents one sample, let E1 represent mRNA expression profile data, S1 represent disease group expression profile data consisting of m samples of the disease under study, C1 represent control group expression profile data consisting of z-m +1 samples of the control group;
any pair of mRNAs is represented by i1-j1, and correlation between i1-j1 in disease group expression profile data S1 and control group expression profile data C1 is calculatedThe coefficient r and a p-value representing a statistical significance, wherein the correlation coefficient r and the p-value calculated in the disease group expression profile data S1 are respectively designated as rs1,i1-j1And ps1,i1-j1The correlation coefficient r and the p value calculated in the control group expression profile data C1 were respectively designated as rc1,i1-j1And pc1,i1-j1
Performing DC when at least one of the p-value of i1-j1 in the disease group expression profile data S1 and the p-value of i1-j1 in the control group expression profile data C1 is less than or equal to a set threshold valuei1-j1Calculating a score;
constructing all mRNA pairs with DC values into an mDCN network; wherein the network node is mRNA and the weight of the edge is DC value;
generating a matrix with x rows and z columns according to the miRNA expression profile data and the mRNA expression profile data; wherein, each row represents a miRNA-mRNA pair, each column represents a sample, E2 represents miRNA-mRNA expression profile data, S2 represents disease group expression profile data consisting of m samples of the disease to be researched, and C2 represents control group expression profile data consisting of z-m +1 samples of a control group;
any one miRNA-mRNA pair is represented by i2-j2, a correlation coefficient r of i2-j2 in disease group expression profile data S2 and control group expression profile data C2 and a p value representing statistical significance are calculated, wherein the correlation coefficient r and the p value calculated in the disease group expression profile data S2 are respectively marked as rs2,i2-j2And ps2,i2-j2The correlation coefficient r and the p value calculated in the control group expression profile data C2 were respectively designated as rc2,i2-j2And pc2,i2-j2
If at least one of the p-value of i2-j2 in the disease group expression profile data S2 and the p-value of i2-j2 in the control group expression profile data C2 is less than or equal to a set threshold value and the correlation coefficient r is less than 0, performing DCi2-j2Calculating a score;
constructing all miRNA-mRNA pairs with DC values into a mi-m-DCN-e network, selecting the results of a plurality of miRNA target gene prediction algorithms, calculating a TNet fraction for each miRNA-mRNA pair, namely the number of times that the target gene prediction algorithm predicts the miRNA-mRNA pairs as a target relation, and constructing a mi-m-DCN-t network; wherein the network nodes are miRNA-mRNA pairs and the weights of the edges are TNet scores;
the method comprises the steps of utilizing an initial miDCN network, an mDCN network, a mi-m-DCN-e network/mi-m-DCN-t network as input, utilizing a computer algorithm with the similarity relation of nodes as a calculation basis to predict a new miRNA-miRNA relation, and adding the miDCN network to form an expanded miDCN network.
6. The method for mining module markers according to claim 5, wherein the constructing all mRNA pairs with DC values into an mDCN network comprises:
constructing a disease TBD-mDCN network; wherein the disease TBD-mDCN network comprises all ps1,i1-j1And pc1,i1-j1mRNA relationship pairs which all have statistical significance;
constructing a disease generation-mDCN network: wherein the disease Generation-mDCN network comprises all ps1,i1-j1mRNA relationship pairs with statistical significance;
constructing a disease deletion-mDCN network: wherein the disease deletion-mDCN network comprises all pc1,i1-j1mRNA relationship pairs with statistical significance;
constructing all miRNA-mRNA pairs with DC values into a mi-m-DCN-e network, and specifically comprising the following steps:
constructing a disease TBD-mi-m-DCN-e network; wherein the disease TBD-mi-m-DCN-e network comprises all ps2,i2-j2And pc2,i2-j2miRNA-mRNA pairs that are both statistically significant;
constructing a disease generation-mi-m-DCN-e network: wherein the disease generation-mi-m-DCN-e network comprises all ps2,i2-j2A miRNA-mRNA pair of statistical significance;
constructing a disease deletion-mi-m-DCN-e network: wherein the disease deletion-mi-m-DCN-e network comprises all pc2,i2-j2A miRNA-mRNA pair of statistical significance.
7. A modular marker mining device, said modular marker being a miRNA modular marker of a complex disease, said device comprising:
the constructing unit is used for constructing an initial mitcn network based on the miRNA expression profile data;
the extension unit is used for constructing an mDCN network and a mi-m-DCN network based on the miRNA expression profile data and the mRNA expression profile data, and extending the initial miDCN network to form an extended miDCN network;
the mining unit is used for carrying out module mining on the initial and extended mitcn networks to obtain a mitcn module;
the first acquisition unit is used for acquiring a gold standard miRNA set; wherein the set of gold standard miRNAs comprises miRNAs related to a disease under study in a public database;
the second acquisition unit is used for acquiring the connection relation between miRNAs of the gold standard miRNA set in the miDCN module;
the calculating unit is used for calculating the similarity of the miRNA in the gold standard miRNA set and the degrees of other miRNAs in the miDCN module, and taking the miRNA with the highest similarity in the miDCN module as a new marker;
the module mining is performed on the initial miDCN network and the extended miDCN network to obtain the miDCN module, and the module mining specifically includes:
module mining is carried out by utilizing a module mining algorithm aiming at the initial mitcn network and the expanded mitcn network;
discarding all modules containing the initial mitcn network nodes with the percentage larger than the set percentage or modules with only one edge, and screening the mitcn modules of the expanded mitcn network;
and searching miRNA contained in the screened miDCN module and the researched disease as keywords through a network literature database, and classifying the miRNA in the miDCN module according to functions.
8. A computer device comprising a processor and a memory for storing a program executable by the processor, wherein the processor, when executing the program stored in the memory, implements the modular tag mining method of any of claims 1-6.
9. A storage medium storing a program, wherein the program, when executed by a processor, implements the module marker mining method according to any one of claims 1 to 6.
CN202010944760.3A 2020-09-10 2020-09-10 Module marker mining method and device, computer equipment and storage medium Active CN112071369B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010944760.3A CN112071369B (en) 2020-09-10 2020-09-10 Module marker mining method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010944760.3A CN112071369B (en) 2020-09-10 2020-09-10 Module marker mining method and device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN112071369A CN112071369A (en) 2020-12-11
CN112071369B true CN112071369B (en) 2021-08-03

Family

ID=73663329

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010944760.3A Active CN112071369B (en) 2020-09-10 2020-09-10 Module marker mining method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112071369B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109243538A (en) * 2018-07-19 2019-01-18 长沙学院 A kind of method and system of predictive disease and LncRNA incidence relation
CN110444248A (en) * 2019-07-22 2019-11-12 山东大学 Cancer Biology molecular marker screening technique and system based on network topology parameters
CN111009285A (en) * 2019-05-28 2020-04-14 江南大学 Biological data network processing method based on similarity network fusion algorithm
CN111028887A (en) * 2019-12-04 2020-04-17 电子科技大学 Method and device for identifying ncRNA (non-coding ribonucleic acid) cooperative competition network

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103937888B (en) * 2014-04-14 2016-08-17 上海交通大学 Differentiate screening and the application of the blood plasma microRNA mark of gastric cancer
CN108121896B (en) * 2017-12-19 2020-07-24 深圳先进技术研究院 Disease relation analysis method and device based on miRNA

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109243538A (en) * 2018-07-19 2019-01-18 长沙学院 A kind of method and system of predictive disease and LncRNA incidence relation
CN111009285A (en) * 2019-05-28 2020-04-14 江南大学 Biological data network processing method based on similarity network fusion algorithm
CN110444248A (en) * 2019-07-22 2019-11-12 山东大学 Cancer Biology molecular marker screening technique and system based on network topology parameters
CN111028887A (en) * 2019-12-04 2020-04-17 电子科技大学 Method and device for identifying ncRNA (non-coding ribonucleic acid) cooperative competition network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
"基于多元生物分子网络寻找癌症miRNA/lncRNA标志物:以前列腺癌转移为例";林宇鑫;《中国博士学位论文全文数据库》;20200415;摘要第1-2页,正文第7-40页 *

Also Published As

Publication number Publication date
CN112071369A (en) 2020-12-11

Similar Documents

Publication Publication Date Title
CN103778349B (en) Biomolecular network analysis method based on function module
Alföldi et al. Comparative genomics as a tool to understand evolution and disease
US10347359B2 (en) Method and system for network modeling to enlarge the search space of candidate genes for diseases
CN115132270A (en) Drug screening method and system
CN112071369B (en) Module marker mining method and device, computer equipment and storage medium
Liao et al. Identifying human microRNA–disease associations by a new diffusion-based method
Lajoie et al. Computational discovery of regulatory elements in a continuous expression space
WO2005096208A1 (en) Base sequence retrieval apparatus
US20230410936A1 (en) Network approach to navigating the human genome
Celebi et al. Prediction of Drug-Drug interactions using pharmacological similarities of drugs
Kasukawa et al. Construction of representative transcript and protein sets of human, mouse, and rat as a platform for their transcriptome and proteome analysis
Stoica et al. Predicting gene functions from text using a cross-species approach
CN114360642A (en) Cancer transcriptome data processing method based on gene co-expression network analysis
Li et al. Neural precision medicine by mining implicit treatment concepts
KR101701168B1 (en) Genomic profile method for in-silico interaction-resolution pathway activity quantification
Churbanov et al. A method of precise mRNA/DNA homology-based gene structure prediction
CN110706748A (en) Competitive endogenous RNA network regulation and analysis system and method
KR102483880B1 (en) disease profiling information providing system based on multiple database information and method therefor
CN109256215A (en) A kind of disease association miRNA prediction technique and system based on from avoidance random walk
Xu et al. Computational gene prediction using neural networks and similarity search
Leung et al. Filtering of false positive microRNA candidates by a clustering-based approach
KR20180090680A (en) Geneome analysis system
CN104182654B (en) Protein-protein interaction network based gene set identification method
Hernández-lobato et al. Regulator discovery from gene expression time series of malaria parasites: a hierachical approach
Xie et al. Conditional random field for candidate gene prioritization

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant